Package 'csranks'

Title: Statistical Tools for Ranks
Description: Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <arXiv:2310.15512>.
Authors: Daniel Wilhelm [aut, cre], Pawel Morgen [aut]
Maintainer: Daniel Wilhelm <[email protected]>
License: GPL (>= 3)
Version: 1.2.2.9001
Built: 2024-11-08 05:29:31 UTC
Source: https://github.com/danielwilhelm/r-cs-ranks

Help Index


Confidence sets for ranks

Description

Marginal and simultaneous confidence sets for ranks.

Usage

csranks(
  x,
  Sigma,
  coverage = 0.95,
  cstype = "two-sided",
  stepdown = TRUE,
  R = 1000,
  simul = TRUE,
  indices = NA,
  na.rm = FALSE,
  seed = NA
)

Arguments

x

vector of estimates containing estimated features by which the populations are to be ranked.

Sigma

estimated covariance matrix of x.

coverage

nominal coverage of the confidence set. Default is 0.95.

cstype

type of confidence set (two-sided, upper, lower). Default is two-sided.

stepdown

logical; if TRUE (default), stepwise procedure is used, otherwise single step procedure is used. See Details section for more.

R

number of bootstrap replications. Default is 1000.

simul

logical; if TRUE (default), then simultaneous confidence sets are computed, which jointly cover all populations indicated by indices. Otherwise, for each population indicated in indices a marginal confidence set is computed.

indices

vector of indices of x for whose ranks the confidence sets are computed. indices=NA (default) means computation of confidence sets for all populations.

na.rm

logical; if TRUE, then NA's are removed from x and Sigma (if any).

seed

seed for bootstrap random variable draws. If set to NA (default), then seed is not set.

Value

A csranks object, which is a list with three items:

L

Lower bounds of the confidence sets for ranks indicated in indices

rank

Estimated ranks from irank with default parameters

U

Upper bounds of the confidence sets.

Details

Suppose j=1,,pj=1,\ldots,p populations (e.g., schools, hospitals, political parties, countries) are to be ranked according to some measure θ=(θ1,,θp)\theta=(\theta_1,\ldots,\theta_p). We do not observe the true values θ1,,θp\theta_1,\ldots,\theta_p. Instead, for each population, we have data from which we have estimated these measures, θ^=(θ^1,,θ^p)\hat{\theta}=(\hat{\theta}_1,\ldots,\hat{\theta}_p). The values θ^1,,θ^p\hat{\theta}_1,\ldots,\hat{\theta}_p are estimates of the true values θ1,,θp\theta_1,\ldots,\theta_p and thus contain statistical uncertainty. In consequence, a ranking of the populations by the values θ^1,,θ^p\hat{\theta}_1,\ldots,\hat{\theta}_p contains statistical uncertainty and is not necessarily equal to the true ranking of θ1,,θp\theta_1,\ldots,\theta_p.

The function computes confidence sets for the rank of one, several or all of the populations (indices indicates which of the 1,,p1,\ldots,p populations are of interest). x is a vector containing the estimates θ^1,,θ^p\hat{\theta}_1,\ldots,\hat{\theta}_p and Sigma is an estimate of the covariance matrix of x. The method assumes that the estimates are asymptotically normal and the sample sizes of the datasets are large enough so that θ^θ\hat{\theta}-\theta is approximately distributed as N(0,Σ)N(0,\Sigma). The argument Sigma should contain an estimate of the covariance matrix Σ\Sigma. For instance, if for each population jj

nj(θ^jθj)dN(0,σj2)\sqrt{n_j} (\hat{\theta}_j-\theta_j) \to_d N(0, \sigma_j^2)

and the datasets for each population are drawn independently of each other, then Sigma is a diagonal matrix

diag(σ^12/n1,,σ^p2/np)diag(\hat{\sigma}_1^2/n_1,\ldots,\hat{\sigma}_p^2/n_p)

containing estimates of the asymptotic variances divided by the sample size. More generally, the estimates in x may be dependent, but then Sigma must be an estimate of its covariance matrix including off-diagonal terms.

Marginal confidence sets (simul=FALSE) are such that the confidence set for a population jj contains the true rank of that population jj with probability approximately equal to the nominal coverage level. Simultaneous confidence sets (simul=TRUE) on the other hand are such that the confidence sets for populations indicated in indices cover the true ranks of all of these populations simultaneously with probability approximately equal to the nominal coverage level. For instance, in the PISA example below, a marginal confidence set of a country jj covers the true rank of country jj with probability approximately equal to 0.95. A simultaneous confidence set for all countries covers the true ranks of all countries simultaneously with probability approximately equal to 0.95.

The function implements the procedures developed and described in more detail in Mogstad, Romano, Shaikh, and Wilhelm (2023). The procedure is based on on testing a large family of hypotheses for pairwise comparisons. Stepwise methods can be used to improve the power of the procedure by, potentially, rejecting more hypotheses without violating the desired coverage property of the resulting confidence set. These are employed when stepdown=TRUE. From a practical point of view, stepdown=TRUE is computationally more demanding, but often results in tighter confidence sets.

The procedure uses a parametric bootstrap procedure based on the above approximate multivariate normal distribution.

References

Mogstad, Romano, Shaikh, and Wilhelm (2023), "Inference for Ranks with Applications to Mobility across Neighborhoods and Academic Achievements across Countries", forthcoming at Review of Economic Studies cemmap working paper doi:10.1093/restud/rdad006

Examples

# simple simulated example:
n <- 100
p <- 10
X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10)
thetahat <- colMeans(X)
Sigmahat <- cov(X) / n
csranks(thetahat, Sigmahat)

# PISA example:
data(pisa2018)
math_score <- pisa2018$math_score
math_se <- pisa2018$math_se
math_cov_mat <- diag(math_se^2)

# marginal confidence set for each country:
csranks(math_score, math_cov_mat, simul=FALSE)

# simultaneous confidence set for all countries:
csranks(math_score, math_cov_mat, simul=TRUE)

Confidence sets for ranks based on multinomial data

Description

Marginal and simultaneous confidence sets for ranks of categories, where categories are ranked by the probabilities of being chosen.

Usage

csranks_multinom(
  x,
  coverage = 0.95,
  cstype = "two-sided",
  simul = TRUE,
  multcorr = "Holm",
  indices = NA,
  na.rm = FALSE
)

Arguments

x

vector of counts indicating how often each category was chosen.

coverage

nominal coverage of the confidence set. Default is 0.95.

cstype

type of confidence set (two-sided, upper, lower). Default is two-sided.

simul

logical; if TRUE (default), then simultaneous confidence sets are computed, which jointly cover all populations indicated by indices. Otherwise, for each population indicated in indices a marginal confidence set is computed.

multcorr

multiplicity correction to be used: Holm (default) or Bonferroni. See Details section for more.

indices

vector of indices of x for whose ranks the confidence sets are computed. indices=NA (default) means computation of confidence sets for all populations.

na.rm

logical; if TRUE, then NA's are removed from x and Sigma (if any).

Value

A csranks object, which is a list with three items:

L

Lower bounds of the confidence sets for ranks indicated in indices

rank

Estimated ranks from irank with default parameters

U

Upper bounds of the confidence sets.

Details

This function computes confidence sets for ranks similarly as csranks, but it is tailored to the special case of multinomial data. Suppose there are pp populations (for the case of multinomial data, we will refer to them as "categories") such as political parties, for example, that one wants to rank by the probabilities of them being chosen. For political parties, this would correspond to the share of votes each party obtains. Here, the underlying data are multinomial: each observation corresponds to a choice among the pp categories. The vector x contains the counts of how often each category was chosen in the data.

In this setting, link{csranks} could be applied to compute confidence sets for the ranks of each category, but instead this function implements a different method proposed by Bazylik, Mogstad, Romano, Shaikh, and Wilhelm (2023), which exploits the multinomial structure of the problem and yields confidence sets for the ranks that are valid in finite samples (whereas csranks produces confidence sets that are valid only asymptotically).

The procedure involves testing multiple hypotheses. The ⁠\code{multcorr}⁠ indicates a method for multiplicity correction. See the paper for details.

References

Bazylik, Mogstad, Romano, Shaikh, and Wilhelm. "Finite-and large-sample inference for ranks using multinomial data with an application to ranking political parties". cemmap working paper

Examples

x <- c(rmultinom(1, 1000, 1:10))
csranks_multinom(x)

Confidence sets for the tau-best

Description

Computation of confidence sets for the identities of populations among the tau best.

Usage

cstaubest(
  x,
  Sigma,
  tau = 2,
  coverage = 0.95,
  stepdown = TRUE,
  R = 1000,
  na.rm = FALSE,
  seed = NA
)

cstauworst(
  x,
  Sigma,
  tau = 2,
  coverage = 0.95,
  stepdown = TRUE,
  R = 1000,
  na.rm = FALSE,
  seed = NA
)

Arguments

x

vector of estimates containing estimated features by which the populations are to be ranked.

Sigma

estimated covariance matrix of x.

tau

the confidence set contains indicators for the elements in x whose rank is less than or equal to tau.

coverage

nominal coverage of the confidence set. Default is 0.95.

stepdown

logical; if TRUE (default), stepwise procedure is used, otherwise single step procedure is used. See Details section for more.

R

number of bootstrap replications. Default is 1000.

na.rm

logical; if TRUE, then NA's are removed from x and Sigma (if any).

seed

seed for bootstrap random variable draws. If set to NA (default), then seed is not set.

Value

logical vector indicating which of the elements of x are in the confidence set for the tau-best.

Functions

  • cstauworst(): Confidence sets for the tau-worst

    Equivalent to calling cstaubest with -x.

Details

The function computes a confidence set containing indicators for the elements in x whose rank is less than or equal to tau with probability approximately equal to the nominal coverage (coverage).

The function implements the projection confidence set for the tau-best developed and described in more detail in Mogstad, Romano, Shaikh, and Wilhelm (2023).

References

Mogstad, Romano, Shaikh, and Wilhelm (2023), "Inference for Ranks with Applications to Mobility across Neighborhoods and Academic Achievements across Countries", forthcoming at Review of Economic Studies cemmap working paper, doi:10.1093/restud/rdad006

Examples

# simple simulated example:
n <- 100
p <- 10
X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10)
thetahat <- colMeans(X)
Sigmahat <- cov(X) / n

# confidence set for the populations that may be among the top-3 
# (with probability approximately 0.95):
cstaubest(thetahat, Sigmahat, tau=3)

# confidence set for the populations that may be among the bottom-3 
# (with probability approximately 0.95):
cstauworst(thetahat, Sigmahat, tau=3)

Compute ranks

Description

Compute integer of fractional ranks with flexible handling of ties.

Usage

irank(x, omega = 0, increasing = FALSE, na.rm = FALSE)

frank(x, omega = 0, increasing = FALSE, na.rm = FALSE)

Arguments

x

vector of values to be ranked

omega

numeric value in [0,1], defining how ties in x (if any) are handled; default is 0. See Details.

increasing

logical; if FALSE (default), then large elements in x receive a small rank. Otherwise, large elements in x receive a large rank.

na.rm

logical; if TRUE, then NA's are removed from x. Default: FALSE.

Details

irank implements all possible definitions of ranks of the values in x. Different definitions of the ranks are chosen through combinations of the two arguments omega and increasing. Suppose x is of length pp. If increasing=TRUE, then the largest value in x receives the rank pp and the smallest the rank 11. If increasing=FALSE, then the largest value in x receives the rank 11 and the smallest the rank pp.

The value of omega indicates how ties are handled. If there are no ties in x, then the value of omega does not affect the ranks and the only choice to be made is whether the ranks should be increasing or decreasing with the values in x. When there are ties in x, however, then there are infinitely many possible ranks that can be assigned to a tied value.

When increasing=TRUE, then omega=0 leads to the smallest possible and omega=1 to the largest possible rank of a tied value. Values of omega between 0 and 1 lead to values of the rank between the largest and smallest.

frank takes the ranking returned by irank and divides the result by length(x). The result is a ranking with ranks in the interval [0,1]. An important special case occurs for increasing=TRUE and omega=1: in this case, the rank of the value x[j] is equal to the empirical cdf of x evaluated at x[j].

Value

Numeric vector of the same length as x containing the integer (for irank) or fractional (for frank) ranks.

Examples

# simple example without ties:
x <- c(3,8,-4,10,2)
irank(x, increasing=TRUE)
irank(x, increasing=FALSE)

# since there are no ties, the value of omega has no impact:
irank(x, increasing=TRUE, omega=0)
irank(x, increasing=TRUE, omega=0.5)
irank(x, increasing=TRUE, omega=1)

# simple example with ties:
x <- c(3,4,7,7,10,11,15,15,15,15)
irank(x, increasing=TRUE, omega=0) # smallest possible ranks
irank(x, increasing=TRUE, omega=0.5) # mid-ranks
irank(x, increasing=TRUE, omega=1) # largest possible ranks

# simple example of fractional ranks without ties:
x <- c(3,8,-4,10,2)
frank(x, increasing=TRUE)
frank(x, increasing=FALSE)

Compute integer ranks in another reference vector

Description

The method irank compares ranks using the same vector as reference. irank_against returns integer ranks, that values from x would assume if (individually) inserted into v. frank_against acts analogously, returning fractional ranks.

Usage

irank_against(x, v, omega = 0, increasing = FALSE, na.rm = FALSE)

frank_against(x, v, omega = 0, increasing = FALSE, na.rm = FALSE)

Arguments

x

numeric query vector.

v

numeric reference vector.

omega

numeric value in [0,1], defining how ties in x (if any) are handled; default is 0. See Details.

increasing

logical; if FALSE (default), then large elements in x receive a small rank. Otherwise, large elements in x receive a large rank.

na.rm

logical; if TRUE, then NA's are removed from x. Default: FALSE.

Details

It's useful to think about frank_against(x,v) as a generalization of Empirical Cumulative Distribution Function, created for v and evaluated for points in x. frank_agaist(x,v,increasing=TRUE,omega=1) is identical to ecdf(v)(x).

increasing switches the inequality sign in ECDF definition from FV(t)=P^(V<=t)F_V(t) = \hat P(V <= t) to P^(V>=t)\hat P(V >= t).

omega=0 introduces the strict inequality (P^(V<t)\hat P(V < t) instead of P^(V<=t)\hat P(V <= t)). Any omega in between is a weighted average of the cases omega=1 and omega=0.

Finally, irank_against is equal to frank_against multiplied by the length(v).

This particular choice of default parameters was made for compatibility with default parameters of irank and frank. irank(x) is always equal to irank_against(x,x) and frank(x) is always equal to frank_against(x,x).

Value

Numeric vector of the same length as x containing the integer (for irank_against) or fractional (for frank_against) ranks.

See Also

irank(), ecdf()

Examples

irank_against(1:10, c(4,4,4,3,1,10,7,7))

Regressions Involving Ranks

Description

Estimation and inference for regressions involving ranks, i.e. regressions in which the dependent and/or the independent variable has been transformed into ranks before running the regression.

Usage

lmranks(
  formula,
  data,
  subset,
  weights,
  na.action = stats::na.fail,
  method = "qr",
  model = TRUE,
  x = FALSE,
  qr = TRUE,
  y = FALSE,
  singular.ok = TRUE,
  contrasts = NULL,
  offset = offset,
  omega = 1,
  ...
)

## S3 method for class 'lmranks'
plot(x, which = 1, ...)

## S3 method for class 'lmranks'
proj(object, onedf = FALSE, ...)

## S3 method for class 'lmranks'
predict(object, newdata, ...)

## S3 method for class 'lmranks'
summary(object, correlation = FALSE, symbolic.cor = FALSE, ...)

## S3 method for class 'lmranks'
vcov(object, complete = TRUE, ...)

Arguments

formula

An object of class "formula": a symbolic description of the model to be fitted. Exactly like the formula for linear model except that variables to be ranked can be indicated by r(). See Details and Examples below.

data

an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called.

subset

currently not supported.

weights

currently not supported.

na.action

currently not supported. User is expected to handle NA values prior to the use of this function.

method

the method to be used; for fitting, currently only method = "qr" is supported; method = "model.frame" returns the model frame (the same as with model = TRUE, see below).

model, y, qr

logicals. If TRUE the corresponding components of the fit (the model frame, the response, the QR decomposition) are returned.

x
  • For lmranks: Logical. Should model matrix be returned?

  • For plot method: An lmranks object.

singular.ok

logical. If FALSE (the default in S but not in R) a singular fit is an error.

contrasts

an optional list. See the contrasts.arg of model.matrix.default.

offset

this can be used to specify an a priori known component to be included in the linear predictor during fitting. This should be NULL or a numeric vector or matrix of extents matching those of the response. One or more offset terms can be included in the formula instead or as well, and if more than one are specified their sum is used. See model.offset.

omega

real number in the interval [0,1] defining how ties are handled (if there are any). The value of omega is passed to frank for computation of ranks. The default is 1 so that the rank of a realized value is defined as the empirical cdf evaluated at that realized value. See Details below.

...

For lm(): additional arguments to be passed to the low level regression fitting functions (see below).

which

As in plot.lm. Currently only no.1 is available.

object

A lmranks object.

onedf

A logical flag. If TRUE, a projection is returned for all the columns of the model matrix. If FALSE, the single-column projections are collapsed by terms of the model (as represented in the analysis of variance table).

newdata

An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used.

correlation

logical; if TRUE, the correlation matrix of the estimated parameters is returned and printed.

symbolic.cor

logical. If TRUE, print the correlations in a symbolic form (see symnum) rather than as numbers.

complete

logical indicating if the full variance-covariance matrix should be returned also in case of an over-determined system where some coefficients are undefined and coef(.) contains NAs correspondingly. When complete = TRUE, vcov() is compatible with coef() also in this singular case.

Details

This function performs estimation and inference for regressions involving ranks. Suppose there is a dependent variable YiY_i and independent variables XiX_i and WiW_i, where XiX_i is a scalar and WiW_i a vector (possibly including a constant). Instead of running a linear regression of YiY_i on XiX_i and WiW_i, we want to first transform YiY_i and/or XiX_i into ranks. Denote by RiYR_i^Y the rank of YiY_i and RiXR_i^X the rank of XiX_i. Then, a rank-rank regression,

RiY=ρRiX+Wiβ+εi,R_i^Y = \rho R_i^X + W_i'\beta + \varepsilon_i,

is run using the formula r(Y)~r(X)+W. Similarly, a regression of the raw dependent variable on the ranked regressor,

Yi=ρRiX+Wiβ+εi,Y_i = \rho R_i^X + W_i'\beta + \varepsilon_i,

can be implemented by the formula Y~r(X)+W, and a regression of the ranked dependent variable on the raw regressors,

RiY=Wiβ+εi,R^Y_i = W_i'\beta + \varepsilon_i,

can be implemented by the formula r(Y)~W.

The function works, in many ways, just like lm for linear regressions. Apart from some smaller details, there are two important differences: first, in lmranks, the mark r() can be used in formulas to indicate variables to be ranked before running the regression and, second, subsequent use of summary produces a summary table with the correct standard errors, t-values and p-values (while those of the lm are not correct for regressions involving ranks). See Chetverikov and Wilhelm (2023) for more details.

Many other aspects of the function are similar to lm. For instance, . in a formula means 'all columns not otherwise in the formula' just as in lm. An intercept is included by default. In a model specified as r(Y)~r(X)+., both r(X) and X will be included in the model - as it would have been in lm and, say, log() instead of r(). One can exclude X with a -, i.e. r(Y)~r(X)+.-X. See formula for more about model specification.

The r() is a private alias for frank. The increasing argument, provided at individual regressor level, specifies whether the ranks should increase or decrease as regressor values increase. The omega argument of frank, provided at lmranks function level, specifies how ties in variables are to be handled and can be supplied as argument in lmranks. For more details, see frank. By default increasing is set to TRUE and omega is set equal to 1, which means r() computes ranks by transforming a variable through its empirical cdf.

Many functions defined for lm also work correctly with lmranks. These include coef, model.frame, model.matrix, resid, update and others. On the other hand, some would return incorrect results if they treated lmranks output in the same way as lm's. The central contribution of this package are vcov, summary and confint implementations using the correct asymptotic theory for regressions involving ranks.

See the lm documentation for more.

Value

An object of class lmranks, inheriting (as much as possible) from class lm.

Additionally, it has an omega entry, corresponding to the omega argument, a ranked_response logical entry, and a rank_terms_indices - an integer vector with indices of entries of terms.labels attribute of terms(formula), which correspond to ranked regressors.

Methods (by generic)

  • plot(lmranks): Plot diagnostics for an lmranks object

    Displays plots useful for assessing quality of model fit. Currently, only one plot is available, which plots fitted values against residuals (for homoscedacity check).

  • proj(lmranks): Projections of the data onto terms of rank-rank regression model

  • predict(lmranks): Predict method for Linear Model for Ranks Fits

  • summary(lmranks): Summarizing fits of rank-rank regressions

  • vcov(lmranks): Calculate Variance-Covariance Matrix for a Fitted lmranks object

    Returns the variance-covariance matrix of the regression coefficients (main parameters) of a fitted lmranks object. Its result is theoretically valid and asymptotically consistent, in contrast to naively running vcov(lm(...)).

Rank-rank regressions with clusters

Sometimes, the data is divided into clusters (groups) and one is interested in running rank-rank regressions separately within each cluster, where the ranks are not computed within each cluster, but using all observations pooled across all clusters. Specifically, let Gi=1,,nGG_i=1,\ldots,n_G denote a variable that indicates the cluster to which the i-th observation belongs. Then, the regression model of interest is

RiY=g=1nG1{Gi=g}(ρgRiX+Wiβg)+εi,R_i^Y = \sum_{g=1}^{n_G} 1\{G_i=g\}(\rho_g R_i^X + W_i'\beta_g) + \varepsilon_i,

where ρg\rho_g and βg\beta_g are now cluster-specific coefficients, but the ranks RiYR_i^Y and RiXR_i^X are computed as ranks among all observations YiY_i and XiX_i, respectively. That means the rank of an observation is not computed among the other observations in the same cluster, but rather among all available observations across all clusters.

This type of regression is implemented in the lmranks function using interaction notation: r(Y)~(r(X)+W):G. Here, the variable G must be a factor.

Since the theory for clustered regression mixing grouped and ungrouped (in)dependent variables is not yet developed, such a model will raise an error. Also, by default the function includes a cluster-specific intercept, i.e. r(Y)~(r(X)+W):G is internally interpreted as r(Y)~(r(X)+W):G+G-1.

contrasts of G must be of contr.treatment kind, which is the default.

Warning

As a consequence of the order, in which model.frame applies operations, subset and na.action would be applied after evaluation of r(). That would drop some rank values from the final model frame and returned coefficients and standard errors could no longer be correct. The user must handle NA values and filter the data on their own prior to usage in lmranks.

Wrapping r() with other functions (like log(r(x))) will not recognize correctly the mark (because it will not be caught in terms(formula, specials = "r")). The ranks will be calculated correctly, but their transformation will be treated later in lm as a regular regressor. This means that the corresponding regression coefficient will be calculated correctly, but the standard errors, statistics etc. will not.

r, .r_predict and .r_cache are special expressions, used internally to interpret r mark correctly. Do not use them in formula.

A number of methods defined for lm do not yield theoretically correct results when applied to lmranks objects; errors or warnings are raised in those instances. Also, the df.residual component is set to NA, since the notion of effects of freedom for the rank models is not theoretically established (at time of 1.2 release).

References

Chetverikov and Wilhelm (2023), "Inference for Rank-Rank Regressions". arXiv preprint arXiv:2310.15512

See Also

lm for details about other arguments; frank.

Generic functions coef, effects, residuals, fitted, model.frame, model.matrix, update .

Examples

# rank-rank regression:
X <- rnorm(500)
Y <- X + rnorm(500)
rrfit <- lmranks(r(Y) ~ r(X))
summary(rrfit)

# naive version of the rank-rank regression:
RY <- frank(Y, increasing=TRUE, omega=1)
RX <- frank(X, increasing=TRUE, omega=1)
fit <- lm(RY ~ RX)
summary(fit)
# the coefficient estimates are the same as in the lmranks function, but
# the standard errors, t-values, p-values are incorrect

# support of `data` argument:
data(mtcars)
lmranks(r(mpg) ~ r(hp) + ., data = mtcars)
# Same as above, but use the `hp` variable only through its rank
lmranks(r(mpg) ~ r(hp) + . - hp, data = mtcars)

# rank-rank regression with clusters:
G <- factor(rep(LETTERS[1:4], each=nrow(mtcars) / 4))
lmr <- lmranks(r(mpg) ~ r(hp):G, data = mtcars)
summary(lmr)
model.matrix(lmr)
# Include all columns of mtcars as usual covariates:
lmranks(r(mpg) ~ (r(hp) + .):G, data = mtcars)

Income of parents and children

Description

An artificial dataset containing income of children and their parents together with some information about them.

Usage

parent_child_income

Format

A data frame with 3894 rows and 4 variables:

c_faminc

Family income of a child

p_faminc

Family income of parent

gender

Gender

race

Race: hisp (Hispanic), black or neither


Cross-country comparison of students' achievement

Description

[Deprecated]

New code should use data(pisa2018) instead.

Dataset containing average scores on math, reading, and science together with standard errors for all OECD countries. These are from the 2018 Program for International Student Assessment (PISA) study by the Organization for Economic Cooperation and Development (OECD). The average scores are over all 15-year-old students in the study.

Usage

pisa

Format

A data frame with 37 rows and 7 variables:

jurisdiction

country, from which data was collected

math_score

average score in math

math_se

standard error for the average score in math

reading_score

average score in reading

reading_se

standard error for the average score in reading

science_score

average score in science

science_se

standard error for the average score in science

Source

https://www.oecd.org/en/about/programmes/pisa/pisa-data.html


Cross-country comparison of students' achievement

Description

Datasets containing average scores on math, reading, and science together with standard errors for all OECD countries. These are from the 2018 and 2022 editions of Program for International Student Assessment (PISA) study by the Organization for Economic Cooperation and Development (OECD). The average scores are over all 15-year-old students in the study.

Usage

pisa2018

pisa2022

Format

jurisdiction

country, from which data was collected

math_score

average score in math

math_se

standard error for the average score in math

reading_score

average score in reading

reading_se

standard error for the average score in reading

science_score

average score in science

science_se

standard error for the average score in science

An object of class data.frame with 38 rows and 7 columns.

Source

https://www.oecd.org/en/about/programmes/pisa/pisa-data.html


Plot ranking with confidence sets

Description

Display ranks together with their confidence set bounds.

Usage

## S3 method for class 'csranks'
plot(x, ...)

plotranking(
  ranks,
  L,
  U,
  popnames = NULL,
  title = NULL,
  subtitle = NULL,
  caption = NULL,
  colorbins = 1,
  horizontal = TRUE
)

Arguments

x

An csranks object, likely produced by csranks.

...

Other arguments, passed to plotranking.

ranks

vector of ranks

L

vector of lower bounds of confidence sets for the ranks

U

vector of lower bounds of confidence sets for the ranks

popnames

vector containing names of the populations whose ranks are in ranks. If popnames=NULL (default), then populations are automatically numbered.

title

character string containing the main title of the graph. title=NULL (default) means no title.

subtitle

character string containing the subtitle of the graph. subtitle=NULL (default) means no subtitle.

caption

character string containing the caption of the graph. caption=NULL (default) means no caption.

colorbins

integer indicating the number of quantile bins into which populations are grouped and color-coded. Value has to lie between 1 (default) and the number of populations.

horizontal

logical. Should be the bars displayed horizontally, or vertically?

Value

A ggplot plot displaying confidence sets.

Functions

  • plot(csranks): Plot csranks output

Examples

x <- seq(1, 3, length = 10)
V <- diag(rep(0.04, 10))
CS <- csranks(x, V)
grid::current.viewport()
plot(CS)
# Equivalent: 
plotranking(CS$rank, CS$L, CS$U)

# plotranking returns a ggplot object. It can be customized further:
library(ggplot2)
pl <- plot(CS)
pl + xlab("position in ranking") + ylab("population label") + theme_gray()

# horizontal = FALSE uses ggplot2::coord_flip underneath. The x and y axes swap places.
pl <- plot(CS, horizontal = FALSE)
pl + xlab("position in ranking") + # Note, that xlab refers to vertical axis now
  ylab("population label") + theme_gray()