Title: | Statistical Tools for Ranks |
---|---|
Description: | Account for uncertainty when working with ranks. Estimate standard errors consistently in linear regression with ranked variables. Construct confidence sets of various kinds for positions of populations in a ranking based on values of a certain feature and their estimation errors. Theory based on Mogstad, Romano, Shaikh, and Wilhelm (2023)<doi:10.1093/restud/rdad006> and Chetverikov and Wilhelm (2023) <arXiv:2310.15512>. |
Authors: | Daniel Wilhelm [aut, cre], Pawel Morgen [aut] |
Maintainer: | Daniel Wilhelm <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.2.2.9001 |
Built: | 2024-11-08 05:29:31 UTC |
Source: | https://github.com/danielwilhelm/r-cs-ranks |
Marginal and simultaneous confidence sets for ranks.
csranks( x, Sigma, coverage = 0.95, cstype = "two-sided", stepdown = TRUE, R = 1000, simul = TRUE, indices = NA, na.rm = FALSE, seed = NA )
csranks( x, Sigma, coverage = 0.95, cstype = "two-sided", stepdown = TRUE, R = 1000, simul = TRUE, indices = NA, na.rm = FALSE, seed = NA )
x |
vector of estimates containing estimated features by which the populations are to be ranked. |
Sigma |
estimated covariance matrix of |
coverage |
nominal coverage of the confidence set. Default is 0.95. |
cstype |
type of confidence set ( |
stepdown |
logical; if |
R |
number of bootstrap replications. Default is 1000. |
simul |
logical; if |
indices |
vector of indices of |
na.rm |
logical; if |
seed |
seed for bootstrap random variable draws. If set to |
A csranks
object, which is a list with three items:
L
Lower bounds of the confidence sets for ranks indicated in indices
rank
Estimated ranks from irank
with default parameters
U
Upper bounds of the confidence sets.
Suppose populations (e.g., schools, hospitals, political parties, countries) are to be ranked according to
some measure
. We do not observe the true values
. Instead, for each population,
we have data from which we have estimated these measures,
. The values
are estimates of the true values
and thus contain statistical uncertainty. In consequence, a ranking of the populations by
the values
contains statistical uncertainty and is not necessarily equal to the true ranking of
.
The function computes confidence sets for the rank of one, several or all of the populations (indices
indicates which of the populations are of interest).
x
is a vector containing the estimates
and
Sigma
is an estimate of the covariance matrix of x
. The method assumes that the estimates are asymptotically normal and the sample sizes of the datasets
are large enough so that is approximately distributed as
. The argument
Sigma
should contain an estimate of the covariance matrix . For instance, if for each population
and the datasets for each population are drawn independently of each other, then Sigma
is a diagonal matrix
containing estimates of the asymptotic variances divided by the sample size. More generally, the estimates in x
may be dependent, but then Sigma
must be an estimate of its covariance matrix including off-diagonal terms.
Marginal confidence sets (simul=FALSE
) are such that the confidence set for a population contains the true rank of that population
with probability approximately
equal to the nominal coverage level. Simultaneous confidence sets (
simul=TRUE
) on the other hand are such that the confidence sets for populations indicated in indices
cover the true ranks
of all of these populations simultaneously with probability approximately equal to the nominal coverage level. For instance, in the PISA example below, a marginal confidence set of a country covers the true
rank of country
with probability approximately equal to 0.95. A simultaneous confidence set for all countries covers the true ranks of all countries simultaneously with probability approximately equal to 0.95.
The function implements the procedures developed and described in more detail in Mogstad, Romano, Shaikh, and Wilhelm (2023). The procedure is based on
on testing a large family of hypotheses for pairwise comparisons. Stepwise methods can be used to improve the power of the procedure by, potentially,
rejecting more hypotheses without violating the desired coverage property of the resulting confidence set. These are employed when
stepdown=TRUE
. From a practical point of view, stepdown=TRUE
is computationally more demanding, but often results
in tighter confidence sets.
The procedure uses a parametric bootstrap procedure based on the above approximate multivariate normal distribution.
Mogstad, Romano, Shaikh, and Wilhelm (2023), "Inference for Ranks with Applications to Mobility across Neighborhoods and Academic Achievements across Countries", forthcoming at Review of Economic Studies cemmap working paper doi:10.1093/restud/rdad006
# simple simulated example: n <- 100 p <- 10 X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10) thetahat <- colMeans(X) Sigmahat <- cov(X) / n csranks(thetahat, Sigmahat) # PISA example: data(pisa2018) math_score <- pisa2018$math_score math_se <- pisa2018$math_se math_cov_mat <- diag(math_se^2) # marginal confidence set for each country: csranks(math_score, math_cov_mat, simul=FALSE) # simultaneous confidence set for all countries: csranks(math_score, math_cov_mat, simul=TRUE)
# simple simulated example: n <- 100 p <- 10 X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10) thetahat <- colMeans(X) Sigmahat <- cov(X) / n csranks(thetahat, Sigmahat) # PISA example: data(pisa2018) math_score <- pisa2018$math_score math_se <- pisa2018$math_se math_cov_mat <- diag(math_se^2) # marginal confidence set for each country: csranks(math_score, math_cov_mat, simul=FALSE) # simultaneous confidence set for all countries: csranks(math_score, math_cov_mat, simul=TRUE)
Marginal and simultaneous confidence sets for ranks of categories, where categories are ranked by the probabilities of being chosen.
csranks_multinom( x, coverage = 0.95, cstype = "two-sided", simul = TRUE, multcorr = "Holm", indices = NA, na.rm = FALSE )
csranks_multinom( x, coverage = 0.95, cstype = "two-sided", simul = TRUE, multcorr = "Holm", indices = NA, na.rm = FALSE )
x |
vector of counts indicating how often each category was chosen. |
coverage |
nominal coverage of the confidence set. Default is 0.95. |
cstype |
type of confidence set ( |
simul |
logical; if |
multcorr |
multiplicity correction to be used: |
indices |
vector of indices of |
na.rm |
logical; if |
A csranks
object, which is a list with three items:
L
Lower bounds of the confidence sets for ranks indicated in indices
rank
Estimated ranks from irank
with default parameters
U
Upper bounds of the confidence sets.
This function computes confidence sets for ranks similarly as csranks
, but it is tailored to the special case of
multinomial data. Suppose there are populations (for the case of multinomial data, we will refer to them as "categories") such
as political parties, for example, that one wants to rank by the probabilities of them being chosen. For political parties, this would
correspond to the share of votes each party obtains. Here, the underlying data are multinomial: each observation corresponds to a choice
among the
categories. The vector
x
contains the counts of how often each category was chosen in the data.
In this setting, link{csranks}
could be applied to compute confidence sets for the ranks of each category, but instead this function
implements a different method proposed by Bazylik, Mogstad, Romano, Shaikh, and Wilhelm (2023), which exploits the
multinomial structure of the problem and yields confidence sets for the ranks that are valid in finite samples (whereas csranks
produces
confidence sets that are valid only asymptotically).
The procedure involves testing multiple hypotheses. The \code{multcorr}
indicates a method for multiplicity correction. See the paper for
details.
Bazylik, Mogstad, Romano, Shaikh, and Wilhelm. "Finite-and large-sample inference for ranks using multinomial data with an application to ranking political parties". cemmap working paper
x <- c(rmultinom(1, 1000, 1:10)) csranks_multinom(x)
x <- c(rmultinom(1, 1000, 1:10)) csranks_multinom(x)
Computation of confidence sets for the identities of populations among the tau best.
cstaubest( x, Sigma, tau = 2, coverage = 0.95, stepdown = TRUE, R = 1000, na.rm = FALSE, seed = NA ) cstauworst( x, Sigma, tau = 2, coverage = 0.95, stepdown = TRUE, R = 1000, na.rm = FALSE, seed = NA )
cstaubest( x, Sigma, tau = 2, coverage = 0.95, stepdown = TRUE, R = 1000, na.rm = FALSE, seed = NA ) cstauworst( x, Sigma, tau = 2, coverage = 0.95, stepdown = TRUE, R = 1000, na.rm = FALSE, seed = NA )
x |
vector of estimates containing estimated features by which the populations are to be ranked. |
Sigma |
estimated covariance matrix of |
tau |
the confidence set contains indicators for the elements in |
coverage |
nominal coverage of the confidence set. Default is 0.95. |
stepdown |
logical; if |
R |
number of bootstrap replications. Default is 1000. |
na.rm |
logical; if |
seed |
seed for bootstrap random variable draws. If set to |
logical vector indicating which of the elements of x
are in the confidence set for the tau-best.
cstauworst()
: Confidence sets for the tau-worst
Equivalent to calling cstaubest
with -x
.
The function computes a confidence set containing indicators for the elements in x
whose rank is less than or equal to tau
with probability approximately equal to the nominal coverage (coverage
).
The function implements the projection confidence set for the tau-best developed and described in more detail in Mogstad, Romano, Shaikh, and Wilhelm (2023).
Mogstad, Romano, Shaikh, and Wilhelm (2023), "Inference for Ranks with Applications to Mobility across Neighborhoods and Academic Achievements across Countries", forthcoming at Review of Economic Studies cemmap working paper, doi:10.1093/restud/rdad006
# simple simulated example: n <- 100 p <- 10 X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10) thetahat <- colMeans(X) Sigmahat <- cov(X) / n # confidence set for the populations that may be among the top-3 # (with probability approximately 0.95): cstaubest(thetahat, Sigmahat, tau=3) # confidence set for the populations that may be among the bottom-3 # (with probability approximately 0.95): cstauworst(thetahat, Sigmahat, tau=3)
# simple simulated example: n <- 100 p <- 10 X <- matrix(rep(1:p,n)/p, ncol=p, byrow=TRUE) + matrix(rnorm(n*p), 100, 10) thetahat <- colMeans(X) Sigmahat <- cov(X) / n # confidence set for the populations that may be among the top-3 # (with probability approximately 0.95): cstaubest(thetahat, Sigmahat, tau=3) # confidence set for the populations that may be among the bottom-3 # (with probability approximately 0.95): cstauworst(thetahat, Sigmahat, tau=3)
Compute integer of fractional ranks with flexible handling of ties.
irank(x, omega = 0, increasing = FALSE, na.rm = FALSE) frank(x, omega = 0, increasing = FALSE, na.rm = FALSE)
irank(x, omega = 0, increasing = FALSE, na.rm = FALSE) frank(x, omega = 0, increasing = FALSE, na.rm = FALSE)
x |
vector of values to be ranked |
omega |
numeric value in [0,1], defining how ties in |
increasing |
logical; if |
na.rm |
logical; if |
irank
implements all possible definitions of ranks of the values in x
. Different definitions of the ranks are chosen through combinations of the two arguments
omega
and increasing
. Suppose x
is of length . If
increasing=TRUE
, then the largest value in x
receives the rank and the smallest
the rank
. If
increasing=FALSE
, then the largest value in x
receives the rank and the smallest
the rank
.
The value of omega
indicates how ties are handled. If there are no ties in x
, then the value of omega
does not affect the ranks and the only choice to be made is whether
the ranks should be increasing or decreasing with the values in x
. When there are ties in x
, however, then there are infinitely
many possible ranks that can be assigned to a tied value.
When increasing=TRUE
, then omega=0
leads to the smallest possible and omega=1
to the largest possible rank of a tied value. Values of omega
between
0 and 1 lead to values of the rank between the largest and smallest.
frank
takes the ranking returned by irank
and divides the result by length(x)
. The result is a ranking with
ranks in the interval [0,1]. An important special case occurs for increasing=TRUE
and omega=1
: in this case, the rank
of the value x[j]
is equal to the empirical cdf of x
evaluated at x[j]
.
Numeric vector of the same length as x
containing the integer (for irank
) or fractional (for frank
) ranks.
# simple example without ties: x <- c(3,8,-4,10,2) irank(x, increasing=TRUE) irank(x, increasing=FALSE) # since there are no ties, the value of omega has no impact: irank(x, increasing=TRUE, omega=0) irank(x, increasing=TRUE, omega=0.5) irank(x, increasing=TRUE, omega=1) # simple example with ties: x <- c(3,4,7,7,10,11,15,15,15,15) irank(x, increasing=TRUE, omega=0) # smallest possible ranks irank(x, increasing=TRUE, omega=0.5) # mid-ranks irank(x, increasing=TRUE, omega=1) # largest possible ranks # simple example of fractional ranks without ties: x <- c(3,8,-4,10,2) frank(x, increasing=TRUE) frank(x, increasing=FALSE)
# simple example without ties: x <- c(3,8,-4,10,2) irank(x, increasing=TRUE) irank(x, increasing=FALSE) # since there are no ties, the value of omega has no impact: irank(x, increasing=TRUE, omega=0) irank(x, increasing=TRUE, omega=0.5) irank(x, increasing=TRUE, omega=1) # simple example with ties: x <- c(3,4,7,7,10,11,15,15,15,15) irank(x, increasing=TRUE, omega=0) # smallest possible ranks irank(x, increasing=TRUE, omega=0.5) # mid-ranks irank(x, increasing=TRUE, omega=1) # largest possible ranks # simple example of fractional ranks without ties: x <- c(3,8,-4,10,2) frank(x, increasing=TRUE) frank(x, increasing=FALSE)
The method irank
compares ranks using the same vector as reference.
irank_against
returns integer ranks, that values from x
would assume if (individually)
inserted into v
. frank_against
acts analogously, returning fractional ranks.
irank_against(x, v, omega = 0, increasing = FALSE, na.rm = FALSE) frank_against(x, v, omega = 0, increasing = FALSE, na.rm = FALSE)
irank_against(x, v, omega = 0, increasing = FALSE, na.rm = FALSE) frank_against(x, v, omega = 0, increasing = FALSE, na.rm = FALSE)
x |
numeric query vector. |
v |
numeric reference vector. |
omega |
numeric value in [0,1], defining how ties in |
increasing |
logical; if |
na.rm |
logical; if |
It's useful to think about frank_against(x,v)
as a generalization of Empirical Cumulative
Distribution Function, created for v
and evaluated for points in x
.
frank_agaist(x,v,increasing=TRUE,omega=1)
is identical
to ecdf(v)(x)
.
increasing
switches the inequality sign in ECDF definition from
to
.
omega=0
introduces the strict inequality ( instead of
).
Any
omega
in between is a weighted average of the cases omega=1
and omega=0
.
Finally, irank_against
is equal to frank_against
multiplied by the length(v)
.
This particular choice of default parameters was made for compatibility with default parameters of
irank
and frank
. irank(x)
is always equal to irank_against(x,x)
and frank(x)
is always equal to frank_against(x,x)
.
Numeric vector of the same length as x
containing the integer (for irank_against
) or fractional (for frank_against
) ranks.
irank_against(1:10, c(4,4,4,3,1,10,7,7))
irank_against(1:10, c(4,4,4,3,1,10,7,7))
Estimation and inference for regressions involving ranks, i.e. regressions in which the dependent and/or the independent variable has been transformed into ranks before running the regression.
lmranks( formula, data, subset, weights, na.action = stats::na.fail, method = "qr", model = TRUE, x = FALSE, qr = TRUE, y = FALSE, singular.ok = TRUE, contrasts = NULL, offset = offset, omega = 1, ... ) ## S3 method for class 'lmranks' plot(x, which = 1, ...) ## S3 method for class 'lmranks' proj(object, onedf = FALSE, ...) ## S3 method for class 'lmranks' predict(object, newdata, ...) ## S3 method for class 'lmranks' summary(object, correlation = FALSE, symbolic.cor = FALSE, ...) ## S3 method for class 'lmranks' vcov(object, complete = TRUE, ...)
lmranks( formula, data, subset, weights, na.action = stats::na.fail, method = "qr", model = TRUE, x = FALSE, qr = TRUE, y = FALSE, singular.ok = TRUE, contrasts = NULL, offset = offset, omega = 1, ... ) ## S3 method for class 'lmranks' plot(x, which = 1, ...) ## S3 method for class 'lmranks' proj(object, onedf = FALSE, ...) ## S3 method for class 'lmranks' predict(object, newdata, ...) ## S3 method for class 'lmranks' summary(object, correlation = FALSE, symbolic.cor = FALSE, ...) ## S3 method for class 'lmranks' vcov(object, complete = TRUE, ...)
formula |
An object of class " |
data |
an optional data frame, list or environment (or object
coercible by |
subset |
currently not supported. |
weights |
currently not supported. |
na.action |
currently not supported. User is expected to handle NA values prior to the use of this function. |
method |
the method to be used; for fitting, currently only
|
model , y , qr
|
logicals. If TRUE the corresponding components of the fit (the model frame, the response, the QR decomposition) are returned. |
x |
|
singular.ok |
logical. If |
contrasts |
an optional list. See the |
offset |
this can be used to specify an a priori known
component to be included in the linear predictor during fitting.
This should be |
omega |
real number in the interval [0,1] defining how ties are handled (if there are any).
The value of |
... |
For |
which |
As in |
object |
A |
onedf |
A logical flag. If |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
correlation |
logical; if |
symbolic.cor |
logical. If |
complete |
logical indicating if the full variance-covariance matrix
should be returned also in case of an over-determined system where
some coefficients are undefined and |
This function performs estimation and inference for regressions involving ranks. Suppose there is a dependent variable and independent
variables
and
, where
is a scalar and
a vector (possibly including a constant). Instead of running a linear regression of
on
and
, we want to first transform
and/or
into ranks. Denote by
the rank of
and
the rank of
. Then, a
rank-rank regression,
is run using the formula r(Y)~r(X)+W
. Similarly, a
regression of the raw dependent variable on the ranked regressor,
can be implemented by the formula Y~r(X)+W
, and a
regression of the ranked dependent variable on the raw regressors,
can be implemented by the formula r(Y)~W
.
The function works, in many ways, just like lm
for linear regressions. Apart from some smaller details, there are two important differences:
first, in lmranks
, the mark r()
can be used in formulas to indicate variables to be ranked before running the regression and, second,
subsequent use of summary
produces a summary table with the correct standard errors, t-values and p-values (while those of the lm
are not correct for
regressions involving ranks). See Chetverikov and Wilhelm (2023) for more details.
Many other aspects of the function are similar to lm
. For instance,
.
in a formula means 'all columns not otherwise in the formula' just as in lm
. An
intercept is included by default.
In a model specified as r(Y)~r(X)+.
, both r(X)
and X
will be
included in the model - as it would have been in lm
and, say,
log()
instead of r()
.
One can exclude X
with a -
, i.e. r(Y)~r(X)+.-X
. See
formula
for more about model specification.
The r()
is a private alias for frank
.
The increasing
argument, provided at individual regressor level,
specifies whether the ranks should increase or decrease as regressor values increase.
The omega
argument of frank
, provided at lmranks
function level,
specifies how ties in variables are to be handled and
can be supplied as argument in lmranks
. For more details, see frank
.
By default increasing
is set to TRUE
and omega
is set equal to 1
,
which means r()
computes ranks by transforming a variable through its empirical cdf.
Many functions defined for lm
also work correctly with lmranks
.
These include coef
, model.frame
,
model.matrix
, resid
,
update
and others.
On the other hand, some would return incorrect results if they treated
lmranks
output in the same way as lm
's. The central contribution of this package
are vcov
, summary
and confint
implementations using the correct asymptotic theory for regressions involving ranks.
See the lm
documentation for more.
An object of class lmranks
, inheriting (as much as possible) from class lm
.
Additionally, it has an omega
entry, corresponding to the omega
argument,
a ranked_response
logical entry, and
a rank_terms_indices
- an integer vector with indices of entries of terms.labels
attribute
of terms(formula)
, which correspond to ranked regressors.
plot(lmranks)
: Plot diagnostics for an lmranks
object
Displays plots useful for assessing quality of model fit. Currently, only one plot is available, which plots fitted values against residuals (for homoscedacity check).
proj(lmranks)
: Projections of the data onto terms of rank-rank regression model
predict(lmranks)
: Predict method for Linear Model for Ranks Fits
summary(lmranks)
: Summarizing fits of rank-rank regressions
vcov(lmranks)
: Calculate Variance-Covariance Matrix for a Fitted lmranks
object
Returns the variance-covariance matrix of the regression coefficients
(main parameters) of a fitted lmranks
object. Its result is theoretically valid
and asymptotically consistent, in contrast to naively running vcov(lm(...))
.
Sometimes, the data is divided into clusters (groups) and one is
interested in running rank-rank regressions separately within each cluster, where the ranks are not computed
within each cluster, but using all observations pooled across all clusters. Specifically, let denote
a variable that indicates the cluster to which the i-th observation belongs. Then, the regression model of interest is
where and
are now cluster-specific coefficients, but the ranks
and
are computed as
ranks among all observations
and
, respectively. That means the rank of an observation is not computed among the other observations
in the same cluster, but rather among all available observations across all clusters.
This type of regression is implemented in the lmranks
function using interaction notation: r(Y)~(r(X)+W):G
. Here, the variable
G must be a factor
.
Since the theory for clustered regression mixing grouped and ungrouped (in)dependent variables is not yet developed, such a model will raise an error.
Also, by default the function includes a cluster-specific intercept, i.e. r(Y)~(r(X)+W):G
is internally interpreted as r(Y)~(r(X)+W):G+G-1
.
contrasts
of G
must be of contr.treatment
kind,
which is the default.
As a consequence of the order, in which model.frame
applies operations,
subset
and na.action
would be applied after evaluation of r()
.
That would drop some rank values from the final model frame and returned coefficients
and standard errors could no longer be correct.
The user must handle NA values and filter the data on their own prior to usage in lmranks
.
Wrapping r()
with other functions (like log(r(x))
) will not
recognize correctly the mark (because it will not be caught in terms(formula, specials = "r")
).
The ranks will be calculated correctly, but their transformation will be treated later in lm
as a regular
regressor. This means that the corresponding regression coefficient will be calculated correctly,
but the standard errors, statistics etc. will not.
r
, .r_predict
and .r_cache
are special expressions, used
internally to interpret r
mark correctly. Do not use them in formula
.
A number of methods defined for lm
do not yield theoretically correct
results when applied to lmranks
objects; errors or warnings are raised in those instances.
Also, the df.residual
component is set to NA, since the notion of effects of freedom
for the rank models is not theoretically established (at time of 1.2 release).
Chetverikov and Wilhelm (2023), "Inference for Rank-Rank Regressions". arXiv preprint arXiv:2310.15512
lm
for details about other arguments; frank
.
Generic functions coef
, effects
,
residuals
,
fitted
, model.frame
,
model.matrix
, update
.
# rank-rank regression: X <- rnorm(500) Y <- X + rnorm(500) rrfit <- lmranks(r(Y) ~ r(X)) summary(rrfit) # naive version of the rank-rank regression: RY <- frank(Y, increasing=TRUE, omega=1) RX <- frank(X, increasing=TRUE, omega=1) fit <- lm(RY ~ RX) summary(fit) # the coefficient estimates are the same as in the lmranks function, but # the standard errors, t-values, p-values are incorrect # support of `data` argument: data(mtcars) lmranks(r(mpg) ~ r(hp) + ., data = mtcars) # Same as above, but use the `hp` variable only through its rank lmranks(r(mpg) ~ r(hp) + . - hp, data = mtcars) # rank-rank regression with clusters: G <- factor(rep(LETTERS[1:4], each=nrow(mtcars) / 4)) lmr <- lmranks(r(mpg) ~ r(hp):G, data = mtcars) summary(lmr) model.matrix(lmr) # Include all columns of mtcars as usual covariates: lmranks(r(mpg) ~ (r(hp) + .):G, data = mtcars)
# rank-rank regression: X <- rnorm(500) Y <- X + rnorm(500) rrfit <- lmranks(r(Y) ~ r(X)) summary(rrfit) # naive version of the rank-rank regression: RY <- frank(Y, increasing=TRUE, omega=1) RX <- frank(X, increasing=TRUE, omega=1) fit <- lm(RY ~ RX) summary(fit) # the coefficient estimates are the same as in the lmranks function, but # the standard errors, t-values, p-values are incorrect # support of `data` argument: data(mtcars) lmranks(r(mpg) ~ r(hp) + ., data = mtcars) # Same as above, but use the `hp` variable only through its rank lmranks(r(mpg) ~ r(hp) + . - hp, data = mtcars) # rank-rank regression with clusters: G <- factor(rep(LETTERS[1:4], each=nrow(mtcars) / 4)) lmr <- lmranks(r(mpg) ~ r(hp):G, data = mtcars) summary(lmr) model.matrix(lmr) # Include all columns of mtcars as usual covariates: lmranks(r(mpg) ~ (r(hp) + .):G, data = mtcars)
An artificial dataset containing income of children and their parents together with some information about them.
parent_child_income
parent_child_income
A data frame with 3894 rows and 4 variables:
Family income of a child
Family income of parent
Gender
Race: hisp (Hispanic), black or neither
New code should use data(pisa2018)
instead.
Dataset containing average scores on math, reading, and science together with standard errors for all OECD countries. These are from the 2018 Program for International Student Assessment (PISA) study by the Organization for Economic Cooperation and Development (OECD). The average scores are over all 15-year-old students in the study.
pisa
pisa
A data frame with 37 rows and 7 variables:
country, from which data was collected
average score in math
standard error for the average score in math
average score in reading
standard error for the average score in reading
average score in science
standard error for the average score in science
https://www.oecd.org/en/about/programmes/pisa/pisa-data.html
Datasets containing average scores on math, reading, and science together with standard errors for all OECD countries. These are from the 2018 and 2022 editions of Program for International Student Assessment (PISA) study by the Organization for Economic Cooperation and Development (OECD). The average scores are over all 15-year-old students in the study.
pisa2018 pisa2022
pisa2018 pisa2022
country, from which data was collected
average score in math
standard error for the average score in math
average score in reading
standard error for the average score in reading
average score in science
standard error for the average score in science
An object of class data.frame
with 38 rows and 7 columns.
https://www.oecd.org/en/about/programmes/pisa/pisa-data.html
Display ranks together with their confidence set bounds.
## S3 method for class 'csranks' plot(x, ...) plotranking( ranks, L, U, popnames = NULL, title = NULL, subtitle = NULL, caption = NULL, colorbins = 1, horizontal = TRUE )
## S3 method for class 'csranks' plot(x, ...) plotranking( ranks, L, U, popnames = NULL, title = NULL, subtitle = NULL, caption = NULL, colorbins = 1, horizontal = TRUE )
x |
An |
... |
Other arguments, passed to |
ranks |
vector of ranks |
L |
vector of lower bounds of confidence sets for the ranks |
U |
vector of lower bounds of confidence sets for the ranks |
popnames |
vector containing names of the populations whose ranks are in |
title |
character string containing the main title of the graph. |
subtitle |
character string containing the subtitle of the graph. |
caption |
character string containing the caption of the graph. |
colorbins |
integer indicating the number of quantile bins into which populations are grouped and color-coded. Value has to lie between 1 (default) and the number of populations. |
horizontal |
logical. Should be the bars displayed horizontally, or vertically? |
A ggplot plot displaying confidence sets.
plot(csranks)
: Plot csranks
output
x <- seq(1, 3, length = 10) V <- diag(rep(0.04, 10)) CS <- csranks(x, V) grid::current.viewport() plot(CS) # Equivalent: plotranking(CS$rank, CS$L, CS$U) # plotranking returns a ggplot object. It can be customized further: library(ggplot2) pl <- plot(CS) pl + xlab("position in ranking") + ylab("population label") + theme_gray() # horizontal = FALSE uses ggplot2::coord_flip underneath. The x and y axes swap places. pl <- plot(CS, horizontal = FALSE) pl + xlab("position in ranking") + # Note, that xlab refers to vertical axis now ylab("population label") + theme_gray()
x <- seq(1, 3, length = 10) V <- diag(rep(0.04, 10)) CS <- csranks(x, V) grid::current.viewport() plot(CS) # Equivalent: plotranking(CS$rank, CS$L, CS$U) # plotranking returns a ggplot object. It can be customized further: library(ggplot2) pl <- plot(CS) pl + xlab("position in ranking") + ylab("population label") + theme_gray() # horizontal = FALSE uses ggplot2::coord_flip underneath. The x and y axes swap places. pl <- plot(CS, horizontal = FALSE) pl + xlab("position in ranking") + # Note, that xlab refers to vertical axis now ylab("population label") + theme_gray()