validate.lrm {rms}  R Documentation 
The validate
function when used on an object created by
lrm
or orm
does resampling validation of a logistic
regression model,
with or without backward stepdown variable deletion. It provides
biascorrected Somers' D_{xy} rank correlation, Rsquared index,
the intercept and slope of an overall logistic calibration equation, the
maximum absolute difference in predicted and calibrated probabilities
E_{max}, the discrimination index D (model L.R. (chisquare  1)/n), the unreliability index U =
difference in 2 log likelihood between uncalibrated X
beta and X beta with overall intercept and slope
calibrated to test sample / n, the overall quality index (logarithmic
probability score) Q = D  U, and the Brier or quadratic
probability score, B (the last 3 are not computed for ordinal
models), the gindex, and gp
, the gindex on the
probability scale. The corrected slope can be thought of as shrinkage
factor that takes into account overfitting. For orm
fits, a
subset of the above indexes is provided, Spearman's ρ is
substituted for D_{xy}, and a new index is reported: pdm
, the mean
absolute difference between 0.5 and the predicted probability that
Y≥q the marginal median of Y.
# fit < lrm(formula=response ~ terms, x=TRUE, y=TRUE) or orm ## S3 method for class 'lrm' validate(fit, method="boot", B=40, bw=FALSE, rule="aic", type="residual", sls=0.05, aics=0, force=NULL, estimates=TRUE, pr=FALSE, kint, Dxy.method=if(k==1) 'somers2' else 'lrm', emax.lim=c(0,1), ...) ## S3 method for class 'orm' validate(fit, method="boot", B=40, bw=FALSE, rule="aic", type="residual", sls=.05, aics=0, force=NULL, estimates=TRUE, pr=FALSE, ...)
fit 
a fit derived by 
method,B,bw,rule,type,sls,aics,force,estimates,pr 
see 
kint 
In the case of an ordinal model, specify which intercept to validate.
Default is the middle intercept. For 
Dxy.method 

emax.lim 
range of predicted probabilities over which to compute the maximum error. Default is entire range. 
... 
other arguments to pass to 
If the original fit was created using penalized maximum likelihood estimation,
the same penalty.matrix
used with the original
fit are used during validation.
a matrix with rows corresponding to D_{xy},
R^2, Intercept
, Slope
, E_{max}, D,
U, Q, B, g, gp, and
columns for the original index, resample estimates, indexes applied to
the whole or omitted sample using the model derived from the resample,
average optimism, corrected index, and number of successful resamples.
For validate.orm
not all columns are provided, Spearman's rho
is returned instead of D_{xy}, and pdm
is reported.
prints a summary, and optionally statistics for each refit
Frank Harrell
Department of Biostatistics, Vanderbilt University
f.harrell@vanderbilt.edu
Miller ME, Hui SL, Tierney WM (1991): Validation techniques for logistic regression models. Stat in Med 10:1213–1226.
Harrell FE, Lee KL (1985): A comparison of the discrimination of discriminant analysis and logistic regression under multivariate normality. In Biostatistics: Statistics in Biomedical, Public Health, and Environmental Sciences. The Bernard G. Greenberg Volume, ed. PK Sen. New York: NorthHolland, p. 333–343.
lrm
,
calibrate
,
cr.setup
,
orm
n < 1000 # define sample size age < rnorm(n, 50, 10) blood.pressure < rnorm(n, 120, 15) cholesterol < rnorm(n, 200, 25) sex < factor(sample(c('female','male'), n,TRUE)) # Specify population model for log odds that Y=1 L < .4*(sex=='male') + .045*(age50) + (log(cholesterol  10)5.2)*(2*(sex=='female') + 2*(sex=='male')) # Simulate binary y to have Prob(y=1) = 1/[1+exp(L)] y < ifelse(runif(n) < plogis(L), 1, 0) f < lrm(y ~ sex*rcs(cholesterol)+pol(age,2)+blood.pressure, x=TRUE, y=TRUE) #Validate full model fit validate(f, B=10) # normally B=300 validate(f, B=10, group=y) # twosample validation: make resamples have same numbers of # successes and failures as original sample #Validate stepwise model with typical (not so good) stopping rule validate(f, B=10, bw=TRUE, rule="p", sls=.1, type="individual") ## Not run: #Fit a continuation ratio model and validate it for the predicted #probability that y=0 u < cr.setup(y) Y < u$y cohort < u$cohort attach(mydataframe[u$subs,]) f < lrm(Y ~ cohort+rcs(age,4)*sex, penalty=list(interaction=2)) validate(f, cluster=u$subs, subset=cohort=='all') #see predab.resample for cluster and subset ## End(Not run)