| survfit.formula {survival} | R Documentation | 
Computes an estimate of a survival curve for censored data using the Aalen-Johansen estimator. For ordinary (single event) survival this reduces to the Kaplan-Meier estimate.
## S3 method for class 'formula'
survfit(formula, data, weights, subset, na.action,  
        stype=1, ctype=1, id, cluster, istate, timefix=TRUE,
        etype, error,  ...)
| formula | a formula object, which must have a 
 | 
| data | a data frame in which to interpret the variables named in the formula, 
 | 
| weights | The weights must be nonnegative and it is strongly recommended that  
they be strictly positive, since zero weights are ambiguous, compared 
to use of the  | 
| subset | expression saying that only a subset of the rows of the data should be used in the fit. | 
| na.action | a missing-data filter function, applied to the model frame, after any 
 | 
| stype | the method to be used estimation of the survival curve: 1 = direct, 2 = exp(cumulative hazard). | 
| ctype | the method to be used for estimation of the cumulative hazard: 1 = Nelson-Aalen formula, 2 = Fleming-Harrington correction for tied events. | 
| id | identifies individual subjects, when a given person can have multiple lines of data. | 
| cluster | used to group observations for the infinitesmal jackknife variance estimate, defaults to the value of id | 
| istate | for multi-state models, identifies the initial state of each subject or observation | 
| timefix | process times through the  | 
| etype | a variable giving the type of event. This has been superseded by multi-state Surv objects and is depricated; see example below. | 
| error | this argument is no longer used | 
| ... | The following additional arguments are passed to internal functions
called by  
 | 
If there is a data argument, then variables in the formula,
codeweights, subset, id, cluster and
istate arguments will be searched for in that data set.
The routine returns both an estimated probability in state and an
estimated cumulative hazard estimate.
The cumulative hazard estimate is the Nelson-Aalen (NA) estimate or the
Fleming-Harrington (FH) estimate, the latter includes a correct for
tied event times.  The estimated probability in state can estimated
either using the exponential of the cumulative hazard, or as a direct
estimate using the Aalen-Johansen approach.
For single state data the AJ estimate reduces to the Kaplan-Meier and
the probability in state to the survival curve; 
for competing risks data the AJ reduces to the cumulative incidence (CI)
estimator.
For backward compatability the type argument can be used instead.
When the data set includes left censored or interval censored data (or both), then the EM approach of Turnbull is used to compute the overall curve. Currently this algorithm is very slow, only a survival curve is produced, and it does not support a robust variance.
If a id or cluster argument is present, or for multi-state
curves, then the standard
errors of the results will be based on an infinitesimal jackknife (IJ)
estimate, otherwise the standard model based estimate will be used.
With the IJ estimate, the leverage values themselves can be returned
as arrays with dimensions: number of subjects, number of unique times,
and for a multi-state model, the number of unique states.
Be forwarned that these arrays can be huge.  If there is a
cluster argument this first dimension will be the number of
clusters and the variance will be a grouped IJ estimate; this can be
an important tool for reducing the size.
A numeric value for the influence argument allows finer
control: 0= return neither (same as FALSE), 1= return the influence
array for probability in state, 2= return the influence array for the
cumulative hazard, 3= both (same as TRUE).
an object of class "survfit".  
See survfit.object for 
details. Methods defined for survfit objects are  
print, plot, 
lines, and points. 
Dorey, F. J. and Korn, E. L. (1987). Effective sample sizes for confidence intervals for survival probabilities. Statistics in Medicine 6, 679-87.
Fleming, T. H. and Harrington, D. P. (1984). Nonparametric estimation of the survival distribution in censored data. Comm. in Statistics 13, 2469-86.
Kalbfleisch, J. D. and Prentice, R. L. (1980). The Statistical Analysis of Failure Time Data. New York:Wiley.
Kyle, R. A. (1997). Moncolonal gammopathy of undetermined significance and solitary plasmacytoma. Implications for progression to overt multiple myeloma}, Hematology/Oncology Clinics N. Amer. 11, 71-87.
Link, C. L. (1984). Confidence intervals for the survival function using Cox's proportional hazards model with covariates. Biometrics 40, 601-610.
Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J Am Stat Assoc, 69, 169-173.
survfit.coxph for survival curves from Cox models,
survfit.object for a description of the components of a
survfit object,
print.survfit,  
plot.survfit,  
lines.survfit,   
coxph,  
Surv.  
#fit a Kaplan-Meier and plot it 
fit <- survfit(Surv(time, status) ~ x, data = aml) 
plot(fit, lty = 2:3) 
legend(100, .8, c("Maintained", "Nonmaintained"), lty = 2:3) 
#fit a Cox proportional hazards model and plot the  
#predicted survival for a 60 year old 
fit <- coxph(Surv(futime, fustat) ~ age, data = ovarian) 
plot(survfit(fit, newdata=data.frame(age=60)),
     xscale=365.25, xlab = "Years", ylab="Survival") 
# Here is the data set from Turnbull
#  There are no interval censored subjects, only left-censored (status=3),
#  right-censored (status 0) and observed events (status 1)
#
#                             Time
#                         1    2   3   4
# Type of observation
#           death        12    6   2   3
#          losses         3    2   0   3
#      late entry         2    4   2   5
#
tdata <- data.frame(time  =c(1,1,1,2,2,2,3,3,3,4,4,4),
                    status=rep(c(1,0,2),4),
                    n     =c(12,3,2,6,2,4,2,0,2,3,3,5))
fit  <- survfit(Surv(time, time, status, type='interval') ~1, 
              data=tdata, weight=n)
#
# Time to progression/death for patients with monoclonal gammopathy
#  Competing risk curves (cumulative incidence)
#
fitKM <- survfit(Surv(stop, event=='pcm') ~1, data=mgus1,
                    subset=(start==0))
fitCI <- survfit(Surv(stop, event) ~1,
                    data=mgus1, subset=(start==0))
## Not run: 
# CI curves show the probability in state
plot(fitCI, xscale=365.25, xmax=7300, mark.time=FALSE,
            col=2:3, xlab="Years post diagnosis of MGUS",
            ylab="P(state)")
lines(fitKM, fun='event', xmax=7300, mark.time=FALSE,
            conf.int=FALSE)
text(3652, .4, "Competing risk: death", col=3)
text(5840, .15,"Competing risk: progression", col=2)
text(5480, .30,"KM:prog")
## End(Not run)