Variable Selection and Estimation with Censored Data
Author | : Yi Li |
Publisher | : |
Total Pages | : 96 |
Release | : 2020 |
ISBN-10 | : OCLC:1157346073 |
ISBN-13 | : |
Rating | : 4/5 (73 Downloads) |
Book excerpt: In clinical and epidemiological studies, it is possible to collect a large set of covariates that are potentially prognostic of the event time. For survival data with high-dimensional covariates, selecting a subset of covariates that are most significantly associated with the outcome has become an important objective. This dissertation focuses on variable selection and estimation with censored data. In the first part, we consider robust modeling and variable selection for the accelerated failure time (AFT) model with right-censored data. We propose a unified Expectation-Maximization (EM) approach combined with the LASSO penalty to perform variable selection and parameter estimation simultaneously. Our approach can be used with general loss functions, and reduces to the well-known Buckley-James method when the squared-error loss is used without regularization. To mitigate the effects of outliers and heavy-tailed noise in the real application, we recommend the use of robust loss functions under our proposed framework. Simulation studies are conducted to evaluate the performance of the proposed approach with different loss functions, and an application to an ovarian cancer study is provided. In the second part, we consider group and within-group variable selection for the AFT model with right-censored data. We extend our approach established in the first part by incorporating the group structure among the covariates. The LASSO penalty is replaced by the sparse group LASSO (SGL) penalty in the proposed EM approach in order to select groups and covariates within a group. We conduct simulation studies to assess the performance of the proposed approach with the SGL penalty and compare it with the approach proposed in the first part. We provide an application to the same ovarian cancer data. In the third part, we consider variable selection with interval-censored data. We study a class of semiparametric linear transformation models, which includes the Cox proportional hazards and proportional odds models as special cases. We propose a penalized nonparametric maximum likelihood estimation (NPMLE) approach to perform variable selection and parameter estimation simultaneously for this class of models. Efficient computation of the penalized NPMLE is achieved by a modified iterative convex minorant (ICM) algorithm combined with the coordinate descent algorithm. The proposed approach is evaluated by simulation studies and applied to the Atherosclerosis Risk in Communities (ARIC) study.