Statistical Machine Learning for Complex Data Sets
Author | : Xiaowu Dai |
Publisher | : |
Total Pages | : 0 |
Release | : 2019 |
ISBN-10 | : OCLC:1117278038 |
ISBN-13 | : |
Rating | : 4/5 (38 Downloads) |
Book excerpt: This thesis is focused on developing theory and computational methods for a set of problems involving complex data. Chapter 2 studies multivariate nonparametric predictions with gradient information. Gradients can be easily estimated in stochastic simulations and computer experiments. We propose a unified framework to incorporate the noisy and correlated gradients into predictions. We show theoretically, through minimax optimal rates of convergence, that incorporating gradients tends to significantly improve predictions with deterministic or random designs. Chapters 3 proposes high-dimensional smoothing splines with applications to Alzheimer's disease (AD) prediction. While traditional prediction based on structural MRI uses imaging acquired at a single time point, a longitudinal study is more sensitive in detecting early pathological changes of the AD. Our novel method can be applied to extract features from heterogeneous and longitudinal MRI for the AD prediction, outperforming existing methods. Chapters 4 introduces a novel class of variable selection penalties called TWIN, which provides sensible data-adaptive penalization. Under a linear sparsity regime, we show that TWIN penalties have a high probability of selecting correct models and result in minimax optimal estimators. We demonstrate in challenging and realistic simulation settings with high correlations between active and inactive variables that TWIN has high power in variable selection while controlling the number of false discoveries, outperforming standard penalties. Chapters 5 investigates generalizations of mini-batch SGD in deep neural networks. We theoretically justify a hypothesis that large-batch SGD tends to converge to sharp minimizers by providing new properties of SGD. In particular, we give an explicit escaping time of SGD from a local minimum in the finite-time regime and prove that SGD tends to converge to flatter minima in the asymptotic regime (although may take exponential time to converge) regardless of the batch size. Chapter 6 provides another look at statistical calibration problems in computer models. This viewpoint is inspired by two overarching practical considerations: (i) Many computer models are inadequate for perfectly modeling physical systems; (ii) Only a finite number of data are available from physical experiments to calibrate related computer models. We provide a non-asymptotic theory and derive a novel prediction-oriented calibration method.