Recent advances in high-throughput biotechnologies have provided an unprecedented opportunity for biomarker discovery, which, from a statistical point of view, can be cast as a variableselection problem. This problem is challenging due to the high-dimensional and non-linear nature of omics data and, in general, it suffers three difficulties: (i) an unknown functional form ofthe nonlinear system, (ii) variable selection consistency, and (iii) high-demanding computation.To circumvent the first difficulty, we employ a feed-forward neural network to approximate theunknown nonlinear function motivated by its universal approximation ability. To circumvent thesecond difficulty, we conduct structure selection for the neural network, which induces variableselection, by choosing appropriate prior distributions that lead to the consistency of variable se-lection. To circumvent the third difficulty, we implement the population stochastic approximationMonte Carlo algorithm, a parallel adaptive Markov Chain Monte Carlo (MCMC) algorithm, onthe OpenMP platform which provides a linear speedup for the simulation. The numerical resultsindicate that the proposed method can execute very fast on a multicore computer and work verywell for identification of relevant variables for general high-dimensional nonlinear systems. Theproposed method is successfully applied to selection of anticancer drug response genes for thedrug sensitivity data collected in the cancer cell line encyclopedia (CCLE) study.
|时 间:||2017-12-01 09:30|