Abstract
A basic question in analyzing cDNA microarray data is normalization, the purpose of which is to remove systematic bias in the observed expression values by establishing a normalization curve across the whole dynamic range. A proper normalization procedure ensures that the normalized intensity ratios provide meaningful measures of relative expression levels. We propose a two-way semilinear model (TW-SLM) for normalization and analysis of microarray data. This method does not make the usual assumptions underlying some of the existing methods. For example, it does not assume that the percentage of differentially expressed genes is small or that there is symmetry in the expression levels of up-regulated and down-regulated genes, as required in the lowess normalization method. The TW-SLM also naturally incorporates uncertainty due to normalization into significance analysis of microarrays. We use a semiparametric approach based on polynomial splines in the TW-SLM to estimate the normalization curves and the normalized expression values. We study the theoretical properties of the proposed estimator in the TW-SLM, including the finite-sample distributional properties of the estimated gene effects and the rate of convergence of the estimated normalization curves when the number of genes under study is large. We also conduct simulation studies to evaluate the TW-SLM method and illustrate the proposed method using a published microarray dataset.
Original language | English |
---|---|
Pages (from-to) | 814-829 |
Number of pages | 16 |
Journal | Journal of the American Statistical Association |
Volume | 100 |
Issue number | 471 |
DOIs | |
Publication status | Published - 1 Sep 2005 |
Externally published | Yes |
Keywords
- Analysis of variance
- Differentially expressed gene
- High-dimensional data
- Microarray
- Noise level
- Semiparametric regression
- Spline
- Variance estimation
ASJC Scopus subject areas
- Statistics and Probability
- Statistics, Probability and Uncertainty