Abstract
We consider the problem of estimating the nonparametric function in nonparametric logistic regression under semi-supervised framework, where a relatively small size labeled data set collected by case-control sampling and a relatively large size of unlabeled data containing only observations of predictors are available. This problem arises in various applications when the outcome variable is expensive or difficult to be observed directly. A two-stage nonparametric semi-supervised estimator based on spline method is proposed to estimate the target regression function by maximizing the likelihood function of the labeled case-control data. The unlabeled data are used in the first stage for estimating the density function that involves in the likelihood function. The consistency and functional asymptotic normality of the semi-supervised two-stage estimator are established under mild conditions. The proposed method, by making use of the unlabeled data, produces more efficient estimation of the target function than the traditional supervised counterpart. The performance of the proposed method is evaluated through extensive simulation studies. An application is illustrated with an analysis of a skin segmentation data.
Original language | English |
---|---|
Pages (from-to) | 2573-2589 |
Number of pages | 17 |
Journal | Statistics in Medicine |
Volume | 42 |
Issue number | 15 |
DOIs | |
Publication status | Published - 10 May 2023 |
Keywords
- case-control studies
- nonparametric logistic regression
- semi-supervised inference
ASJC Scopus subject areas
- Epidemiology
- Statistics and Probability