Some recent works in classification show that the data obtained from various views with different sensors for an object contributes to achieving a remarkable performance. Actually, in many real-world applications, each view often contains multiple features, which means that this type of data has a hierarchical structure, while most of existing works do not take these features with multi-layer structure into consideration simultaneously. In this paper, a probabilistic hierarchical model is proposed to address this issue and applied for classification. In our model, a latent variable is first learned to fuse the multiple features obtained from a same view, sensor or modality. Particularly, mapping matrices corresponding to a certain view are estimated to project the latent variable from a shared space to the multiple observations. Since this method is designed for the supervised purpose, we assume that the latent variables associated with different views are influenced by their ground-truth label. In order to effectively solve the proposed method, the Expectation-Maximization (EM) algorithm is applied to estimate the parameters and latent variables. Experimental results on the extensive synthetic and two real-world datasets substantiate the effectiveness and superiority of our approach as compared with state-of-the-art.