TY - JOUR
T1 - Supervariants identification for breast cancer
AU - Hu, Jianchang
AU - Li, Ting
AU - Wang, Shiying
AU - Zhang, Heping
N1 - Funding Information:
The work of Heping Zhang is supported in part by the U.S. National Institutes of Health (R01HG010171 and R01MH116527) and the National Science Foundation (Grant no. DMS1722544). This study has been conducted using the UK Biobank Resource under Application Number 42009. The authors would like to thank the Yale Center for Research Computing for the guidance and use of the research computing infrastructure.
Funding Information:
The work of Heping Zhang is supported in part by the U.S. National Institutes of Health (R01HG010171 and R01MH116527) and the National Science Foundation (Grant no. DMS1722544). This study has been conducted using the UK Biobank Resource under Application Number 42009. The authors would like to thank the Yale Center for Research Computing for the guidance and use of the research computing infrastructure.
Publisher Copyright:
© 2020 Wiley Periodicals LLC
PY - 2020/11
Y1 - 2020/11
N2 - In genome-wide association studies, signals associated with rare variants and interactions between genes are hard to detect even when the sample size is in tens of thousands. To overcome these problems, we examine the concept of supervariant. Like the classic concept of the gene, a supervariant is a combination of alleles in multiple loci, but the contributing loci can be anywhere in the genome. We hypothesize that supervariants are easy to detect and the aggregated signals are more stable in their associations with the disease than that from a single nucleoid polymorphism. Using the UK Biobank databases, we develop a ranking and aggregation method for identifying supervariants. Specifically, we examine 9,377 breast cancer cases with 46,861 controls matched by sex and age. In our simulations, the use of supervariants outperforms single-nucleotide polymorphism-based association method in detecting rare variants and signals with interactive structure. In real data analysis, we identify supervariants on Chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 16, and 22 which cover previously reported loci that have associations with breast or other cancers, and several novel loci on Chromosomes 2, 5, 9, and 12. These findings demonstrate the validity of supervariants and its potential of discovering replicable and novel results for complex disease.
AB - In genome-wide association studies, signals associated with rare variants and interactions between genes are hard to detect even when the sample size is in tens of thousands. To overcome these problems, we examine the concept of supervariant. Like the classic concept of the gene, a supervariant is a combination of alleles in multiple loci, but the contributing loci can be anywhere in the genome. We hypothesize that supervariants are easy to detect and the aggregated signals are more stable in their associations with the disease than that from a single nucleoid polymorphism. Using the UK Biobank databases, we develop a ranking and aggregation method for identifying supervariants. Specifically, we examine 9,377 breast cancer cases with 46,861 controls matched by sex and age. In our simulations, the use of supervariants outperforms single-nucleotide polymorphism-based association method in detecting rare variants and signals with interactive structure. In real data analysis, we identify supervariants on Chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 16, and 22 which cover previously reported loci that have associations with breast or other cancers, and several novel loci on Chromosomes 2, 5, 9, and 12. These findings demonstrate the validity of supervariants and its potential of discovering replicable and novel results for complex disease.
KW - depth importance
KW - gene–gene interaction
KW - GWAS
KW - random forest
UR - http://www.scopus.com/inward/record.url?scp=85089455057&partnerID=8YFLogxK
U2 - 10.1002/gepi.22350
DO - 10.1002/gepi.22350
M3 - Journal article
C2 - 32808324
AN - SCOPUS:85089455057
SN - 0741-0395
VL - 44
SP - 934
EP - 947
JO - Genetic Epidemiology
JF - Genetic Epidemiology
IS - 8
ER -