Supervariants identification for breast cancer

Jianchang Hu, Ting Li, Shiying Wang, Heping Zhang

Research output: Journal article publicationJournal articleAcademic researchpeer-review

1 Citation (Scopus)

Abstract

In genome-wide association studies, signals associated with rare variants and interactions between genes are hard to detect even when the sample size is in tens of thousands. To overcome these problems, we examine the concept of supervariant. Like the classic concept of the gene, a supervariant is a combination of alleles in multiple loci, but the contributing loci can be anywhere in the genome. We hypothesize that supervariants are easy to detect and the aggregated signals are more stable in their associations with the disease than that from a single nucleoid polymorphism. Using the UK Biobank databases, we develop a ranking and aggregation method for identifying supervariants. Specifically, we examine 9,377 breast cancer cases with 46,861 controls matched by sex and age. In our simulations, the use of supervariants outperforms single-nucleotide polymorphism-based association method in detecting rare variants and signals with interactive structure. In real data analysis, we identify supervariants on Chromosomes 1, 2, 3, 5, 6, 7, 8, 9, 10, 11, 16, and 22 which cover previously reported loci that have associations with breast or other cancers, and several novel loci on Chromosomes 2, 5, 9, and 12. These findings demonstrate the validity of supervariants and its potential of discovering replicable and novel results for complex disease.

Original languageEnglish
Pages (from-to)934-947
Number of pages14
JournalGenetic Epidemiology
Volume44
Issue number8
DOIs
Publication statusPublished - Nov 2020
Externally publishedYes

Keywords

  • depth importance
  • gene–gene interaction
  • GWAS
  • random forest

ASJC Scopus subject areas

  • Epidemiology
  • Genetics(clinical)

Cite this