Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model

Ying Chao Lin, Ai Ru Hsieh, Ching Lin Hsiao, Shang Jung Wu, Hui Min Wang, Iebin Lian, Cathy S.J. Fann

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Background: Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. Results: We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson 's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. Conclusions: Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.

Original languageEnglish
Article number88
JournalJournal of Biomedical Science
Volume21
Issue number1
DOIs
Publication statusPublished - 2014 Aug 30

Fingerprint

Rare Diseases
Parkinson Disease
Genes
Synucleins
Genome
Kringles
Wnt Signaling Pathway
Genome-Wide Association Study
Medical Genetics
Chromosomes
Scanning

All Science Journal Classification (ASJC) codes

  • Endocrinology, Diabetes and Metabolism
  • Molecular Biology
  • Clinical Biochemistry
  • Cell Biology
  • Biochemistry, medical
  • Pharmacology (medical)

Cite this

Lin, Ying Chao ; Hsieh, Ai Ru ; Hsiao, Ching Lin ; Wu, Shang Jung ; Wang, Hui Min ; Lian, Iebin ; Fann, Cathy S.J. / Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model. In: Journal of Biomedical Science. 2014 ; Vol. 21, No. 1.
@article{ea8218740cb04de9a3776485081fcdb2,
title = "Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model",
abstract = "Background: Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. Results: We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson 's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. Conclusions: Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.",
author = "Lin, {Ying Chao} and Hsieh, {Ai Ru} and Hsiao, {Ching Lin} and Wu, {Shang Jung} and Wang, {Hui Min} and Iebin Lian and Fann, {Cathy S.J.}",
year = "2014",
month = "8",
day = "30",
doi = "10.1186/s12929-014-0088-9",
language = "English",
volume = "21",
journal = "Journal of Biomedical Science",
issn = "1021-7770",
publisher = "BioMed Central",
number = "1",

}

Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model. / Lin, Ying Chao; Hsieh, Ai Ru; Hsiao, Ching Lin; Wu, Shang Jung; Wang, Hui Min; Lian, Iebin; Fann, Cathy S.J.

In: Journal of Biomedical Science, Vol. 21, No. 1, 88, 30.08.2014.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Identifying rare and common disease associated variants in genomic data using Parkinson's disease as a model

AU - Lin, Ying Chao

AU - Hsieh, Ai Ru

AU - Hsiao, Ching Lin

AU - Wu, Shang Jung

AU - Wang, Hui Min

AU - Lian, Iebin

AU - Fann, Cathy S.J.

PY - 2014/8/30

Y1 - 2014/8/30

N2 - Background: Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. Results: We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson 's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. Conclusions: Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.

AB - Background: Genome-wide association studies have been successful in identifying common genetic variants for human diseases. However, much of the heritable variation associated with diseases such as Parkinson's disease remains unknown suggesting that many more risk loci are yet to be identified. Rare variants have become important in disease association studies for explaining missing heritability. Methods for detecting this type of association require prior knowledge on candidate genes and combining variants within the region. These methods may suffer from power loss in situations with many neutral variants or causal variants with opposite effects. Results: We propose a method capable of scanning genetic variants to identify the region most likely harbouring disease gene with rare and/or common causal variants. Our method assigns a score at each individual variant based on our scoring system. It uses aggregate scores to identify the region with disease association. We evaluate performance by simulation based on 1000 Genomes sequencing data and compare with three commonly used methods. We use a Parkinson 's disease case-control dataset as a model to demonstrate the application of our method. Our method has better power than CMC and WSS and similar power to SKAT-O with well-controlled type I error under simulation based on 1000 Genomes sequencing data. In real data analysis, we confirm the association of α-synuclein gene (SNCA) with Parkinson's disease (p = 0.005). We further identify association with hyaluronan synthase 2 (HAS2, p = 0.028) and kringle containing transmembrane protein 1 (KREMEN1, p = 0.006). KREMEN1 is associated with Wnt signalling pathway which has been shown to play an important role for neurodegeneration in Parkinson's disease. Conclusions: Our method is time efficient and less sensitive to inclusion of neutral variants and direction effect of causal variants. It can narrow down a genomic region or a chromosome to a disease associated region. Using Parkinson's disease as a model, our method not only confirms association for a known gene but also identifies two genes previously found by other studies. In spite of many existing methods, we conclude that our method serves as an efficient alternative for exploring genomic data containing both rare and common variants.

UR - http://www.scopus.com/inward/record.url?scp=84908457848&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84908457848&partnerID=8YFLogxK

U2 - 10.1186/s12929-014-0088-9

DO - 10.1186/s12929-014-0088-9

M3 - Article

VL - 21

JO - Journal of Biomedical Science

JF - Journal of Biomedical Science

SN - 1021-7770

IS - 1

M1 - 88

ER -