Identification of SNP-containing regulatory motifs in the myelodysplastic syndromes model using SNP arrays and gene expression arrays
Identification of SNP-containing regulatory motifs in the myelodysplastic syndromes model using SNP arrays and gene expression arrays作者机构:Department of Electrical and Computer Engineering Northeastern University Department of Pathology Florida Hospital University of Central Florida Department of Radiology The Methodist Hospital Research Institute Weill Medical College & Cornell University
出 版 物:《Chinese Journal of Cancer》 (Chin. J. Cancer)
年 卷 期:2013年第32卷第4期
页 面:170-185页
核心收录:
学科分类:1002[医学-临床医学] 100201[医学-内科学(含:心血管病、血液病、呼吸系病、消化系病、内分泌与代谢病、肾病、风湿病、传染病)] 10[医学]
基 金:supported by grants from NIH(No.1R01LM010185 1U01CA166886 and 1U01HL111560)
主 题:Association study, genetic variation and mutation, transcription factor-binding sites, myelodysplastic syndromes
摘 要:Myelodysplastic syndromes have increased in frequency and incidence in the American population, but patient prognosis has not significantly improved over the last decade. Such improvements could be realized if biomarkers for accurate diagnosis and prognostic stratification were successfully identified. In this study, we propose a method that associates two state-of-the-art array technologies-single nucleotide polymorphism (SNP) array and gene expression array-with gene motifs considered transcription factor -binding sites (TFBS). We are particularly interested in SNP-containing motifs introduced by genetic variation and mutation as TFBS. The potential regulation of SNP-containing motifs affects only when certain mutations occur. These motifs can be identified from a group of co-expressed genes with copy number variation. Then, we used a sliding window to identify motif candidates near SNPs on gene sequences. The candidates were filtered by coarse thresholding and fine statistical testing. Using the regression-based LARS-EN algorithm and a level-wise sequence combination procedure, we identified 28 SNP-containing motifs as candidate TFBS. We confirmed 21 of the 28 motifs with ChIP-chip fragments in the TRANSFAC database. Another six motifs were validated by TRANSFAC via searching binding fragments on coregulated genes. The identified motifs and their location genes can be considered potential biomarkers for myelodysplastic syndromes. Thus, our proposed method, a novel strategy for associating two data categories, is capable of integrating information from different sources to identify reliable candidate regulatory SNP-containing motifs introduced by genetic variation and mutation.