STAG-CNS: An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species
STAG-CNS: An Order-Aware Conserved Noncoding Sequences Discovery Tool for Arbitrary Numbers of Species作者机构:Department of Agronomy and Horticulture Center for Plant Science innovation University of Nebraska-Lincoln Lincoln NE 68588 USA Department of Computer Science and Engineering University of Nebraska-Lincoln Lincoln NE 68588 USA Maize Research Institute Sichuan Agricultural University Chengdu 611130 China
出 版 物:《Molecular Plant》 (分子植物(英文版))
年 卷 期:2017年第10卷第7期
页 面:990-999页
核心收录:
学科分类:0710[理学-生物学] 07[理学] 08[工学] 09[农学] 071007[理学-遗传学] 0901[农学-作物学] 0836[工学-生物工程] 090102[农学-作物遗传育种]
基 金:supported by internal funding to J.C.S. and J.S.D by a China Scholarship Council fellowship awarded to X.L
主 题:conserved noncoding sequence comparative genomics suffix tree longest path algorithm grain crops
摘 要:One method for identifying noncoding regulatory regions of a genome is to quantify rates of divergence between related species, as functional sequence will generally diverge more slowly. Most approaches to identifying these conserved noncoding sequences (CNSs) based on alignment have had relatively large minimum sequence lengths (≥15 bp) compared with the average length of known transcription factor binding sites. To circumvent this constraint, STAG-CNS that can simultaneously integrate the data from the promoters of conserved orthologous genes in three or more species was developed. Using the data from up to six grass species made it possible to identify conserved sequences as short as 9 bp with false discovery rate ≤0.05. These CNSs exhibit greater overlap with open chromatin regions identified using DNase I hypersensitivity assays, and are enriched in the promoters of genes involved in transcriptional regulation. STAG-CNS was further employed to characterize loss of conserved noncoding sequences associated with retained duplicate genes from the ancient maize polyploidy. Genes with fewer retained CNSs show lower overall expression, although this bias is more apparent in samples of complex organ systems containing many cell types, suggesting that CNS loss may correspond to a reduced number of expression contexts rather than lower expression levels across the entire ancestral expression domain.