Data Analysis of Multiplex Sequencing at SOLiD Platform:A Probabilistic Approach to Characterization and Reliability Increase
作者机构:Engineering and Geoscience Institute Federal University of Western Pará (UFOPa) Santarém Brazil Technological Institute Federal University of Pará (UFPa) Belém Brazil Institute of Mathematical and Computer Sciences University of Sao Paulo (USP) Sao Carlos Brazil Department of Morphology and Physiological Sciences State University of Pará Marabá Brazil Biological Science Institute Federal University of Pará (UFPa) Belém Brazil Laboratory of Computing and Applied Mathematics National Institute for Space Research (INPE) Sao José Dos Campos Brazil
出 版 物:《American Journal of Molecular Biology》 (美国分子生物学期刊(英文))
年 卷 期:2018年第8卷第1期
页 面:26-38页
学科分类:1002[医学-临床医学] 100214[医学-肿瘤学] 10[医学]
主 题:Probabilistic Modeling Health Informatics SOLiD Barcoding System Statistical Analysis Multiplex Sequencing
摘 要:New sequencing technologies such as Illumina/Solexa, SOLiD/ABI, and 454/Roche, revolutionized the biological researches. In this context, the SOLiD platform has a particular sequencing type, known as multiplex run, which enables the sequencing of several samples in a single run. It implies in cost reduction and simplifies the analysis of related samples. Meanwhile, this sequencing type requires an additional filtering step to ensure the reliability of the results. Thus, we propose in this paper a probabilistic model which considers the intrinsic characteristics of each sequencing to characterize multiplex runs and filter low-quality data, increasing the data analysis reliability of multiplex sequencing performed on SOLiD. The results show that the proposed model proves to be satisfactory due to: 1) identification of faults in the sequencing process;2) adaptation and development of new protocols for sample preparation;3) the assignment of a degree of confidence to the data generated;and 4) guiding a filtering process, without discarding useful sequences in an arbitrary manner.