A computational framework for improving genetic variants identification from 5,061 sheep sequencing data
作者机构:Department of AnimalVeterinary&Food SciencesUniversity of IdahoMoscowIDUSA Superior FarmsCaliforniaUSA
出 版 物:《Journal of Animal Science and Biotechnology》 (畜牧与生物技术杂志(英文版))
年 卷 期:2023年第14卷第6期
页 面:2332-2344页
核心收录:
基 金:Superior Farms sheep producers IBEST for their support financial support from the Idaho Global Entrepreneurial Mission
主 题:Computational framework Genetic variants Multiple samples Sheep
摘 要:Background Pan-genomics is a recently emerging strategy that can be utilized to provide a more comprehensive characterization of genetic *** calling is routinely used to combine identified variants across multiple related ***,the improvement of variants identification using the mutual support information from mul-tiple samples remains quite limited for population-scale *** In this study,we developed a computational framework for joint calling genetic variants from 5,061 sheep by incorporating the sequencing error and optimizing mutual support information from multiple samples’*** variants were accurately identified from multiple samples by using four steps:(1)Probabilities of variants from two widely used algorithms,GATK and Freebayes,were calculated by Poisson model incorporating base sequencing error potential;(2)The variants with high mapping quality or consistently identified from at least two samples by GATK and Freebayes were used to construct the raw high-confidence identification(rHID)variants database;(3)The high confidence variants identified in single sample were ordered by probability value and controlled by false discovery rate(FDR)using rHID database;(4)To avoid the elimination of potentially true variants from rHID database,the vari-ants that failed FDR were reexamined to rescued potential true variants and ensured high accurate identification *** results indicated that the percent of concordant SNPs and Indels from Freebayes and GATK after our new method were significantly improved 12%-32%compared with raw variants and advantageously found low frequency variants of individual sheep involved several traits including nipples number(GPC5),scrapie pathology(PAPSS2),sea-sonal reproduction and litter size(GRM1),coat color(RAB27A),and lentivirus susceptibility(TMEM154).Conclusion The new method used the computational strategy to reduce the number of false positives,and simulta-neously improve the identifi