Enhancing Generic Reaction Yield Prediction through Reaction Condition-Based Contrastive Learning
作者机构:Dr.Neher’s Biophysics Laboratory for Innovative Drug DiscoveryState Key Laboratory of Quality Research in Chinese MedicineMacao Institute for Applied Research in Medicine and HealthMacao University of Science and TechnologyMacao 999078China Innovation Institute for Artificial Intelligence in Medicine of Zhejiang UniversityCollege of Pharmaceutical SciencesZhejiang UniversityHangzhou 310058China. Faculty of Applied SciencesMacao Polytechnic UniversityMacao 999078China. CarbonSilicon AI Technology Co.LtdHangzhouZhejiang 310018China Center of Chemistry and Chemical BiologyGuangzhou Regenerative Medicine and Health Guangdong LaboratoryGuangzhou 510530China.
出 版 物:《Research》 (研究(英文))
年 卷 期:2024年第3期
页 面:685-702页
核心收录:
学科分类:081704[工学-应用化学] 07[理学] 08[工学] 0817[工学-化学工程与技术] 070303[理学-有机化学] 0703[理学-化学]
基 金:the Science and Technology Development Fund,Macao SAR(file nos.0056/2020/AMJ,0114/2020/A3,and 0015/2019/AMJ) Dr.Neher’s Biophysics Laboratory for Innovative Drug Discovery(file no.002/2023/ALC)
摘 要:Deep learning(DL)-driven efficient synthesis planning may profoundly transform the paradigm for designing novel pharmaceuticals and ***,the progress of many DL-assisted synthesis planning(DASP)algorithms has suffered from the lack of reliable automated pathway evaluation *** a critical metric for evaluating chemical reactions,accurate prediction of reaction yields helps improve the practicality of DASP algorithms in the real-world ***,accurately predicting yields of interesting reactions still faces numerous challenges,mainly including the absence of high-quality generic reaction yield datasets and robust generic yield *** compensate for the limitations of high-throughput yield datasets,we curated a generic reaction yield dataset containing 12 reaction categories and rich reaction condition ***,by utilizing 2 pretraining tasks based on chemical reaction masked language modeling and contrastive learning,we proposed a powerful bidirectional encoder representations from transformers(BERT)-based reaction yield predictor named *** achieved comparable or even superior performance to the best previous models on 4 benchmark datasets and established state-of-the-art performance on the newly curated *** found that reaction-condition-based contrastive learning enhances the model’s sensitivity to reaction conditions,and Egret is capable of capturing subtle differences between reactions involving identical reactants and products but different reaction ***,we proposed a new scoring function that incorporated Egret into the evaluation of multistep synthesis *** results showed that yield-incorporated scoring facilitated the prioritization of literature-supported high-yield reaction pathways for target *** addition,through meta-learning strategy,we further improved the reliability of the model’s prediction for reaction types with limited data and lower data *** r