Synthetic Data Generation and Shuffled Multi-Round Training Based Offline Handwritten Mathematical Expression Recognition
作者机构:School of Computer Science and TechnologyUniversity of Science and Technology of ChinaHefei 230022China
出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))
年 卷 期:2022年第37卷第6期
页 面:1427-1443页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 081202[工学-计算机软件与理论] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:the National Key Research and Development Program of China No.2020YFB1313602
主 题:handwritten mathematical expression recognition offline synthetic data generation training strategy
摘 要:Offline handwritten mathematical expression recognition is a challenging optical character recognition(OCR)task due to various ambiguities of handwritten symbols and complicated two-dimensional *** work in this area usually constructs deeper and deeper neural networks trained with end-to-end approaches to improve the ***,the higher the complexity of the network,the more the computing resources and time *** improve the performance without more computing requirements,we concentrate on the training data and the training strategy in this *** propose a data augmentation method which can generate synthetic samples with new LaTeX notations by only using the official training data of ***,we propose a novel training strategy called Shuffled Multi-Round Training(SMRT)to regularize the *** the generated data and the shuffled multi-round training strategy,we achieve the state-of-the-art result in expression accuracy,i.e.,59.74%and 61.57%on CROHME 2014 and 2016,respectively,by using attention-based encoder-decoder models for offline handwritten mathematical expression recognition.