An analysis of correctness for API recommendation:are the unmatched results useless?
An analysis of correctness for API recommendation:are the unmatched results useless?作者机构:School of Computer Science and Engineering Southeast University
出 版 物:《Science China(Information Sciences)》 (中国科学:信息科学(英文版))
年 卷 期:2020年第63卷第9期
页 面:43-57页
核心收录:
学科分类:08[工学] 0835[工学-软件工程] 081202[工学-计算机软件与理论] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported in part by National Key R&D Program of China (Grant No. 2018YFB100-3900) in part by National Natural Science Foundation of China (Grant Nos. 61402103,61572126,61872078) in part by Open Research Fund of Key Laboratory of Safety-Critical Software Fund (Nanjing University of Aeronautics and Astronautics)(Grant No. NJ2019006) in part by Key Laboratory of Computer Network and Information Integration of the Ministry of Education of China (Grant No. 93K-9)
主 题:API recommendation onsite programming correctness evaluation of recommendation
摘 要:API recommendation is a promising approach which is widely used during software development. However, the evaluation of API recommendation is not explored with sufficient rigor. The current evaluation of API recommendation mainly focuses on correctness, the measurement is conducted by matching recommended results with ground-truth results. In most cases, there is only one set of ground-truth APIs for each recommendation attempt, but the object code can be implemented in dozens of ways. The neglect of code diversity results in a possible defect in the evaluation. To address the problem, we invite15 developers to analyze the unmatched results in a user study. The online evaluation confirms that some unmatched APIs can also benefit to programming due to the functional correlation with ground-truth *** we measure the API functional correlation based on the relationships extracted from API knowledge graph, API method name, and API documentation. Furthermore, we propose an approach to improve the measurement of correctness based on API functional correlation. Our measurement is evaluated on a dataset of 6141 requirements and historical code fragments from related commits. The results show that 28.2% of unmatched APIs can contribute to correctness in our experiments.