Optimal Dependence of Performance and Efficiency of Collaborative Filtering on Random Stratified Subsampling
Optimal Dependence of Performance and Efficiency of Collaborative Filtering on Random Stratified Subsampling作者机构:Department of Computational Data Science and EngineeringNorth Carolina A&T State UniversityGreensboroNC 27401USA
出 版 物:《Big Data Mining and Analytics》 (大数据挖掘与分析(英文))
年 卷 期:2022年第5卷第3期
页 面:192-205页
核心收录:
学科分类:081203[工学-计算机应用技术] 08[工学] 0835[工学-软件工程] 0812[工学-计算机科学与技术(可授工学、理学学位)]
主 题:Collaborative Filtering(CF) subsampling Training Time Improvement(TTI) performance loss Recommendation System(RS) collaborative filtering optimal solutions rating matrix
摘 要:Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)*** effect of this subsampling on the computing time and accuracy of CF is not fully understood,and clear guidelines for selecting optimal or even appropriate subsampling levels are not *** this paper,we present a Density-based Random Stratified Subsampling using Clustering(DRSC)algorithm in which the desired Fraction of Users Dropped(FUD)and Fraction of Items Dropped(FID)are specified,and the overall density during subsampling is ***,we develop simple models of the Training Time Improvement(TTI)and the Accuracy Loss(AL)as functions of FUD and FID,based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens,Yahoo Music Rating,and Amazon Automotive *** show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven *** TTI linear regression of a CF method appears to be same for all *** simulations illustrate that TTI can be estimated reliably with FUD and FID only,but AL requires considering additional dataset *** derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL.A simple sub-optimal approximation was found,in which the optimal AL is proportional to the optimal Training Time Reduction Factor(TTRF)for higher values of TTRF,and the optimal subsampling levels,like optimal FID/(1-FID),are proportional to the square root of TTRF.