HXPY: A High-Performance Data Processing Package for Financial Time-Series Data
作者机构:The Hong Kong University of Science and TechnologyHong KongChina International Digital Economy AcademyShenzhen 518048China The Hong Kong University of Science and Technology(Guangzhou)Guangzhou 511455China
出 版 物:《Journal of Computer Science & Technology》 (计算机科学技术学报(英文版))
年 卷 期:2023年第38卷第1期
页 面:3-24页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)]
主 题:dataframe time-series data SIMD(single instruction multiple data) CUDA(Compute Unified Device Architecture)
摘 要:A tremendous amount of data has been generated by global financial markets everyday,and such time-series data needs to be analyzed in real time to explore its potential *** recent years,we have witnessed the successful adoption of machine learning models on financial data,where the importance of accuracy and timeliness demands highly effective computing ***,traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues,such as the outlier handling with stock suspension in Pandas and *** this paper,we propose HXPY,a high-performance data processing package with a C++/Python interface for financial time-series *** supports miscellaneous acceleration techniques such as the streaming algorithm,the vectorization instruction set,and memory optimization,together with various functions such as time window functions,group operations,down-sampling operations,cross-section operations,row-wise or column-wise operations,shape transformations,and alignment *** results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its *** MiBs to GiBs data,HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times.