Distributed top-k similarity query on big trajectory streams
大轨道溪流上的分布式的 top-k 类似质问作者机构:School of Data Science and Engineering East China Normal University Shanghai 200062 China Computer School China West Normal University Nanchong 637009 China
出 版 物:《Frontiers of Computer Science》 (中国计算机科学前沿(英文版))
年 卷 期:2019年第13卷第3期
页 面:647-664页
核心收录:
学科分类:12[管理学] 1201[管理学-管理科学与工程(可授管理学、工学学位)] 08[工学]
基 金:Shanghai Knowledge Service Platform Project, (ZF1213) National Natural Science Foundation of China, NSFC, (61370101, 61402180, 61532021, U1401256, U1501252) National Basic Research Program of China (973 Program), (2016YFB1000905)
主 题:top-k similarity query trajectory stream communication cost
摘 要:Recently, big trajectory data streams are generated in distributed environments with the popularity of smartphones and other mobile devices. Distributed top?k similarity query, which finds k trajectories that are most similar to a given query trajectory from all remote sites, is critical in this field. The key challenge in such a query is how to reduce the communication cost due to the limited network bandwidth resource. Although this query can be solved by sending the query trajectory to all the remote sites, in which the pairwise similarities are computed precisely. However, the overall cost, O(n·m),is huge when nor mis huge, where n is the size of query trajectory and m is the number of remote sites. Fortunately, there are some cheap ways to estimate pairwise similarity, which filter some trajectories in advance without precise computation. In order to overcome the challenge in this query, we devise two general frameworks, into which concrete distance measures can be plugged. The former one uses two bounds (the upper and lower bound), while the latter one only uses the lower bound. Moreover, we introduce detailed implementations of two representative distance measures, Euclidean and DTW distance, after inferring the lower and upper bound for the former framework and the lower bound for the latter one. Theoretical analysis and extensive experiments on real-world datasets evaluate the efficiency of proposed methods.