Data-driven human and bot recognition from web activity logs based on hybrid learning techniques
作者机构:Systems Research InstitutePolish Academy of SciencesWarsawPoland Faculty of Mathematics and Information ScienceWarsaw University of TechnologyWarsawPoland EDGE NPD Ltd.CoWarsawPoland
出 版 物:《Digital Communications and Networks》 (数字通信与网络(英文版))
年 卷 期:2024年第10卷第4期
页 面:1178-1188页
核心收录:
学科分类:08[工学] 0812[工学-计算机科学与技术(可授工学、理学学位)]
基 金:supported by the ABT SHIELD(Anti-Bot and Trolls Shield)project at the Systems Research Institute,Polish Academy of Sciences,in cooperation with EDGE NPD RPMA.01.02.00-14-B448/18-00 funded by the Regional Development Fund for the development of Mazovia
主 题:Web logs Classification Clustering Web traffic Bots Interpretability
摘 要:Distinguishing between web traffic generated by bots and humans is an important task in the evaluation of online marketing *** of the main challenges is related to only partial availability of the performance metrics:although some users can be unambiguously classified as bots,the correct label is uncertain in many *** calls for the use of classifiers capable of explaining their *** paper demonstrates two such mechanisms based on features carefully engineered from web *** first is a man-made rule-based *** second is a hierarchical model that first performs clustering and next classification using human-centred,interpretable *** stability of the proposed methods is analyzed and a minimal set of features that convey the classdiscriminating information is *** proposed data processing and analysis methodology are successfully applied to real-world data sets from online publishers.