Abstract
Online social networks (OSNs) have seen a remarkable rise in the presence of surreptitious automated accounts. Massive human user-base and business-supportive operating model of social networks (such as Twitter) facilitates the creation of automated agents. In this paper we outline a systematic methodology and train a classifier to categorise Twitter accounts into ‘automated’ and ‘human’ users. To improve classification accuracy we employ a set of novel steps. First, we divide the dataset into four popularity bands to compensate for differences in types of accounts. Second, we create a large ground truth dataset using human annotations and extract relevant features from raw tweets. To judge accuracy of the procedure we calculate agreement among human annotators as well as with a bot detection research tool. We then apply a Random Forests classifier that achieves an accuracy close to human agreement. Finally, as a concluding step we perform tests to measure the efficacy of our results.Index Terms
Social network analysis; account classification; automated agents; bot detectionOur work has the following contributions:
(i) Use of raw historical data (60 million tweets) for attribute collection and account classification (722; 109 tweets) to cater for stealthier agents that are harder to discern from humans; (ii) A Twitter dataset divided into user popularity bands, further partitioned into lists of agents and humans (for reasons refer to xIV) using a human annotation task. This serves as a large ground truth dataset; (iii) 14 novel features from a total feature-set of 21 attributes (see xIV); (iv) Performance evaluation of current state of the art in bot detection by calculating agreement between human annotators and BOTORNOT; (v) Application of supervised learning approach – Random Forests classifier – for non-partisan account categorisation; (vi) Identification of a distinct group of features (using ablation tests) that are most informative for classifying automated agents within each popularity band (cf. Table VIII); and (vii) Hypotheses (cf. Table I) verification against our findings using t-tests (see xVI).Infotainment
References
12: Datasets can be found here – https://goo.gl/SigsQB. Classifier is available as a part of Stweeler. The link is forbidden for public.
Pages
- Index of Lessons in Technology
- Index of Book Summaries
- Index of Book Lists And Downloads
- Index For Job Interviews Preparation
- Index of "Algorithms: Design and Analysis"
- Python Course (Index)
- Data Analytics Course (Index)
- Index of Machine Learning
- Postings Index
- Index of BITS WILP Exam Papers and Content
- Lessons in Investing
- Index of Math Lessons
- Downloads
- Index of Management Lessons
- Book Requests
- Index of English Lessons
- Index of Medicines
- Index of Quizzes (Educational)
Wednesday, August 10, 2022
Classification of Twitter Accounts into Automated Agents and Human Users (Zafar Gilani, Jul 2022)
Download Research Paper
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment