Thesis Format
Monograph
Degree
Master of Science
Program
Computer Science
Supervisor
Sedig, Kamran
2nd Supervisor
Haque, Anwar
Co-Supervisor
Abstract
Through social media platforms, massive amounts of data are being produced. Twitter, as one such platform, enables users to post “tweets” on an unprecedented scale. Once analyzed by machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This thesis describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and ML) to help users make sense of the real-time streams on Twitter. As proofs of concept, public-health and political discussions were analyzed. The system not only provides categorized and aggregate results but also enables the stakeholders to diagnose and to heuristically suggest fixes for the errors in the outcome.
Summary for Lay Audience
Through social media platforms, massive amounts of data are being produced. Twitter, as a microblogging social media platform, enables users to post short updates as “tweets” on an unprecedented scale. Once analyzed by using machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This thesis describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and ML) to help users monitor, analyze, and make sense of the streams of discussions on Twitter in a real-time manner. This system helps the users to understand “who” is talking about “what” and “how” and/or “why” a tweet is posted. As case-studies, we use public-health and election discussions to demonstrate the capabilities enabled by the system. The system then not only provides categorized and aggregate results of such discussions but also enables the stakeholders to diagnose and to heuristically suggest fixes for the errors in the outcome, resulting in a more detailed understanding of the discussions.
Recommended Citation
HaghighatiMaleki, Amir, "A Visual Analytics System for Making Sense of Real-Time Twitter Streams" (2020). Electronic Thesis and Dissertation Repository. 6809.
https://ir.lib.uwo.ca/etd/6809
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Included in
Artificial Intelligence and Robotics Commons, Computer and Systems Architecture Commons, Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons, Other Computer Engineering Commons, Science and Technology Studies Commons, Social Statistics Commons, Software Engineering Commons, Systems Architecture Commons