Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Computer Science

Supervisor

Sedig, Kamran

2nd Supervisor

Haque, Anwar

Co-Supervisor

Abstract

Through social media platforms, massive amounts of data are being produced. Twitter, as one such platform, enables users to post “tweets” on an unprecedented scale. Once analyzed by machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This thesis describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and ML) to help users make sense of the real-time streams on Twitter. As proofs of concept, public-health and political discussions were analyzed. The system not only provides categorized and aggregate results but also enables the stakeholders to diagnose and to heuristically suggest fixes for the errors in the outcome.

Summary for Lay Audience

Through social media platforms, massive amounts of data are being produced. Twitter, as a microblogging social media platform, enables users to post short updates as “tweets” on an unprecedented scale. Once analyzed by using machine learning (ML) techniques and in aggregate, Twitter data can be an invaluable resource for gaining insight. However, when applied to real-time data streams, due to covariate shifts in the data (i.e., changes in the distributions of the inputs of ML algorithms), existing ML approaches result in different types of biases and provide uncertain outputs. This thesis describes a visual analytics system (i.e., a tool that combines data visualization, human-data interaction, and ML) to help users monitor, analyze, and make sense of the streams of discussions on Twitter in a real-time manner. This system helps the users to understand “who” is talking about “what” and “how” and/or “why” a tweet is posted. As case-studies, we use public-health and election discussions to demonstrate the capabilities enabled by the system. The system then not only provides categorized and aggregate results of such discussions but also enables the stakeholders to diagnose and to heuristically suggest fixes for the errors in the outcome, resulting in a more detailed understanding of the discussions.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS