Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science


Sedig, Kamran


With the rapid growth of scientific documents over the years, researchers must examine large collections of documents to keep up with their research fields. Over the past years, numerous tools have been developed to support researchers in making sense of the documents collection; however, due to the high load and complexity of scientific information, many of these tools have only covered basic tasks or restricted information items. This thesis describes a visual analytics system (i.e., a tool that integrates data visualization, human-data interaction, and machine learning) that helps researchers explore and examine scientific documents thoroughly and rapidly with an especial focus on the textual content of scientific documents. Through a usage and comparative scenario, we illustrated the efficiency and advantages of our system over similar tools. Finally, we discussed possible future extensions and upgrades thanks to the modular architecture of the system.

Summary for Lay Audience

With the rapid growth of scientific documents and interdisciplinary studies conducted in the past years, researchers have found it challenging to remain up to date with their research fields. Search and exploration, filtering, reading, and extracting information items, and comparison are common tasks that researchers perform to make sense of the collection of documents they are working with. Many computational tools have been developed over the years to support researchers in performing these tasks; however, they often support some of these tasks or just specific information items of scientific documents (e.g., exploration of bibliographic information of document collections). On the other hand, due to the complex structure and high load of scientific information, the visualization techniques used in the visual interface of the existing tools require researchers to perform extra interactions with the tool to access their desired information.

This thesis describes a visual analytics system (i.e., a tool that integrates machine learning algorithms with data visualization and human-data interaction) with an innovative visual interface design to afford rapid sensemaking of scientific documents for researchers. By combining an integrated visualization component and advanced text analytical approaches, we have managed to design a system that not only encodes a broad spectrum of scientific information items (ranging from bibliographic information to derived attributes of textual content of scientific documents) but also supports a wide range of activities during sensemaking process (e.g., rapid exploration, skimming, comparison). As a proof of concept, we have provided a scenario in which we examine the efficiency of different components of our system compared to the existing tools. Last but not least, we analyzed the limitations and future extensions of our system.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.