Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science


Parisa Shooshtari


Combining multiple data types can help researchers gain deeper insight into the subject of the study compared to analyzing only one dataset in many cases. Biological researchers can also benefit from these methods of integration. For instance, GWAS data that gives information about variations in the DNA cannot provide us with much information about the specific biological components that are significant in the trait of interest. However, when combined with sequencing data such as chromatin accessibility data or gene expression data, they can help us find the significant biological elements in the trait of interest. In this study, I perform multiple statistical and machine learning-based integration methods on GWAS and sequencing data and find the relevant tissues and cell types in schizophrenia and specific regulatory elements affected by this complex mental disease.

Summary for Lay Audience

As the technology progresses, new datasets are generated at a faster speed constantly in all of the fields. Although each of these datasets gives researchers further information about the subject of their studies, when combined together they may give them insights that would have been missed without the integration of multiple datasets. Data integration methods try to develop ways to leverage fusing datasets together to get a better insight into their subject of interest.

Biological studies can benefit from data integration too. In this thesis, I apply three data integration methods to multiple biological datasets in order to obtain a deeper understanding of a complex mental disease called Schizophrenia.

Some biological data like data from variations in the DNA sequences cannot give much information about the functional elements that play a role in the disease of interest like schizophrenia. However, when combined with other biological data types like datasets that get generated by mapping small parts of DNA to the whole DNA sequence (sequencing data), they enable us to find the specific biological components important in schizophrenia.

I apply the integration methods to mouse and human datasets to find the cell types that are important in schizophrenia, as well as biological components affected by this disease. Finally, I propose suggestions to help researchers develop further integration frameworks in the future.

Included in

Data Science Commons