Statistical methods for the analysis of RNA sequencing data

Man-Kee Maggie Chu, The University of Western OntarioFollow

Degree

Doctor of Philosophy

Program

Statistics and Actuarial Sciences

Supervisor

Dr. Wenqing He

Abstract

The next generation sequencing technology, RNA-sequencing (RNA-seq), has an increasing popularity over traditional microarrays in transcriptome analyses. Statistical methods used for gene expression analyses with these two technologies are different because the array-based technology measures intensities using continuous distributions, whereas RNA-seq provides absolute quantification of gene expression using counts of reads. There is a need for reliable statistical methods to exploit the information from the rapidly evolving sequencing technologies and limited work has been done on expression analysis of time-course RNA-seq data. In this dissertation, we propose a model-based clustering method for identifying gene expression patterns in time-course RNA-seq data. Our approach employs a longitudinal negative binomial mixture model to postulate the over-dispersed time-course gene count data. We also modify existing common initialization procedures to suit our model-based clustering algorithm. The effectiveness of the proposed methods is assessed using simulated data and is illustrated by real data from time-course genomic experiments. Another common issue in gene expression analysis is the presence of missing values in the datasets. Various treatments to missing values in genomic datasets have been developed but limited work has been done on RNA-seq data. In the current work, we examine the performance of various imputation methods and their impact on the clustering of time-course RNA-seq data. We develop a cluster-based imputation method which is specifically suitable for dealing with missing values in RNA-seq datasets. Simulation studies are provided to assess the performance of the proposed imputation approach.

Recommended Citation

Chu, Man-Kee Maggie, "Statistical methods for the analysis of RNA sequencing data" (2014). Electronic Thesis and Dissertation Repository. 1935.
https://ir.lib.uwo.ca/etd/1935

Download

Included in

Biostatistics Commons, Longitudinal Data Analysis and Time Series Commons

COinS

Statistical methods for the analysis of RNA sequencing data

Degree

Program

Supervisor

Abstract

Recommended Citation

Included in

Links

Browse

Author Corner

Links

Statistical methods for the analysis of RNA sequencing data

Author

Degree

Program

Supervisor

Abstract

Recommended Citation

Included in

Share

Links

Browse

Author Corner

Links