Date of Award

2009

Degree Type

Thesis

Degree Name

Master of Science

Program

Computer Science

Supervisor

Lila Kari

Second Advisor

Kathleen Hill

Abstract

String complexity has many definitions: Kolmogorov complexity [30]; Lempel-Ziv complexity [14] [27]; Linguistic complexity [42], Subword complexity [10] etc. In this thesis we will consider the n-subword complexity studied in [2] and [13]. The n-subword complexity Pw(n) of a genomic sequence w was defined in [13] as the number of distinct factors (subwords) of length n that occur in w. In [2] a new measure called the n-subword deficit was defined as the difference between the number of subwords of length n of a genomic sequence w and of a random genomic sequence of the same length. This definition was applied to short sequences (2000 base pairs). In this thesis, we will expand this definition to be applied, in addition to short sequences, also to very long sequences (from 100 base pairs to 200,000 base pairs). The aim of our work is to answer the following questions: 1. Do biological sequences show an n-subword deficit, and is their n- subword deficit length dependent? 2. Is the n-subword deficit gene specific? 3. Is the n-subword deficit genome specific? Our results indicate that the answers to questions 1 — 3 appears to be Yes, No, and No respectively. Moreover, it was found that the insects Apis mellifera and Drosophila melanogaster have genomes with the lowest maximal n-subword deficit value among other genomes in all experiments that have been conducted.

Recommended Citation

Shihadeh, Noman, "n-Subword Complexity Measure of DNA Sequences" (2009). Digitized Theses. 3841.
https://ir.lib.uwo.ca/digitizedtheses/3841

Download

COinS

Digitized Theses

n-Subword Complexity Measure of DNA Sequences

Date of Award

Degree Type

Degree Name

Program

Supervisor

Second Advisor

Abstract

Recommended Citation

Links

Browse

Author Corner

Links

Digitized Theses

n-Subword Complexity Measure of DNA Sequences

Author

Date of Award

Degree Type

Degree Name

Program

Supervisor

Second Advisor

Abstract

Recommended Citation

Share

Links

Browse

Author Corner

Links