Electronic Thesis and Dissertation Repository

Degree

Master of Science

Program

Computer Science

Supervisor

Kaizhong Zhang

Abstract

Ordered labelled trees are trees where each node has a label and the left-to-right order among siblings is significant. Ordered labelled forests are sequences of ordered labelled trees. Ordered labelled trees and forests are useful structures for hierarchical data representation. Given two ordered labelled forests F and G, the local forest similarity is to compute two sub-forests F' and G' of F and G respectively such that they are the most similar over all the possible F' and G'. Given a target forest F and a pattern forest G, the forest pattern matching problem is to compute a sub-forest F' of F which is the most similar to G over all the possible F'. This thesis presents novel efficient algorithms for the local forest similarity problem and forest pattern matching problem for sub-forest. An application of the algorithms is that it can be used to locate the structural regions in RNA secondary structures which is the necessity data in RNA secondary structure prediction and function investigation. RNA is a chain molecular, mathematically it is a string over a four letter alphabet; in computational molecular biology, labeled ordered trees are used to represent RNA secondary structures.

Share

COinS