Electronic Thesis and Dissertation Repository


Master of Science


Computer Science


Kaizhong Zhang


Ordered labelled trees are trees where each node has a label and the left-to-right order among siblings is significant. Ordered labelled forests are sequences of ordered labelled trees. Ordered labelled trees and forests are useful structures for hierarchical data representation. Given two ordered labelled forests F and G, the local forest similarity is to compute two sub-forests F' and G' of F and G respectively such that they are the most similar over all the possible F' and G'. Given a target forest F and a pattern forest G, the forest pattern matching problem is to compute a sub-forest F' of F which is the most similar to G over all the possible F'. This thesis presents novel efficient algorithms for the local forest similarity problem and forest pattern matching problem for sub-forest. An application of the algorithms is that it can be used to locate the structural regions in RNA secondary structures which is the necessity data in RNA secondary structure prediction and function investigation. RNA is a chain molecular, mathematically it is a string over a four letter alphabet; in computational molecular biology, labeled ordered trees are used to represent RNA secondary structures.