Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Computer Science


Professor Lila Kari


The study of formal language operations inspired by enzymatic actions on DNA is part of ongoing efforts to provide a formal framework and rigorous treatment of DNA-based information and DNA-based computation. Other studies along these lines include theoretical explorations of splicing systems, insertion-deletion systems, substitution, hairpin extension, hairpin reduction, superposition, overlapping concatenation, conditional concatenation, contextual intra- and intermolecular recombinations, as well as template-guided recombination.

First, a formal language operation is proposed and investigated, inspired by the naturally occurring phenomenon of DNA primer extension by a DNA-template-directed DNA polymerase enzyme. Given two DNA strings u and v, where the shorter string v (called the primer) is Watson-Crick complementary and can thus bind to a substring of the longer string u (called the template) the result of the primer extension is a DNA string that is complementary to a suffix of the template which starts at the binding position of the primer. The operation of DNA primer extension can be abstracted as a binary operation on two formal languages: a template language L1 and a primer language L2. This language operation is called L1-directed extension of L2 and the closure properties of various language classes, including the classes in the Chomsky hierarchy, are studied under directed extension. Furthermore, the question of finding necessary and sufficient conditions for a given language of target strings to be generated from a given template language when the primer language is unknown is answered. The canonic inverse of directed extension is used in order to obtain the optimal solution (the minimal primer language) to this question.

The second research project investigates properties of the binary string and language operation overlap assembly as defined by Csuhaj-Varju, Petre and Vaszil as a formal model of the linear self-assembly of DNA strands: The overlap assembly of two strings, xy and yz, which share an "overlap" y, results in the string xyz. In this context, we investigate overlap assembly and its properties: closure properties of various language families under this operation, and related decision problems. A theoretical analysis of the possible use of iterated overlap assembly to generate combinatorial DNA libraries is also given.

The third research project continues the exploration of the properties of the overlap assembly operation by investigating closure properties of various language classes under iterated overlap assembly, and the decidability of the completeness of a language. The problem of deciding whether a given string is terminal with respect to a language, and the problem of deciding if a given language can be generated by an overlap assembly operation of two other given languages are also investigated.