# Decomposition and Descriptional Complexity of Shuffle on Words and Finite Languages

2009

Thesis

## Degree Name

Doctor of Philosophy

Computer Science

Mark Daley

Kai Salomaa

## Abstract

We investigate various questions related to the shuffle operation on words and finite languages.

First we investigate a special variant of the shuffle decomposition problem for regular languages, namely, when the given regular language is the shuffle of finite languages. The shuffle decomposition into finite languages is, in general not unique. Thatis,therearelanguagesL^,L2,L3,L4withLiluL2= £3luT4but{L\,L2}^

{I/3, L4}. However, if all four languages are singletons (with at least two combined letters), it follows by a result of Berstel and Boasson [6], that the solution is unique; that is {L\,L2} = {L3,L4}. We extend this result to show that if L\ and L2 are arbitrary finite sets and Lz and Z-4 are singletons (with at least two letters in each), the solution is unique. This is as strong as it can be, since we provide examples showing that the solution can be non-unique already when (1) both L\ and L2 are singleton sets over different unary alphabets; or (2) L\ contains two words and L2 is singleton.

We furthermore investigate the size of shuffle automata for words. It was shown by Campeanu, K. Salomaa and Yu in [11] that the minimal shuffle automaton of two regular languages requires 2mn states in the worst case (where the minimal automata of the two component languages had m and n states, respectively). It was also recently shown that there exist words u and v such that the minimal shuffle

iii

DFA for u and v requires an exponential number of states. We study the size of shuffle DFAs for restricted cases of words, namely when the words u and v are both periods of a common underlying word. We show that, when the underlying word obeys certain conditions, then the size of the minimal shuffle DFA for u and v is at most quadratic.

Moreover we provide an efficient algorithm, which decides for a given DFA A and two words u and v, whether u lu u C L(A).

COinS