Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Computer Science


Mark Daley


The human brain is a complex, nonlinear dynamic chaotic system that is poorly understood. When faced with these difficult to understand systems, it is common to observe the system and develop models such that the underlying system might be deciphered. When observing neurological activity within the brain with functional magnetic resonance imaging (fMRI), it is common to develop linear models of functional connectivity; however, these models are incapable of describing the nonlinearities we know to exist within the system.

A genetic programming (GP) system was developed to perform symbolic regression on recorded fMRI data. Symbolic regression makes fewer assumptions than traditional linear tools and can describe nonlinearities within the system. Although GP is a powerful form of machine learning that has many drawbacks (computational cost, overfitting, stochastic), it may provide new insights into the underlying system being studied.

The contents of this thesis are presented in an integrated article format. For all articles, data from the Human Connectome Project were used.

In the first article, nonlinear models for 507 subjects performing a motor task were created. These nonlinear models generated by GP contained fewer ROI than what would be found with traditional, linear tools. It was found that the generated nonlinear models would not fit the data as well as the linear models; however, when compared to linear models containing a similar number of ROI, the nonlinear models performed better.

Ten subjects performing 7 tasks were studied in article two. After improvements to the GP system, the generated nonlinear models outperformed the linear models in many cases and were never significantly worse than the linear models.

Forty subjects performing 7 tasks were studied in article three. Newly generated nonlinear models were applied to unseen data from the same subject performing the same task (intrasubject generalization) and many nonlinear models generalized to unseen data better than the linear models. The nonlinear models were applied to unseen data from other subjects performing the same task (intersubject generalization) and were not capable of generalizing as well as the linear.