Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Doctor of Philosophy


Applied Mathematics

Collaborative Specialization

Scientific Computing


Karttunen, Mikko


In scientific research, understanding and modeling physical systems often involves working with complex equations called Partial Differential Equations (PDEs). These equations are essential for describing the relationships between variables and their derivatives, allowing us to analyze a wide range of phenomena, from fluid dynamics to quantum mechanics. Traditionally, the discovery of PDEs relied on mathematical derivations and expert knowledge. However, the advent of data-driven approaches and machine learning (ML) techniques has transformed this process. By harnessing ML techniques and data analysis methods, data-driven approaches have revolutionized the task of uncovering complex equations that describe physical systems. The primary goal in this thesis is to develop methodologies that can automatically extract simplified equations by training models using available data. ML algorithms have the ability to learn underlying patterns and relationships within the data, making it possible to extract simplified equations that capture the essential behavior of the system. This study considers three distinct learning categories: black-box, gray-box, and white-box learning.

The initial phase of the research focuses on black-box learning, where no prior information about the equations is available. Three different neural network architectures are explored: multi-layer perceptron (MLP), convolutional neural network (CNN), and a hybrid architecture combining CNN and long short-term memory (CNN-LSTM). These neural networks are applied to uncover the non-linear equations of motion associated with phase-field models, which include both non-conserved and conserved order parameters.

The second architecture explored in this study addresses explicit equation discovery in gray-box learning scenarios, where a portion of the equation is unknown. The framework employs eXtended Physics-Informed Neural Networks (X-PINNs) and incorporates domain decomposition in space to uncover a segment of the widely-known Allen-Cahn equation. Specifically, the Laplacian part of the equation is assumed to be known, while the objective is to discover the non-linear component of the equation. Moreover, symbolic regression techniques are applied to deduce the precise mathematical expression for the unknown segment of the equation.

Furthermore, the final part of the thesis focuses on white-box learning, aiming to uncover equations that offer a detailed understanding of the studied system. Specifically, a coarse parametric ordinary differential equation (ODE) is introduced to accurately capture the spreading radius behavior of Calcium-magnesium-aluminosilicate (CMAS) droplets. Through the utilization of the Physics-Informed Neural Network (PINN) framework, the parameters of this ODE are determined, facilitating precise estimation. The architecture is employed to discover the unknown parameters of the equation, assuming that all terms of the ODE are known. This approach significantly improves our comprehension of the spreading dynamics associated with CMAS droplets.

Summary for Lay Audience

This thesis is centered around the application of machine learning techniques for uncovering hidden patterns and equations in complex physical systems. It showcases the transformative potential of machine learning and data-driven approaches in revolutionizing the process of understanding and describing complex equations. Traditional methods of deriving equations from data often require significant time and expertise. However, with the advent of data-driven approaches and machine learning, we can automate and improve this process. The thesis delves into three distinct learning approaches: black-box, gray-box, and white-box learning. Through these approaches, the thesis explores different ways of learning and extracting insights from data, ranging from scenarios where no prior knowledge of the equations is available to cases where some parts of the equations are known.

In black-box learning, neural network models are developed to uncover non-linear equations governing phase-field models without any prior knowledge of the equations. These models capture the behavior of the systems solely based on the provided data.

In gray-box learning, extended physics-informed neural networks (X-PINNs) are employed to reveal unknown components of an equation. By incorporating domain decomposition and symbolic regression techniques, we can determine the missing components of the equation using the available data.

Finally, in white-box learning, the primary objective is to achieve a comprehensive understanding of a system through the utilization of the physics-informed neural network (PINN) framework. Specifically, this approach focuses on predicting the parameters of a coarse parametric ordinary differential equation (ODE) that accurately characterizes the spreading radius behavior of CMAS droplets.