Electronic Thesis and Dissertation Repository

Thesis Format

Integrated Article


Master of Engineering Science


Electrical and Computer Engineering

Collaborative Specialization

Artificial Intelligence


Grolinger, Katarina


Smart meter data are crucial for power grid management. However, missing data caused by communication and device failures reduce the quality of the data. While machine learning- based solutions have been proposed for missing data imputation, recent developments in generative models have created opportunities for improvements. This thesis proposes two approaches for missing data imputation in smart meter data. The first approach, Conditional Denoising Diffusion Model, leverages diffusion models to generate coherent imputations based on the daily load profile together with a guidance mechanism that captures histor- ical context. Our approach outperforms existing techniques, especially for a substantial number of random or consecutive missing points by achieving 11.33% lower normalized root mean square error than the compared methods when 40% of the points are missing. However, since it is a deep learning technique, it requires abundant data and computational resources. The second approach, Temporally Chained Equations, reduces computational and data requirements by imputing missing points iteratively using lag and lead features, local normalization, and linear regression. It outperforms compared baselines in random missing points scenarios by achieving 6.32% lower normalized root mean square error in random missing scenarios. However, its performance reduces when many consecutive points are missing.

Summary for Lay Audience

Smart meters are essential for managing the power grid, but sometimes they drop a few readings due to problems like communication or device failures. This can make it hard to get a complete picture of how much energy people are using. To help solve this problem, many data-driven machine learning and deep learning-based solutions have been proposed. However recent developments in generative deep learning techniques have created opportunities for improvements. Hence, in this thesis, we propose two ways to fill in the missing data. The first approach, Conditional Denoising Diffusion Model, is a deep learning-based approach that uses a technique called diffusion models. It estimates missing points based on what it knows about the surrounding data points and past similar data points, making sure that the estimated points make sense with the rest of the information. The second approach, Temporally Chained Equations, is designed with the aim of lowering computational power and training data requirements. The second method involves using a few data points before and after the missing values and combining them with a linear regression model to fill in the gaps. This is done iteratively to refine the estimation for the missing points. We evaluate both approaches under multiple missing data points scenarios and show that our proposed methods can reliably impute missing points.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Available for download on Monday, April 21, 2025