Faculty
Engineering
Supervisor Name
Miriam Capretz
Keywords
Machine Learning, Data preprocessing, Feature encoding, Predictive Maintenance
Description
Data preprocessing is an essential step when building machine learning solutions. It significantly impacts the success of machine learning modules and the output of these algorithms. Typically, data preprocessing is made-up of data sanitization, feature engineering, normalization, and transformation. This paper outlines the data preprocessing methodology implemented for a data-driven predictive maintenance solution. The above-mentioned project entails acquiring historical electrical data from industrial assets and creating a health index indicating each asset's remaining useful life. This solution is built using machine learning algorithms and requires several data processing steps to increase the solution's accuracy and efficiency. In this project, the preprocessing measures implemented are data sanitization, daylight savings transformation, feature encoding, and data normalization. The purpose and results of each of the above steps are explained to highlight the importance of data preprocessing in machine learning projects.
Acknowledgements
Thank you to Dr. Miriam Capretz, Dr. Luisa Liboni, and Ruiqi Tian for supporting me throughout this incredible oppurtunity.
Creative Commons License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License
Document Type
Paper
Poster
- Usage
- Downloads: 212
- Abstract Views: 138
Data Preprocessing for Machine Learning Modules
Data preprocessing is an essential step when building machine learning solutions. It significantly impacts the success of machine learning modules and the output of these algorithms. Typically, data preprocessing is made-up of data sanitization, feature engineering, normalization, and transformation. This paper outlines the data preprocessing methodology implemented for a data-driven predictive maintenance solution. The above-mentioned project entails acquiring historical electrical data from industrial assets and creating a health index indicating each asset's remaining useful life. This solution is built using machine learning algorithms and requires several data processing steps to increase the solution's accuracy and efficiency. In this project, the preprocessing measures implemented are data sanitization, daylight savings transformation, feature encoding, and data normalization. The purpose and results of each of the above steps are explained to highlight the importance of data preprocessing in machine learning projects.