Faculty

Engineering

Supervisor Name

Miriam Capretz

Keywords

Machine Learning, Data preprocessing, Feature encoding, Predictive Maintenance

Description

Data preprocessing is an essential step when building machine learning solutions. It significantly impacts the success of machine learning modules and the output of these algorithms. Typically, data preprocessing is made-up of data sanitization, feature engineering, normalization, and transformation. This paper outlines the data preprocessing methodology implemented for a data-driven predictive maintenance solution. The above-mentioned project entails acquiring historical electrical data from industrial assets and creating a health index indicating each asset's remaining useful life. This solution is built using machine learning algorithms and requires several data processing steps to increase the solution's accuracy and efficiency. In this project, the preprocessing measures implemented are data sanitization, daylight savings transformation, feature encoding, and data normalization. The purpose and results of each of the above steps are explained to highlight the importance of data preprocessing in machine learning projects.

Acknowledgements

Thank you to Dr. Miriam Capretz, Dr. Luisa Liboni, and Ruiqi Tian for supporting me throughout this incredible oppurtunity.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Document Type

Paper

Research Poster.pdf (535 kB)
Poster

Share

COinS
 

Data Preprocessing for Machine Learning Modules

Data preprocessing is an essential step when building machine learning solutions. It significantly impacts the success of machine learning modules and the output of these algorithms. Typically, data preprocessing is made-up of data sanitization, feature engineering, normalization, and transformation. This paper outlines the data preprocessing methodology implemented for a data-driven predictive maintenance solution. The above-mentioned project entails acquiring historical electrical data from industrial assets and creating a health index indicating each asset's remaining useful life. This solution is built using machine learning algorithms and requires several data processing steps to increase the solution's accuracy and efficiency. In this project, the preprocessing measures implemented are data sanitization, daylight savings transformation, feature encoding, and data normalization. The purpose and results of each of the above steps are explained to highlight the importance of data preprocessing in machine learning projects.

 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.