Electronic Thesis and Dissertation Repository

Framework for Bug Inducing Commit Prediction Using Quality Metrics

Alireza Tavakkoli Barzoki, Western University

Abstract

This thesis relates to the topic of software defect prediction within the broader area of continuous software engineering. The approach presented in this thesis is employing source code and process metrics obtained for each commit, and is examining as to whether specific patterns, as the system moves from one commit to another, can predict an impending bug inducing commit. The thesis utilizes the SonarQube Technical Debt open source data which provides source code metrics and process metrics for each commit in 22 medium to large scale open source Apache projects.

Central to this research is the novel utilization of commits to trace transitions to bug-inducing commits, facilitating the construction of a predictive model. In this approach, each commit is denoted by a vector of metrics values which have undergone pre-processing so can be efficiently used. Each such a vector defines the “state” of a commit. A significant portion of the methodology is devoted to meticulous data preparation and analysis, including the delineation of commit transitions, feature selection, and rigorous data cleansing. This rigorous process is aimed at enhancing the precision and accuracy of pattern recognition, particularly in identifying transitions leading to bug-inducing commits.

Through the integration of advanced methodologies encompassing correlation analysis, clustering techniques (including K-Means and Hierarchical clustering), and a suite of classification strategies such as KNN, Decision Trees, and innovative percentile-based classification, the study aims to identify emerging vector metrics state transition patterns which may be indicative of potential software bugs.

The results indicate that the proposed technique is promising on recognizing patterns indicative of potential impending bug inducing commits and sheds light on the practical implications of utilizing commit transitions in defect prediction strategies, offering insights into enhancing software development processes.