Date of Award

2008

Degree Type

Thesis

Degree Name

Master of Science

Program

Computer Science

Supervisor

Dr. Hanan Lutfiyya

Abstract

Problem determination and fault localization have become a central part of manag­ ing large-scale distributed systems. The need to avoid downtime becomes paramount as the systems grow larger and more complex and the applications become integrated into today's business model for most corporations. In this thesis the problem of fault localization in large distributed systems is considered. A new technique, developed from a combination of data mining and rule-based systems, for automatically detecting failures and localizing software faults in a distributed environment is presented. The technique leverages the wealth of information about the runtime behavior of software contained within existing log files. The proposed system is designed to extract information from multiple existing log sources and create a rule set based on frequently occurring episodes of events in the log files. The rule set is able to detect the presence of a failure and start a localization algo­ rithm for determining the location of the fault.

Share

COinS
 
 

To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.