Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Master of Science

Program

Computer Science

Supervisor

Madhavji, Nazim

Abstract

[Context and motivation] Project contracts for building a system contain a large number of cross-references to regulatory documents such as environmental regulations, quality standards, and regulatory “codes”. The system being developed must comply with regulatory requirements in such documents. Thus, a domain expert needs to read and interpret the relevant regulatory documents. [Problem] This can be an arduous and time-consuming task in large projects because the relevant regulatory requirements may be scattered across numerous regulatory documents. [Principal idea and novelty] The text prior to or following an external cross-reference in a contract contains information that can assist in automatically locating relevant information from the target regulatory documents. This study used dependency parsing, Part of Speech tagging and Regular Expression to extract the Target Phrase, which is the text referencing more elaborate content in the cited external document, and the target position, which is the location of the referenced text within the external document. The study then conducted a search operation using Elasticsearch and query DSL to retrieve relevant information from the cited legal documents and standards. [Research Contribution] This thesis describes a software solution that, to our knowledge, for the first time automatically extracts requirement-related information from external documents cross-referenced in the contract. [Conclusion] The final output displays the relevant text, the content of relevant pages and the page number for a corresponding regulatory requirement ordered by relevance score. For Target Phrase extraction, we obtained Precision = 0.81, Recall = 0.98 and F-measure = 0.89. We obtained Precision = 1 and Recall = 1 in target position extraction. Automatically extracting the relevant information from disparate sources will save an enormous amount of time and reduce workload for requirement analysts and domain experts.

Summary for Lay Audience

Project contracts for building a system contain a large number of internal and external cross-references to regulatory documents such as environmental regulations, quality standards, and regulatory “codes”. External Cross-references are citations that refer to a fragment of text within an external legal document. The system being developed must comply with these regulations to avoid defective or sub-standard systems, customer dissatisfaction and potential penalties for violating the law. Thus, domain experts and requirement analysts need to read and interpret the relevant regulatory documents, but this can be an arduous and time-consuming task in large projects because the relevant regulatory requirements may be scattered across numerous regulatory documents. The text prior to or following an external cross-reference in a contract contains information that can assist in automatically locating relevant information from the target regulatory documents. This study is the first to extract the Target Phrase, which is the text referencing more elaborate content in the cited external document, and the target position, which is the location of the referenced text within the external document, to automatically find relevant information from the target regulatory documents in the contract. In this study such keywords and key phrases were extracted automatically using the dependency structure of the sentences and after that a search operation was used to search the Target Phrase within the text of those documents to find any possible matches. Automatically extracting the relevant information from disparate sources will save an enormous amount of time and reduce workloads for domain experts. This method will make the work of domain experts more efficient.

Share

COinS