Thesis Format

Monograph

Automatic extraction of requirements-related information from regulatory documents cited in the project contract

Sara Fotouhi, The University of Western OntarioFollow

Degree

Master of Science

Program

Computer Science

Supervisor

Madhavji, Nazim

Abstract

[Context and motivation] Project contracts for building a system contain a large number of cross-references to regulatory documents such as environmental regulations, quality standards, and regulatory “codes”. The system being developed must comply with regulatory requirements in such documents. Thus, a domain expert needs to read and interpret the relevant regulatory documents. [Problem] This can be an arduous and time-consuming task in large projects because the relevant regulatory requirements may be scattered across numerous regulatory documents. [Principal idea and novelty] The text prior to or following an external cross-reference in a contract contains information that can assist in automatically locating relevant information from the target regulatory documents. This study used dependency parsing, Part of Speech tagging and Regular Expression to extract the Target Phrase, which is the text referencing more elaborate content in the cited external document, and the target position, which is the location of the referenced text within the external document. The study then conducted a search operation using Elasticsearch and query DSL to retrieve relevant information from the cited legal documents and standards. [Research Contribution] This thesis describes a software solution that, to our knowledge, for the first time automatically extracts requirement-related information from external documents cross-referenced in the contract. [Conclusion] The final output displays the relevant text, the content of relevant pages and the page number for a corresponding regulatory requirement ordered by relevance score. For Target Phrase extraction, we obtained Precision = 0.81, Recall = 0.98 and F-measure = 0.89. We obtained Precision = 1 and Recall = 1 in target position extraction. Automatically extracting the relevant information from disparate sources will save an enormous amount of time and reduce workload for requirement analysts and domain experts.

Summary for Lay Audience

Project contracts for building a system contain a large number of internal and external cross-references to regulatory documents such as environmental regulations, quality standards, and regulatory “codes”. External Cross-references are citations that refer to a fragment of text within an external legal document. The system being developed must comply with these regulations to avoid defective or sub-standard systems, customer dissatisfaction and potential penalties for violating the law. Thus, domain experts and requirement analysts need to read and interpret the relevant regulatory documents, but this can be an arduous and time-consuming task in large projects because the relevant regulatory requirements may be scattered across numerous regulatory documents. The text prior to or following an external cross-reference in a contract contains information that can assist in automatically locating relevant information from the target regulatory documents. This study is the first to extract the Target Phrase, which is the text referencing more elaborate content in the cited external document, and the target position, which is the location of the referenced text within the external document, to automatically find relevant information from the target regulatory documents in the contract. In this study such keywords and key phrases were extracted automatically using the dependency structure of the sentences and after that a search operation was used to search the Target Phrase within the text of those documents to find any possible matches. Automatically extracting the relevant information from disparate sources will save an enormous amount of time and reduce workloads for domain experts. This method will make the work of domain experts more efficient.

Recommended Citation

Fotouhi, Sara, "Automatic extraction of requirements-related information from regulatory documents cited in the project contract" (2021). Electronic Thesis and Dissertation Repository. 8122.
https://ir.lib.uwo.ca/etd/8122

Download

COinS

Thesis Format

Automatic extraction of requirements-related information from regulatory documents cited in the project contract

Degree

Program

Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Links

Browse

Author Corner

Links

Thesis Format

Automatic extraction of requirements-related information from regulatory documents cited in the project contract

Author

Degree

Program

Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Share

Links

Browse

Author Corner

Links