Cosine Similarity for Article Section Classification: Using Structured Abstracts as a Proxy for an Annotated Corpus

Arthur T. Bugorski, The University of Western OntarioFollow

Degree

Master of Science

Program

Computer Science

Supervisor

Dr Robert Mercer

Abstract

During the last decade, the amount of research published in biomedical journals has grown significantly and at an accelerating rate. To fully explore all of this literature, new tools and techniques are needed for both information retrieval and processing. One such tool is the identification and extraction of key claims. In an e ort to work toward claim-extraction, we aim to identify the key areas in the body of the article referred to by text in the abstract. In this project, our work is preliminary to that goal in that we attempt to match specific clauses in the abstract with the section of the article body to which they refer. For our data, we use journal articles from PubMed with structured abstracts. Our technique is based on the cosine-measure of feature vectors using a bag-of-words approach. We refine our technique through the application of five di erent experimental variables: feature-weighting, word and bi-gram based feature-sets, text pre-processing, fixedexpression filtering, and di erent classifier heuristics. We found that the choice of classifier dominates all other considerations, and while their performance with feature-weighting is synergistic, other variables were found to have little or no e ffect.

Recommended Citation

Bugorski, Arthur T., "Cosine Similarity for Article Section Classification: Using Structured Abstracts as a Proxy for an Annotated Corpus" (2014). Electronic Thesis and Dissertation Repository. 2154.
https://ir.lib.uwo.ca/etd/2154

Download

Included in

Artificial Intelligence and Robotics Commons, Computational Linguistics Commons

COinS

Cosine Similarity for Article Section Classification: Using Structured Abstracts as a Proxy for an Annotated Corpus

Degree

Program

Supervisor

Abstract

Recommended Citation

Included in

Links

Browse

Author Corner

Links

Cosine Similarity for Article Section Classification: Using Structured Abstracts as a Proxy for an Annotated Corpus

Author

Degree

Program

Supervisor

Abstract

Recommended Citation

Included in

Share

Links

Browse

Author Corner

Links