Electronic Thesis and Dissertation Repository

Thesis Format



Master of Science


Computer Science


Mercer, Robert E.


The MEDLINE database currently comprises an extensive collection of over 35 million citations, with more than 1 million records being added each year [28]. The abundance of information available in the database presents a significant challenge in identifying and locating relevant research articles on a given search topic. This has prompted the development of various techniques and approaches aimed at improving the efficiency and effectiveness of information retrieval from the MEDLINE database. A search engine devoted to the research publications on MEDLINE is called PubMed. MeSH, or Medical Subject Headings, is a restricted vocabulary used by PubMed to categorize each article. Human annotators have been used for decades, which is not only time-consuming but also expensive. Due to its enormously complex hierarchically ordered structure, MeSH indexing is a difficult problem in the machine learning domain. We propose a model which addresses all these challenges. We propose an end-to-end model that takes MeSH description into account and combines it with a Knowledge Enhanced Mask attention model to index new research papers. We also calculated the journal correlation of each MeSH term in the MeSH hierarchy.

Summary for Lay Audience

The growth of research in the medical field has resulted in an overwhelming number of research papers, making it challenging to find relevant articles on a particular topic. PubMed, a scientific and biomedical research paper search engine, categorizes each research paper using Medical Subject Headings (MeSH) terms. These MeSH terms serve as tags to categorize research articles. Traditionally, human annotators have manually tagged each article according to its relevant MeSH term, which is a time-consuming and costly process. To address these challenges, we propose an end-to-end model that can automatically analyze a new research paper based on its abstract and title and label it with the relevant MeSH terms.

In this thesis, we aimed to simplify the extraction of MeSH terms by considering their structure and descriptions to determine their semantic meaning. We analyzed research papers in the PubMed database based on their abstracts and titles. By matching the semantic meaning of MeSH words to the text analysis, we automatically assigned the relevant MeSH terms to each paper.