Electronic Thesis and Dissertation Repository

Thesis Format

Monograph

Degree

Doctor of Philosophy

Program

Library & Information Science

Supervisor

Ajiferuke, Isola

Abstract

This study used the semantic similarity between citation contexts to develop one scheme for weighting direct citations, and another scheme for allocating residual citations to a publication from its nth citation generation level publication. A relationship between the new direct citation weighting scheme and each of five existing schemes was investigated while the new residual citation scheme was compared with the cascading citation scheme. Two datasets from biomedical publications were used for this study, one each for the direct and residual citation weighting aspects of the study. The sample for the direct citation aspect contained 100 publications that received 7317 citations, 11,234 citation contexts, and 9,795 citation context pairs. A sample of 981 citation context pairs was given to two human experts for annotation into “similar”, “somewhat similar”, and “not similar” classes. Semantic similarity scores between the 11,234 citation contexts were obtained using BioSent2Vec word-embedding model for biomedical publications. The residual citation aspect sample included ten base articles and five generations of citations from which 5272 citation context pairs were obtained. Results of the Spearman’s rank correlation test showed that the correlation coefficients between the proposed direct citation weighting scheme and each of the weighting schemes “number of positive sentiments,” “number of multiple citation mentions,” “sum of multiple citation mentions,” “number of citations,” and “number of citation mentions” were .83, .89, .89, .93, and .99 respectively. The average residual citations received from the 2nd, 3rd, 4th and 5th citation generation level papers were 0.47, 0.43, 0.40, and 0.37 respectively. These average residual citations were significantly different from the averages of 0.5, 0.25, 0.125, and 0.0625 suggested by the cascading citation scheme. Even though the proposed direct citation weighting scheme and the residual citation scheme require more complex computations, it is recommended that they should be considered as credible alternatives to the “number of citation mentions” and cascading citation scheme respectively.

Summary for Lay Audience

One of the objectives of evaluative bibliometrics, a branch of Information Science, is to fairly and appropriately quantify the contributions from previously written (cited) papers to the citing scientific paper. Citation mention count, which is the number of times a cited publication is mentioned in the citing paper, is a popular method for weighting contribution of citations, it however does not take into account citation contexts information (the wording associated with the in-text citations). Firstly, this research proposes a more nuanced weighting method that incorporates citation contexts into citation mention count. Secondly, this study exploits the citation context information to create a system for weighting residual citation, where residual citations are accumulated by a publication depending on its contributions to other publications on its citation path. Conversely, on a citation path A-B-C, publication A was cited by publication B and publication C cited publication B, publication A contributed to publication C if the citation context of publication A in publication B is similar to the citation context of publication B in publication C. Two datasets were used for this thesis; one each for the direct and residual citation weighting aspects of the study. The first sample contained 100 publications that received 7317 citations, 11,234 citation contexts, and 9,795 citation context pairs. The proposed semantic similarity-based weighting allocated more weights to unique citation contexts. The indirect citation sample included ten base articles and five generations of citations from which 5272 citation context pairs were obtained. Statistical test revealed the number of citation mention was the most similar metric to the proposed citation weight. This implies the proposed weighting method is similar to the citation mention method. Similar to the cascading citation system, knowledge flow from articles to their generations of citations decreased as the number of generations increased. However, residual citations accrued to publications at all the generations were statistically different between the proposed and existing systems. This implies the proposed residual citation weighting is different from the cascading citation system. Though the proposed metrics require deeper computation, they are more novel because they are based on the contribution of the cited publications.

Creative Commons License

Creative Commons Attribution 4.0 License
This work is licensed under a Creative Commons Attribution 4.0 License.

Share

COinS