Electronic Thesis and Dissertation Repository


Doctor of Philosophy


Library & Information Science


Burkell, Jacquelyn


The past decade has witnessed a dramatic expansion in the amount of publicly available health care information on the Web. The health care information on the web, however, is of extremely variable quality. The evaluation of content quality is a big challenge because non-automated methods for information content rating can be easily overwhelmed by the huge data volume. This study proposes an automated approach for assessing the quality of web health care information through comparing the text content with evidence-based health care recommendations. This method relies on semantic analysis and text classification to identify the presentation of evidence-based recommendations in web documents. As a result, the semantics-based rating approach is able to rate quality based on information content, rather than using indirect quality indicators such as website authorship, sponsorship, or text keywords as used in previous studies. Two systems were built to implement the semantics-based quality rating: a rule-based system and a prototypical machine learning system. The performance of both implementations was evaluated by comparing the automated quality rating results with human rating results on the same set of depression treatment web pages. The evaluation demonstrates that the automatically generated rating results using the semantics-based approach are comparable to those from human raters: that is, there is a high Pearson correlation between computer ratings and human rating results. The semantics-based approach has an advantage over previous automated approaches in that it produces quality rating results that present to information consumers feedback that is more instructive than just a quality score.