Thesis Format

Monograph

Ranking comments: An Entropy-based Method with Word Embedding Clustering

Yuyang Zhang, The University of Western OntarioFollow

Degree

Master of Science

Program

Statistics and Actuarial Sciences

Supervisor

Yu, Hao

Abstract

Automatically ranking comments by their relevance plays an important role in text mining and text summarization area. In this thesis, firstly, we introduce a new text digitalization method: the bag of word clusters model. Unlike the traditional bag of words model that treats each word as an independent item, we group semantic-related words as clusters using pre-trained word2vec word embeddings and represent each comment as a distribution of word clusters. This method can extract both semantic and statistical information from texts. Next, we propose an unsupervised ranking algorithm that identifies relevant comments by their distance to the “ideal” comment. The “ideal” comment is the maximum general entropy comment with respect to the global word cluster distribution. The intuition is that the “ideal” comment highlights aspects of a product that many other comments frequently mention. Therefore, it can be regarded as a standard to judge a comment’s relevance to this product. At last, we analyze our algorithm’s performance on a real Amazon product.

Summary for Lay Audience

Gathering information based on other people’s opinions is an essential part of the purchasing decision process. With the rapid growth of the Internet, these conversations in online markets provide a large amount of product information. So when doing online shopping, consumers rely on online product comments, posted by other consumers, for their purchase decisions.

In this thesis, we propose a new method to identify relevant comments under a product. Our method is sensitive to the content of a comment and can successfully filter out unrelated comments. By ranking these relevant comments higher, consumers can better evaluate the true underlying quality of a product.

Recommended Citation

Zhang, Yuyang, "Ranking comments: An Entropy-based Method with Word Embedding Clustering" (2020). Electronic Thesis and Dissertation Repository. 7300.
https://ir.lib.uwo.ca/etd/7300

Download

Included in

Applied Statistics Commons, Data Science Commons

COinS

Thesis Format

Ranking comments: An Entropy-based Method with Word Embedding Clustering

Degree

Program

Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Included in

Links

Browse

Author Corner

Links

Thesis Format

Ranking comments: An Entropy-based Method with Word Embedding Clustering

Author

Degree

Program

Supervisor

Abstract

Summary for Lay Audience

Recommended Citation

Included in

Share

Links

Browse

Author Corner

Links