Electronic Thesis and Dissertation Repository

Degree

Doctor of Philosophy

Program

Computer Science

Supervisor

Charles X. Ling

Abstract

Most of the existing web search engines (such as Google and Bing) are in the form of keyword-based search. Typically, after the user issues a query with the keywords, the search engine will return a flat list of results. When the query issued by the user is related to a topic, only the keyword matching may not accurately retrieve the whole set of webpages in that topic. On the other hand, there exists another type of search system, particularly in e-Commerce web- sites, where the user can search in the categories of different faceted hierarchies (e.g., product types and price ranges). Is it possible to integrate the two types of search systems and build a web search engine with a topic hierarchy? The main diffculty is how to classify the vast number of webpages on the Internet into the topic hierarchy. In this thesis, we will leverage machine learning techniques to automatically classify webpages into the categories in our hierarchy, and then utilize the classification results to build the new search engine SEE. The experimental results demonstrate that SEE can achieve better search results than the traditional keyword-based search engine in most of the queries, particularly when the query is related to a topic. We also conduct a small-scale usability study which further verifies that SEE is a promising search engine. To further improve SEE, we also propose a new active learning framework with several novel strategies for hierarchical classification.

Share

COinS