Electronic Thesis and Dissertation Repository

Degree

Doctor of Philosophy

Program

Computer Science

Supervisor

Charles X. Ling

Abstract

In traditional active learning, learning algorithms (or learners) mainly focus on the performance of the final model built and the total number of queries needed for learning a good model. However, in many real-world applications, active learners have to focus on the learning process for achieving finer goals, such as minimizing the number of mistakes in predicting unlabeled examples. These learning goals are common and important in real-world applications. For example, in direct marketing, a sales agent (learner) has to focus on the process of selecting customers to approach, and tries to make correct predictions (i.e., fewer mistakes) on the customers who will buy the product.

However, traditional active learning algorithms cannot achieve the finer learning goals due to the different focuses. In this thesis, we study how to control the learning process in active learning such that those goals can be accomplished. According to various learning tasks and goals, we address four new active paradigms as follows.

The first paradigm is learning actively and conservatively. Under this paradigm, the learner actively selects and predicts the most certain example (thus, conservatively) iteratively during the learning process. The goal of this paradigm is to minimize the number of mistakes in predicting unlabeled examples during active learning. Intuitively the conservative strategy is less likely to make mistakes, i.e., more likely to achieve the learning goal. We apply this new learning strategy in an educational software, as well as direct marketing.

The second paradigm is learning actively and aggressively. Under this paradigm, unlabeled examples and multiple oracles are available. The learner actively selects the best multiple oracles to label the most uncertain example (thus, aggressively) iteratively during the learning process. The learning goal is to learn a good model with guaranteed label quality.

The third paradigm is learning actively with conservative-aggressive tradeoff. Under this learning paradigm, firstly, unlabeled examples are available and learners are allowed to select examples actively to learn. Secondly, to obtain the labels, two actions can be considered: querying oracles and making predictions. Lastly, cost has to be paid for querying oracles or for making wrong predictions. The tradeoff between the two actions is necessary for achieving the learning goal: minimizing the total cost for obtaining the labels.

The last paradigm is learning actively with minimal/maximal effort. Under this paradigm, the labels of the examples are all provided and learners are allowed to select examples actively to learn. The learning goal is to control the learning process by selecting examples actively such that the learning can be accomplished with minimal effort or a good model can be built fast with maximal effort.

For each of the four learning paradigms, we propose effective learning algorithms accordingly and demonstrate empirically that related learning problems in real applications can be solved well and the learning goals can be accomplished.

In summary, this thesis focuses on controlling the learning process to achieve fine goals in active learning. According to various real application tasks, we propose four novel learning paradigms, and for each paradigm we propose efficient learning algorithms to solve the learning problems. The experimental results show that our learning algorithms outperform other state-of-the-art learning algorithms.

Share

COinS