Electronic Thesis and Dissertation Repository

Degree

Master of Science

Program

Computer Science

Supervisor(s)

Professor Yuri Boykov

Abstract

This thesis proposes an optimization-based algorithm for detecting lines of text in images taken by hand-held cameras. The majority of existing methods for this problem assume alphabet-based texts (e.g. in Latin or Greek) and they use heuristics specific to such texts: proximity between letters within one line, larger distance between separate lines, etc. We are interested in a more challenging problem where images combine alphabet and logographic characters from multiple languages where typographic rules vary a lot (e.g. English, Korean, and Chinese). Significantly higher complexity of fitting multiple lines of text in different languages calls for an energy-based formulation combining a data fidelity term and a regularization prior. Our data cost combines geometric errors and likelihoods given by a classifier trained to low-level features in each language. Our regularization term encourages sparsity based on label costs. Our energy can be efficiently minimized by fusion moves. The algorithm was evaluated on a database of images from the subway of metropolitan area of Seoul and was proven to be robust.


Share

COinS