Degree
Master of Science
Program
Computer Science
Supervisor
Professor Yuri Boykov
Abstract
This thesis proposes an optimization-based algorithm for detecting lines of text in images taken by hand-held cameras. The majority of existing methods for this problem assume alphabet-based texts (e.g. in Latin or Greek) and they use heuristics specific to such texts: proximity between letters within one line, larger distance between separate lines, etc. We are interested in a more challenging problem where images combine alphabet and logographic characters from multiple languages where typographic rules vary a lot (e.g. English, Korean, and Chinese). Significantly higher complexity of fitting multiple lines of text in different languages calls for an energy-based formulation combining a data fidelity term and a regularization prior. Our data cost combines geometric errors and likelihoods given by a classifier trained to low-level features in each language. Our regularization term encourages sparsity based on label costs. Our energy can be efficiently minimized by fusion moves. The algorithm was evaluated on a database of images from the subway of metropolitan area of Seoul and was proven to be robust.
Recommended Citation
Milevskiy, Igor, "Detecting Multilingual Lines of Text with Fusion Moves" (2013). Electronic Thesis and Dissertation Repository. 1780.
https://ir.lib.uwo.ca/etd/1780
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons