Master of Science
Professor Yuri Boykov
This thesis proposes an optimization-based algorithm for detecting lines of text in images taken by hand-held cameras. The majority of existing methods for this problem assume alphabet-based texts (e.g. in Latin or Greek) and they use heuristics specific to such texts: proximity between letters within one line, larger distance between separate lines, etc. We are interested in a more challenging problem where images combine alphabet and logographic characters from multiple languages where typographic rules vary a lot (e.g. English, Korean, and Chinese). Significantly higher complexity of fitting multiple lines of text in different languages calls for an energy-based formulation combining a data fidelity term and a regularization prior. Our data cost combines geometric errors and likelihoods given by a classifier trained to low-level features in each language. Our regularization term encourages sparsity based on label costs. Our energy can be efficiently minimized by fusion moves. The algorithm was evaluated on a database of images from the subway of metropolitan area of Seoul and was proven to be robust.
Milevskiy, Igor, "Detecting Multilingual Lines of Text with Fusion Moves" (2013). Electronic Thesis and Dissertation Repository. 1780.