摘要
:
Text Line Segmentation (TLS) methods are intended to locate and separate text lines in document images for different stages of image analysis such as word spotting, keyword search, text alignment, text recognition and other stages...
展开
Text Line Segmentation (TLS) methods are intended to locate and separate text lines in document images for different stages of image analysis such as word spotting, keyword search, text alignment, text recognition and other stages of indexation involved in the retrieval of information from handwritten documents. The design of the proposed methods for the TLS and the tuning of their parameters assume a level of complexity according to the language and the writing style of a document collection. Therefore, the performance of these methods is not maintained against documents of greater or lesser complexity. In this paper, we present TLS-ICI, a TLS Intrinsic Complexity Index that allows measuring the complexity of a document for the TLS task, without the necessity of a human gold standard. Through experimentation, we demonstrate how our proposed TLS-ICI provides an order to both the TLS methods and the image-based handwritten documents. In this way, with our proposed complexity index it is possible to select the most appropriated method for each document of a collection, reducing the time spent in exhaustive tests and increasing the performance. In addition, we demonstrate through a new hybrid TLS method that the TLS-ICI outperforms previous individual TLS methods. The dataset consists of several standard TLS collections of contemporary and ancient texts from different languages and alphabets such as English, Spanish, Arabic, and Chinese, Greek, Khmer, Persian, Bengali, Oriya, Kannada and Nahuatl.
收起