![]() In addition, we introduce the newer perceptual metric LPIPS as a benchmark metric for document dewarping. For our extensive evaluation, we trained DewarpNet without refinement network and GeoTr in a unified framework and evaluate both approaches on the Doc3D dataset, as well as Inv3D, making use of established metrics such as MS-SSIM, LD, ED, and CER. The subsequent transformer encoder–decoder module learns the attention between all pairs of features which enables the model to combine the warped image with our structural template information. We encode both the warped image and our template image using a convolutional neural network and combine the feature representations. ![]() We then propose a novel supervised dewarping approach, referred to as GeoTrTemplate, which exploits the novel structured information by extending the recent GeoTr algorithm. Inv3D consists of 25,000 samples, each composed of four flatbed invoice image layers, two ground-truth annotations, the 3D warped document, nine supervision signal maps, and the backward transformation map (see Fig. More specifically, we present Inv3D, a large, high-resolution invoice dataset comprising both synthetic data generated from carefully designed templates and challenging real-world data. In this paper, we follow exactly this path and propose a novel labeled invoice dataset with additional structural information to assist image dewarping. While it might be tedious to define initial templates, the added value is significant due to the considerable increase in dewarping precision and robustness. One potential remedy is the use of additional structured information in the form of templates, which confine the general structure of the documents in order to improve unwarping. While these approaches generate promising results, they still are not satisfactorily robust when dealing with environmental factors, such as light incidence, shadows, occlusions, crumpled or folded paper, and perspective transformations. Prominent examples are DewarpNet and GeoTr which learn to dewarp images using supervised learning, having the available dewarping meshes as ground truth. In order to overcome the hardware restriction, current state-of-the-art approaches attempt to analyze document images taken with smartphones. This, however, creates additional costs and reduces the flexibility of the given solution. Here, existing solutions make use of scanners to create flatbed digital copies of the paper document and apply optical character recognition (OCR) to automatically extract information. The receiving party has to manually digitalize the document in order to persistently access, search, or store the provided data, causing significant personnel costs. ![]() Numerous business workflows in enterprises involve printed forms, such as invoices, bills, or receipts. We made our new dataset and all code publicly available at. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |