One of the aims of HumaReC will be to use the Handwritten Text Recognition (HTR) technology developed by the Research Infrastructure Transkribus, which is funded by the H2020 Project READ - Recognition and Enrichment of Archivial Documents. We are glad to collaborate with the Transkribus team!
Transkribus offers many services for scholars working with handwritten texts. Users can upload image files and use the annotation tool that notably provides a user-friendly interface and includes an automatic recognition of text baselines.
Snapshot of Transkribus Research platform (source)
Most importantly though, once a certain number of folios are transcribed, Transkribus offers to train an HTR model in order to produce an automatically transcribed text. For the training, 5’000 to 10’000 transcribed words are usually needed.
Once we have the requisite transcribed material, we plan to test the Transkribus technology for the three languages. It will be particularly exciting to see the results of HTR for Arabic as Arabic has always been challenging for Text recognition due to the particular features of the writing system. So far, HumaReC is both the first project that aims to use Transkribus HTR technology with an Arabic document and the first try of HTR on a manuscript of the New Testament.
We will keep you informed on the results and share our experiment with you!
→ Find more information about Transkribus on their website: https://transkribus.eu
References and links about OCR and HTR in Arabic:
- M. Romanov, M.T. Miller, S. Bowen Savant & B. Kiessling, 'Important New Developments in Arabographic Optical Character Recognition (OCR)' (2016). url blog url full report
→ see the open-source OCR software kraken
- N. Tagougui, M. Kherallah & A.M. Alimi. 'Online Arabic handwriting recognition: a survey'. IJDAR (2013) 16: 209-226. url
- A.-M. Awal, V. Eglin & F. LeBourgeois. ‘Computer Assisted Transcription of Historical Arabic Documents’. In Reading Tomorrow. From Ancient Manuscripts to the Digital Era / Lire Demain. Des Manuscrits Antiques à L’ère Digitale, edited by C. Clivaz, J. Meizoz, F. Vallotton & J. Verheyden, Ebook, 333-44. Lausanne: PPUR, 2012. url ebook url Google Books
→ see the related project VECMAS Tombouctou
- L. M. Lorigo & V. Govindaraju, 'Offline Arabic handwriting recognition: a survey'. IEEE Transactions on Pattern Analysis and Machine Intelligence (2006) 28/5: 712-724. url
⇒ Discuss on the forum the article Collaboration with Transkribus.