Automatic synthesis of tactile graphics conditioned by text prompt. Printing and Publishing. Ukrainian Academy of Printing

Author(s)	Collection number	Pages	Download abstract	Download full text
Джуринський Є. А., Maik V. Z.	№ 2 (86)	29-39

Summary
References

The problem of the development of tactile graphics in the field of inclusive literature publishing lies in the peculiarities of the execution of convex-tactile illustrations. Such a development process requires the performer to possess the basic skills of a fine art specialist and knowledge of the specifics of the technical performance of a tactile image, which are determined by a considerable number of requirements. In addition, the design of the illustration is complicated by additional factors, as they affect the final result of the developed tactile illustration. Such factors may include: the age of the target audience, the genre of the publication, the textual content that complements the illustration, etc. With the development of information technologies, in particular, the field of deep machine learning, solving the above problems has become possible. Recently, artificial intelligence tools [1, 2, 3], which allow synthesizing images based on the user’s text prompt, have gained significant development. The proposed information concept is to use the method of synthesis of tactile graphics with a text prompt in the applied field of inclusive illustration. In this way, the information model can be represented as a function of mapping a set of text into a set of tactile graphics, and the emerging task of modelling such a mapping is the subject of research in this paper. The work considers and formalizes the step-by-step process of modelling the algorithm for solving the given problem. The proposed technique consists of the following stages: tokenization of text content (optimization of representation), language modelling, tokenization of image content (contextual representation), modelling of sequence conversion (i.e., seq2seq) of text tokens into a sequence of image tokens. Each of the stages is accompanied by information about the results of training and evaluation of the developed models. At the end of the main part of the study, an informative note is given about the developed software that was used during model training. It is also noted that the developed software product will be used in subsequent studies related to the topic of this work. To sum up, a conclusion is made about the success and prospects of the obtained research results and examples of synthesized tactile images based on a text prompt are presented.

Keywords: information technology, artificial intelligence, text prompt, model, model evaluation criteria, tokenization technique, illustration requirements, image processing, tactile graphics, inclusive illustration, inclusive literature, Braille.

doi: 10.32403/0554-4866-2023-2-86-28-39

1. Midjourney AI model tool for text-to-image conversion. Retrieved from https://www.midjourney.com/ (access date: 04/05/2023) (in English).
2. Stable Diffusion AI model tool for text-to-image conversion. Retrieved from https://stablediffusionweb.com/ (access date: 04/05/2023) (in English).
3. DALL·E 2 AI system that can create realistic images and art from a description in natural language. Retrieved from https://openai.com/product/dall-e-2/ (access date: 04/05/2023) (in English).
4. Dzhurynskyi, Ye. A., & Maik, V. Z. (2022). Analiz protsesu pidhotovky iliustratsii dlia inkliuzyvnoi literatury: Kvalilohiia knyhy, 1 (41), 7−15 (in Ukrainian).
5. Way, T., & Barner, K. (1997). Automatic visual to tactile translation - Part I: Human factors, access methods, and image manipulation: Rehabilitation Engineering, IEEE Transactions on, 5, 81−94 (in English).
6. Way, T., & Barner, K. (1997). Automatic visual to tactile translation. II. Evaluation of the TACTile image creation system: Rehabilitation Engineering, IEEE Transactions on, 5, 95−105 (in English).
7. Way, T., & Barner, K. (1999). Towards Automatic Generation of Tactile Graphics. Applied Science and Engineering Laboratories. University of Delaware (in English).
8. Pakėnaitė, K., Nedelev, P., & Kamperou, E. Communicating Photograph Content Through Tactile Images to People With Visual Impairments: Front. Comput. Sci., 10 January 2022 Sec. Computer Vision, 3. Doi: https://doi.org/10.3389/fcomp.2021.787735 (in English).
9. Zouhar, V., Meister, C., Gastaldi, Luis J., Du, L., Vieira, T., Sachan, M., & Cotterell, R. (2023). A Formal Perspective on Byte-Pair Encoding. ETH Zürich. Johns Hopkins University. Doi: https://doi.org/10.48550/arXiv.2306.16837 (in English).
10. Bostrom, K., & Durrett, G. (2020). Byte Pair Encoding is Suboptimal for Language Model Pretraining. Department of Computer Science The University of Texas at Austin. Doi: https://doi.org/10.48550/arXiv.2004.03720 (in English).
11. Hulianytskyi, L. F., & Mulesa, O. Yu. (2015). Metody kombinatornoi optymizatsii. Uzhhorod, 16−26 (in Ukrainian).
12. Braunskyi korpus ukrainskoi movy. Retrieved from https://github.com/brown-uk/corpus (access date: 01/09/2023) (in Ukrainian).
13. Korpus khudozhnoi literatury ukrainskoiu movoiu. Retrieved from https://lang.org.ua/static/downloads/corpora/fiction.tokenized.shuffled.txt.bz2 (access date: 01/09/2023) (in Ukrainian).
14. Douglas, M. R. (2023). Large Language Models. CMSA, Harvard University. Doi: https://doi.org/10.48550/arXiv.2307.05782 (in English).
15. Naveed, H., Ullah Khan, A., Qiu, S., Saqib, M., & Anwar, S. (2023). A Comprehensive Overview of Large Language Models. University of Engineering and Technology (UET). Doi: https://doi.org/10.48550/arXiv.2307.06435 (in English).
16. Hadi Usman, M., & Al-Tashi, Q. (2023). Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects (in English).
17. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., N. Gomez, A., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need. Google Brain. Google Research. Doi: https://doi.org/10.48550/arXiv.1706.03762 (in English).
18. Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. University of Freiburg. Doi: https://doi.org/10.48550/arXiv.1711.05101 (in English).
19. Aaron van den Oord, Vinyals O., Kavukcuoglu K. Neural Discrete Representation Learning. DeepMind. 2017. Doi: https://doi.org/10.48550/arXiv.1711.00937 (in English).
20. Kingma, D. P., & Welling, M. (2013). Auto-Encoding Variational Bayes. Machine Learning Group Universiteit van Amsterdam. Doi: https://doi.org/10.48550/arXiv.1312.6114 (in English).
21. The American Printing House Tactile Library. Retrieved from https://imagelibrary.aph.org/portals/aphb/ (access date: 12/09/2023) (in English).
22. Yu, Yu, Zhang, Weibin, & Deng, Yun. (2021). Frechet Inception Distance (FID) for Evaluating GANs (in English).