INNOVATIVE LANGUAGE TECHNOLOGIES AND ARTIFICIAL INTELLIGENCE: DEVELOPMENT AND APPLICATIONS BY THE POLYHEDRON PLATFORM BASED ON UKRAINIAN LEXICOGRAPHIC THEORIES
PDF

Keywords

методи штучного інтелекту; обробка природної мови; лексикографічні технології; динамічні онтології; морфолого-семантичний аналіз; нейронні мережі; мультикритеріальний аналіз; POLYHEDRON artificial intelligence methods; natural language processing; lexicographic technologies; dynamic ontologies; morphological and semantic analysis; neural networks; multi-criteria decision analysis; POLYHEDRON

How to Cite

НАДУТЕНКО, М. В., НАДУТЕНКО, М. В., & ФАСТ, О. Л. (2024). INNOVATIVE LANGUAGE TECHNOLOGIES AND ARTIFICIAL INTELLIGENCE: DEVELOPMENT AND APPLICATIONS BY THE POLYHEDRON PLATFORM BASED ON UKRAINIAN LEXICOGRAPHIC THEORIES. ACADEMIC STUDIES. SERIES “HUMANITIES”, (3), 38-48. https://doi.org/10.52726/as.humanities/2024.3.6

Abstract

The article presents a comprehensive approach to the creation and implementation of intelligent language technologies developed by the authors of POLYHEDRON, which are based on the fundamental lexicographic theories devised by the Ukrainian Lingua-Information Foundation of the National Academy of Sciences of Ukraine. The authors focus on the POLYHEDRON technology family, which encompasses tools for lexicographic and corpus-based text processing, systems for parsing files in various formats, modules for morphological and semantic analysis, as well as innovative means for constructing dynamic ontologies and decision support systems. This enables a multilayered processing of natural language (Ukrainian, English, Russian, French, German, and Italian) and the creation of scalable information resources geared toward both scientific and practical applications. The central element of the study is a hybrid architecture that combines statistical methods (deep neural networks of the transformer type) with lexicographic-ontological models. This approach allows for the effective simultaneous analysis of both the syntactic and semantic structure of sentences, the revelation of latent relationships between terms and concepts, and the formation of dynamic ontologies that are continuously updated based on new textual data. The work especially emphasizes the role of dynamic knowledge compression technologies, which ensure optimized storage and processing of information, thereby enabling the use of models with a smaller footprint without a loss in analytical accuracy. One of the key application areas for the described technologies is the automated monitoring and analysis of large volumes of text, including legal documents, scientific and technical publications, media materials, etc. For this purpose, dedicated parsing subsystems (MxParse, MxDocArch, OCR modules) have been developed that support DOC, PDF, TXT, HTML, and other formats, as well as recognize scanned images and audio/video files. The “AVALANCHE” and “INVISIBLE” technologies, developed by the authors, play an important role by providing unique capabilities for the rapid indexing and retrieval of data in multilingual corpora: the former is responsible for the persistent storage of billions of objects on disk, while the latter efficiently manages large structures in volatile memory. The article also introduces the INTELLIGENCE-ANALYTICS platform, which integrates three key components: a neural network, an ontological module, and multi-criteria decision analysis (MCDA) mechanisms. This integration makes it possible to identify non-obvious relationships between documents, prioritize among various alternatives, and generate flexible analytical reports to support real-time decision-making. Promising application areas include national security, education, legal expertise, scientific research, and information management in large organizations. The authors emphasize the critical importance of developing proprietary national language models, particularly for the Ukrainian language, that can be competitive with foreign counterparts. The proposed concepts demonstrate that the integration of a lexicographic-ontological approach with modern neural algorithms maintains the quality of processing while reducing model size and computational resource requirements. This is particularly relevant given the limitations of targeted funding and infrastructural challenges. Thanks to the conducted research and collaboration with partners from scientific and educational institutions, unique developments have been achieved that are capable of accelerating the digitization of documents, supporting the development of high-tech products in Ukraine, and strengthening information security. The article underlines the need for consolidating scientific and technological potential, establishing public–private projects, and expanding cooperation between NAS Ukraine institutions, universities, and the private sector. The authors see this as the main impetus for creating an integrated language-information ecosystem capable of addressing the intellectual challenges of modernity and stimulating scientific and technological progress in the country.

https://doi.org/10.52726/as.humanities/2024.3.6
PDF

References

УМІФ: Український мовно-інформаційний фонд НАН України. URL: https://www.ulif.org.ua/about.

УМІФ. Проекти. URL: https://ulif.org.ua/projects

Програмні продукти. URL: https://central.ulif.org.ua/

УМІФ. Ресурси. URL: https://lcorp.ulif.org.ua/LSlist

Nadutenko M., Prykhodniuk V., Shyrokov V., Stryzhak O. Ontology-Driven Lexicographic Systems. Advances in Information and Communication. FICC 2022. Lecture Notes in Networks and Systems. Cham : Springer. 2022. С. 204–215. DOI: 10.1007/978-3-030-98012-2_16

Широков В.А. Лінгвістичні виміри проблем національної безпеки та оборони України. Вісник Національної академії наук України. 2024. № 1. С. 56–71. https://doi.org/10.15407/visn2024.01.056