A Belgrade university offshoot develops Serbian-language artificial intelligence processing tools to make businesses faster and more efficient

Artificial intelligence tools such as ChatGPT are quickly changing the way we communicate and work, but there is still a long way to go before we realise all the benefits.

“To enable users to benefit from AI, we need to fine-tune it for specific tasks using special datasets,” says Vuk Batanović, head of the Natural Language Processing Lab within the Innovation Centre at the University of Belgrade’s School of Electrical Engineering.

At the Innovation Centre, scientists and students are developing a set of resources and tools for the automatic processing of texts in Serbian, a language spoken by 12 million people. Their COMtext.SR project focuses on legal texts, a text domain that has not been covered yet in existing Serbian-language academic or commercial tools. This domain has considerable importance for public governance, non-governmental organisations and companies, especially in the context of EU integration and convergence with EU standards.

Set up in 2006 to produce advanced electrical engineering and information technology innovations, the Centre’s equipment was financed under part of a €200 million loan from the European Investment Bank. It received support from the European Union’s Instrument for Pre-accession Assistance, the Council of Europe Development Bank, and the Serbian government.

When computers understand human language

Natural language processing, which uses machine learning and deep learning to teach computers to process human language, is used in advanced language models like BERT (Bidirectional Encoder Representations from Transformers) and GPT (Generative Pre-trained Transformer).

Using these models, computers can analyse the morphology, syntactic structure, and semantics of a text.

“The COMtext.SR project is specifically intended to create reliable annotated data, verified by experts, for the development of large language models in the Serbian language,” Batanović said. “This area can, therefore, be of huge practical value, because a vast corpus of human knowledge can be found in text format. However, computers cannot process it without adequate natural language processing solutions.”

Covering the two variants of the Serbian language – Ekavian (spoken mostly by Serbs in Serbia) and Ijekavian (by Serbs in Bosnia and Herzegovina, Croatia and Montenegro), the COMtext.SR project makes its findings publicly available for the benefit of individuals, corporations, public institutions and start-ups.  For them, reviewing documents, supporting customers, searching texts, and creating content will soon be faster and more efficient. The Innovation Centre released its findings in January.

Bringing science and industry together

Projects like COMtext.SR. exemplify a successful collaboration between academic research and industry to connect knowledge, creativity and ideas.  

“The Innovation Centre strives to create innovative solutions and services, as well as to improve existing ones, following the needs of the market,” says Ilija Radovanović, deputy director of the Innovation Centre. “Our projects have a multidisciplinary and practical character, and new solutions are focused on end-users and on solving real industrial and social challenges.”

And what are the challenges in the long-term when it comes to natural language processing globally?

“One of the critical points in its future development,” says Batanović, “will be the successful combination of logical reasoning with the statistical approach to language models.”