Recent years have seen a very rapid development of digital technologies that are used to process natural language. They are used in many areas of human life – starting with the popular automatic translation services into foreign languages or text autocorrection, and ending with talking to chatbots.
The use of tools based on digital language processing technology in science and business is also increasing. Particularly intensive development has been observed in industries related to customer service. Chatbots not only inform us about the status of the order or answer our questions while shopping, but also call us and try to persuade us to buy a specific service.
These programs look more and more like a human being.
- The Information Processing Center – National Research Institute (OPI PIB) has also intensified the development of projects in the field of computational linguistics.
- It is worth mentioning, for example, the Unified Anti-plagiarism System (JSA), which is used by all promoters of diploma.
- theses in Poland, Annobot, which reads human emotions, or a tool for detecting illegal provisions in contracts, which the institute plans to develop for UOKiK in 2022. just some of the many innovative OPI PIB projects.
- In order to present the subject of computational linguistics in an accessible way and to present its application in many areas of human life,
- the institute has published another expert material as part of the OPI PIB Academy. Dr. Eng. Marcin Mirończuk presents the most important issues and discusses interesting projects related to computational linguistics.
Not only translations and chatbots
Scientists from the Information Processing Center – National Research Institute (OPI PIB) in their work are guided by two aspects of computational linguistics – scientific and technological. Especially the latter is interesting for users, as it focuses on individual components of speech and writing systems, most often for their implementation. Nowadays we are surrounded by them, but most often we are not aware of it.
These components are the basis for voice response tools used in mobile phones, speech-to-writing applications and popular internet search engines. Of course, the aforementioned translators, mechanisms and dialog systems in the form of chatbots, which effectively support human work, are also based on the mechanisms of natural language processing.
The use of computational linguistics is also visible in the case of tools for developing abstracts, answering questions, automatically generating text, recognizing emotions or used to classify specific content. The spectrum for the use of computational linguistics is very broad and data systems are widely used all over the world.
– Intensive development of tools for automatic natural language processing has undoubtedly been one of the noticeable trends in the IT industry in recent years. Work on these systems is carried out by many centers around the world. Products such as intelligent voice assistants or chatbots are becoming more and more popular – says Dr. Jarosław Protasiewicz, director of the Information Processing Center – National Research Institute (OPI PIB). –At our institute, we conduct more and more work related to computational linguistics and these are not only research projects. We develop and implement new tools that effectively support human activity. Our Annobot can distinguish emotions on the basis of text, JSA finds plagiarism in diploma theses, and the new system that we are planning to prepare for UOKiK will search for prohibited provisions in contracts. The scope of our work is very wide and we place great emphasis on the functionality of our products – adds the director of OPI PIB.
Discovering innovations and converting text into biological signals
In previous years, OPI PIB experts developed and implemented tools based on computational linguistics that use the process of appropriate text classification. This is how the Inventorum system was created to discover innovations on the web and then make them available to entrepreneurs who may be interested in them. The tool for classifying multilingual scientific articles is also based on a similar mechanism, which effectively facilitates the work of researchers from various fields.
– We have also implemented an interesting project using the data extraction mechanism from disordered texts. We have developed a system for analyzing reports on actions carried out by the State Fire Service. Thanks to it, from a large amount of unstructured text data, we quickly and easily obtain information, e.g. about the place of the accident, the type of event, the damage caused or the location and technical condition of the fire hydrant. Interesting work is also carried out by other experts from OPI PIB. They draw inspiration from naturally occurring biological systems. They transform written text into signals corresponding to the mechanisms taking place in the human brain, they create the so-called impulse networks. These are then used to develop systems for recognizing and classifying the relevant texts– says Dr. Marcin Mirończuk from the Intelligent IT Systems Laboratory at OPI PIB.