Extracting Terminological Concept Systems

Terminology is the foundation of any specialized communication. Whether in science, healthcare, or policy, consistent terminology ensures clear understanding and the smooth exchange of knowledge, especially in multilingual settings. When terms are used inconsistently, misunderstandings can arise, leading to serious communication breakdowns.

Automatic term extraction is currently limited to extracting a list of term candidates. In contrast, our Text2TCS application provides the following: we extract terms, identify cross-lingual synonyms, and group them into coherent concepts. These concepts are then linked through hierarchical and semantic relationships to form a terminological concept system (TCS).

The underlying technology relies on state-of-the-art transformer-based NLP approaches and is trained on novel, high-quality datasets in multiple languages.

An extracted TCS is a valuable resource when communicating across language barriers and is extremely important in crisis situations such as COVID-19. A TCS ensures that different parties, such as health specialists, politicians, and journalists, refer to phenomena consistently using the same words.

The outcome of the Text2TCS project is an easy-to-use extraction application, made freely available on the European Language Grid. The accompanying publications resulted in a best paper award and a first place in a shared task on relation extraction.

Lennart Wachowiak
Lennart Wachowiak
PhD Student at the Centre for Doctoral Training in Safe and Trusted Artificial Intelligence

I am currently researching when and what to explain as a robot by interpreting the interaction context in combination with social cues of the collaborator.