Contact info
Partner looking for projectMaciej OgrodniczukHead of Department of Language ModellingDepartment of Language ModellingPoland
Expression of Interest
Institute of Computer Science, Polish Academy of SciencesThe Department of Language Modelling at the Institute of Computer Science of the Polish Academy of Sciences is a leading Polish unit in the field of computational linguistics, linguistic engineering, natural language processing, language modelling and digital humanities. It currently employs 20 full-time staff and over 100 project collaborators, publishes the Journal of Language Modelling, organises conferences, workshops and shared tasks (e.g. PolEval), participates in COST actions (it is currently a Grant Holder of the UniDive action), leading research infrastructures (such as CLARIN-PL and DARIAH-PL), and implements several national projects (e. National Science Centre, National Centre for Research and Development, Foundation for Polish Science, Polish Academy of Sciences, National Agency for Academic Exchange and National Programme for the Development of Humanities) and international projects (e.g. within CEF, Horizon 2020, DIGITAL Europe).
In the context of the current CHIST-ERA call on „Science in your own language (SOL)”, the Department of Language Modelling offers the following expertise:
1. Natural language processing:
- Advanced expertise in the development of natural language engineering solutions (e.g. all kinds of language analysis tools, corpus building and retrieval tools, etc.)
- Expertise in fine-tuning, adaptation and use of large language models (e.g. PLLuM - Polish Large Language Model)
- Experience in massive data crawling (data currently used by the Polish plagiarism detection system).
2. Adaptation to the scientific domain:
- Expertise in processing terminology and phraseology (author of TermoUD - a tool for language-independent terminology extraction)
- Expertise in processing scientific article corpora (e.g. CURLICAT - Curated Multilingual Language Resources for CEF AT)
3. Development of multilingual systems:
- Experience in multilingual data processing (multilingual corpora, e.g. ParlaMint, MARCELL - Multilingual Resources for CEF.AT in the legal domain)
- Knowledge of linguistic challenges in scientific communication (e.g. Nexus Linguarum - European network for web-centred linguistic data science, TextLink - Structuring Discourse in Multilingual Europe)
{Empty}