About
HuGME and the Language Technology Research Group
We are embedded in the Hungarian Research Centre for Linguistics (NYTK), where evaluation is one of our main missions. Recognizing the importance of thoroughly assessing language models, we developed HuLU as a benchmark collection for discriminative models. We soon realized that generative models also needed a specialized evaluation system – thus, HuGME was born.
Our story and focus
The Language Technology Research Group began as the Corpus Linguistics Department in 1997. Over nearly two decades, our work has evolved to include:
-
Building linguistic resources:
- Initiated the creation of the Hungarian National Corpus (MNSZ) in 2005, later expanding it with MNSZ2 (1.5 billion words) and now aiming for MNSZ3 (10 billion words).
-
Developing language technology tools:
- Tools like the Spelling Advisory Portal (helyesiras.mta.hu), our e-magyar Digital Language Processing Toolchain, and HuWordNet have been instrumental in advancing Hungarian NLP.
-
Advancing Large Language Models:
- We have created Hungarian versions of neural language models, from static word embeddings to transformer-based and generative models. Notable projects include PULI-3SX, and PULI LlumiX Instruct (you can check some of them on our demo site).
- Our recent work focuses on instruction-following and chat models.
Our evaluation mission
Evaluating the performance of our models is a core focus. Through projects like HuLU and HuGME, we provide robust benchmarks that:
- Measure discriminative capabilities (HuLU) and generative performance (HuGME).
- Use comprehensive datasets to assess various aspects of model output, from factual accuracy and linguistic correctness to prompt adherence.
Our impact
By continually refining our corpora, tools, and evaluation methods, we ensure that Hungarian language technology remains at the forefront. Our work supports both academic research and practical applications, making it easier for practitioners and researchers to compare models and track improvements.
For more detailed information about our projects, collaborations, and tools, please visit NYTK's website or contact us.