

GRNET announces, in the context of SmartAttica EDIH (European Digital Innovation Hub), the 12th Module of Τraining modules for SMEs with the subject "Inference with Transformers & Semantic Search with Sentence Transformers".
Date: June 22nd, 2026
Location: Online via Zoom
Presentation Languages: Greek, English
Instructors: Nikos Bakas (GRNET), Roman Dolgopolyi (GRNET)
Duration: 3 hours
Description: This is a hands-on introduction to running open Large Language Models and turning text into meaning with embeddings. Participants learn how to load a pre-trained model from the HuggingFace Hub, generate text with streaming, and control the tokenizer and chat template. The second half covers Sentence Transformers, showing how text is mapped into vectors and how cosine similarity reveals semantic relationships between words and sentences - the foundation of modern search and retrieval systems.
Target Audience: This module is designed for SME developers, technical leads, and data scientists who want to incorporate natural language processing (NLP) into their projects. It is ideal for those looking to run models in their own environment and understand the building blocks behind semantic search and RAG.
Learning Objectives:
By the end of this module, participants will be able to:
- Load and run pre-trained language models from the HuggingFace Hub using the Transformers library.
- Generate text with both streaming and standard inference, and use the pipeline abstraction.
- Apply chat templates and manage tokenizers, padding, and special tokens correctly.
- Produce embeddings from text using Sentence Transformers.
- Measure semantic similarity with cosine similarity and interpret the resulting vector space.
Prerequisites:
Participants should have:
- Basic understanding of Python programming.
- Familiarity with running code in Jupyter/Colab notebooks.
- Interest in NLP applications.
- Some experience with machine learning will be helpful.
Indicative Content:
- The Transformers Library. Introduction to HuggingFace and the ecosystem for loading and running open models.
- Model and Tokenizer Setup. Downloading a model from the Hub, configuring the tokenizer, and handling pad/EOS tokens.
- Chat Templates and Messages. Structuring system and user messages and applying the model's chat template.
- Streaming Inference. Generating text token-by-token with a streamer for a responsive experience.
- Standard Inference and the Pipeline. Running batch generation and using the high-level pipeline object.
- Introduction to Embeddings. What embeddings are and why similar meanings map to nearby vectors.
- Sentence Transformers in Practice. Encoding words and sentences into vectors with a compact, fast model.
- Measuring Similarity. Using cosine similarity to compare texts and visualize a similarity matrix.
- Summary and Q&A. Key takeaways and open discussion.
The project is co-funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Commission. Neither the European Union nor the granting authority can be held responsible for them.