PHAROS Training Series - Course 12 "Compute-Efficient Methods for Large Language Models"

Europe/Athens
Description

PHAROS AI Factory announces the 12th Course of its Training Series, under the title "Compute-Efficient Methods for Large Language Models", under the topic LLMs, organised in collaboration with Pharos-CY, held online via Zoom. 

Date: July 17th, 2026, at 11:00 EEST 

Location: Online via Zoom

Presentation Language: English

Description: -

Audience: -

Prerequisites: -

Learning Objectives

By the end of the seminar, participants will be able to:

  • Identify key efficiency methods for training, fine-tuning, and inference.
  • Describe how LoRA enables parameter-efficient fine-tuning of LLMs.
  • Apply a basic Hugging Face workflow for dataset preparation, training, evaluation, and inference.
  • Compare trade-offs between model performance, cost, memory use, and deployment efficiency.

Learning Outcomes: -

Note: Please enter your institutional/corporate email when registering.

 

Registration
Registration
    • 11:00 12:00
      Efficient training, fine-tuning and inference of large-scale ML models 1h

      This talk presents model-centric methods for efficient generative AI. It explains why training and inference of LLMs are computationally heavy, then covers model compression methods such as quantization, neural network pruning, low-rank approximations, and knowledge distillation. It also introduces efficient pre-training with mixed-precision acceleration and PHEW, parameter-efficient fine-tuning methods such as LLM-Adapters, LLaMA-Adapter, P-Tuning, and LoraHub, and efficient inference techniques including speculative decoding and KV-cache optimization.

      Speaker: Constantine Dovrolis (The Cyprus Institute)
    • 12:00 14:00
      Fine-Tuning Transformers for Medical Reasoning with LoRA and Hugging Face Trainer 2h

      This talk demonstrates a practical, end-to-end notebook for fine-tuning a reasoning-capable transformer model for medical question answering. Participants will learn how to use Hugging Face Datasets and Trainer to handle the training workflow, from loading and cleaning data to tokenization, checkpointing, evaluation, and inference. The session demonstrates parameter-efficient fine-tuning with LoRA, showing how a 3B-class Mistral reasoning model can be adapted on a single 16 GB GPU by training only small adapter weights instead of the full model. The notebook combines MedReason and medical-o1 reasoning datasets into a unified question, chain-of-thought, and answer format, then trains and evaluates the model on a small demo subset. By the end, attendees will understand the key engineering choices behind efficient LLM fine-tuning and see a side-by-side comparison of base and fine-tuned model behavior on medical reasoning tasks, including practical notes on GPU setup, mixed precision, and resource cleanup for reproducible classroom demos.

      Speaker: Roman Dolgopolyi (GRNET)