PHAROS Training Series - Course 12 "Compute-Efficient Methods for Large Language Models"
Friday, 17 July 2026 -
11:00
Monday, 13 July 2026
Tuesday, 14 July 2026
Wednesday, 15 July 2026
Thursday, 16 July 2026
Friday, 17 July 2026
11:00
Efficient training, fine-tuning and inference of large-scale ML models
-
Constantine Dovrolis
(The Cyprus Institute)
Efficient training, fine-tuning and inference of large-scale ML models
Constantine Dovrolis
(The Cyprus Institute)
11:00 - 12:00
This talk presents model-centric methods for efficient generative AI. It explains why training and inference of LLMs are computationally heavy, then covers model compression methods such as quantization, neural network pruning, low-rank approximations, and knowledge distillation. It also introduces efficient pre-training with mixed-precision acceleration and PHEW, parameter-efficient fine-tuning methods such as LLM-Adapters, LLaMA-Adapter, P-Tuning, and LoraHub, and efficient inference techniques including speculative decoding and KV-cache optimization.
12:00
Fine-Tuning Transformers for Medical Reasoning with LoRA and Hugging Face Trainer
-
Roman Dolgopolyi
(GRNET)
Fine-Tuning Transformers for Medical Reasoning with LoRA and Hugging Face Trainer
Roman Dolgopolyi
(GRNET)
12:00 - 14:00
This talk demonstrates a practical, end-to-end notebook for fine-tuning a reasoning-capable transformer model for medical question answering. Participants will learn how to use Hugging Face Datasets and Trainer to handle the training workflow, from loading and cleaning data to tokenization, checkpointing, evaluation, and inference. The session demonstrates parameter-efficient fine-tuning with LoRA, showing how a 3B-class Mistral reasoning model can be adapted on a single 16 GB GPU by training only small adapter weights instead of the full model. The notebook combines MedReason and medical-o1 reasoning datasets into a unified question, chain-of-thought, and answer format, then trains and evaluates the model on a small demo subset. By the end, attendees will understand the key engineering choices behind efficient LLM fine-tuning and see a side-by-side comparison of base and fine-tuned model behavior on medical reasoning tasks, including practical notes on GPU setup, mixed precision, and resource cleanup for reproducible classroom demos.