Efficient training, fine-tuning and inference of large-scale ML models

17 Jul 2026, 11:00
1h

Speaker

Constantine Dovrolis (The Cyprus Institute)

Description

This talk presents model-centric methods for efficient generative AI. It explains why training and inference of LLMs are computationally heavy, then covers model compression methods such as quantization, neural network pruning, low-rank approximations, and knowledge distillation. It also introduces efficient pre-training with mixed-precision acceleration and PHEW, parameter-efficient fine-tuning methods such as LLM-Adapters, LLaMA-Adapter, P-Tuning, and LoraHub, and efficient inference techniques including speculative decoding and KV-cache optimization.

Presentation Materials

There are no materials yet.