BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//CERN//INDICO//EN
BEGIN:VEVENT
SUMMARY:Ingestion\, chunking\, metadata and embeddings for RAG
DTSTART;VALUE=DATE-TIME:20260707T091000Z
DTEND;VALUE=DATE-TIME:20260707T093500Z
DTSTAMP;VALUE=DATE-TIME:20260703T180733Z
UID:indico-contribution-1150@events.grnet.gr
DESCRIPTION:Speakers: George  Drosatos (ATHENA RC)\nThis methodology sessi
 on focuses on preparing knowledge so that it can be retrieved accurately b
 y a RAG system. It explains why document preparation is not a minor prepro
 cessing task but a core design decision that affects downstream retrieval 
 and answer quality. Participants will learn how text can be extracted from
  PDFs\, Markdown\, HTML or structured files while preserving headings\, se
 ctions\, tables\, article numbers\, source identifiers and other traceable
  information. The session then introduces chunking strategies\, including 
 fixed-size\, overlapping\, semantic and structure-aware chunking\, and dis
 cusses how chunk size influences retrieval precision and context completen
 ess. It also covers metadata fields that support filtering\, citation and 
 source attribution. Finally\, the session explains embeddings and vector i
 ndexing\, with emphasis on multilingual or Greek-aware representations\, i
 ndex choice\, latency\, scaling\, filtering and maintainability in prototy
 pe and production RAG settings. These foundations prepare participants for
  the following indexing and retrieval implementation exercises in Colab\, 
 including practical debugging.\n\nhttps://events.grnet.gr/event/213/contri
 butions/1150/
LOCATION:
URL:https://events.grnet.gr/event/213/contributions/1150/
END:VEVENT
END:VCALENDAR
