GRNET announces, in the context of SmartAttica EDIH (European Digital Innovation Hub), the 6th Module of Τraining modules for SMEs with the subject "Data Science Fundamentals: Part C - Decision Trees", that will take place online on March 28th, 2025.

Date: March 28th, 2025, at 11:00 EET  

Location: Online via Zoom

Presentation Languages: Greek

Instructor: Dr. Nikolaos Bakas (GRNET)

Description: This module focuses on decision trees, a versatile machine learning model used for classification and regression tasks. Through a mix of theoretical presentation and hands-on exercises, participants will learn how decision trees make predictions, understand the concept of node impurity, and measure information gain. The session will guide participants through training decision trees using a popular dataset, enabling them to build, evaluate, and fine-tune models effectively.

Target Audience:

This module is designed for data analysts, software developers, aspiring data scientists, and machine learning enthusiasts who wish to deepen their understanding of decision trees. It is ideal for individuals who are seeking to expand their toolkit with this intuitive yet powerful machine learning method.

Learning Objectives:

By the end of this module, participants will be able to:

  1. Describe the structure and functioning of decision trees.

  2. Understand and calculate node impurity and information gain.

  3. Build decision tree models using Python and scikit-learn.

  4. Evaluate model performance and apply techniques for cross-validation.

  5. Visualize decision trees and interpret the rules they use for predictions.

  6. Identify and address potential issues such as overfitting and underfitting.

Prerequisites:

Participants should have:

  • Basic understanding of machine learning concepts.

  • Familiarity with Python programming and basic data analysis libraries.

  • Experience with data preprocessing and dataset manipulation.

Note: Please enter your institutional/corporate email when registering.

 

Indicative Contents

  1. Introduction to Decision Trees

    • Concept and structure of decision trees.

    • Overview of classification and regression trees.

  2. Understanding Impurity and Information Gain

    • Calculation and interpretation of node impurity (Gini index, entropy).

    • Use of information gain to determine optimal splits.

  3. Building Decision Trees

    • Dataset introduction: Overview of the bank marketing dataset from the UCI repository.

    • Data preprocessing: Handling missing values and encoding categorical variables.

    • Splitting data into training, validation, and test sets.

  4. Model Training and Evaluation

    • Training a decision tree using scikit-learn.

    • Evaluating model performance using accuracy and cross-validation.

    • Understanding tree depth, pruning, and criterion selection.

  5. Hands-On Exercise

    • Participants will implement a decision tree classifier on the bank marketing dataset.

    • Practice cross-validation and parameter tuning to improve model performance.

  6. Visualizing and Interpreting Decision Trees

    • Techniques for visualizing decision tree structures.

    • Interpreting decision tree rules and making predictions.

  7. Discussion on Overfitting and Underfitting

    • Strategies to avoid overfitting, including pruning and setting appropriate tree parameters.

  8. Summary and Q&A

    • Recap of decision tree concepts and practical applications.

    • Open discussion for participant queries and insights sharing.

Starts
Ends
Europe/Athens
Registration
Registration for this event is currently open.