The European Association for Data Science organises a summer school
Automated Data Science
Tuesday July 8th to Friday July 11th 2025 in Luxembourg.
The 2025 edition of the EuADS Summer School is dedicated to Automated Data Science (AutoDS) and will cover important branches of this research field in a tutorial style. With the increasing complexity of data science projects and the limited availability of human expertise, the idea of automating or partially automating the work of a data scientist has come to the fore in recent years. AutoDS aims to streamline the data science workflow, making processes such as data pre-processing, feature engineering, model selection, evaluation and deployment faster and more accessible. By reducing manual intervention, AutoDS enables both non-experts and data scientists to work more efficiently, scale projects, and make data science accessible to a broader audience. It leverages tools from automated machine learning (AutoML) frameworks, automated visualisation and interpretability techniques to enable efficient model tuning, robust evaluation and easy deployment. Despite its advantages in efficiency and scalability, challenges remain in automating subtasks that are context-dependent and require human interaction, as well as model interpretability, dependence on data quality, and ethical concerns related to bias in automated models. These and other issues will be addressed in a series of five tutorials delivered by leading experts in the field.
The Summer School emphasizes the interdisciplinary nature of data science and is primarily aimed at PhD students, postdoctoral and early-career researchers with a basic grounding in data science, statistics, machine learning, AI, or related fields, and an interest in interdisciplinary research and applications.
The Summer School 2025 will take place at the venue of
Maison d’Accueil (Convent of the Franciscan Sisters)
50 avenue Gaston Diderich
L – 1420 Luxembourg-Belair
Public Event
The summer school will be preceded by a public event on Tuesday, July 8th starting 15:00 pm.
At the heart of the public event is the Sabine-Krolak-Schwerdt-Lecture, in memoriam of EuADS’ founding president. This will be held by Luc De Raedt, (KU Leuven, B). Participation in the public event is free of charge. Please register for this event by writing an email to contact@euads.org.
Agenda
15h00– 16h00 | Registration and Coffee |
16h00– 17h00 | Opening and Welcome Eyke Hüllermeyer EuADS President |
17h00– 18h30 | Sabine Krolak-Schwerdt Public Lecture Neurosymbolic AI: learning and reasoning for trustworthy AI Luc De Raedt KU Leuven, B |
18h30 | Welcome Reception |
from the 8th – 11th of July 2025. A limited number of rooms for smaller travel budgets will be available at the premises. Attendees are, however, not obliged to take residence there.
All public transport in Luxembourg is for free. You can look up all schedules in the app: Mobilitéit.lu
Neurosymbolic AI: learning and reasoning for trustworthy AI
Luc De Raedt (KU Leuven, B)
The abilities to learn and to reason are central to (artificial) intelligence. The focus in AI today is very much on learning, but one should not learn what one already knows. The challenge therefore is to use the avaible knowledge to guide and constrain the learning, and to reason with the resulting models in a trustworthy manner. This requires the integration of different paradigms for AI, which is the focus of neurosymbolic AI, often viewed as the next wave in AI.
The talk will contrast traditional AI techniques (such as search and rule-based AI) with those of deep learning and large language models. I will argue that we need to combine the best of both worlds to arrive at trustworthy AI, especially AI that exploits knowledge, provides genuine explanations and guarantees with regard to safety and robustness.
I will also argue that Neurosymbolic AI = Logic + Probability + Neural Networks. This will allow me to specify a high-level recipe for developing neurosymbolic AI approaches: start from a logic, add a probabilistic interpretation, and then turn neural networks into `neural predicates'. Probability is interpreted broadly here, and is necessary to provide a quantitative and differentiable component to the logic. At the semantic and the computation level, one can then combine logical circuits (ako proof structures) labeled with probability, and neural networks in computation graphs.
Speaker's WebpageTopics and Presenters
The schedule for the summer school including speakers and topics:
Tuesday, July 8th | 5:00 p.m. to 6:30 p.m | Sabine-Krolack-Schwerdt-lecture: Neurosymbolic AI: learning and reasoning for trustworthy AI | Luc De Raedt (KU Leuven, B) |
Wednesday, July 9th | 9:30 a.m. to 1 p.m. | Tabular learning, from data preparation to foundation models | Gaël Varoquaux (Inria, F) |
2:30 p.m. to 6 p.m. | From Theory to Hands-On Workflow Automation, Algorithm Selection, and Hyperparameter Optimization | José Raúl Romero & Carlos García (University of Cordoba, ES) | |
Thursday, July 10th | 9:30 a.m. to 1 p.m. | Advances in AutoML for Deep Learning: An introduction to Neural Architecture Search, Metalearning and Learning Curves | Jan van Rijn (Leiden University, NL) |
2:30 p.m. to 6 p.m. | Meta-learning for data and algorithm analysis and understanding | Ana Carolina Lorena (Instituto Tecnológico de Aeronáutica, BRA) | |
Friday, July 11th | 09:00 a.m. to 12:30 p.m. | Towards Sustainable Automated Data Science… because resource efficiency is not enough! | Marcel Wever (Leibniz University Hanover, DE ) |
For details see below!
Wednesday, July 9th
Tabular learning, from data preparation to foundation models
Gaël Varoquaux (INRIA, F)
While much of the excitement is on AI generating images, much of the data-science challenges are about assembling and preparing tabular data. These typically require extensive manual transformation or "data wrangling", and gradient-boosted trees are king. I will start by discussing how this data preparation helps machine learning, and how progression this processing has been rethought, eventually laying the ground of foundation models for tabular data.
Speaker's WebpageFrom Theory to Hands-On Workflow Automation, Algorithm Selection, and Hyperparameter Optimization
José Raúl Romero & Carlos García (University of Córdoba, ES)
With a wide array of underlying techniques like Bayesian optimization and evolutionary computation, Automated Machine Learning (AutoML) systems significantly reduce manual effort while delivering high-performance models. This session is designed to provide a comprehensive yet practical introduction to the core principles of AutoML, tailored to both domain experts and data scientists.
We will begin by exploring the theoretical foundations of AutoML, breaking down its key components and tasks for common predictive problems. The session will highlight state-of-the-art methods for automating pipeline construction, algorithm selection, and hyperparameter optimization. Participants will also be introduced to EvoFlow, a recent grammar-based evolutionary optimization technique that unifies all AutoML tasks within a single, flexible framework—allowing customization in both pipeline structure and component selection, while tuning all hyperparameters.
During the hands-on segment attendees will gain practical experience with PipeGenie, a Python library powered by EvoFlow. Starting with basic exercises on pipeline automation, participants will gradually move to advanced topics, such as customizing grammar rules, tuning parameters, and configuring PipeGenie to address complex data science problems. Through real-world applications in classification, regression, and time series forecasting, they will gain a solid understanding of how AutoML can streamline the development of effective machine learning solutions across diverse domains.
Thursday, July 10th
Advances in AutoML for Deep Learning: An introduction to Neural Architecture Search, Metalearning and Learning Curves
Jan van Rijn (Leiden University, NL)
Deep learning models (such as transformer models) have obtained great advances in various domains, such as object recognition, natural language processing and generative applications. However, training such a model requires some data science experience. Various hyperparameters need to be set correctly, including those that relate to the optimal architecture. Additionally, these methods are known to require plenty of data. In this presentation, we will deal with various techniques (i.e., neural architecture search, metalearning and the extrapolation of learning lurves) that support the data scientist in further automating such training processes.
Speaker's WebpageMeta-learning for data and algorithm analysis and understanding
Ana Carolina Lorena (Instituto Tecnológico de Aeronáutica, BRA)
The area of Meta-learning (MtL) leverages knowledge from problems for which successful Machine Learning (ML) solutions are known to support automated algorithm selection for new problems. But far more meta-knowledge can be extracted by relating data properties to algorithmic performance. This topic remains under-explored compared to using MtL for automated algorithm selection. For instance, one may reveal the competencies and limitations of different ML algorithms and highlight data quality issues worth investigating. By deepening such understanding, we expect to contribute to improving the comprehensibility and reliability of the usage of ML models. We also expect to generate contributions in areas that can directly benefit from data and algorithm understanding, such as data pre-processing. The idea is to guide the solution of the previous tasks using meta-knowledge extracted about the dichotomous relationship between data properties and algorithmic performance.
Speaker's WebpageFriday, July 11th
Towards Sustainable Automated Data Science ... because resource efficiency is not enough!
Marcel Wever (Leibniz University Hanover, DE)
Automated data science has established itself as a key technology in both research and industry to customize data science and machine learning pipelines to learning tasks. While it supports its users by automating repetitive, tedious, and time-consuming tasks of selecting algorithms for preprocessing data, learning models from data, and ensembling the resulting models to achieve peak performance, automated data science remains a computationally expensive endeavor. On the one hand, the demand for computational resources required to use methods from automated data science incurs financial costs, which are also linked to energy costs. On the other hand, consuming a substantial amount of energy also significantly impacts the environment in terms of CO2 footprint and increased hardware attrition.
This session will focus on the interface between automated data science and sustainability. To this end, we will first discuss different types of efficiency when dealing with automated data science systems and, with that, the concept of energy efficiency. Furthermore, we will discuss that we can consider sustainability on different levels in an automated data science system, ranging from optimizing for sustainable data science or machine learning pipelines to designing sustainable automated data science systems to designing automated data science systems to learn across tasks and leverage their knowledge to become more efficient optimizers after all. Furthermore, we will discuss an example where "green thinking" even helps to design better AutoML systems, thus leveraging "green thinking" as an incentive for creative thinking in research. We conclude the session with a call to action to establish sustainability in both research on and applying automated data science systems. As a practical takeaway, we have a brief live-coding session on a tool named PyExperimenter, which facilitates distributing computational experiments, e.g., on high-performance computers. Moreover, PyExperimenter logs the energy consumption of conducted experiments and, in addition to that, the estimated carbon footprint, integrating with the carbon emission measuring framework CodeCarbon.io. While PyExperimenter can be used to evaluate automated data science systems, its interface is quite generic, so it can also be used for any other type of computational experiments, eliciting the carbon footprint and, therefore, the sustainability of computational experiments in general.
Speaker's WebpageFees and Registration
For EuADS individual members only, the fee for participating in the summer school is 150€.
For non-members the fee is 200 € with a free EuADS-membership for 2025.
To ensure an interactive experience the number of participants is limited, so early registration is strongly recommended. Please register by
1. Sending an email with your personal details and the institution you come from to contact@euads.org, with reference to EuADS Summer School 2025 Automated Data Science.
2. Transferring the amount (participation fee only or participation fee and room reservation) to
Account holder: EuADS (European Association for Data Science)
Bank: Banque et Caisse d’Epargne de l’Etat, Luxembourg
IBAN: LU47 0019 4655 6967 1000
BIC: BCEELULL
Please mention in the payment details field:
“EuADS Summer School 2025_Participation Fee” or “EuADS Summer School 2025_Participation Fee and Room Reservation”
The participation fee must be transferred before the 1st of June 2025.
You will receive a confirmation mail as soon as the money has arrived on the EuADS
account. Without payment until the 1st of June 2025, your reservation is cancelled, and the room will be allocated to the next person on the waiting list.
Rooms and Catering
There will be rooms available at the premises:
8 single rooms
255 EUR per person for 3 nights.
7 double rooms
2 single beds 225 EUR per person for 3 nights.
In order to avoid misunderstandings, please clearly mention the name of the person with whom you want to share the double room.
The rooms are available from the 8th – 11th of July 2025 at the following email: denise.schroeder@ext.statec.etat.lu
All bookings must be made before the 20th of May 2025. The payments for the rooms must be transferred before the 1st of June 2025.
The catering (breakfast, lunch, dinner and cocktail) and the participation in the social event will be offered by our sponsors
Further Accomodations
From the following hotels you can walk to the location:
Hôtel Parc Belle-Vue – Goeres Hotels Luxembourg
Hôtel Parc Belair – Goeres Hotels Luxembourg – Groupe Hôtelier
Or, for smaller budgets:
Organisers:
Eyke Hüllermeier (LMU Munich, Germany; EuADS President)
Peter Flach (U of Bristol, UK; EuADS Vice-President)
Nils Hachmeister (Germany; EuADS Vice-President)
Berthold Lausen (U of Essex, UK; EuADS Vice-President)
Serge Allegrezza (STATEC, Luxembourg; EuADS Treasurer)
Tim Friede (U of Göttingen, Germany)
Andreas Geyer-Schulz, Germany
Arthur Guja (Data 3.0 Ldt., UK)
Victoria Lopéz Lopéz (CUNEF Universidad, Spain)
José Raúl Romero (U of Cordoba, Spain)
Ian Sosa (NeuralWave Technologies, Luxembourg)
Mark Van Lokeren, Imperial College London, UK)
Katharina Weiß (Bielefeld University, Germany)
Denise Schroeder (STATEC, Luxembourg, EuADS administrative assistant)
Sponsors






