Skip to main content


The workshops will take place on June 22, 2023 in the JED Zurich.

In order to attend a workshop, make sure you buy a 2-day pass and select the corresponding workshop. Workshops are limited in capacity and served on a first come basis.


We have been overwhelmed this year by the number and quality of submitted workshops. After a first selection round, we still have 28 workshops that qualify for SDS 2023. Due to a limited number of available rooms, we can only accept 14. 

For making sure that we select the most interesting workshops, the organizing committee decided to launch a community vote for gauging the interest of SDS participants for the different workshops. This voting will be open until Jan 15, 2023. A final decision will then be made, taking into account the results of your voting. 

So, if you plan to come to SDS and attend workshops, please let us know which ones you are most interested in:

AI-Powered Medical Image Analysis – from Imaging to Decision Support

Medical imaging constitutes a key source of information for answering clinical and scientific questions targeting human health. A wide variety of different imaging methods exists, from radiography, to tomography, magnetic resonance imaging, ultrasound imaging or light sheet microscopy. These technologies enable the study of the human body and the diagnosis, monitoring, or treatment of medical conditions. Combining these imaging methods with Artificial Intelligence (AI) can greatly improve image acquisition, diagnostic precision or the processing of the growing amount of imaging data. AI for biomedical imaging has found a wide range of applications and is beginning to transform medical diagnostics.

An Introduction to Neural Networks using Longitudinal Health Data

In this tutorial, Daniel and Michael – members of the Data Science working group of the Swiss Association of Actuaries – will introduce neural networks on a synthetic, longitudinal health dataset considering risk factors like BMI, blood pressure, age, etc. to predict various health outcomes of individuals over time. The tutorial consists of 4 parts: Creation of the synthetic dataset, applying generalized linear models, transitioning to shallow and deep neural networks, model explainability and risk factor importance. All 4 parts will provide ready-to-use Python scripts for the audience to explore during and after the tutorial.

Best practices in Tech Recruiting in Switzerland

Tech recruiting is an ongoing challenge to most companies and limits the speed and success rate of a lot of IT projects. We will share and discuss best practices in tech recruitment.

Building an end-to-end MLOps pipeline using AWS SageMaker

In this end-to-end MLOps workshop, attendees will learn how to apply MLOps best practices with AWS SageMaker to build, train, and deploy machine learning models into a production environment. Participants will leverage the fully managed CI/CD environment inside AWS SageMaker. They will explore a real-world dataset, and configure both the model training and the deployment pipeline. This workshop suits data scientists and IT professionals with basic machine learning and python knowledge, but no prior experience with AWS services is required. By the end, attendees will have gained a strong understanding of AWS SageMaker and can independently implement MLOps in a CI/CD production environment.

Create value from data on Google BigQuery using dbt, Data Vault and the 99 bananas data set

As of today data warehousing is at a pivotal turning point with infrastructure being moved to the cloud and data ownership being democratized. This shift allows data professionals to rethink the modeling techniques and discover new ways of pipelining data.
In this hands-on session, we show the power of these new possibilities by combining Data Vault with dbt on BigQuery. As the modeling framework, Data Vault 2.0 provides a state of the art approach of structuring data in a flexible manner, suiting fast development cycles. With dbt, we have a code-centric transformation tool which facilitates developing data pipelines. The engine, BigQuery, provides a performant and scalable data warehouse. This workshop serves as a jump start in data pipelining and business value creation on GCP with dbt.

Creating a Modern Data Lakehouse

Data Lakehouse architectures are on the rise and are supporting modern data needs through flexible and fluid structures, while ensuring performance for big data analytics. In this workshop, you will learn the concepts, differences and the purpose of Data Lakehouses as well as key technological advancements, such as Delta Lakes, which have enabled this solution. Additionally, you will gain firsthand experience to set up your own Data Lakehouse in the public cloud and use it as source for a BI report, putting theory into practice. Simultaneously, you will learn about how the cloud can help you gain access to data science, machine learning, and business analytics capabilities. In the end, we want you to leave the workshop with all the tools needed to create your very own big data solutions.

Databricks brick-by-brick : Data, Analytics and ML on one platform

Databricks is an industry-leading, cloud-based data platform used for processing and transforming massive quantities of data, deriving new insights using SQL endpoints, and enabling modern ML lifecycles. It facilitates turning raw data into actionable data by combining ELT processes, data analytics, and machine learning together with a strong unified governance approach. In this workshop, we present how Databricks tools assist and enable fast development in all aspects of the current data product lifecycle, from ELT pipelines and data governance to Machine Learning experimentation and Model Serving (MLOps). In the practical part of the workshop we will discuss each of these steps in detail and guide the participants through the whole data lifecycle in Databricks.

DataMesh in Action – When and how to implement a DataMesh

DataMesh is a socio-technical paradigm on how to shape an organization to leverage the value of data even in the presence of frequent change and massive growth. Even though the theoretical concepts are well covered in literature, implementation details and answers to practical questions are scarce. In this workshop, we will leverage the wide experience of data practitioners to get more flesh-on-the-bone of DataMesh in an interactive manner. After a short catch-up on DataMesh, we will identify the most common challenges in today’s data ecosystem based on our collective experience. In a second step, we will work in groups to identify how the DataMesh principles might help to overcome the prevalent data challenges, but also what new questions might arise in a practical implementation.

Data Science Techniques for Data sets on Mental and Neurodegenerative Disorders

About two percent of the world’s population suffers from various types of mental and neurodegenerative disorders, making up 13% of the global burden of disease. The burden of mental health disorders, in terms of reduced health and productivity, was estimated as costing approximately $2.5 trillion USD globally in 2010 and projected to grow to $6 trillion by 2030 ($8.2 trillion in 2022 dollars). Artificial intelligence techniques have been used to detect and help treat mental and neurodegenerative disorders. However, developing and integrating trustworthy AI models into healthcare settings, especially with such vulnerable populations, requires many things, including: robust methods, careful assessment of model performance, and clean, unbiased data sets. Thus, data science techniques are critical for advancing our understanding and treatment of mental and neurodegenerative disorders.

Data Stories: making your insights truly stand out

Data professionals spend a huge amount of time exploring, processing, and analyzing their data. Managers and executives utilize these to make or drive data based decisions. However, even with highly impactful insights, it is challenging to make sure the message clearly gets across, the more so when communication takes place with busy decision makers. In this workshop we will see how data storytelling can be leveraged to make sure the right message is efficiently conveyed, such that the results of the hard work stand out.

Deep Learning for Predictive Maintenance: Scalable Implementation in Operational Setups

Developing deep learning (DL) algorithms for predictive maintenance is a growing trend in various industrial fields. Whereas research methods have been rapidly advancing, implementations in commercial systems are still lagging behind. One reason for the delay is the common focus on the choice of algorithm, ignoring crucial aspects of scaling the algorithms to heterogeneous fleets of multi-component machines.

In this tutorial, we discuss approaches to address these challenges. We provide background to techniques for scalable deployment of DL in commercial machine fleets, focusing on anomaly detection, transfer learning, and uncertainty quantification. We explain the generic concepts on use cases from commercial fleets, including a code implementation on a publicly available data set.

Delivering Data Projects Successfully with DataOps

In our workshop, you will learn how DataOps can help you run data projects more successfully. Many data initiatives fail to deliver what they promise, often for reasons not directly related to technology. Common challenges are missing stakeholder buy-in, rigid project management, poor data quality, and unclear processes. With DataOps you can address them. We will discuss how to manage stakeholders, handle changing requirements, improve data governance, and apply DevOps principles. Using our DataOps Radar, you will uncover the strengths and weaknesses of your projects – and you will leave with actionable ideas to make your initiatives a success.

Designing and deploying a responsible AI solution on the Azure cloud

Many organizations are now transitioning their workloads to the cloud, as it offers an effective and compliant way to automatize, deploy, and scale solutions, while insuring lower maintenance, development effort and costs, and enhanced sustainability.

This also holds true for ML/AI solutions; however, there are several challenges. First, migrating to the cloud is often associated with a learning curve, as it involves adopting and adapting to a new technology stack. Second, ML models must ensure the principles of responsible AI so that stakeholders can trust and explain them.

This workshop will address both challenges: participants will build and deploy an ML model on Azure Machine Learning, including a responsible AI dashboard for error analysis, feature importance and model explanations.

Forecasting & Meta-learning

Time series forecasting is a crucial tool in a variety of fields, but applying deep learning models in practice can be challenging. In this workshop, we will present techniques for data cleaning and modeling to train and apply (deep learning) forecasting models, using our open-source Python library Darts. We will also introduce the innovative concept of meta-learning, which can discover generic patterns from diverse time series data and provide zero-shot predictions on unrelated time series. Participants will have the chance to apply these techniques through hands-on exercises, including a time series forecasting competition.

Generative AI in Practice

Join us at the Generative AI Workshop for an interactive exploration of this exciting new field. We’ll present how Generative AI models work, which have made rapid progress in generating images, text, and more on demand. You’ll have the chance to try out natural language and computer vision models for yourself and collaborate with others to brainstorm practical applications for these technologies. By the end of the workshop, you’ll have a solid understanding of what generative AI can do, how it does it and how you can use it to your advantage.

Generative AI has the potential to disrupt markets and create new business opportunities. Don’t miss out on this chance to see what practical applications are possible today and get inspired for how you can use it in your company tomorrow.

GEOSpatial Business – from data to value added services

The Power of Where – this frequently used statement underscores the importance of geo-spatial data in a world, which is becoming increasingly data-driven. But even though data quality and availability are increasing, the transition from an initial idea to commercialisation of data products/services remains a challenge. IBM Research and ZKB will show, how geospatial data can help to understand and quantify environmental conditions and their changes, how this impacts supply chains, resource management and other operational processes, and how it finally generates incentives to improve inter alia data-processing infrastructure and spatial data analytics. Challenges in the translation into a revenue gain or competitive advantage, will be identified in a subsequent open Ideathon.

Geospatial Data Science: The Power of Knowing Where

60-80% of all information is geospatially referenced, yet few companies exploit the full potential of geospatial data. Companies that understand why something happens where can boost the effectiveness of marketing campaigns, optimize supply chains and improve customer experience, among others. In this workshop, we introduce participants to the basics of geospatial data science, including an overview of geographic coordinate systems, data types and commonly used tools for storing, manipulating, and visualizing geodata. Using open-source data to explore natural hazards in Switzerland, participants learn how to prepare, visualize, and manipulate location-based data, how to use it in predictive modeling and the state-of-the-art tools to do so in Python.

Geospatial Innovation – from case studies to new ideas

The Power of Where – this frequently used statement underscores the importance of geo-spatial data and spatial data analytics in a world, which is becoming increasingly data-driven. In this workshop, success stories of exciting project ideas based on geodata are presented by the Swiss Territorial Data Lab (STDL). Platforms and initiatives related to geodata discovery, analysis, and visualization as well as funding opportunities within and outside of Switzerland will be discussed. In interactive sequences we will look beyond one’s own professional horizons to identify new fields of application and future challenges and to generate ideas around data-driven approaches. Via participant voting, the best ideas can be submitted directly to the Databooster with the prospect of further funding.

Going Data Driven with Data Literacy

To this day, a large part of data science projects still fail to reach the expected business impact and ROI, as many publications show. Many companies and organizations approach us to help them with their data driven jour- ney. They often have well working data pipelines and brilliant data scientists in place, but struggle with the following recurring patterns: The potential of data, business ownership, academic mindset

Understanding what Data Literacy means, why it matters, how it can be measured and improved efficiently and effectively, are the main takeaways our workshop attendees will learn. They will also develop an understanding where Data Literacy (or the lack of it) is holding data & AI projects back in their organization.

Introduction and ROI of Knowledge Graphs, based on three examples in watch industry, energy and insurance

Participants learn about three examples of Semantic Knowledge Graphs in operation (IT asset management and automation, master data alignment and data catalog). The presenter addresses and contrasts the initial situation, particular use cases and benefits, justifications for using KG vs. other approaches, tooling used, roles and skills needed, obstacles encountered, specific outcomes and Return on Investment (ROI). Inspired by the examples, participants identify potential use cases and benefits in their own organization. Participants anticipate business benefits, identify needed skills, write a simple road map and calculate a quick Return on Investment (ROI). Participants then present their use cases to their peers for immediate feedback.

How to implement and train a GAN

Generative Adversarial Networks (GANs) are a popular class of deep learning models that can be used to generate new, realistic data, practically indistinguishable from the training data. In this workshop, we will introduce the key concepts and underlying principles behind GANs, and provide an overview of how they work. We will then dive into the technical details of implementing in Python the main components that make up a GAN, as well as the process of training and evaluating them. Finally, we will discuss about common challenges encountered when building GANs, such as mode collapse, and review best practices for mitigating it. Overall, this workshop is meant to give attendees a clear understanding of GANs, and a strong foundation for further exploring this exciting area of data science.

Model Explainability and Ethics in AI

In the field of artificial intelligence (AI), there have been growing concerns about the potential for biased outcomes, particularly in decision-making systems that may affect the users. One way to mitigate bias in AI is through the use of explainable models. As such, Model Explainability (ME) and Ethics in AI are becoming more and more relevant for companies striving for transparency and equitable outcomes.

We propose a day-long, hands-on tutorial to introduce ME concepts and a selection of tools to apply them to concrete situations. This tutorial is for anyone interested in creating ethical and trustworthy systems. It will cover: Importance of Explainable AI, Machine Learning ME, and Deep Learning ME. Each part starts with an intro, followed by an example implementation and an exercise.

Responsible AI – Transparency and Fairness of data-based applications in practice

The usage of AI will soon be regulated in the European Union, with the upcoming AI Act. This will impact the data science activities of companies dramatically: the use of many data-based algorithms and applications will have to be re-thought, and often adopted. In this workshop, we focus on two relevant requirements: (a) Explainability and Transparency, and (b) Fairness and Non-discrimination.

Distinguished speakers will explain these requirements and comment on the state-of-the-art of how to implement such requirements technically, in a concrete data-science application. In addition, participants will have the opportunity to discuss their specific challenges, open questions, etc. with the experts as well as with the other workshop participants, in a moderated exchange format.

Scaling Analytical Data Platforms: From Data to Data Products to Data Mesh

Scaling analytical data platforms is one of the challenges of this decade. By now, we know very well how to get the most out of data in a bounded context. But with the increasing adoption of data-driven solutions, the rising complexity of platforms forces us to think beyond technology. This workshop is designed to address these challenges. We will start with a short introduction to the concept of data mesh and then provide a structured approach to think of data as products. We will dive into an example of architecting an enterprise-scale data landscape including organizational and governance aspects. At the end, the participants are ready to apply the techniques and tools in their very own real-life scenarios. Join us to learn how to create a data mesh organization.

The Internet of Things — Build a cloud IoT system using a RaspberryPi

The key takeaway is for participants to experience how easy and accessible the process of streaming Internet of Things (IoT) data into the cloud is. Additionally, they will be able to get a first experience with the technologies of the two biggest players in the cloud market. Last but not least, the workshop is not purely focused on software, but gives participants an opportunity to create a simple electric circuit with their hands – a true hands-on experience!

TinyML Workshop: Spiking Neural Networks for low-power real-time inference

Low-power machine learning is a fascinating solution to process information on resource-constrained devices, such as satellites, wearables, and autonomous cars. The TinyML Foundation and world-leading companies like SynSense provide hardware and software solutions to deploy energy-efficient applications easily. In this tutorial supported by TinyML, SynSense will introduce spiking neural networks (SNNs), a deep-learning paradigm to perform energy-efficient machine learning and signal processing tasks, and will present toolchains for training and deploying an SNN audio application based on PyTorch and JAX. All participants will be able to execute the tutorial on their own laptops and will have access to SynSense’s ultra-low-power development kits to test the deployment of their models.

Trustworthy and Certifiable AI: From Myth to Reality!

Artificial Intelligence (AI) is making an impact in multiple domains. Organizations interested in using AI to improve efficiency, gain competitive advantage and increase added value to customers need to be prepared to comply with the upcoming regulatory frameworks at the national and international level.

These frameworks will demand that organizations certify the systems they develop according to technical and ethical requirements such as reliability, safety, transparency, and fairness. However, translating these requirements into organizational processes and practices is not an easy feat. Through presentations by invited experts and hands-on exercises, this workshop will present the current state of AI regulation and standardization, introduce technical approaches for improving the trustworthiness of AI systems and empower organizations to be prepared to fulfill regulatory and certification requirements.

Tutorial: Apache Superset Open-Source BI Dashboards

If you want to share insightful visualizations and dashboards with your team or the public, you need a powerful business intelligence (BI) platform. Originally developed by Airbnb, Superset was subsequently released as an open-source project and accepted into the Apache Incubator program. In this tutorial, we will demonstrate how to develop interactive dashboards with freely configurable charts, maps, tables, and text elements, including customized filters to slice and dice the data. For experienced users, we will also demonstrate the use of advanced features like in-chart calculations, time-series comparisons, or direct use of SQL. We will explore the possibilities of sharing and publishing dashboards within a team or with the public. For this, we will also discuss the extensive user, role, and permission management integrated into Superset. We will also show how Superset can be deployed and connected to existing data infrastructure.