Skip to main content

Workshop Objective

This workshop will provide an introduction to the data engineering, data science and ML, and analytics capabilities of Databricks.

Abstract

Databricks is an industry-leading, cloud-based lakehouse platform used for processing and transforming massive quantities of data, deriving new insights using SQL endpoints, and enabling modern ML lifecycles. In this workshop, we present how Databricks tools assist and enable fast development in all aspects of the current data product lifecycle, from ELT pipelines, workflow orchestration and data governance to Machine Learning experimentation and Model Serving (MLOps). In the practical part of the workshop we will discuss each of these steps in detail and guide the participants through the whole development lifecycle in Databricks.

Workshop Description

Databricks is an industry-leading, cloud-based data platform used for processing and transforming massive quantities of data, deriving new insights using SQL endpoints, and enabling modern ML lifecycles. It is already integrated in big data cloud platforms such as Azure, AWS and GCP. Available to all organizations, it allows them to easily turn their raw data into actionable data by combining ELT processes, data analytics, and machine learning together with a strong unified governance approach. It is an Apache-Spark-based platform that runs a distributed system behind the scenes, meaning the workload is automatically split across various processors and scales up and down on demand. Increased efficiency results in direct time and cost savings for massive tasks.

In this workshop, we present how Databricks tools assist and enable fast development in all aspects of the current data product lifecycle, from ELT pipelines and data governance to Machine Learning experimentation and Model Serving (MLOps).

In the practical part of the workshop we will discuss each of these steps in detail and guide the participants through the whole data lifecycle in Databricks.

Agenda

  1. Introduction
    • Databricks introduction
    • Lakehouse Overview
  2. Practical Hands-on
    • Databricks workspace navigation
    • Data Pipeline Creation using Workflows: Unity Catalog, Compute, Notebooks, Repos
    • ML in Databricks: experiments, model registry, model serving
    • Databricks Analytics : serverless querying, dashboards, partner connect
  3. Closing
    • Q&A