How to build data pipelines in the Cloud

You have been solicited to implement a new data pipeline architecture. The aim: turn wealth of raw data into friendly off-the-shelf data inputs for the data science, analytics and reporting teams.

Following the best practices, you will learn how to enforce a successful workflow, distinguish between ELT, ETL and reverse-ETL, escape pitfalls and enable data to be served across the company.

Vorkenntnisse

  • The target group are entry-level data engineers/scientists/analysts looking for conceptual ideas and best practices on how to ingest, transform, store and serve their data

Lernziele

  • 1. The Data's Lifecycle ⁓ 15min
  • 2. Hands-on Example: Build a Reliable Pipeline ⁓ 15min
  • 2.1. Fetch data from the API
  • 2.2. Store data to Google Cloud Storage
  • 2.3. Move data to Google BigQuery
  • 2.4. Orchestrate with Apache Airflow
  • Avoid the Most Common Pitfalls ⁓ 5min
  • Deep-dive into a Data Science Application ⁓ 15min

Speaker

 

Olivier Bénard
Olivier Bénard is a Data Engineer at Breuninger. In his job Olivier oversees the Google Cloud Platform (ensuring a 24/7 data availability); move data from A to B; ingest/transform/store data from different sources and monitor the overall workflow. The objective is to serve data to the analysts/BI/data science teams and back to the software teams (reverse ETL).

Alec Sproten
Alec Sproten is responsible for the Data Sciences at E. Breuninger GmbH & Co., with over 15 years of experience in the field. He holds a Ph.D. in economics and has a background in psychology. Alec is a renowned conference speaker and podcast participant with expertise in data science and business strategy. He has a proven track record of success in leveraging data to drive business decisions and innovation. Alec is proud to be part of the Breuninger family and is passionate about achieving tangible results for the organization.

data2day-Newsletter

Ihr möchtet über die data2day
auf dem Laufenden gehalten werden?

 

Anmelden