Data Integration with Cloud Data Fusion (DICDF)

 

Course Overview

This 2-day course introduces learners to the data integration capability of Google Cloud using Cloud Data Fusion. In this course, we discuss the challenges of data integration and the need for a data integration platform (middleware). We then examine how Cloud Data Fusion can help effectively integrate data from a variety of sources and formats and generate insights. We look at the main components of Cloud Data Fusion and how they work, how to process batch and streaming data in real time with visual pipeline design, rich metadata and data lineage tracking, and how to deploy data pipelines on various runtime engines.

Who should attend

  • Data Engineer
  • Data Analysts

Prerequisites

Complete "Fundamentals of Big Data and Machine Learning."

Course Objectives

  • Identify the need for data integration,
  • Understand the capabilities of Cloud Data Fusion as a data integration platform,
  • Identify use cases for possible implementation with Cloud Data Fusion,
  • List the major components of Cloud Data Fusion,
  • [Design and execute batch and real-time data processing pipelines,
  • Work with Wrangler to build data transformations.
  • Use connectors to integrate data from different sources and formats,
  • Configure the runtime environment; monitor and troubleshoot pipeline execution,
  • Understand the relationship between metadata and data lineage

.

Course Content

Module 00 - Introduction

(in English)

Module 01 - Introduction to Data Integration and Cloud Data Fusion
  • Data integration: what, why, challenges
  • Data integration tools used in the industry
  • User personas
  • Introduction to cloud-based data fusion
  • Critical Data Integration Capabilities
  • Cloud Data Fusion user interface components
Module 02 - Building Pipelines
  • Cloud Data Fusion architecture
  • Basic concepts
  • Data pipelines and directed acyclic graphs (DAG)
  • Pipeline Life Cycle
  • Designing pipelines in Pipeline Studio
Module 03 - Designing Complex Pipelines
  • Branches, merges and joins
  • Actions and Notifications
  • Error handling and macros Pipeline configurations, scheduling, import and export
Module 04 - Pipeline Execution Environment
  • Scheduling and triggers
  • Runtime environment: Compute profile and provisioners
  • Pipeline Monitoring
Module 05 - Building transformations and preparing data with Wrangler
  • Wrangler
  • Guidelines
  • User-defined directives
Module 06 - Stream Connectors and Pipelines
  • Understand the data integration architecture.
  • List the different connectors.
  • Use the Cloud Data Loss Prevention (DLP) API.
  • Understand the streaming pipeline reference architecture.
  • Build and run a streaming pipeline

.

Module 07 - Metadata and Data Lineage
  • Metadata
  • Data lineage
Module 08 - Summary
  • Course summary
Online Training

Durata 2 Giorni

Prezzo (IVA esclusa)
  • 1.300,– €
Formazione in Aula

Durata 2 Giorni

Prezzo (IVA esclusa)
  • Italia: 1.300,– €

Schedulazione

Inglese
Fuso orario: Central European Summer Time (CEST)   ±1 Ora
Online Training Questo è un corso FLEX.
Fuso orario: British Summer Time (BST)
Online Training Questo è un corso FLEX.
Fuso orario: Greenwich Mean Time (GMT)
Questo è un corso FLEX, erogato sia in aula che in remoto, contemporaneamente.
Regno Unito
London, City Corso FLEX in lingua Inglese
Fuso orario: British Summer Time (BST)
Lingua Corso: Inglese
London, City Corso FLEX in lingua Inglese
Fuso orario: Greenwich Mean Time (GMT)
Lingua Corso: Inglese
Questo è un corso FLEX, erogato sia in aula che in remoto, contemporaneamente.