DP-203 Data Engineering on Microsoft Azure

In this course, the student will learn how to implement and manage data engineering workloads on Microsoft Azure, using Azure services such as Azure Synapse Analytics, Azure Data Lake Storage Gen2, Azure Stream Analytics, Azure Databricks, and others. The course focuses on common data engineering tasks such as orchestrating data transfer and transformation pipelines, working with data files in a data lake, creating and loading relational data warehouses, capturing and aggregating streams of real-time data, and tracking data assets and lineage.

Duur: 4 dagen
Locatie: In-Company, Online of op onze trainingslocatie: De Loods in Rijswijk
Niveau: Intermediate
Vragen?

Doelgroep
Inhoud training
Meer informatie

Doelgroep

The primary audience for this course is data professionals, data architects, and business intelligence professionals who want to learn about data engineering and building analytical solutions using data platform technologies that exist on Microsoft Azure. The secondary audience for this course includes data analysts and data scientists who work with analytical solutions built on Microsoft Azure.

Inhoud training

Introduction to data engineering on Azure

This module describes how Microsoft Azure provides a comprehensive platform for data engineering.

In this module you will learn how to:

Identify common data engineering tasks
Describe common data engineering concepts
Identify Azure services for data engineering

Introduction to Azure Data Lake Storage Gen2

Discover how Data Lake Storage provides a repository where you can upload and store unstructured data bringing new efficiencies to processing big data analytics.

In this module you will learn how to:

Describe the key features and benefits of Azure Data Lake Storage Gen2
Enable Azure Data Lake Storage Gen2 in an Azure Storage account
Compare Azure Data Lake Storage Gen2 and Azure Blob storage
Describe where Azure Data Lake Storage Gen2 fits in the stages of analytical processing
Describe how Azure data Lake Storage Gen2 is used in common analytical workloads

Introduction to Azure Synapse Analytics

Introduction to Azure Synapse Analytics

In this module, you'll learn how to:

Identify the business problems that Azure Synapse Analytics addresses.
Describe core capabilities of Azure Synapse Analytics.
Determine when to use Azure Synapse Analytics.

Use Azure Synapse serverless SQL pool to query files in a data lake

Use Azure Synapse serverless SQL pool to query files in a data lake

After the completion of this module, you will be able to:

Identify capabilities and use cases for serverless SQL pools in Azure Synapse Analytics
Query CSV, JSON, and Parquet files using a serverless SQL pool
Create external database objects in a serverless SQL pool

Use Azure Synapse serverless SQL pools to transform data in a data lake

Use Azure Synapse serverless SQL pools to transform data in a data lake

After completing this module, you'll be able to:

Use a CREATE EXTERNAL TABLE AS SELECT (CETAS) statement to transform data.
Encapsulate a CETAS statement in a stored procedure.
Include a data transformation stored procedure in a pipeline.

Create a lake database in Azure Synapse Analytics

Create a lake database in Azure Synapse Analytics

After completing this module, you will be able to:

Understand lake database concepts and components
Describe database templates in Azure Synapse Analytics
Create a lake database

Analyze data with Apache Spark in Azure Synapse Analytics

After completing this module, you will be able to:
Identify core features and capabilities of Apache Spark.
Configure a Spark pool in Azure Synapse Analytics.
Run code to load, analyze, and visualize data in a Spark notebook.

Transform data with Spark in Azure Synapse Analytics
Learn how to use Apache Spark pools in Azure Synapse Analytics to transform data.
In this module, you will learn how to:
Use Apache Spark to modify and save dataframes
Partition data files for improved performance and scalability.
Transform data with SQL

Use Delta Lake in Azure Synapse Analytics
Delta Lake is an open source relational storage area for Spark that you can use to implement a data lakehouse architecture in Azure Synapse Analytics.
In this module, you'll learn how to:
Describe core features and capabilities of Delta Lake.
Create and use Delta Lake tables in a Synapse Analytics Spark pool.
Create Spark catalog tables for Delta Lake data.
Use Delta Lake tables for streaming data.
Query Delta Lake tables from a Synapse Analytics SQL pool.

Analyze data in a relational data warehouse
Relational data warehouses are a core element of most enterprise Business Intelligence (BI) solutions, and are used as the basis for data models, reports, and analysis.
In this module, you'll learn how to:
Design a schema for a relational data warehouse.
Create fact, dimension, and staging tables.
Use SQL to load data into data warehouse tables.
Use SQL to query relational data warehouse tables.

Load data into a relational data warehouse
Learn how to load tables in a relational data warehouse that is hosted in a dedicated SQL pool in Azure Synapse Analytics.
In this module, you'll learn how to:
Load staging tables in a data warehouse
Load dimension tables in a data warehouse
Load time dimensions in a data warehouse
Load slowly changing dimensions in a data warehouse
Load fact tables in a data warehouse
Perform post-load optimizations in a data warehouse

Build a data pipeline in Azure Synapse Analytics
Build pipelines using Azure Synapse Analytics.
In this module, you will learn how to:
Describe core concepts for Azure Synapse Analytics pipelines.
Create a pipeline in Azure Synapse Studio.
Implement a data flow activity in a pipeline.
Initiate and monitor pipeline runs.

Use Spark Notebooks in an Azure Synapse Pipeline
This module describes how Apache Spark notebooks can be integrated into an Azure Synapse Analytics pipeline.
In this module, you will learn how to:
Describe notebook and pipeline integration.
Use a Synapse notebook activity in a pipeline.
Use parameters with a notebook activity.

Plan hybrid transactional and analytical processing using Azure Synapse Analytics
Plan hybrid transactional and analytical processing using Azure Synapse Analytics
After completing this module, you'll be able to:
Describe Hybrid Transactional / Analytical Processing patterns.
Identify Azure Synapse Link services for HTAP.

Implement Azure Synapse Link with Azure Cosmos DB
Implement Azure Synapse Link with Azure Cosmos DB
After completing this module, you'll be able to:
Configure an Azure Cosmos DB Account to use Azure Synapse Link.
Create an analytical store enabled container.
Create a linked service for Azure Cosmos DB.
Analyze linked data using Spark.
Analyze linked data using Synapse SQL.

Implement Azure Synapse Link for SQL
Implement Azure Synapse Link for SQL
In this module, you'll learn how to:
Understand key concepts and capabilities of Azure Synapse Link for SQL.
Configure Azure Synapse Link for Azure SQL Database.
Configure Azure Synapse Link for Microsoft SQL Server.

Get started with Azure Stream Analytics
Get started with Azure Stream Analytics
In this module, you'll learn how to:
Understand data streams.
Understand event processing.
Understand window functions.
Get started with Azure Stream Analytics.

Ingest streaming data using Azure Stream Analytics and Azure Synapse Analytics
Azure Stream Analytics provides a real-time data processing engine that you can use to ingest streaming event data into Azure Synapse Analytics for further analysis and reporting.
After completing this module, you'll be able to:
Describe common stream ingestion scenarios for Azure Synapse Analytics.
Configure inputs and outputs for an Azure Stream Analytics job.
Define a query to ingest real-time data into Azure Synapse Analytics.
Run a job to ingest real-time data, and consume that data in Azure Synapse Analytics.

Visualize real-time data with Azure Stream Analytics and Power BI
By combining the stream processing capabilities of Azure Stream Analytics and the data visualization capabilities of Microsoft Power BI, you can create real-time data dashboards.
In this module, you'll learn how to:
Configure a Stream Analytics output for Power BI.
Use a Stream Analytics query to write data to Power BI.
Create a real-time data visualization in Power BI.

Introduction to Microsoft Purview
Introduction to Microsoft Purview
By the end of this module, you'll be able to:
Evaluate whether Microsoft Purview is appropriate for your data discovery and governance needs.
Describe how the features of Microsoft Purview work to provide data discovery and governance.

Integrate Microsoft Purview and Azure Synapse Analytics
Learn how to integrate Microsoft Purview with Azure Synapse Analytics to improve data discoverability and lineage tracking.
After completing this module, you'll be able to:
Catalog Azure Synapse Analytics database assets in Microsoft Purview.
Configure Microsoft Purview integration in Azure Synapse Analytics.
Search the Microsoft Purview catalog from Synapse Studio.
Track data lineage in Azure Synapse Analytics pipelines activities.

Explore Azure Databricks
Explore Azure Databricks
In this module, you learn how to:
Provision an Azure Databricks workspace
Identify core workloads for Azure Databricks
Use Data Governance tools Unity Catalog and Microsoft Purview
Describe key concepts of an Azure Databricks solution

Use Apache Spark in Azure Databricks
Use Apache Spark in Azure Databricks
In this module, you'll learn how to:
Describe key elements of the Apache Spark architecture.
Create and configure a Spark cluster.
Describe use cases for Spark.
Use Spark to process and analyze data stored in files.
Use Spark to visualize data.

Run Azure Databricks Notebooks with Azure Data Factory
Run Azure Databricks Notebooks with Azure Data Factory
In this module, you'll learn how to:
Describe how Azure Databricks notebooks can be run in a pipeline.
Create an Azure Data Factory linked service for Azure Databricks.
Use a Notebook activity in a pipeline.
Pass parameters to a notebook.

Meer informatie

Heb je interesse in deze training? Vul je gegevens in, verstuur en we nemen contact met je op.

Bedrijfsnaam*

Naam contactpersoon*

Telefoonnummer*

E-mailadres*

Opmerkingen en/of vragen*