Overview
Curriculum
Instructor

Course Outline

PREREQUISITES FOR AZURE DATA ENGINEER:

Any Cloud basics
Basic networking knowledge

DP-900 MICROSOFT AZURE DATA FUNDAMENTALS

DP-203 AZURE DATA ENGINEER

The objectives covered in this course are:

• Design and implement data storage (40-45%)
• Design and develop data processing (25-30%)
• Design and implement data security (10-15%)
• Monitor and optimize data storage and data processing (10-15%)

• What is the purpose of an Azure Data Lake Gen 2 storage account?
• Basics on Transact-SQL commands
• How to work with Azure Synapse. This will include building a data warehouse into a dedicated
SQL Pool.
• How to build an ETL pipeline with the help of Azure Data Factory. There will be various scenarios
on how to create mapping data flows.
• How to stream data with the use of Azure Stream Analytics. You can see how SQL commands
can be used for your streaming data.
• Basics on the Scala programming language, and SPARK
• How to work with SPARK, Scala in Azure Databricks. We will see how to work with Notebooks.
We will also see how to stream data into Azure Databricks.
• The different security measures and monitoring aspects to consider when working with Azure
services.

1. DATA ENGINEERING FUNDAMENTALS

The objectives covered in this course are:

1.1. Course and Exam objectives
1.2. Azure Overview (Concepts, Azure AD, Azure Free Account Creation, Azure Portal tour)
1.3. Introduction to Different File Formats
1.3.1. CSV, TXT
1.3.2. JSON & JSON Data types
1.3.3. Parquet Format
1.4. Legacy Engineering & Analytical Revision
1.4.1. Relational Databases & Structured Query Language (SQL)
1.4.2. Power BI
1.4.3. ETL & ELT Batch Processing, Data Analytics
1.5. Creating Azure Resources
1.5.1. About Azure Active Directory
1.5.2. Deploying Microsoft SQL Server on Azure
1.5.3. Lab – Creating a virtual machine, Installing Microsoft SQL Server, Connecting to
Microsoft SQL Server, SQL Server template, Creating an Azure SQL Database,
Deleting Azure resources
1.5.4. Different ways to deploy Azure resources – Overview
1.5.5. Lab – Creating an Azure SQL Database – PowerShell, Azure CLI, ARM Templates
1.5.6. Azure SQL Managed Instance (Creating and Connecting)
1.5.7. IaaS vs PaaS
1.5.8. Type of SQL statements
1.5.9. Lab – Azure SQL Database – Working with data, Primary Keys, Foreign keys,
Additional SQL statements
1.5.10. Lab – Creating another Azure SQL database
1.5.11. Lab – Table Views, Indexes in tables
1.5.12. Roles and tools
1.5.13. Lab – Azure Data Studio
1.5.14. Using Visual Studio Code
1.5.15. Lab – Using sqlcmd
1.5.16. Lab – Azure Database for MySQL, MariaDB, PostgreSQL
1.5.17. Commands for other databases
1.5.18. Azure SQL Encryption, Read Replica, Adding SQL users
1.5.19. Lab – Azure SQL Database – Firewall

2. DESIGN AND IMPLEMENT DATA STORAGE

2.1. Storage Introduction
2.2. Understanding “Data” and relevant relational / non-relational storage systems
2.3. Azure Storage Accounts
- 2.3.1. Storage Account Access Tiers
- 2.3.2. Storage Account Redundancy
- 2.3.3. Storage Account Lifecycle
- 2.3.4. Azure Storage Accounts – Introduction
- 2.3.5. Lab – Creating a storage account, Blob service, Table service, File service
- 2.3.6. Azure Storage Explorer
- 2.3.7. Storage Accounts – Data Redundancy
- 2.3.8. Lab – Azure Blob storage – Service tiers, Firewall
2.4. Azure SQL Databases
2.5. Connect Applications with Storage and SQL Databases (Lab)
2.6. File Formats Overview
2.7. Lab – Azure Data Lake Storage
2.8. Azure Data Lake Gen-2
- 2.8.1. Gen-2 Storage Account
- 2.8.2. Gen-2 Storage creation (Lab)
- 2.8.3. Authorize Gen-2 Storage though Access keys and shared access signature (Lab)
2.9. Azure Storage Costing and Budgeting
2.10. Power Bi to Visualize Data
2.11. SQL Query / T-SQL Theory
2.11.1. SQL / T-SQL Practice (Lab)
2.12. Azure Synapse Analytics
- 2.12.1. OLTP and OLAP
- 2.12.2. Azure Synapse (Create, Compute, External Tables)
- 2.12.3. Azure Synapse Architecture
- 2.12.4. Azure Synapse External Tables (Lab)
- 2.12.5. SQL Pool & Data Cleansing, External Tables (Lab >> CSV, Parquet)
- 2.12.6. Azure Synapse Data Loading and Bulk Insert
- 2.12.7. Azure Synapse Data Loading (Lab >> CSV, Parquet)
- 2.12.8. Azure Synapse PolyBase and pushing dedicated SQL Pool
2.13. Datawarehouse Design and Build to Set up database (Lab)
- 2.13.1. Fact and Dimension Tables
- 2.13.2. SC Dimensions
- 2.13.3. Schema Types (Star Schema and Snowflake Schema)
- 2.13.4. Using Power BI and Copy Commands with Azure Synapse
- 2.13.5. Synapse Table Types
- 2.13.5.1. Round Robin Tables and Hash Distributed Tables, Heap Tables
- 2.13.5.2. Table replication
- 2.13.5.3. Azure Synapse Tables (Lab>> Create, Partitions, Indexes, Window
  Functions, JSON reading, Surrogate Keys, Case Statements)
- 2.13.5.4. Modern Datawarehouse
- 2.13.5.5. Spark Pool

3. DESIGN AND DEVELOP DATA PROCESSING

The objectives covered in this course are:

3.1. Azure Data Factory (ADF)
- 3.1.1. Introduction
- 3.1.2. Extract, Transform, Load
- 3.1.3. Starting with Azure Data Factory
- 3.1.4. Lab – 1 ADF Azure Data Lake to Azure Synapse (Generating a Parquet files,
  Log.csv file, Parquet files)
- 3.1.5. Lab – 2 ADF What about using a query for data transfer
- 3.1.6. Deleting artifacts in Azure Data Factory & Mapping Data Flow
- 3.1.7. Lab – Mapping Data Flow
- 3.1.7.1. Fact Table
- 3.1.7.2. Dimension Table (DimCustomer, DimProduct)
- 3.1.7.3. Surrogate Keys – Dimension tables
- 3.1.8. Lab – Using Cache sink
- 3.1.9. Lab – Handling Duplicate rows
- 3.1.9.1. Note – What happens if we do not have any data in our DimProduct table
- 3.1.9.2. Changing connection details
- 3.1.9.3. Lab – Changing the Time column data in our Log.csv file
- 3.1.9.4. Lab – Convert Parquet to JSON
- 3.1.9.5. Lab – Loading JSON into SQL Pool
- 3.1.9.5.1. Self-Hosted Integration Runtime (Setting up nginx, Setting up the
  runtime, Copy Activity, Mapping Data Flow, Using triggers)
- 3.1.9.6. Lab – Processing JSON (Arrays, Objects)
- 3.1.9.7. Lab – Conditional Split, Schema Drift, Metadata activity
- 3.1.9.8. Lab – Azure DevOps (Git configuration, Release configuration)
- 3.1.10. Lab Material Folder
3.2. Azure Event Hubs and Stream Analytics
- 3.2.1. What are Azure Event Hubs
- 3.2.2. Batch and Real-Time Processing
- 3.2.3. Lab – (Creating an instance of Event hub, Sending and Receiving Events)
- 3.2.4. What is Azure Stream Analytics
- 3.2.5. Lab – (Creating a Stream Analytics job, Azure Stream Analytics – Defining the job)
- 3.2.6. Lab – Reading database diagnostic data – Setup
- 3.2.7. Lab – Reading data from a JSON file (Setup, Implementation)
- 3.2.8. Lab – Reading data from the Event Hub – (Setup, Implementation)
- 3.2.9. Lab – (Timing windows, Adding multiple outputs, Reference data, OVER clause)
- 3.2.10. Lab – Power BI Output
- 3.2.11. Lab – Reading Network Security Group Logs – (Server Setup, Enabling NSG Flow
  Logs, Processing the data)
- 3.2.12. Lab – User Defined Functions
- 3.2.13. Custom Serialization Formats
- 3.2.14. Lab – Azure Event Hubs – Capture Feature
- 3.2.15. Lab – Azure Data Factory – Incremental Data Copy
- 3.2.16. Lab Material Folder

3.3. SCALA, NOTEBOOKS AND SPARK

3.3.1. Introduction to Scala
- 3.3.2. Installing Scala
- 3.3.2.1. Playing with values, Installing IntelliJ IDE
- 3.3.2.2. If construct, for construct, while construct, case construct
- 3.3.2.3. Functions, List collection
- 3.3.3. Starting with Python
- 3.3.3.1. A simple program
- 3.3.3.2. If construct, while construct, List collection, Functions
- 3.3.4. Quick look at Jupyter Notebook
3.3.5. Lab – Azure Synapse – Creating a Spark pool
- 3.3.5.1. Starting out with Notebooks
- 3.3.5.2. Spark DataFrames, Sorting data, Load data, Removing NULL values, Using
  SQL statements, Write data to Azure Synapse
- 3.3.5.3. Combined Power, Sharing tables, Creating tables
- 3.3.5.4. JSON files
- 3.3.6. Lab Materials

3.4. AZURE DATABRICKS

3.4.1. Introduction to DataBricks / Azure DataBricks
- 3.4.2. Clusters in Azure Databricks
- 3.4.3. Lab – Creating a workspace, Creating a cluster
- 3.4.4. Lab – Simple notebook, Using DataFrames, Lab – Reading a CSV file
- 3.4.5. Databricks File System
- 3.4.6. Lab – The SQL Data Frame, Visualizations
- 3.4.7. Lab – Few functions on dates, Filtering on NULL values
- 3.4.8. Lab – Parquet-based files, JSON-based files
- 3.4.9. Lab – Structured Streaming
- 3.4.9.1. Let’s first understand our data
- 3.4.9.2. Streaming from Azure Event Hubs – a) Setup b) Implementation
- 3.4.10. Lab – Getting data from Azure Data Lake – a) Setup b) Implementation
- 3.4.11. Lab – Azure Synapse SQL Dedicated Pool
- 3.4.11.1. Lab – Stream and write Data to Azure Synapse SQL Dedicated Pool
- 3.4.12. Lab – Azure Data Lake Storage Credential Passthrough
- 3.4.13. Lab – Running an automated job
- 3.4.14. Autoscaling a cluster
- 3.4.15. Lab – Removing duplicate rows, Using the PIVOT command, Azure Databricks
  Table, Azure Data Factory – Running a notebook
- 3.4.16. Delta Lake Introduction
- 3.4.16.1. Lab – Creating a Delta Table, Streaming data into the table, Time Travel
- 3.4.17. Decision making to select Azure Synapse or Azure Databricks
- 3.4.18. Lab Materials Folder

4. DESIGN AND DEVELOP DATA SECURITY

4.1. Introduction
4.2. What is the Azure Key Vault Service
4.3. Azure Data Factory – Encryption
4.4. Azure Synapse – Customer Managed Keys
4.5. Azure Dedicated SQL Pool – Transparent Data Encryption
4.6. Lab – Azure Synapse – Data Masking, Auditing
4.7. Azure Synapse – Data Discovery and Classification
4.8. Azure Synapse – Azure AD Authentication
4.9. Lab – Azure Synapse – Azure AD Authentication
- 4.9.1. Setting the admin, creating a user, Row-Level Security, Column-Level Security,
  Role Based Access Control (RBAC), Access Control Lists (ACL)
4.10. Lab – Azure Synapse – External Tables Authorization
- 4.10.1. via Managed Identity, via Azure AD Authentication
4.11. Lab – Azure Synapse – Firewall
4.12. Lab – Azure Data Lake – Virtual Network Service Endpoint
4.13. Lab – Azure Data Lake – Managed Identity – Data Factory
4.14. Lab Materials Folder

5. MONITOR AND OPTIMIZE DATA STORAGE AND DATA PROCESSING

5.1.1. Best practices for structuring files in your data lake
5.1.2. Azure Storage accounts – Query acceleration
5.1.3. View on Azure Monitor
5.1.4. Azure Monitor – Alerts
5.1.5. Azure Synapse – System Views
5.1.6. Azure Synapse – Result set caching
5.1.7. Azure Synapse – Workload Management
5.1.8. Azure Synapse – Retention points
5.1.9. Lab – Azure Data Factory – Monitoring
5.1.10. Azure Data Factory – Monitoring – Alerts and Metrics
5.1.11. Lab – Azure Data Factory – Annotations
5.1.12. Azure Data Factory – Integration Runtime – Note
5.1.13. Azure Data Factory – Pipeline Failures
5.1.14. Azure Key Vault – High Availability
5.1.15. Azure Stream Analytics – Metrics, Streaming Units, The importance of time,
More on time aspect, Monitoring the stream analytics job Example
5.1.16. Azure Event Hubs and Stream Analytics – Partitions
5.1.17. Azure Stream Analytics – An example on multiple partitions
5.1.18. Azure Stream Analytics – More on partitions
5.1.19. Azure Stream Analytics – An example on diagnosing errors
5.1.20. Azure Stream Analytics – Diagnostics setting
5.1.21. Azure Databricks – Monitoring
5.1.22. Azure Databricks – Sending logs to Azure Monitor
5.1.23. Azure Event Hubs – High Availability

6. WORKING WITH COSMOS DB

6.1.1. Introduction, Non-relational data
6.1.2. Different types of non-relational data stores
6.1.3. Lab – Azure Cosmos DB
6.1.4. Lab – Azure Cosmos DB – Working with data
6.1.5. Lab – Azure Cosmos DB – Working with data – Resource
6.1.6. NEW – Azure Cosmos DB – More on Partitions
6.1.7. The different API’s in Azure Cosmos DB
6.1.8. Lab – Azure Cosmos DB – Table API
6.1.9. NEW – Lab – Azure Cosmos DB – MongoDB API
6.1.10. NEW – Azure Cosmos DB – Gremlin API
6.1.11. NEW – Lab – Azure Cosmos DB – Gremlin API
6.1.12. NEW – Lab – Azure Cosmos DB – Gremlin API – Resources
6.1.13. Azure Cosmos DB – High Availability
6.1.14. Azure Storage Account – Azure CLI
6.1.15. Azure Cosmos DB – Azure CLI
6.1.16. Azure CLI – Commands
6.2. Practice Questions / Tests
- 6.2.1. Project 1
- 6.2.2. Project 2
- 6.3. Revision or Repeat Section requests if any.

There are no items in the curriculum yet.

demo

Duration : 10 week
On-Demand :
Azure Data Engineer fee : PKR 78,000
3 Installments Plan : PKR 26,000 per month
Lump sum fee : PKR 70,000
LMS Fee (Lifetime access) : PKR 25,000
15% Off for PASHA & GISPP members : PKR 66,300 (lump sum)

Azure Data Engineer Course Outline

Course Outline

PREREQUISITES FOR AZURE DATA ENGINEER:

DP-900 MICROSOFT AZURE DATA FUNDAMENTALS

DP-203 AZURE DATA ENGINEER

1. DATA ENGINEERING FUNDAMENTALS

2. DESIGN AND IMPLEMENT DATA STORAGE

3. DESIGN AND DEVELOP DATA PROCESSING

3.3. SCALA, NOTEBOOKS AND SPARK

3.4. AZURE DATABRICKS

4. DESIGN AND DEVELOP DATA SECURITY

5. MONITOR AND OPTIMIZE DATA STORAGE AND DATA PROCESSING

6. WORKING WITH COSMOS DB

demo

Support

Contact Us

Azure Data Engineer Course Outline

Course Outline

PREREQUISITES FOR AZURE DATA ENGINEER:

DP-900 MICROSOFT AZURE DATA FUNDAMENTALS

DP-203 AZURE DATA ENGINEER

1. DATA ENGINEERING FUNDAMENTALS

2. DESIGN AND IMPLEMENT DATA STORAGE

3. DESIGN AND DEVELOP DATA PROCESSING

3.3. SCALA, NOTEBOOKS AND SPARK

3.4. AZURE DATABRICKS

4. DESIGN AND DEVELOP DATA SECURITY

5. MONITOR AND OPTIMIZE DATA STORAGE AND DATA PROCESSING

6. WORKING WITH COSMOS DB

demo

Modal title