Skip to content

Latest commit

 

History

History
100 lines (78 loc) · 9.04 KB

Challenge-01.md

File metadata and controls

100 lines (78 loc) · 9.04 KB

Challenge 01 - Data Warehouse Migration

< Previous Challenge - Home - Next Challenge >

Introduction

WWI wants to modernize their data warehouse in phases. The first stage will be to scale-out horizontally their existing data warehouse. The data warehouse migration will be from their on-premise WWI Data Warehouse to Azure Synapse Analytics. They like to reuse their existing ETL code and leave their source systems as-is (no migration). This will require a hybrid architecture for on-premise OLTP and Azure Synapse Analytics as the end state. This exercise will be showcasing how to migrate your traditional SQL Server (SMP) to Azure Synapse Analytics (MPP).

Description

The objective of this challenge is to migrate the WWI DW (OLAP) to Azure Synapse Analytics. Azure Synapse Analytics is a MPP (Massive Parallel Processing) platform that allows you to scale out your datawarehouse by adding new server nodes (compute) rather than adding more cores to the server.

There will be four different object types we'll migrate:

  • Database Schemas and Tables
  • Database code (Stored Procedure, Function, Triggers, etc)
  • SSIS code set refactor (Refactor has been done for you and not part of success criteria of this hack)
  • Data migration (with SSIS)

Here are the steps to migrate from SQL Server to Synapse Analytics.

  • Migrate all database schemas to Synapse
  • Create one table per schema in Synapse
    • Tables to create are; Dimension.City, Fact.Order & Integration.Order_Staging
  • Refactor one Stored Procedure per design pattern. Parathensis contains recommended objects
    • Dimension Tables (Integration.MigratedCityData)
    • Fact Table (Appends Only; Integration.MigratedStagedSaleData)
    • Fact Table (Merge; Integration.MigratedStagedMovementData)
  • Execute the T-SQL script, Master Create.sql in the /Challenge01/ folder of the student Resource.zip package
    • This will create all remaining fact, dimension, integration tables and stored procedures
    • This will ensure you have the full enviornment setup and configured to peform a bulk load
  • Run SSIS jobs based on new mappings
    • You can find the Daily ETLMDWLC.ispac SSIS package in the /Challenge01/ folder of the student Resource.zip package
    • Review data setup instructions before you execute the SSIS jobs
    • Load data into Synapse Analytics
  • Run Power BI Report (WWI_Sales.pbit)
    • You can find the WWI_Sales.pbit file in the /Challenge01/ folder of the student Resources.zip package

The Solution diagram is described in the text following this diagram.

Success Criteria

  • Compare your Power BI report results with the coach's screenshot to see if your results match. This will confirm you completed the migration successfully.

Learning Resources

Overall Migration

Database Schema Migration

Database code rewrite (T-SQL)

SSIS Job

Tips

  • Connect to SQL Server 2019 databases in the containers with Azure Data Studio
  • Determine your distribution column (HINT IDENTITY Column can not be your distribution key)
  • Run this query to identify which columns are not supported by Azure Synapse Analytics
     SELECT  t.[name], c.[name], c.[system_type_id], c.[user_type_id], y.[is_user_defined], y.[name]
     	FROM sys.tables  t
     	JOIN sys.columns c on t.[object_id]    = c.[object_id]
     	JOIN sys.types   y on c.[user_type_id] = y.[user_type_id]
     	WHERE y.[name] IN ('geography','geometry','hierarchyid','image','text','ntext','sql_variant','timestamp','xml')
     	OR  y.[is_user_defined] = 1;
    
  • Review the SSIS jobs that are at this Github repo (Daily.ETL.ispac) This job leverages stored procedures in the Source and Target databases extensively. This will require a refactoring of the Stored procedures for the OLAP database when you repoint the ETL target to Azure Synapse Analytics.
  • Current package in repo is setup for SQL Server 2019 which means you can run it thru Azure Data Factory. Here are instructions to deploy the SSIS pakcage on in the SSIS Catalog.
  • Create ADF pipeline with Execute SSIS Package activity. ADF Activity
  • Update connection settings in package.
  • For the first time setup only, you will need to execute the "Master Create.sql" script to populate all control tables before you execute the SSIS job. This is required and it is only done on the initial setup. After this is complete, you can run the SSIS job. For all subsequent runs after the initial setup, execute the Reseed ETL Stored Procedure only. This stored procedure will rollback the database to it's original state.**
  • Execute this package to load data into Azure Synapse Analytics. SSIS Execution

Advanced Challenges (Optional)

Too comfortable? Eager to do more? Try these additional challenges!