Main.py Overview

Project

This project consists of an example of how to handle JSON and an implementation flow using Databricks as the tool.

The main factors of any project are:

CI/CD - in this case used to update the job on Databricks. Testing - Validating transformation or calculation functions is very important. Environment - what the cloud environment needs to run the project. Documentation - The more comments and information, the better, right?

Main.py Overview

This section of the main.py file focuses on merging data into specific tables and executing an autoloader process using Spark Streaming.

Merging Data into Tables

The code snippet begins with merging data into tables:

account
transaction
account_status_change
Similarly, if df_account_status_change is not empty, it merges the data into the "account_status_change" table, also using "id" as the key and "date_time" as the additional column.

Autoloader Execution

The execute_autoloader function is defined to execute the autoloader process using Spark Streaming. Here's a breakdown of its parameters and functionality:

Parameters:

load_directory: The directory containing the data files.
schema_path: The path to the schema file.
checkpoint_location: The location to save the checkpoint.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cicd		cicd
templates		templates
terraform		terraform
test		test
Dockerfile		Dockerfile
README.md		README.md
job.py		job.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project

Main.py Overview

Merging Data into Tables

Autoloader Execution

Parameters:

About

Releases

Packages

Languages

thiagoschonrock/pyspark_sample

Folders and files

Latest commit

History

Repository files navigation

Project

Main.py Overview

Merging Data into Tables

Autoloader Execution

Parameters:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages