Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Signed-off-by: Ramin Mohammadi <26876927+raminmohammadi@users.noreply.github.com>
  • Loading branch information
raminmohammadi authored Sep 18, 2024
1 parent 71679b8 commit da14801
Showing 1 changed file with 62 additions and 0 deletions.
62 changes: 62 additions & 0 deletions Labs/Data_Storage_Warehouse_Labs/README.md
Original file line number Diff line number Diff line change
@@ -1 +1,63 @@
# Data Warehouse: Overview and Types
## What is a Data Warehouse?
A data warehouse, often referred to as an enterprise data warehouse (EDW), is a system used to analyze and report structured and semi-structured data from various sources like point-of-sale transactions, customer relationship management (CRM) systems, and marketing automation tools. Data warehouses are designed to consolidate and integrate vast amounts of both current and historical data, enabling long-term data analysis to support informed business decisions.

Data warehouses offer key analytical capabilities, including ad hoc analysis, custom reporting, and support for business applications. They form a critical component of business intelligence (BI) systems by allowing organizations to perform complex queries and generate reports that drive strategic decision-making.

### Key Features of Data Warehouses:
- Consolidates data from multiple sources
- Supports both historical and real-time data analysis
- Enables business intelligence and reporting
- Scalable for large data volumes
- Supports ad hoc queries and custom reporting
- Traditional vs. Cloud-Based Data Warehouses

## Traditional Data Warehouses
Traditional data warehouses are hosted on-premises, typically designed to handle relational data using rigid schemas and batch processing. They require significant upfront investment in hardware and software and are less flexible when it comes to scaling or handling real-time data.

### Key Characteristics:

- Hosted on-premises
- Higher costs due to hardware and maintenance
- Limited scalability
- Primarily supports batch processing and structured data


## Cloud-Based Data Warehouses
Cloud data warehouses extend the capabilities of traditional systems, offering scalability, flexibility, and cost-efficiency. They run on fully managed cloud services, eliminating the need for physical infrastructure management. Cloud data warehouses can handle diverse data types, process large data volumes, and support both real-time and batch processing.

### Key Advantages:

- Instant scalability to meet business needs
- Pay-as-you-go pricing models
- Lower upfront investment
- Supports real-time data processing and complex queries
- No need for physical infrastructure management

## How Cloud Data Warehousing Works
Cloud data warehouses follow the same principles as traditional ones but operate in a cloud environment. Data is extracted from source systems, transformed, and loaded (ETL) or loaded first and then transformed (ELT) into the warehouse. Cloud-based BI tools are then used to analyze and report on the data. These systems can also integrate with data lakes for unstructured data management.

### Key Capabilities of Cloud Data Warehouses:

- Massively parallel processing (MPP)
- Columnar data storage for efficiency
- Self-service ETL/ELT tools
- Automatic backups and disaster recovery
- Compliance and data governance features
- Seamless integration with BI, AI, and machine learning tools

### Types of Data Warehouses

**Enterprise Data Warehouse (EDW):** A centralized system that consolidates data across an entire organization. It is ideal for supporting large-scale analytics and business intelligence.

**Operational Data Store (ODS):** A system used for real-time operational reporting and quick query processing. Data in ODS is frequently updated, making it suitable for routine activities rather than complex analytics.

**Data Mart:** A subset of a data warehouse that focuses on a specific business area, such as sales or marketing. Data marts are smaller and more specialized for specific departments within an organization.

### Examples of Popular Data Warehouses
- Google BigQuery (Cloud)
- Amazon Redshift (Cloud)
- Snowflake (Cloud)
- Azure Synapse Analytics (Cloud)
- IBM Db2 Warehouse (Traditional/Cloud)
- Teradata (Traditional/Cloud)

0 comments on commit da14801

Please sign in to comment.