Data Warehouse
1. Overview
1.1. Definition:
- A data warehouse is a centralized repository that stores large volumes of structured data accumulated from various sources within an organization. It is designed to facilitate reporting and analysis.
1.2. Purpose:
- The primary purpose of a data warehouse is to offer a comprehensive view of an organization's data to support business decision-making and analytics. It helps in data consolidation and management.
1.3. Architecture:
- Typically, it follows a layered design, with components such as:
- Data Sources: Input from various databases, flat files, or external sources.
- ETL (Extract, Transform, Load): Processes that extract data from sources, cleanse and transform it, and load it into the warehouse.
- Data Storage: Centralized storage in data marts or a central repository.
- Metadata: Information about data definitions, mappings, and transformations.
- End-user access tools: Interfaces for analysis, reporting, and data mining.
1.4. Characteristics:
- Subject-Oriented: Organized around key business subjects such as sales, inventory, or finance.
- Integrated: Combines data from various sources into a unified view.
- Time-Variant: Involves historical data over time for trend analysis.
- Non-Volatile: Data is stable and does not change once it’s stored.
1.5. Connections:
- Data warehouses often work in tandem with OLAP (Online Analytical Processing) for multidimensional analysis and Business Intelligence (BI) tools that enable user-friendly reporting and dashboards.
- They are distinct from data lakes, which store raw, unprocessed data that can include structured, semi-structured, or unstructured formats.
Tags::data: