Data Warehouse

1. Overview

1.1. Definition:

  • A data warehouse is a centralized repository that stores large volumes of structured data accumulated from various sources within an organization. It is designed to facilitate reporting and analysis.

1.2. Purpose:

  • The primary purpose of a data warehouse is to offer a comprehensive view of an organization's data to support business decision-making and analytics. It helps in data consolidation and management.

1.3. Architecture:

  • Typically, it follows a layered design, with components such as:
    • Data Sources: Input from various databases, flat files, or external sources.
    • ETL (Extract, Transform, Load): Processes that extract data from sources, cleanse and transform it, and load it into the warehouse.
    • Data Storage: Centralized storage in data marts or a central repository.
    • Metadata: Information about data definitions, mappings, and transformations.
    • End-user access tools: Interfaces for analysis, reporting, and data mining.

1.4. Characteristics:

  • Subject-Oriented: Organized around key business subjects such as sales, inventory, or finance.
  • Integrated: Combines data from various sources into a unified view.
  • Time-Variant: Involves historical data over time for trend analysis.
  • Non-Volatile: Data is stable and does not change once it’s stored.

1.5. Connections:

  • Data warehouses often work in tandem with OLAP (Online Analytical Processing) for multidimensional analysis and Business Intelligence (BI) tools that enable user-friendly reporting and dashboards.
  • They are distinct from data lakes, which store raw, unprocessed data that can include structured, semi-structured, or unstructured formats.
Tags::data: