Storage

1. Overview

1.1. Types of Storage:

  • Primary Storage: Also known as volatile memory or RAM, it is used by computers to temporarily store data that is actively being used or processed.
  • Secondary Storage: Refers to non-volatile storage like hard drives (HDDs), solid-state drives (SSDs), and optical discs where data is stored for long-term retention.
  • Tertiary Storage: Involves storage systems used for archiving and backup such as tape drives or cloud-based cold storage solutions.
  • Quaternary Storage: Rarely used term, sometimes refers to off-site storage systems or lesser-used forms like microforms.

1.2. Storage Technologies:

  • Magnetic Storage: Utilizes magnetic media to store data (e.g., HDDs, magnetic tapes).
  • Optical Storage: Uses lasers to readwrite data (e.g., CDs, DVDs, Blu-rays).
  • Flash Storage: A form of EEPROM, non-volatile storage technology used in SSDs, USB flash drives.
  • Cloud Storage: Allows data to be stored and accessed over the internet, offered by providers like AWS, Google Cloud, Azure.

1.3. Key Concepts:

  • Volatility: Determines whether storage retains data when power is lost.
  • Capacity: Amount of data a storage medium can hold.
  • Speed: Access time and data transfer rates of a storage medium.
  • Durability: Resistance to physical wear and data deterioration over time.

2. Misc

2.1. Understanding Data Access Frequency

2.1.1. Temperatures

  1. Hot Data
    • more than many times per day
    • could be several times per second
  2. Cold Data
    • seldom queried
    • often retained for compliance purposes
    • backups in cases of catastrophic failures

2.2. Handy Questions to evaluate Storage systems

These are some questions that help gauge the choices of storage systems when architecting a data solution such as:

2.3. Questions

  • Is this storage solution compatible with the architecture's required read and write speeds?
  • Will storage create a bottleneck for downstream processes?
  • Do you understand how this storage technology works?
    • are you using the storage system optimally or commiting unnatural acts?
    • for instance: are you applying a high rate of random access updates in an object storage (an antipattern)
  • Will this storage system handle anticipated future scale?
    • you should consider all capacity limits on the storage system: total available storage, read operation rate, write volume, etc
  • Will downstream users and processes be able to retrieve data in the required service level agreement
  • Are you capturing metadata about the schema evolution, data flows, data lineage and so forth?
    • Metadata has a significant impact on the utility of data
    • Metadata represents an investment in the future, dramatically enhancing discoverability and institutional knowledge to streamline future projects and architecture changes.
  • Is this a pure storage solution (object storage), or does it support complex query patterns (i.e. a cloud data warehouse)?
  • Is the storage system schema-agnostic (object storage)? Flexible schema (Cassandra)? Enforced Schema (a cloud data warehouse)?
  • How are you tracking master data, golden records data quality, and data lineage for data governance?
  • How are you handling regulatory compliance and data sovereignty? For example, can you store your data in certain geographical locations but not others?

3. Relevant Nodes

Tags::data:cs: