Storage
Table of Contents
1. Overview
1.1. Types of Storage:
- Primary Storage: Also known as volatile memory or RAM, it is used by computers to temporarily store data that is actively being used or processed.
- Secondary Storage: Refers to non-volatile storage like hard drives (HDDs), solid-state drives (SSDs), and optical discs where data is stored for long-term retention.
- Tertiary Storage: Involves storage systems used for archiving and backup such as tape drives or cloud-based cold storage solutions.
- Quaternary Storage: Rarely used term, sometimes refers to off-site storage systems or lesser-used forms like microforms.
1.2. Storage Technologies:
- Magnetic Storage: Utilizes magnetic media to store data (e.g., HDDs, magnetic tapes).
- Optical Storage: Uses lasers to readwrite data (e.g., CDs, DVDs, Blu-rays).
- Flash Storage: A form of EEPROM, non-volatile storage technology used in SSDs, USB flash drives.
- Cloud Storage: Allows data to be stored and accessed over the internet, offered by providers like AWS, Google Cloud, Azure.
1.3. Key Concepts:
- Volatility: Determines whether storage retains data when power is lost.
- Capacity: Amount of data a storage medium can hold.
- Speed: Access time and data transfer rates of a storage medium.
- Durability: Resistance to physical wear and data deterioration over time.
2. Misc
2.1. Understanding Data Access Frequency
2.2. Handy Questions to evaluate Storage systems
These are some questions that help gauge the choices of storage systems when architecting a data solution such as:
2.3. Questions
- Is this storage solution compatible with the architecture's required read and write speeds?
- Will storage create a bottleneck for downstream processes?
- Do you understand how this storage technology works?
- are you using the storage system optimally or commiting unnatural acts?
- for instance: are you applying a high rate of random access updates in an object storage (an antipattern)
- Will this storage system handle anticipated future scale?
- you should consider all capacity limits on the storage system: total available storage, read operation rate, write volume, etc
- Will downstream users and processes be able to retrieve data in the required service level agreement
- Are you capturing metadata about the schema evolution, data flows, data lineage and so forth?
- Metadata has a significant impact on the utility of data
- Metadata represents an investment in the future, dramatically enhancing discoverability and institutional knowledge to streamline future projects and architecture changes.
- Is this a pure storage solution (object storage), or does it support complex query patterns (i.e. a cloud data warehouse)?
- Is the storage system schema-agnostic (object storage)? Flexible schema (Cassandra)? Enforced Schema (a cloud data warehouse)?
- How are you tracking master data, golden records data quality, and data lineage for data governance?
- How are you handling regulatory compliance and data sovereignty? For example, can you store your data in certain geographical locations but not others?