DataBase
Table of Contents
Towards managing data stores.
Speaking formally, databases represent another layer of abstraction over a computer system's filesystem that intend to provide convenient endpoints for tasks (like templated insertion, deletion, searches, etc) that would be performed with higher frequencies : they facilitate structured storage, pushing for reliability and efficiency.
1. Standards: ACID Compliance
- Atomicity
- Consistency
- Isolation
- Durability
checkout : https://docs.digitalocean.com/glossary/acid/
2. Types (Need Not be Mutually Exclusive)
2.1. Relational Databases
2.1.1. Tools
- PostGres
- RDBMS
- Open Source with all the bells and whistles
- the recommended choice
- MySQL
- RDBMS
- open source without all the bells and whistles
- batteries included though
- SQLite
- RDBMS : single file
- open source : spartanish
- has its moments : https://www.sqlite.org/whentouse.html
2.1.2. Extensions
- Object Relational Mappers
- Coms with the database without sql queries : language specific
- read up for the case of python
- some issues do arise:
- Impedence mismatch
- performance issues
- Personally, I'd rather write raw SQL : see stored procedures
- still, checkout sqlAlchemy
- Coms with the database without sql queries : language specific
2.2. Graph Databases
- three storage aspects
- Node
- Edge
- Property
2.3. Document Stores
2.4. Key-Value Stores
- underlying data stucture being the HashMap
2.4.1. Tools
- Redis
- in memory
- good for caching, queuing, and storing session data and requests
- check out : https://realpython.com/python-redis/
- Memcached
2.5. Columnar Databases
- builds on key-value pairs
- each pair a row in a store while each column family is similar to a table in the relational model.
2.6. In-Memory Databases
2.8. NewSQL Databases
3. Auxiliary features
3.1. Data Replication
- one master, multiple read-only slaves (different than sharding)
3.2. Semi-structured storage opportunities
- JSON type in RDBMS for instance
3.3. Sharding
- horizontal scaling for multiple read/write instances
- introduces delays for maintenance of data consistency
3.4. Monitoring
- Profiling processes, analysing frequency of certain queries, etc, help with better structuring the templates (index, schema, etc) to push for performance