Querying Databases
Table of Contents
1. Overview
Queries abstract the underlying data structures in a database by providing a simplified, high-level interface to manipulate and retrieve data, masking the complexity of the physical data storage and retrieval processes.
2. Execution of a Query
The execution of a query involves multiple steps:
- Parsing: The query is checked for syntax errors. For example, SQL queries are parsed to ensure they follow proper syntax rules.
- Optimization: The query is optimized to determine the most efficient way to access the data. Tools like query planners and optimizers in databases like PostgreSQL play a crucial role here.
- Execution: The optimized query is executed using the chosen execution plan. The database engine performs operations such as searching, filtering, and joining tables.
- Result Retrieval: The data is fetched from the database and returned to the user or application that made the query.
3. Elaborated Types
3.0.1. 1. Relational Databases
- SELECT: Used to retrieve data from the tables (e.g.,
SELECT * FROM users WHERE age > 25;
). - INSERT: Used to add new rows to a table (e.g.,
INSERT INTO users (name, age) VALUES ('John', 30);
). - UPDATE: Used to modify existing data within a table (e.g.,
UPDATE users SET age = 31 WHERE name = 'John';
). - DELETE: Removes existing rows from a table (e.g.,
DELETE FROM users WHERE age < 18;
). - DDL (Data Definition Language): Includes queries to define or alter schema objects like tables and indexes (e.g.,
CREATE TABLE users (id INT, name VARCHAR(100));
). - DCL (Data Control Language): Manages access to the database, such as granting or revoking permissions (e.g.,
GRANT SELECT ON users TO read_only_user;
).
3.0.2. 2. Graph Databases
- Pattern Matching: Used to find specific patterns within the graph (e.g., in Cypher for Neo4j:
MATCH (p:Person)-[:KNOWS]->(f:Friend) RETURN p, f;
). - Path Queries: Queries that find paths between nodes (e.g.,
MATCH path = (start)-[*]->(end) RETURN path;
). - Graph Traversal: Navigates through the graph based on node relationships (e.g., in Gremlin for Apache TinkerPop:
g.V().has('name', 'John').out('KNOWS').values('name');
).
3.0.3. 3. Document Stores
- Find: Retrieves documents from a collection (e.g., in MongoDB:
db.users.find({ age: { $gt: 25 } });
). - Insert: Adds new documents to a collection (e.g.,
db.users.insert({ name: 'John', age: 30 });
). - Update: Modifies existing documents (e.g.,
db.users.update({ name: 'John' }, { $set: { age: 31 } });
). - Delete: Removes documents from a collection (e.g.,
db.users.remove({ age: { $lt: 18 } });
).
3.0.4. 4. Key-Value Stores
- Get: Retrieves the value for a given key (e.g., in Redis:
GET user:1000:name
). - Set: Adds or updates a value for a given key (e.g.,
SET user:1000:name "John"
). - Delete: Removes the value associated with a key (e.g.,
DEL user:1000:name
).
3.0.5. 5. Columnar Databases
- SELECT: Efficient retrieval of specific columns (e.g., in Cassandra:
SELECT name, age FROM users WHERE age > 25;
). - INSERT: Adds rows to a column family (e.g.,
INSERT INTO users (id, name, age) VALUES (1001, 'John', 30);
). - UPDATE: Modifies existing columns within a row (e.g.,
UPDATE users SET age = 31 WHERE id = 1001;
). - DELETE: Removes rows from a column family (e.g.,
DELETE FROM users WHERE age < 18;
).
3.0.6. 6. In-Memory Databases
- Get/Set: Similar to key-value stores, these rapidly access data stored in RAM (e.g., in Redis:
GET user:1000:name
andSET user:1000:name "John"
). - Atomic Operations: Execute operations like incrementing a counter (e.g.,
INCR page_view_count
). - Pub/Sub: Implement publish/subscribe messaging patterns (e.g.,
SUBSCRIBE channel1
).
3.0.7. 7. Time-Series Databases
- Select: Retrieves time-series data points (e.g., in InfluxDB:
SELECT mean(temperature) FROM weather WHERE time > now() - 1h GROUP BY time(10m);
). - Insert: Adds new data points (e.g.,
INSERT INTO weather (time, temperature) VALUES (now(), 22.5);
). - Aggregation: Perform time-based aggregations like averages, sums (e.g.,
SELECT SUM(usage) FROM electricity WHERE time > now() - 1d GROUP BY time(1h);
).
3.0.8. 8. NewSQL Databases
- SELECT: High-performance SQL queries (e.g., in CockroachDB:
SELECT * FROM users WHERE age > 25;
). - INSERT: Adding new rows with ACID compliance (e.g.,
INSERT INTO users (name, age) VALUES ('John', 30);
). - UPDATE: Modifying data with distributed transactions (e.g.,
UPDATE users SET age = 31 WHERE name = 'John';
). - DELETE: Removing rows consistently across nodes (e.g.,
DELETE FROM users WHERE age < 18;
).
4. Building a Query Language's Inference Engine
Building a query language's inference engine is a complex task involving several key components:
- Lexical Analysis (Tokenization):
- Converts query strings into tokens. Tools like ANTLR can assist with this process.
- Example: Breaking down the SQL statement
SELECT * FROM users;
into tokens likeSELECT
,*
,FROM
,users
,;
.
- Parsing:
- Parses tokens into a syntax tree using a grammar that defines valid query structures.
- Example: Using parser generators like Bison to create a parse tree from SQL queries.
- Semantic Analysis:
- Checks for semantic errors like type mismatches and schema validations.
- Example: Ensuring that the
users
table exists and that*
is a valid column selection.
- Optimization:
- Optimizes the query plan for efficient execution.
- Techniques include query re-writing, join reordering, and indexing strategies.
- Example: PostgreSQL’s query planner optimizes
SELECT * FROM users WHERE age > 25;
by using an index on theage
column.
- Execution Plan Generation:
- Translates the optimized query into an execution plan, detailing how to retrieve data.
- Example: Deciding whether to perform a full table scan or utilize an index.
- Query Execution:
- Executes the plan using the database engine, managing I/O and computation.
- Example: Reading data blocks from disk into memory, applying filters, and returning results.
- Result Formatting:
- Converts raw execution results into the desired output format.
- Example: Formatting query results as JSON or CSV.