Database Management Systems (DBMS)
Database Management Systems (DBMS)
A comprehensive guide covering DBMS fundamentals, database architecture, and core concepts.
Table of Contents
- Introduction to DBMS
- Database Architecture
- Data Models
- ER Model & ER Diagrams
- Relational Model
- Normalization
- Transactions & Concurrency Control
- Indexing
- NoSQL Overview
- SQL vs NoSQL
- Popular Databases
- Best Practices
Related Guides:
- SQL Guide - Complete SQL commands and queries
- MongoDB Guide - MongoDB in-depth
- Redis Guide - Redis data structures and commands
Introduction to DBMS
What is a Database?
A database is an organized collection of structured data stored electronically in a computer system.
What is DBMS?
A Database Management System (DBMS) is software that manages databases, providing an interface for users and applications to store, retrieve, update, and manage data.
Advantages of DBMS
| Advantage | Description |
|---|---|
| Data Abstraction | Hides complexity from users |
| Data Independence | Changes in data structure don't affect applications |
| Data Integrity | Ensures accuracy and consistency |
| Data Security | Access control and authentication |
| Concurrent Access | Multiple users can access simultaneously |
| Backup & Recovery | Protection against data loss |
| Reduced Redundancy | Minimizes data duplication |
Types of DBMS
- Hierarchical DBMS: Tree-like structure (e.g., IBM IMS)
- Network DBMS: Graph structure with multiple parents (e.g., IDMS)
- Relational DBMS (RDBMS): Table-based structure (e.g., MySQL, PostgreSQL)
- Object-Oriented DBMS: Object-based storage (e.g., db4o)
- NoSQL DBMS: Non-relational (e.g., MongoDB, Redis)
Database Architecture
Three-Schema Architecture
┌─────────────────────────────────────────┐
│ External Level │ ← User Views (View 1, View 2, ...)
│ (View Schema) │
├─────────────────────────────────────────┤
│ Conceptual Level │ ← Logical Structure
│ (Logical Schema) │
├─────────────────────────────────────────┤
│ Internal Level │ ← Physical Storage
│ (Physical Schema) │
└─────────────────────────────────────────┘Data Independence
- Logical Data Independence: Ability to change conceptual schema without affecting external schema
- Physical Data Independence: Ability to change internal schema without affecting conceptual schema
Database Users
- Database Administrator (DBA): Manages entire database system
- Database Designers: Design database structure
- Application Programmers: Write applications using database
- End Users: Query and use the database
DBMS Components
┌─────────────────────────────────────────────────────────────┐
│ DBMS COMPONENTS │
├─────────────────────────────────────────────────────────────┤
│ Query Processor │ Parses and optimizes queries │
│ Storage Manager │ Manages disk storage │
│ Transaction Manager │ Handles ACID properties │
│ Buffer Manager │ Manages memory buffers │
│ Lock Manager │ Controls concurrent access │
│ Recovery Manager │ Ensures durability & recovery │
└─────────────────────────────────────────────────────────────┘Data Models
1. Hierarchical Model
- Data organized in tree-like structure
- Parent-child relationships
- One-to-many relationships only
[Company]
/ | \
[Dept1] [Dept2] [Dept3]
/ \
[Emp1] [Emp2]Pros: Simple navigation, fast access
Cons: No many-to-many, complex updates
2. Network Model
- Extension of hierarchical model
- Allows many-to-many relationships
- Uses sets to represent relationships
[Student]─────┬─────[Student]
│ │ │
▼ ▼ ▼
[Course] [Course] [Course]Pros: More flexible than hierarchical
Cons: Complex implementation, difficult maintenance
3. Relational Model
- Data stored in tables (relations)
- Rows (tuples) and columns (attributes)
- Most widely used model
| EmpID | Name | Department |
|---|---|---|
| 1 | Alice | HR |
| 2 | Bob | IT |
| 3 | Charlie | Finance |
Pros: Simple, flexible, powerful query language (SQL)
Cons: Can be slower for large-scale distributed systems
4. Object-Oriented Model
- Data stored as objects
- Supports inheritance and encapsulation
- Complex data types
Pros: Natural fit for OOP languages
Cons: Less mature, complex queries
5. Document Model
- Data stored as documents (JSON/BSON)
- Flexible schema
- Used in NoSQL databases
{
"name": "Alice",
"department": "HR",
"skills": ["communication", "management"]
}Pros: Flexible, scales horizontally
Cons: Limited joins, eventual consistency
ER Model & ER Diagrams
Entity-Relationship Model
A conceptual data model used to describe the structure of a database.
Components
1. Entity
- Real-world object with independent existence
- Strong Entity: Has its own primary key
- Weak Entity: Depends on another entity for identification
┌───────────────┐ ╔═══════════════╗
│ Student │ ║ Dependent ║
│ (Strong) │ ║ (Weak) ║
└───────────────┘ ╚═══════════════╝
Rectangle Double Rectangle2. Attributes
| Type | Description | Notation |
|---|---|---|
| Simple | Cannot be divided (e.g., Age) | ○ |
| Composite | Can be divided (e.g., Full Name → First, Last) | ○─┬─○ |
| Single-valued | One value (e.g., SSN) | ○ |
| Multi-valued | Multiple values (e.g., Phone Numbers) | ◎ (double oval) |
| Derived | Calculated from others (e.g., Age from DOB) | ○ (dashed) |
| Key Attribute | Uniquely identifies entity | ○ (underlined) |
3. Relationships
- Association between entities
- Degree of Relationship:
- Unary (1 entity) - Employee manages Employee
- Binary (2 entities
) - Student enrolls in Course - Ternary (3 entities) - Supplier supplies Part to Project
4. Cardinality
| Cardinality | Description | Example |
|---|---|---|
| 1:1 | One entity relates to one other | Person ↔ Passport |
| 1:N | One entity relates to many others | Department → Employees |
| M:N | Many entities relate to many others | Students ↔ Courses |
ER Diagram Notation
┌─────────────┐ ◇─────────◇ ┌─────────────┐
│ Entity │─────────│Relationship│───────│ Entity │
└─────────────┘ ◇─────────◇ └─────────────┘
│
│
○─○ (Attribute)Extended ER (EER) Concepts
Specialization (Top-down)
Defining subclasses from a superclass
[Person]
│
┌─────┴─────┐
▼ ▼
[Employee] [Customer]Generalization (Bottom-up)
Combining subclasses into a superclass
Aggregation
Treating a relationship as an entity
┌─────────────────────────────┐
│ [Works_On] relationship │
│ (Employee works on Project)│
└──────────────┬──────────────┘
│
▼
[Monitors] ← Manager monitors Works_OnRelational Model
Key Terminology
| Term | Description |
|---|---|
| Relation | A table |
| Tuple | A row |
| Attribute | A column |
| Domain | Set of allowed values for an attribute |
| Degree | Number of attributes |
| Cardinality | Number of tuples |
| Schema | Structure definition (R(A₁, A₂, ..., Aₙ)) |
Keys in RDBMS
| Key Type | Description |
|---|---|
| Super Key | Set of attributes that uniquely identifies a tuple |
| Candidate Key | Minimal super key (no proper subset is a super key) |
| Primary Key | Chosen candidate key to identify tuples |
| Alternate Key | Candidate keys not chosen as primary key |
| Foreign Key | Attribute referencing primary key of another table |
| Composite Key | Key consisting of multiple attributes |
| Surrogate Key | System-generated artificial key |
Relational Algebra
Basic Operations
| Operation | Symbol | Description |
|---|---|---|
| Selection | σ | Select rows based on condition |
| Projection | π | Select specific columns |
| Union | ∪ | Combine tuples from two relations |
| Intersection | ∩ | Common tuples |
| Difference | − | Tuples in R but not in S |
| Cartesian Product | × | All combinations |
| Rename | ρ | Rename relation or attributes |
Examples:
σ(age > 25)(Employee) -- Select employees over 25
π(name, salary)(Employee) -- Project name and salary
R ∪ S -- Union of R and S
R ∩ S -- Intersection
R − S -- Difference
R × S -- Cartesian product
ρ(NewName)(R) -- RenameDerived Operations
| Operation | Symbol | Description |
|---|---|---|
| Natural Join | ⋈ | Join on common attributes |
| Theta Join | ⋈θ | Join with condition |
| Equi Join | ⋈= | Join with equality condition |
| Semi Join | ⋉ | Tuples in R that have match in S |
| Anti Join | ▷ | Tuples in R with no match in S |
| Division | ÷ | Tuples in R associated with all tuples in S |
Integrity Constraints
Domain Constraint: Values must be from defined domain
age INT CHECK (age >= 0 AND age <= 150)
Entity Integrity: Primary key cannot be NULL
- Every relation must have a primary key
- Primary key values must be unique
Referential Integrity: Foreign key must reference valid primary key
- Can be NULL (optional relationship)
- Must match existing PK value
Key Constraint: Key values must be unique
Semantic Constraints: Application-specific rules
salary >= minimum_wageend_date > start_date
Normalization
Purpose
- Eliminate data redundancy
- Prevent update anomalies
- Ensure data integrity
- Organize data logically
Anomalies
| Anomaly | Problem |
|---|---|
| Insertion | Cannot insert data without other data |
| Deletion | Deleting data causes unintended loss |
| Update | Updating requires multiple row changes |
Example of Unnormalized Data:
| StudentID | StudentName | CourseID | CourseName | InstructorID | InstructorName |
|---|---|---|---|---|---|
| 1 | Alice | C101 | Math | I01 | Dr. Smith |
| 1 | Alice | C102 | Physics | I02 | Dr. Jones |
| 2 | Bob | C101 | Math | I01 | Dr. Smith |
Problems:
- Insertion: Can't add instructor without student
- Deletion: Deleting Bob loses C101 info if he's the only student
- Update: Changing "Math" requires updating multiple rows
Functional Dependency
If X → Y, then each value of X is associated with exactly one value of Y.
Types:
- Trivial: X → Y where Y ⊆ X
- Non-trivial: X → Y where Y ⊄ X
- Full: Y depends on entire X, not a subset
- Partial: Y depends on part of X
- Transitive: X → Y and Y → Z, therefore X → Z
Armstrong's Axioms
- Reflexivity: If Y ⊆ X, then X → Y
- Augmentation: If X → Y, then XZ → YZ
- Transitivity: If X → Y and Y → Z, then X → Z
Derived Rules:
- Union: If X → Y and X → Z, then X → YZ
- Decomposition: If X → YZ, then X → Y and X → Z
- Pseudotransitivity: If X → Y and WY → Z, then WX → Z
Normal Forms
1NF (First Normal Form)
- All attributes contain atomic (indivisible) values
- No repeating groups
- Each row is unique
Before 1NF:
| ID | Name | Phone Numbers |
|---|---|---|
| 1 | Alice | 123-456, 789-012 |
After 1NF:
| ID | Name | Phone Number |
|---|---|---|
| 1 | Alice | 123-456 |
| 1 | Alice | 789-012 |
2NF (Second Normal Form)
- Must be in 1NF
- No partial dependency (non-prime attribute depending on part of candidate key)
- Applies to composite keys
Before 2NF: (Composite key: StudentID, CourseID)
| StudentID | CourseID | StudentName | Grade |
|---|---|---|---|
| StudentName depends only on StudentID (partial dependency) |
After 2NF:
- Students(StudentID, StudentName)
- Enrollments(StudentID, CourseID, Grade)
3NF (Third Normal Form)
- Must be in 2NF
- No transitive dependency (non-prime attribute depending on another non-prime attribute)
Rule: For X → Y, either:
- X is a superkey, OR
- Y is a prime attribute (part of candidate key)
Before 3NF:
| EmpID | DeptID | DeptName |
|---|---|---|
| DeptName depends on DeptID, not on EmpID (transitive) |
After 3NF:
- Employees(EmpID, DeptID)
- Departments(DeptID, DeptName)
BCNF (Boyce-Codd Normal Form)
- Must be in 3NF
- For every X → Y, X must be a superkey
- Stricter than 3NF: Even prime attributes cannot have dependency on non-superkeys
Difference from 3NF:
- 3NF allows X → Y if Y is prime attribute
- BCNF requires X to always be superkey
4NF (Fourth Normal Form)
- Must be in BCNF
- No multi-valued dependencies
- A multi-valued dependency X →→ Y means for each X, there's a set of Y values independent of other attributes
Before 4NF:
| Student | Course | Hobby |
|---|---|---|
| Alice | Math | Chess |
| Alice | Math | Music |
| Alice | Physics | Chess |
| Alice | Physics | Music |
Courses and Hobbies are independent of each other.
After 4NF:
- StudentCourses(Student, Course)
- StudentHobbies(Student, Hobby)
5NF (Fifth Normal Form)
- Must be in 4NF
- No join dependencies
- Can't be decomposed further without losing data
Normalization Summary
| Normal Form | Requirement |
|---|---|
| 1NF | Atomic values, no repeating groups |
| 2NF | 1NF + No partial dependencies |
| 3NF | 2NF + No transitive dependencies |
| BCNF | 3NF + Every determinant is a superkey |
| 4NF | BCNF + No multi-valued dependencies |
| 5NF | 4NF + No join dependencies |
Denormalization
Sometimes we intentionally denormalize for:
- Performance: Reduce joins
- Reporting: Pre-computed aggregates
- Read-heavy workloads: Trade write complexity for read speed
Transactions & Concurrency Control
What is a Transaction?
A transaction is a logical unit of work that contains one or more database operations.
ACID Properties
| Property | Description | Example |
|---|---|---|
| Atomicity | All operations complete or none do | Bank transfer: both debit and credit must succeed |
| Consistency | Database remains in valid state | Constraints are maintained |
| Isolation | Concurrent transactions don't interfere | Each transaction sees consistent snapshot |
| Durability | Committed changes are permanent | Data survives system crash |
Transaction States
┌──────────┐
│ Active │ ← Transaction executing
└────┬─────┘
│
┌────────┴────────┐
▼ ▼
┌─────────┐ ┌──────────┐
│Partially│ │ Failed │ ← Error occurred
│Committed│ └────┬─────┘
└────┬────┘ │
│ ▼
▼ ┌──────────┐
┌──────────┐ │ Aborted │ ← Rolled back
│Committed │ └──────────┘
└──────────┘Concurrency Problems
| Problem | Description |
|---|---|
| Dirty Read | Reading uncommitted data from another transaction |
| Non-Repeatable Read | Same query returns different results within transaction |
| Phantom Read | New rows appear in repeated query within transaction |
| Lost Update | Two transactions overwrite each other's changes |
Isolation Levels
| Level | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|
| READ UNCOMMITTED | ✓ | ✓ | ✓ |
| READ COMMITTED | ✗ | ✓ | ✓ |
| REPEATABLE READ | ✗ | ✗ | ✓ |
| SERIALIZABLE | ✗ | ✗ | ✗ |
Locking Mechanisms
Types of Locks
- Shared Lock (S): For reading, multiple transactions can hold
- Exclusive Lock (X): For writing, only one transaction can hold
Lock Compatibility Matrix
| S | X | |
|---|---|---|
| S | ✓ | ✗ |
| X | ✗ | ✗ |
Lock Granularity
| Level | Description | Concurrency | Overhead |
|---|---|---|---|
| Database | Entire database locked | Low | Low |
| Table | Entire table locked | Low-Medium | Low |
| Page | Data page locked | Medium | Medium |
| Row | Individual row locked | High | High |
Two-Phase Locking (2PL)
- Growing Phase: Acquire locks, no releases
- Shrinking Phase: Release locks, no acquisitions
Variants:
- Strict 2PL: Hold all exclusive locks until commit
- Rigorous 2PL: Hold all locks until end of transaction
Deadlock
When two or more transactions wait for each other's locks indefinitely.
Transaction T1: Holds Lock A, Wants Lock B
Transaction T2: Holds Lock B, Wants Lock ADeadlock Handling:
| Method | Description |
|---|---|
| Prevention | Ordering of locks, timeout |
| Detection | Wait-for graph cycle detection |
| Recovery | Rollback one transaction (victim selection) |
Timestamp-Based Protocols
Each transaction gets a timestamp, operations are ordered by timestamps.
- Thomas Write Rule: Ignore outdated writes
- Optimistic Concurrency Control: Validate at commit time
MVCC (Multi-Version Concurrency Control)
- Maintains multiple versions of data
- Readers don't block writers, writers don't block readers
- Used by PostgreSQL, MySQL InnoDB, Oracle
Indexing
What is an Index?
A data structure that improves data retrieval speed at the cost of additional storage and slower writes.
Analogy: Like an index in a book that helps you find topics without reading every page.
Types of Indexes
| Index Type | Description |
|---|---|
| Primary Index | On primary key, one per table, usually clustered |
| Secondary Index | On non-primary key columns, multiple allowed |
| Clustered Index | Physical order matches index order, only one per table |
| Non-Clustered Index | Separate structure from data, contains pointers |
| Dense Index | Index entry for every record |
| Sparse Index | Index entry for some records (requires sorted data) |
| Composite Index | Index on multiple columns |
Index Data Structures
B-Tree Index
- Balanced tree structure
- Good for range queries and equality
- Most common index type
- O(log n) search complexity
[50]
/ \
[30] [70]
/ \ / \
[10,20] [40] [60] [80,90]B+ Tree Index
- All data in leaf nodes
- Leaf nodes linked for range scans
- Used by most RDBMS
- Better for range queries than B-Tree
Hash Index
- Good for equality searches only
- O(1) lookup time
- Not suitable for range queries
- Used in memory databases
Bitmap Index
- Efficient for low-cardinality columns
- Good for data warehousing
- Efficient for AND/OR operations
- Example: Gender (M/F), Status (Active/Inactive)
Index Trade-offs
| Benefit | Cost |
|---|---|
| Faster reads | Slower writes |
| Quick lookups | Additional storage |
| Efficient sorting | Index maintenance overhead |
When to Use Indexes
✅ Good candidates:
- Primary keys and foreign keys
- Columns frequently used in WHERE clauses
- Columns used in JOIN conditions
- Columns used in ORDER BY
❌ Avoid indexing:
- Small tables
- Columns with low cardinality
- Frequently updated columns
- Columns rarely used in queries
NoSQL Overview
What is NoSQL?
"Not Only SQL" - databases that provide mechanisms for storage and retrieval other than tabular relations.
Characteristics
- Schema-less: Flexible data structure
- Horizontally Scalable: Add more servers easily
- Distributed: Data spread across nodes
- High Availability: Built-in replication
- Eventual Consistency: Trade consistency for availability
CAP Theorem
A distributed system can only guarantee two of these three properties:
Consistency
/\
/ \
/ \
/ \
/________\
Availability Partition Tolerance| Property | Description |
|---|---|
| Consistency | All nodes see same data at same time |
| Availability | Every request gets a response |
| Partition Tolerance | System works despite network failures |
| Database Type | CAP Focus |
|---|---|
| MongoDB | CP (with replica sets) |
| Cassandra | AP |
| Redis | CP |
| CouchDB | AP |
| Traditional RDBMS | CA |
Types of NoSQL Databases
1. Document Stores
Store data as JSON/BSON documents with flexible schema.
Examples: MongoDB, CouchDB, Firebase
Best for: Content management, catalogs, user profiles
2. Key-Value Stores
Simple key-value pairs, extremely fast.
Examples: Redis, DynamoDB, Memcached
Best for: Caching, sessions, real-time data
3. Column-Family Stores
Data stored in columns instead of rows, good for analytics.
Examples: Cassandra, HBase, ScyllaDB
Best for: Time-series, IoT, large-scale analytics
4. Graph Databases
Store nodes and relationships, optimized for connected data.
Examples: Neo4j, Amazon Neptune, ArangoDB
Best for: Social networks, recommendations, fraud detection
For detailed NoSQL database guides, see:
SQL vs NoSQL
Comparison Table
| Aspect | SQL | NoSQL |
|---|---|---|
| Data Model | Relational (Tables) | Various (Document, Key-Value, etc.) |
| Schema | Fixed, predefined | Dynamic, flexible |
| Scalability | Vertical (scale up) | Horizontal (scale out) |
| ACID | Strong support | Varies (often eventual consistency) |
| Joins | Built-in support | Limited or no support |
| Query Language | Standardized SQL | Database-specific |
| Best For | Complex queries, transactions | Large-scale, flexible data |
| Examples | MySQL, PostgreSQL, Oracle | MongoDB, Redis, Cassandra |
When to Use SQL
✅ Use SQL when:
- Complex queries with multiple joins
- ACID compliance is critical (banking, financial)
- Well-defined, structured data
- Strong consistency requirements
- Reporting and analytics
When to Use NoSQL
✅ Use NoSQL when:
- Large volumes of un
structured data - Rapid development with changing requirements
- High scalability needs
- Real-time big data applications
- Caching and session storage
- Content management systems
Hybrid Approach
Many modern applications use both:
- SQL for transactional data (orders, payments)
- NoSQL for caching, sessions, real-time features
Popular Databases
Relational Databases (SQL)
| Database | Key Features | Use Cases |
|---|---|---|
| MySQL | Open-source, widely used, good performance | Web applications, CMS |
| PostgreSQL | Advanced features, extensible, ACID compliant | Complex applications, GIS |
| Oracle | Enterprise-grade, robust, comprehensive | Large enterprises, critical systems |
| SQL Server | Microsoft ecosystem, BI integration | Windows environments, enterprise |
| SQLite | Embedded, serverless, zero-config | Mobile apps, embedded systems |
| MariaDB | MySQL fork, community-driven | Drop-in MySQL replacement |
NoSQL Databases
| Database | Type | Key Features | Use Cases |
|---|---|---|---|
| MongoDB | Document | Flexible schema, horizontal scaling | Content management, catalogs |
| Redis | Key-Value | In-memory, fast, data structures | Caching, sessions, real-time |
| Cassandra | Column-Family | High availability, no SPOF | Time-series, IoT |
| DynamoDB | Key-Value/Document | AWS managed, serverless | Serverless applications |
| Neo4j | Graph | Cypher query language, ACID | Social networks, recommendations |
| Elasticsearch | Search Engine | Full-text search, analytics | Log analysis, search |
NewSQL Databases
Combine SQL with NoSQL scalability:
| Database | Description |
|---|---|
| CockroachDB | Distributed SQL, PostgreSQL compatible |
| TiDB | MySQL compatible, horizontal scaling |
| Google Spanner | Globally distributed, strongly consistent |
| VoltDB | In-memory, ACID compliant |
Best Practices
Database Design
- Normalize appropriately (usually to 3NF/BCNF)
- Denormalize when performance requires (read-heavy workloads)
- Use appropriate data types (smallest type that fits)
- Define proper constraints (PK, FK, NOT NULL, CHECK)
- Plan for scalability from the start
Indexing
- Index columns used in WHERE, JOIN, ORDER BY
- Avoid over-indexing (slows writes)
- Use composite indexes for multi-column queries
- Monitor and remove unused indexes
- Consider covering indexes for read-heavy queries
Query Optimization
- Use EXPLAIN to analyze query plans
- Avoid SELECT * in production
- Use appropriate joins (not always INNER)
- Limit result sets with pagination
- Batch operations when possible
Security
- Use parameterized queries (prevent SQL injection)
- Implement least privilege access
- Encrypt sensitive data (at rest and in transit)
- Regular backups and testing
- Keep database software updated
Backup & Recovery
- Implement regular backup schedules
- Test recovery procedures regularly
- Use point-in-time recovery when available
- Store backups in multiple locations
- Document recovery procedures
Quick Reference
Normalization Cheat Sheet
1NF: Atomic values, no repeating groups
2NF: 1NF + No partial dependencies
3NF: 2NF + No transitive dependencies
BCNF: 3NF + Every determinant is a superkey
4NF: BCNF + No multi-valued dependencies
5NF: 4NF + No join dependenciesACID Properties
Atomicity - All or nothing
Consistency - Valid state to valid state
Isolation - Transactions don't interfere
Durability - Committed = PermanentIsolation Levels (Increasing Strictness)
READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
(fastest) (safest)Key Types
Super Key ⊇ Candidate Key ⊇ Primary Key
⊇ Alternate Key
Foreign Key → References Primary Key
Composite Key = Multiple AttributesFurther Reading
- SQL Guide - Complete SQL reference
- MongoDB Guide - Document database in-depth
- Redis Guide - In-memory data store
- Database Design - Stanford Course
- Use The Index, Luke
- CMU Database Systems