Database Management Systems (DBMS)

A comprehensive guide covering DBMS fundamentals, database architecture, and core concepts.

Introduction to DBMS
Database Architecture
Data Models
ER Model & ER Diagrams
Relational Model
Normalization
Transactions & Concurrency Control
Indexing
NoSQL Overview
SQL vs NoSQL
Popular Databases
Best Practices

Related Guides:

SQL Guide - Complete SQL commands and queries

MongoDB Guide - MongoDB in-depth

Redis Guide - Redis data structures and commands

Introduction to DBMS

What is a Database?

A database is an organized collection of structured data stored electronically in a computer system.

What is DBMS?

A Database Management System (DBMS) is software that manages databases, providing an interface for users and applications to store, retrieve, update, and manage data.

Advantages of DBMS

Advantage	Description
Data Abstraction	Hides complexity from users
Data Independence	Changes in data structure don't affect applications
Data Integrity	Ensures accuracy and consistency
Data Security	Access control and authentication
Concurrent Access	Multiple users can access simultaneously
Backup & Recovery	Protection against data loss
Reduced Redundancy	Minimizes data duplication

Types of DBMS

Hierarchical DBMS: Tree-like structure (e.g., IBM IMS)
Network DBMS: Graph structure with multiple parents (e.g., IDMS)
Relational DBMS (RDBMS): Table-based structure (e.g., MySQL, PostgreSQL)
Object-Oriented DBMS: Object-based storage (e.g., db4o)
NoSQL DBMS: Non-relational (e.g., MongoDB, Redis)

Database Architecture

Three-Schema Architecture

┌─────────────────────────────────────────┐
│           External Level                │  ← User Views (View 1, View 2, ...)
│         (View Schema)                   │
├─────────────────────────────────────────┤
│          Conceptual Level               │  ← Logical Structure
│       (Logical Schema)                  │
├─────────────────────────────────────────┤
│           Internal Level                │  ← Physical Storage
│        (Physical Schema)                │
└─────────────────────────────────────────┘

Data Independence

Logical Data Independence: Ability to change conceptual schema without affecting external schema
Physical Data Independence: Ability to change internal schema without affecting conceptual schema

Database Users

Database Administrator (DBA): Manages entire database system
Database Designers: Design database structure
Application Programmers: Write applications using database
End Users: Query and use the database

DBMS Components

┌─────────────────────────────────────────────────────────────┐
│                        DBMS COMPONENTS                       │
├─────────────────────────────────────────────────────────────┤
│  Query Processor         │  Parses and optimizes queries    │
│  Storage Manager         │  Manages disk storage            │
│  Transaction Manager     │  Handles ACID properties         │
│  Buffer Manager          │  Manages memory buffers          │
│  Lock Manager            │  Controls concurrent access      │
│  Recovery Manager        │  Ensures durability & recovery   │
└─────────────────────────────────────────────────────────────┘

Data Models

1. Hierarchical Model

Data organized in tree-like structure
Parent-child relationships
One-to-many relationships only

         [Company]
        /    |    \
   [Dept1] [Dept2] [Dept3]
    /   \
[Emp1] [Emp2]

Pros: Simple navigation, fast access
Cons: No many-to-many, complex updates

2. Network Model

Extension of hierarchical model
Allows many-to-many relationships
Uses sets to represent relationships

    [Student]─────┬─────[Student]
        │         │         │
        ▼         ▼         ▼
    [Course]   [Course]  [Course]

Pros: More flexible than hierarchical
Cons: Complex implementation, difficult maintenance

3. Relational Model

Data stored in tables (relations)
Rows (tuples) and columns (attributes)
Most widely used model

EmpID	Name	Department
1	Alice	HR
2	Bob	IT
3	Charlie	Finance

Pros: Simple, flexible, powerful query language (SQL)
Cons: Can be slower for large-scale distributed systems

4. Object-Oriented Model

Data stored as objects
Supports inheritance and encapsulation
Complex data types

Pros: Natural fit for OOP languages
Cons: Less mature, complex queries

5. Document Model

Data stored as documents (JSON/BSON)
Flexible schema
Used in NoSQL databases

{
  "name": "Alice",
  "department": "HR",
  "skills": ["communication", "management"]
}

Pros: Flexible, scales horizontally
Cons: Limited joins, eventual consistency

ER Model & ER Diagrams

Entity-Relationship Model

A conceptual data model used to describe the structure of a database.

Components

1. Entity

Real-world object with independent existence
Strong Entity: Has its own primary key
Weak Entity: Depends on another entity for identification

┌───────────────┐        ╔═══════════════╗
│   Student     │        ║   Dependent   ║
│  (Strong)     │        ║    (Weak)     ║
└───────────────┘        ╚═══════════════╝
     Rectangle          Double Rectangle

2. Attributes

Type	Description	Notation
Simple	Cannot be divided (e.g., Age)	○
Composite	Can be divided (e.g., Full Name → First, Last)	○─┬─○
Single-valued	One value (e.g., SSN)	○
Multi-valued	Multiple values (e.g., Phone Numbers)	◎ (double oval)
Derived	Calculated from others (e.g., Age from DOB)	○ (dashed)
Key Attribute	Uniquely identifies entity	○ (underlined)

3. Relationships

Association between entities
Degree of Relationship:
- Unary (1 entity) - Employee manages Employee
- Binary (2 entities
  ) - Student enrolls in Course
- Ternary (3 entities) - Supplier supplies Part to Project

4. Cardinality

Cardinality	Description	Example
1:1	One entity relates to one other	Person ↔ Passport
1:N	One entity relates to many others	Department → Employees
M:N	Many entities relate to many others	Students ↔ Courses

ER Diagram Notation

┌─────────────┐         ◇─────────◇         ┌─────────────┐
│   Entity    │─────────│Relationship│───────│   Entity    │
└─────────────┘         ◇─────────◇         └─────────────┘
      │
      │
    ○─○ (Attribute)

Extended ER (EER) Concepts

Specialization (Top-down)

Defining subclasses from a superclass

        [Person]
           │
     ┌─────┴─────┐
     ▼           ▼
[Employee]   [Customer]

Generalization (Bottom-up)

Combining subclasses into a superclass

Aggregation

Treating a relationship as an entity

┌─────────────────────────────┐
│    [Works_On] relationship  │
│   (Employee works on Project)│
└──────────────┬──────────────┘
               │
               ▼
        [Monitors] ← Manager monitors Works_On

Relational Model

Key Terminology

Term	Description
Relation	A table
Tuple	A row
Attribute	A column
Domain	Set of allowed values for an attribute
Degree	Number of attributes
Cardinality	Number of tuples
Schema	Structure definition (R(A₁, A₂, ..., Aₙ))

Keys in RDBMS

Key Type	Description
Super Key	Set of attributes that uniquely identifies a tuple
Candidate Key	Minimal super key (no proper subset is a super key)
Primary Key	Chosen candidate key to identify tuples
Alternate Key	Candidate keys not chosen as primary key
Foreign Key	Attribute referencing primary key of another table
Composite Key	Key consisting of multiple attributes
Surrogate Key	System-generated artificial key

Relational Algebra

Basic Operations

Operation	Symbol	Description
Selection	σ	Select rows based on condition
Projection	π	Select specific columns
Union	∪	Combine tuples from two relations
Intersection	∩	Common tuples
Difference	−	Tuples in R but not in S
Cartesian Product	×	All combinations
Rename	ρ	Rename relation or attributes

Examples:

σ(age > 25)(Employee)        -- Select employees over 25
π(name, salary)(Employee)    -- Project name and salary
R ∪ S                        -- Union of R and S
R ∩ S                        -- Intersection
R − S                        -- Difference
R × S                        -- Cartesian product
ρ(NewName)(R)                -- Rename

Derived Operations

Operation	Symbol	Description
Natural Join	⋈	Join on common attributes
Theta Join	⋈θ	Join with condition
Equi Join	⋈=	Join with equality condition
Semi Join	⋉	Tuples in R that have match in S
Anti Join	▷	Tuples in R with no match in S
Division	÷	Tuples in R associated with all tuples in S

Integrity Constraints

Domain Constraint: Values must be from defined domain
- age INT CHECK (age >= 0 AND age <= 150)
Entity Integrity: Primary key cannot be NULL
- Every relation must have a primary key
- Primary key values must be unique
Referential Integrity: Foreign key must reference valid primary key
- Can be NULL (optional relationship)
- Must match existing PK value
Key Constraint: Key values must be unique
Semantic Constraints: Application-specific rules
- salary >= minimum_wage
- end_date > start_date

Normalization

Purpose

Eliminate data redundancy
Prevent update anomalies
Ensure data integrity
Organize data logically

Anomalies

Anomaly	Problem
Insertion	Cannot insert data without other data
Deletion	Deleting data causes unintended loss
Update	Updating requires multiple row changes

Example of Unnormalized Data:

StudentID	StudentName	CourseID	CourseName	InstructorID	InstructorName
1	Alice	C101	Math	I01	Dr. Smith
1	Alice	C102	Physics	I02	Dr. Jones
2	Bob	C101	Math	I01	Dr. Smith

Problems:

Insertion: Can't add instructor without student
Deletion: Deleting Bob loses C101 info if he's the only student
Update: Changing "Math" requires updating multiple rows

Functional Dependency

If X → Y, then each value of X is associated with exactly one value of Y.

Types:

Trivial: X → Y where Y ⊆ X
Non-trivial: X → Y where Y ⊄ X
Full: Y depends on entire X, not a subset
Partial: Y depends on part of X
Transitive: X → Y and Y → Z, therefore X → Z

Armstrong's Axioms

Reflexivity: If Y ⊆ X, then X → Y
Augmentation: If X → Y, then XZ → YZ
Transitivity: If X → Y and Y → Z, then X → Z

Derived Rules:

Union: If X → Y and X → Z, then X → YZ
Decomposition: If X → YZ, then X → Y and X → Z
Pseudotransitivity: If X → Y and WY → Z, then WX → Z

Normal Forms

1NF (First Normal Form)

All attributes contain atomic (indivisible) values
No repeating groups
Each row is unique

Before 1NF:

ID	Name	Phone Numbers
1	Alice	123-456, 789-012

After 1NF:

ID	Name	Phone Number
1	Alice	123-456
1	Alice	789-012

2NF (Second Normal Form)

Must be in 1NF
No partial dependency (non-prime attribute depending on part of candidate key)
Applies to composite keys

Before 2NF: (Composite key: StudentID, CourseID)

StudentID	CourseID	StudentName	Grade
StudentName depends only on StudentID (partial dependency)

After 2NF:

Students(StudentID, StudentName)
Enrollments(StudentID, CourseID, Grade)

3NF (Third Normal Form)

Must be in 2NF
No transitive dependency (non-prime attribute depending on another non-prime attribute)

Rule: For X → Y, either:

X is a superkey, OR
Y is a prime attribute (part of candidate key)

Before 3NF:

EmpID	DeptID	DeptName
DeptName depends on DeptID, not on EmpID (transitive)

After 3NF:

Employees(EmpID, DeptID)
Departments(DeptID, DeptName)

BCNF (Boyce-Codd Normal Form)

Must be in 3NF
For every X → Y, X must be a superkey
Stricter than 3NF: Even prime attributes cannot have dependency on non-superkeys

Difference from 3NF:

3NF allows X → Y if Y is prime attribute
BCNF requires X to always be superkey

4NF (Fourth Normal Form)

Must be in BCNF
No multi-valued dependencies
A multi-valued dependency X →→ Y means for each X, there's a set of Y values independent of other attributes

Before 4NF:

Student	Course	Hobby
Alice	Math	Chess
Alice	Math	Music
Alice	Physics	Chess
Alice	Physics	Music

Courses and Hobbies are independent of each other.

After 4NF:

StudentCourses(Student, Course)
StudentHobbies(Student, Hobby)

5NF (Fifth Normal Form)

Must be in 4NF
No join dependencies
Can't be decomposed further without losing data

Normalization Summary

Normal Form	Requirement
1NF	Atomic values, no repeating groups
2NF	1NF + No partial dependencies
3NF	2NF + No transitive dependencies
BCNF	3NF + Every determinant is a superkey
4NF	BCNF + No multi-valued dependencies
5NF	4NF + No join dependencies

Denormalization

Sometimes we intentionally denormalize for:

Performance: Reduce joins
Reporting: Pre-computed aggregates
Read-heavy workloads: Trade write complexity for read speed

Transactions & Concurrency Control

What is a Transaction?

A transaction is a logical unit of work that contains one or more database operations.

ACID Properties

Property	Description	Example
Atomicity	All operations complete or none do	Bank transfer: both debit and credit must succeed
Consistency	Database remains in valid state	Constraints are maintained
Isolation	Concurrent transactions don't interfere	Each transaction sees consistent snapshot
Durability	Committed changes are permanent	Data survives system crash

Transaction States

        ┌──────────┐
        │  Active  │ ← Transaction executing
        └────┬─────┘
             │
    ┌────────┴────────┐
    ▼                 ▼
┌─────────┐     ┌──────────┐
│Partially│     │  Failed  │ ← Error occurred
│Committed│     └────┬─────┘
└────┬────┘          │
     │               ▼
     ▼          ┌──────────┐
┌──────────┐    │ Aborted  │ ← Rolled back
│Committed │    └──────────┘
└──────────┘

Concurrency Problems

Problem	Description
Dirty Read	Reading uncommitted data from another transaction
Non-Repeatable Read	Same query returns different results within transaction
Phantom Read	New rows appear in repeated query within transaction
Lost Update	Two transactions overwrite each other's changes

Isolation Levels

Level	Dirty Read	Non-Repeatable Read	Phantom Read
READ UNCOMMITTED	✓	✓	✓
READ COMMITTED	✗	✓	✓
REPEATABLE READ	✗	✗	✓
SERIALIZABLE	✗	✗	✗

Locking Mechanisms

Types of Locks

Shared Lock (S): For reading, multiple transactions can hold
Exclusive Lock (X): For writing, only one transaction can hold

Lock Compatibility Matrix

	S	X
S	✓	✗
X	✗	✗

Lock Granularity

Level	Description	Concurrency	Overhead
Database	Entire database locked	Low	Low
Table	Entire table locked	Low-Medium	Low
Page	Data page locked	Medium	Medium
Row	Individual row locked	High	High

Two-Phase Locking (2PL)

Growing Phase: Acquire locks, no releases
Shrinking Phase: Release locks, no acquisitions

Variants:

Strict 2PL: Hold all exclusive locks until commit
Rigorous 2PL: Hold all locks until end of transaction

Deadlock

When two or more transactions wait for each other's locks indefinitely.

Transaction T1: Holds Lock A, Wants Lock B
Transaction T2: Holds Lock B, Wants Lock A

Deadlock Handling:

Method	Description
Prevention	Ordering of locks, timeout
Detection	Wait-for graph cycle detection
Recovery	Rollback one transaction (victim selection)

Timestamp-Based Protocols

Each transaction gets a timestamp, operations are ordered by timestamps.

Thomas Write Rule: Ignore outdated writes
Optimistic Concurrency Control: Validate at commit time

MVCC (Multi-Version Concurrency Control)

Maintains multiple versions of data
Readers don't block writers, writers don't block readers
Used by PostgreSQL, MySQL InnoDB, Oracle

Indexing

What is an Index?

A data structure that improves data retrieval speed at the cost of additional storage and slower writes.

Analogy: Like an index in a book that helps you find topics without reading every page.

Types of Indexes

Index Type	Description
Primary Index	On primary key, one per table, usually clustered
Secondary Index	On non-primary key columns, multiple allowed
Clustered Index	Physical order matches index order, only one per table
Non-Clustered Index	Separate structure from data, contains pointers
Dense Index	Index entry for every record
Sparse Index	Index entry for some records (requires sorted data)
Composite Index	Index on multiple columns

Index Data Structures

B-Tree Index

Balanced tree structure
Good for range queries and equality
Most common index type
O(log n) search complexity

            [50]
           /    \
      [30]        [70]
     /    \      /    \
  [10,20] [40] [60] [80,90]

B+ Tree Index

All data in leaf nodes
Leaf nodes linked for range scans
Used by most RDBMS
Better for range queries than B-Tree

Hash Index

Good for equality searches only
O(1) lookup time
Not suitable for range queries
Used in memory databases

Bitmap Index

Efficient for low-cardinality columns
Good for data warehousing
Efficient for AND/OR operations
Example: Gender (M/F), Status (Active/Inactive)

Index Trade-offs

Benefit	Cost
Faster reads	Slower writes
Quick lookups	Additional storage
Efficient sorting	Index maintenance overhead

When to Use Indexes

✅ Good candidates:

Primary keys and foreign keys
Columns frequently used in WHERE clauses
Columns used in JOIN conditions
Columns used in ORDER BY

❌ Avoid indexing:

Small tables
Columns with low cardinality
Frequently updated columns
Columns rarely used in queries

NoSQL Overview

What is NoSQL?

"Not Only SQL" - databases that provide mechanisms for storage and retrieval other than tabular relations.

Characteristics

Schema-less: Flexible data structure
Horizontally Scalable: Add more servers easily
Distributed: Data spread across nodes
High Availability: Built-in replication
Eventual Consistency: Trade consistency for availability

CAP Theorem

A distributed system can only guarantee two of these three properties:

        Consistency
           /\
          /  \
         /    \
        /      \
       /________\
Availability  Partition Tolerance

Property	Description
Consistency	All nodes see same data at same time
Availability	Every request gets a response
Partition Tolerance	System works despite network failures

Database Type	CAP Focus
MongoDB	CP (with replica sets)
Cassandra	AP
Redis	CP
CouchDB	AP
Traditional RDBMS	CA

Types of NoSQL Databases

1. Document Stores

Store data as JSON/BSON documents with flexible schema.

Examples: MongoDB, CouchDB, Firebase

Best for: Content management, catalogs, user profiles

2. Key-Value Stores

Simple key-value pairs, extremely fast.

Examples: Redis, DynamoDB, Memcached

Best for: Caching, sessions, real-time data

3. Column-Family Stores

Data stored in columns instead of rows, good for analytics.

Examples: Cassandra, HBase, ScyllaDB

Best for: Time-series, IoT, large-scale analytics

4. Graph Databases

Store nodes and relationships, optimized for connected data.

Examples: Neo4j, Amazon Neptune, ArangoDB

Best for: Social networks, recommendations, fraud detection

For detailed NoSQL database guides, see:

MongoDB Guide

Redis Guide

SQL vs NoSQL

Comparison Table

Aspect	SQL	NoSQL
Data Model	Relational (Tables)	Various (Document, Key-Value, etc.)
Schema	Fixed, predefined	Dynamic, flexible
Scalability	Vertical (scale up)	Horizontal (scale out)
ACID	Strong support	Varies (often eventual consistency)
Joins	Built-in support	Limited or no support
Query Language	Standardized SQL	Database-specific
Best For	Complex queries, transactions	Large-scale, flexible data
Examples	MySQL, PostgreSQL, Oracle	MongoDB, Redis, Cassandra

When to Use SQL

✅ Use SQL when:

Complex queries with multiple joins
ACID compliance is critical (banking, financial)
Well-defined, structured data
Strong consistency requirements
Reporting and analytics

When to Use NoSQL

✅ Use NoSQL when:

Large volumes of un
structured data
Rapid development with changing requirements
High scalability needs
Real-time big data applications
Caching and session storage
Content management systems

Hybrid Approach

Many modern applications use both:

SQL for transactional data (orders, payments)
NoSQL for caching, sessions, real-time features

Popular Databases

Relational Databases (SQL)

Database	Key Features	Use Cases
MySQL	Open-source, widely used, good performance	Web applications, CMS
PostgreSQL	Advanced features, extensible, ACID compliant	Complex applications, GIS
Oracle	Enterprise-grade, robust, comprehensive	Large enterprises, critical systems
SQL Server	Microsoft ecosystem, BI integration	Windows environments, enterprise
SQLite	Embedded, serverless, zero-config	Mobile apps, embedded systems
MariaDB	MySQL fork, community-driven	Drop-in MySQL replacement

NoSQL Databases

Database	Type	Key Features	Use Cases
MongoDB	Document	Flexible schema, horizontal scaling	Content management, catalogs
Redis	Key-Value	In-memory, fast, data structures	Caching, sessions, real-time
Cassandra	Column-Family	High availability, no SPOF	Time-series, IoT
DynamoDB	Key-Value/Document	AWS managed, serverless	Serverless applications
Neo4j	Graph	Cypher query language, ACID	Social networks, recommendations
Elasticsearch	Search Engine	Full-text search, analytics	Log analysis, search

NewSQL Databases

Combine SQL with NoSQL scalability:

Database	Description
CockroachDB	Distributed SQL, PostgreSQL compatible
TiDB	MySQL compatible, horizontal scaling
Google Spanner	Globally distributed, strongly consistent
VoltDB	In-memory, ACID compliant

Best Practices

Database Design

Normalize appropriately (usually to 3NF/BCNF)
Denormalize when performance requires (read-heavy workloads)
Use appropriate data types (smallest type that fits)
Define proper constraints (PK, FK, NOT NULL, CHECK)
Plan for scalability from the start

Indexing

Index columns used in WHERE, JOIN, ORDER BY
Avoid over-indexing (slows writes)
Use composite indexes for multi-column queries
Monitor and remove unused indexes
Consider covering indexes for read-heavy queries

Query Optimization

Use EXPLAIN to analyze query plans
Avoid SELECT * in production
Use appropriate joins (not always INNER)
Limit result sets with pagination
Batch operations when possible

Security

Use parameterized queries (prevent SQL injection)
Implement least privilege access
Encrypt sensitive data (at rest and in transit)
Regular backups and testing
Keep database software updated

Backup & Recovery

Implement regular backup schedules
Test recovery procedures regularly
Use point-in-time recovery when available
Store backups in multiple locations
Document recovery procedures

Quick Reference

Normalization Cheat Sheet

1NF: Atomic values, no repeating groups
2NF: 1NF + No partial dependencies
3NF: 2NF + No transitive dependencies
BCNF: 3NF + Every determinant is a superkey
4NF: BCNF + No multi-valued dependencies
5NF: 4NF + No join dependencies

ACID Properties

Atomicity   - All or nothing
Consistency - Valid state to valid state
Isolation   - Transactions don't interfere
Durability  - Committed = Permanent

Isolation Levels (Increasing Strictness)

READ UNCOMMITTED → READ COMMITTED → REPEATABLE READ → SERIALIZABLE
      (fastest)                                         (safest)

Key Types

Super Key     ⊇ Candidate Key ⊇ Primary Key
                              ⊇ Alternate Key
Foreign Key   → References Primary Key
Composite Key = Multiple Attributes

Database Management Systems (DBMS)

Database Management Systems (DBMS)

Table of Contents

Introduction to DBMS

What is a Database?

What is DBMS?

Advantages of DBMS

Types of DBMS

Database Architecture

Three-Schema Architecture

Data Independence

Database Users

DBMS Components

Data Models

1. Hierarchical Model

2. Network Model

3. Relational Model

4. Object-Oriented Model

5. Document Model

ER Model & ER Diagrams

Entity-Relationship Model

Components

1. Entity

2. Attributes

3. Relationships

4. Cardinality

ER Diagram Notation

Extended ER (EER) Concepts

Specialization (Top-down)

Generalization (Bottom-up)

Aggregation

Relational Model

Key Terminology

Keys in RDBMS

Relational Algebra

Basic Operations

Derived Operations

Integrity Constraints

Normalization

Purpose

Anomalies

Functional Dependency

Armstrong's Axioms

Normal Forms

1NF (First Normal Form)

2NF (Second Normal Form)

3NF (Third Normal Form)

BCNF (Boyce-Codd Normal Form)

4NF (Fourth Normal Form)

5NF (Fifth Normal Form)

Normalization Summary

Denormalization

Transactions & Concurrency Control

What is a Transaction?

ACID Properties

Transaction States

Concurrency Problems

Isolation Levels

Locking Mechanisms

Types of Locks

Lock Compatibility Matrix

Lock Granularity

Two-Phase Locking (2PL)

Deadlock

Timestamp-Based Protocols

MVCC (Multi-Version Concurrency Control)

Indexing

What is an Index?

Types of Indexes

Index Data Structures

B-Tree Index

B+ Tree Index

Hash Index

Bitmap Index

Index Trade-offs

When to Use Indexes

NoSQL Overview

What is NoSQL?

Characteristics

CAP Theorem