Database Interview Questions for 2026
Databases are the backbone of every software system. Whether you are interviewing for a backend engineer, data engineer, or full-stack developer position, you will face questions about SQL queries, NoSQL data models, database design, indexing strategies, and performance optimization. This guide covers 20 essential questions that interviewers ask at companies like Google, Amazon, Meta, Stripe, and fast-growing startups.
Why Database Knowledge Matters in Interviews
Every application reads and writes data. Poor database design leads to slow queries, data inconsistencies, and systems that cannot scale. Interviewers test database knowledge to assess whether you can design data models that support your application's access patterns, write efficient queries, and make informed decisions about trade-offs between consistency and availability.
Database questions appear in multiple interview rounds. In coding interviews, you may write SQL queries or design data models. In system design rounds, you decide between SQL and NoSQL, choose sharding strategies, and design for high availability. Even behavioral rounds may include questions about database incidents you have handled. A strong foundation in databases demonstrates engineering maturity that interviewers value highly.
Jump to Topic
SQL Fundamentals Questions
Explain the different types of SQL JOINs.
INNER JOIN returns rows with matching values in both tables. LEFT JOIN returns all rows from the left table and matching rows from the right (NULLs where no match). RIGHT JOIN is the reverse. FULL OUTER JOIN returns all rows from both tables with NULLs where no match exists. CROSS JOIN returns the Cartesian product of both tables. Self-joins join a table with itself.
What are indexes and how do they improve query performance?
Indexes are data structures (typically B-trees or hash tables) that allow the database to find rows without scanning the entire table. They speed up SELECT queries and WHERE clauses but slow down INSERT, UPDATE, and DELETE operations because the index must also be updated. Common types include single-column, composite, unique, partial, and covering indexes.
Explain ACID properties in database transactions.
Atomicity: all operations in a transaction succeed or all fail. Consistency: a transaction moves the database from one valid state to another. Isolation: concurrent transactions do not interfere with each other. Durability: committed transactions persist even after system failures. ACID guarantees are implemented through write-ahead logging, locking mechanisms, and multi-version concurrency control (MVCC).
What are the different transaction isolation levels?
Read Uncommitted: allows dirty reads. Read Committed: prevents dirty reads but allows non-repeatable reads. Repeatable Read: prevents dirty and non-repeatable reads but allows phantom reads. Serializable: prevents all anomalies but has the lowest concurrency. Higher isolation levels provide stronger guarantees but reduce throughput. Most databases default to Read Committed or Repeatable Read.
What is the difference between a clustered and a non-clustered index?
A clustered index determines the physical order of data in the table. There can be only one per table (usually the primary key). A non-clustered index is a separate structure with pointers to the actual data rows. A table can have multiple non-clustered indexes. Clustered indexes are faster for range queries; non-clustered indexes are better for point lookups on non-primary columns.
NoSQL Types Questions
What are the main types of NoSQL databases?
Document stores (MongoDB, CouchDB) store semi-structured JSON/BSON documents. Key-value stores (Redis, DynamoDB) store data as key-value pairs with fast lookups. Wide-column stores (Cassandra, HBase) store data in column families, ideal for time-series data. Graph databases (Neo4j, Amazon Neptune) store nodes and relationships, optimized for traversal queries.
When would you choose a document database over a relational database?
Choose document databases when: your data has a variable or evolving schema, you need to store nested/hierarchical data naturally, you want to avoid complex joins, you need horizontal scalability, or your read patterns align with document boundaries. Avoid document databases when you need complex transactions across documents, strong referential integrity, or heavy ad-hoc querying with joins.
How does a key-value store like Redis achieve high performance?
Redis stores all data in memory (RAM), providing sub-millisecond latency. It uses a single-threaded event loop (avoiding lock overhead), efficient data structures (hash tables, skip lists, compressed lists), and optional persistence via RDB snapshots or AOF logging. Redis Cluster provides horizontal scaling through hash-slot-based sharding across multiple nodes.
What is a graph database and when should you use one?
Graph databases store data as nodes (entities) and edges (relationships) with properties on both. They excel at traversing relationships: social networks, recommendation engines, fraud detection, and knowledge graphs. Unlike relational databases where joins become expensive at depth, graph databases maintain constant-time traversal regardless of dataset size.
Database Design Questions
Explain database normalization and its normal forms.
Normalization reduces data redundancy and improves integrity. 1NF: atomic values, no repeating groups. 2NF: 1NF plus no partial dependencies on composite keys. 3NF: 2NF plus no transitive dependencies. BCNF: every determinant is a candidate key. Higher forms (4NF, 5NF) address multi-valued and join dependencies. In practice, most systems normalize to 3NF or BCNF.
When and why would you denormalize a database?
Denormalization introduces controlled redundancy to improve read performance. Use it when: read-heavy workloads need faster queries, complex joins are too expensive, you are building data warehouses or analytics systems, or caching layers are insufficient. Common techniques include adding computed columns, materialized views, and duplicating data across tables. The trade-off is increased storage and more complex write logic.
What is database sharding and what are the common strategies?
Sharding horizontally partitions data across multiple database servers. Strategies include: range-based (partition by date or ID range), hash-based (hash the shard key to distribute evenly), directory-based (lookup table maps keys to shards), and geographic (partition by region). Challenges include cross-shard queries, rebalancing, and maintaining referential integrity. Choose shard keys carefully to avoid hotspots.
How do you design a schema for a many-to-many relationship?
In relational databases, use a junction (bridge) table with foreign keys to both related tables. The junction table can also hold relationship attributes (e.g., enrollment date in a student-course relationship). Add composite or individual indexes based on query patterns. In document databases, you can embed references or use denormalized arrays, depending on cardinality and access patterns.
Performance Questions
How do you optimize a slow SQL query?
Start with EXPLAIN/EXPLAIN ANALYZE to understand the query plan. Common optimizations: add missing indexes, avoid SELECT *, rewrite subqueries as JOINs, use covering indexes, partition large tables, avoid functions on indexed columns in WHERE clauses, use query hints when needed, and consider materialized views for complex aggregations. Always measure before and after.
What is a query execution plan and how do you read it?
A query execution plan shows how the database engine processes a query: which indexes it uses, join algorithms (nested loop, hash join, merge join), table scan vs. index scan, estimated vs. actual row counts, and cost estimates. Look for sequential scans on large tables (missing indexes), high row estimates vs. actual (stale statistics), and nested loops with large tables.
What are the different indexing strategies and when do you use each?
B-tree indexes: general-purpose, good for range and equality queries (default in most RDBMS). Hash indexes: fast equality lookups but no range support. GIN indexes: full-text search and array/JSON columns. GiST indexes: geometric and spatial data. Partial indexes: index a subset of rows. Composite indexes: multiple columns (column order matters for query matching). Covering indexes: include all query columns to avoid table lookups.
How does connection pooling work and why is it important?
Connection pooling maintains a cache of database connections that are reused across requests instead of creating new connections for each query. Creating a connection is expensive (TCP handshake, authentication, memory allocation). Pools like PgBouncer (PostgreSQL) or HikariCP (Java) manage connection lifecycle, limiting max connections and queuing excess requests. This reduces latency and prevents overwhelming the database.
SQL vs NoSQL Questions
When should you use SQL vs. NoSQL?
Use SQL when: you need ACID transactions, complex queries with joins, strong data consistency, or a well-defined schema. Use NoSQL when: you need horizontal scalability, flexible schemas, high write throughput, or your data model fits naturally (documents, key-value, graph). Many modern systems use both: SQL for transactional data and NoSQL for caching, session storage, or analytics.
What is the CAP theorem and how does it apply to databases?
The CAP theorem states that a distributed system can guarantee at most two of three properties: Consistency (every read returns the latest write), Availability (every request gets a response), and Partition tolerance (the system works despite network failures). Since network partitions are inevitable, the real choice is between CP (strong consistency, e.g., PostgreSQL, MongoDB) and AP (high availability, e.g., Cassandra, DynamoDB).
What is eventual consistency and when is it acceptable?
Eventual consistency means all replicas will converge to the same value given enough time without new writes. It is acceptable for: social media feeds, product view counts, recommendation systems, DNS, and caching layers — anywhere stale reads are tolerable for seconds. It is not acceptable for: financial transactions, inventory counts during checkout, or any scenario where stale data causes incorrect decisions.
How to Prepare for Database Interviews
1. Master SQL Query Writing
Practice writing complex queries with JOINs, subqueries, window functions, CTEs, and GROUP BY clauses. Use a real database (PostgreSQL or MySQL) and work with sample datasets. Many interviews include a live SQL coding round where you write queries on a whiteboard or shared editor. Speed and accuracy matter.
2. Understand Database Internals
Know how B-tree indexes work, how the query optimizer chooses execution plans, and how MVCC enables concurrent transactions. Understanding internals helps you explain why a query is slow and propose the right fix, rather than guessing. Read the documentation for your database of choice (PostgreSQL docs are particularly excellent).
3. Practice Data Modeling
Take real applications (e-commerce, social media, ride-sharing) and design their database schemas. Start with an entity-relationship diagram, then translate it to tables with appropriate data types, constraints, and indexes. Consider access patterns: what queries will run most frequently? Design your schema to make those queries efficient.
4. Learn When to Use NoSQL
Do not default to "always use PostgreSQL" or "always use MongoDB." Understand the strengths and weaknesses of each database type. Practice designing systems that use multiple databases: PostgreSQL for transactional data, Redis for caching and sessions, Elasticsearch for search, and Cassandra for time-series data. This polyglot persistence approach is what real systems use.
5. Study Real-World Scaling Stories
Read engineering blog posts from companies like Uber, Instagram, Slack, and Pinterest about how they scaled their databases. Common themes include migrating from single-node to sharded clusters, switching between database technologies, and optimizing critical queries. These stories provide concrete examples you can reference in interviews.
Related Topics on Guru Sishya
RDBMS & SQL
Relational databases, SQL queries, and joins
NoSQL Databases
Document, key-value, graph, and column stores
System Design Interview
Scalable architecture and design patterns
DSA Interview Questions
50 essential coding interview questions
Backend Engineering
APIs, microservices, and architecture
Cloud & DevOps
AWS, Docker, Kubernetes, and CI/CD
Ready to Ace Your Database Interview?
Practice with interactive lessons, quizzes, and a Feynman practice mode to explain concepts out loud — completely free, no signup required.