Why your database needs an ACID trip to stay in control?

System design and databases

Nov 12, 2024

Our trip starts with a small historical background of the ACID properties. Developed in the 1970s by computer scientist Jim Gray, emerged from the need for robust transaction management as databases evolved to handle increasingly complex and critical data operations. At the time, databases were transitioning from simple file systems to sophisticated relational databases, driven by the demands of industries like banking, finance, and telecommunications, where data integrity was paramount. As databases became integral to these high-stakes applications, the ACID properties provided a framework for ensuring that transactions, even in the face of system failures or concurrent access, would maintain data correctness and reliability.

Today, ACID remains essential in applications where data accuracy and consistency are non-negotiable, setting the standard for transactional integrity in relational databases and influencing even modern distributed systems.

ACID in detail

ACID is an acronym that stands for Atomicity, Consistency, Isolation, and Durability — the four key properties that ensure reliable transaction processing in databases. These properties were developed to guarantee that database transactions, which may involve multiple steps, complete correctly and that data remains consistent and stable, even in cases of system failures, concurrency issues, or power outages. These properties are especially critical for applications where data integrity and reliability are paramount. Below is a comprehensive look at each property, its significance and how it shapes data reliability in databases.

Atomicity

This diagram illustrates a detailed sequence of interactions among a Transaction, a Database, and a Transaction Log, demonstrating the Atomicity property with multiple steps within a single transaction.

Transaction initiates by sending a Start Transaction command to the Database, marking the beginning of the transaction.
Operation 1 is broken down into two steps:
1. Step A: The Transaction executes Operation 1 - Step A on the Database.
  1. The Database writes Operation 1 - Step A to the Transaction Log.
  2. Transaction Log acknowledges that Step A has been logged.
2. Step B: The Transaction executes Operation 1 - Step B on the Database.
  1. The Database writes Operation 1 - Step B to the Transaction Log.
  2. Transaction Log acknowledges that Step B has been logged.
Operation 2 also consists of two steps:
1. Step A: The Transaction executes Operation 2 - Step A on the Database.
  1. The Database writes Operation 2 - Step A to the Transaction Log.
  2. Transaction Log acknowledges that Step A has been logged.
2. Step B: The Transaction executes Operation 2 - Step B on the Database.
  1. The Database writes Operation 2 - Step B to the Transaction Log.
  2. Transaction Log acknowledges that Step B has been logged.
Decision Point:
1. If All Steps are Successful:
  1. The Database marks the transaction as Committed in the Transaction Log.
  2. Transaction Log confirms the commit.
  3. The Database sends a Commit Transaction message back to the Transaction, indicating that all steps have been successfully applied.
2. If Any Step Fails:
  1. The Database marks the transaction as Rollback in the Transaction Log.
  2. Transaction Log confirms the rollback.
  3. The Database sends a Rollback Transaction message back to the Transaction, indicating that all changes have been reverted.

This flow demonstrates the atomicity property, ensuring that if any part of the transaction fails, the entire transaction is rolled back to maintain data integrity. Conversely, if all steps are successful, the transaction is committed, making all changes permanent. Atomicity guarantees that databases remain in a stable state by preventing incomplete operations. This is crucial for preserving data integrity, particularly in multi-step transactions where each part must succeed to achieve the intended result.

Consider a bank transfer where money is deducted from Account A and added to Account B. If the debit is successful but the credit fails, atomicity ensures that the entire transaction is rolled back, so no money is lost or added incorrectly.

Databases achieve atomicity through mechanisms like transaction logs (e.g., write-ahead logging or WAL). If a failure occurs, the log records each operation, allowing the database to roll back any incomplete steps, maintaining a stable state.

Consistency

Consistency ensures that a transaction brings the database from one valid state to another, adhering to all defined rules, constraints, and relationships within the database. A transaction must always leave the database in a valid state, so it maintains the logical correctness of data throughout the database.

In a banking database, a rule might state that account balances cannot be negative. If a transaction attempts to withdraw more than the available balance, consistency ensures this transaction is rejected, maintaining data integrity.

This sequence diagram illustrates the consistency property in a database transaction process, specifically focusing on how constraints are enforced to ensure that data remains in a valid state.

Start Transaction: The Client initiates a transaction with the Database.
Insert Data: The Client sends a request to Insert Data (e.g., a new order) into the Database.
Validate Constraints: The Database communicates with the Constraints system (e.g., foreign keys, rules) to validate the data. This step ensures that the new data meets all predefined rules, maintaining consistency within the database.
Conditional Paths: The diagram then shows two conditional outcomes based on whether the data meets all constraints.
1. If Valid Data:
  1. If the data satisfies all constraints, the Constraints system responds with “All constraints met.”
  2. The Database then sends a Commit Transaction confirmation to the Client, finalizing the transaction and ensuring that valid data is added to the database.
2. Else Invalid Data:
  1. If any constraint is violated (e.g., a missing foreign key), the Constraints system sends a message indicating the Constraint violation.
  2. The Database then responds to the Client with a Rollback Transaction message, canceling the transaction to prevent invalid data from being saved.

Consistency is typically enforced by database constraints and triggers that check each transaction against pre-defined rules. Relational databases use constraints like foreign keys, primary keys, and check constraints to ensure that all data adheres to the set rules.

Isolation

Isolation ensures that transactions occur independently of each other. In a multi-user environment, multiple transactions may be occurring simultaneously, potentially affecting the same data. Isolation ensures that concurrent transactions do not interfere with each other, giving each transaction the impression that it is the only one accessing the database.

Consider an e-commerce platform where two users attempt to purchase the last available item. Isolation ensures that only one transaction completes, preventing both users from accidentally purchasing the same item.

Different levels of isolation provide varying levels of protection against concurrency issues.

Read Uncommitted

This diagram illustrates a sequence of interactions between two transactions (Transaction 1 and Transaction 2) and a database, highlighting a Read Uncommitted isolation level.

Transaction 1 initiates and sends a Start Transaction command to the database, beginning its transaction.
Transaction 1 performs a Write (Uncommitted) operation on the database. At this point, the changes are not yet committed, meaning they are not finalized in the database.
Transaction 2 attempts to Read data from the database. Due to the Read Uncommitted isolation level, Transaction 2 is able to read the uncommitted data written by Transaction 1. This action is labeled as a Dirty Read because Transaction 2 reads data that has not yet been finalized.
Transaction 1 then sends a Commit command to the database, making its changes permanent.

The flow demonstrates the risk of dirty reads at the Read Uncommitted isolation level, where other transactions can read uncommitted changes, potentially leading to inconsistencies if Transaction 1 were to roll back instead of committing.

Read Committed

The diagram illustrates a sequence of interactions between two transactions (Transaction 1 and Transaction 2) and a database, highlighting a Read Committed isolation level.

Transaction 1 initiates and sends a Start Transaction command to the database, beginning its transaction.
Transaction 1 performs a Read Value operation on the database, retrieving a specific value.
Transaction 2 starts its own transaction by sending a Start Transaction command to the database.
Transaction 2 performs an Update Value operation, modifying the value that Transaction 1 previously read.
Transaction 2 then sends a Commit command to the database, making its changes permanent.
Transaction 1 attempts to Read Value again. However, since the value was updated by Transaction 2, Transaction 1 now retrieves a different value than it initially read, resulting in a Non-Repeatable Read.

The flow demonstrates the risk of non-repeatable reads at the Read Committed isolation level. Although dirty reads are prevented (Transaction 1 cannot read uncommitted data from Transaction 2), non-repeatable reads are still possible because the value may be updated by other transactions between reads.

Repeatable Read

This diagram illustrates a sequence of interactions between two transactions (Transaction 1 and Transaction 2) and a database, demonstrating a Repeatable Read isolation level.

Transaction 1 initiates and sends a Start Transaction command to the database, beginning its transaction.
Transaction 1 performs a Read All Rows operation, retrieving all rows that match a certain condition.
Transaction 2 begins its own transaction by sending a Start Transaction command to the database.
Transaction 2 performs an Insert New Row operation, adding a new row to the database that matches the condition Transaction 1 initially queried. This new row creates a potential phantom read.
Transaction 2 then sends a Commit command to the database, making its changes permanent.
Transaction 1 attempts to Read All Rows again. This time, it retrieves the new row inserted by Transaction 2, resulting in a Phantom Read, as the set of rows retrieved has changed since the initial read.

The flow demonstrates the potential for phantom reads at the Repeatable Read isolation level. Although this level prevents dirty reads and non-repeatable reads (ensuring data read once remains consistent if accessed again), new rows inserted by other transactions may still appear, causing the set of rows to change unexpectedly.

Serializable

This diagram illustrates a sequence of interactions between two transactions (Transaction 1 and Transaction 2) and a database, showcasing the Serializable isolation level.

Transaction 1 begins by sending a Start Transaction command to the Database, marking the initiation of its operations.
Transaction 1 performs a Read All Rows operation, retrieving all rows that meet a specified condition.
Transaction 2 then begins by also sending a Start Transaction command to the Database.
Transaction 2 attempts to Insert a New Row that would potentially interfere with the data set Transaction 1 is working with.
Because the Serializable isolation level is enforced, Transaction 2’s attempt to insert a row is either Blocked or Rolled Back to avoid any conflict. Serializable isolation ensures that Transaction 1’s view of the data remains consistent, as though the transactions are executed one at a time.
Transaction 1 completes its operations and sends a Commit command to the Database, finalizing its changes and releasing any locks or resources.
With Transaction 1 now completed, Transaction 2 is free to Try Again, meaning it can attempt its operations again without interfering with Transaction 1.

This flow showcases how the Serializable isolation level handles concurrent transactions to maintain strict data integrity by isolating transactions and managing conflicts through blocking or rolling back.

Isolation is often implemented through locking mechanisms (e.g., row-level locks or table locks) and multiversion concurrency control (MVCC). These methods allow databases to handle concurrent transactions while preventing data inconsistencies.

Isolation is essential for ensuring predictable outcomes in multi-user environments, preventing concurrency issues like lost updates, dirty reads, and phantom reads.

Durability

Durability provides confidence that completed transactions won’t be lost, allowing users and systems to rely on the persistence of their data.

After a customer’s payment is recorded in an e-commerce database, durability ensures that this transaction remains intact, even if the system crashes immediately afterward.

Durability is typically achieved through write-ahead logging and non-volatile storage. Data is recorded to disk or other durable storage before a transaction is confirmed as committed, allowing the database to recover data from logs in the event of a failure.

Why ACID Matters?

The ACID properties are crucial for data integrity and reliability in any database, especially in high-stakes applications where errors can have serious consequences. By enforcing ACID properties, databases ensure that transactions are processed in a predictable, recoverable way that maintains data consistency. Applications where ACID properties are essential include:

Banking and Finance: Financial systems depend on accurate and consistent data, where even minor errors could lead to significant financial discrepancies.
Healthcare: Patient records must be accurate, accessible, and up-to-date across all hospital departments to ensure proper care.
E-commerce: Inventory and order management systems require ACID transactions to prevent issues like double-ordering or incorrect stock updates.
Supply Chain Management: Real-time inventory and logistics data are critical in supply chains, where timing and accuracy are essential.

Modern databases versus ACID

Here is a brief information on how modern relational databases implement ACID properties.

PostgreSQL

Atomicity: PostgreSQL uses Write-Ahead Logging (WAL) by default to ensure atomicity. Changes are recorded in the WAL before committing, so any failure can roll back incomplete transactions.
Consistency: PostgreSQL enforces consistency through constraints (e.g., foreign keys, unique constraints) and data type checks. Additionally, triggers and rules allow further control over data consistency.
Isolation: PostgreSQL’s default isolation level is Read Committed. It also supports Multi-Version Concurrency Control (MVCC) by default, allowing for concurrent transactions without using locks for read operations.
Durability: By default, PostgreSQL uses WAL and writes each transaction to disk before committing, ensuring durability. Configurable settings like synchronous_commit control the balance between performance and durability.

MySQL

Atomicity: The InnoDB storage engine (default in MySQL) ensures atomicity by using a transaction log to roll back incomplete transactions in case of failure.
Consistency: MySQL maintains consistency with constraints (e.g., primary keys, foreign keys) and triggers for additional business logic validation.
Isolation: Repeatable Read is the default isolation level in MySQL's InnoDB engine. InnoDB uses MVCC to handle concurrent transactions, reducing locking.
Durability: InnoDB ensures durability by writing to a redo log before committing transactions. The innodb_flush_log_at_trx_commit setting controls the frequency of log flushing to balance durability and performance.

MariaDB

Atomicity: MariaDB also uses InnoDB as its default engine, implementing atomicity with a transaction log. This ensures that any failed transaction can roll back all changes.
Consistency: Enforced by constraints (foreign keys, unique constraints) and supported by triggers and stored procedures for complex validations.
Isolation: MariaDB’s default isolation level is Repeatable Read (same as MySQL), and it also uses MVCC for concurrency, allowing reads without locking.
Durability: Redo logs in InnoDB provide durability, with configurable settings for log flushing to optimize between performance and durability.

Oracle Database

Atomicity: Oracle uses Redo and Undo logs to ensure atomicity. Undo logs help roll back uncommitted changes if a transaction fails, and redo logs record changes to disk.
Consistency: Oracle enforces consistency with constraints, triggers, and advanced features like integrity constraints and PL/SQL for custom rules.
Isolation: Read Committed is the default isolation level, though Oracle supports Serializable as well. It uses MVCC for high concurrency, reducing contention.
Durability: Oracle ensures durability by writing to redo logs and automatically archiving logs to guarantee recoverability.

Amazon Aurora

Atomicity: Aurora ensures atomicity by using a distributed write-ahead log across multiple storage nodes. Failed transactions are automatically rolled back.
Consistency: Aurora enforces consistency using constraints and checks. Data is validated to ensure transactions maintain a consistent state.
Isolation: Aurora supports configurable isolation levels, with Read Committed as the default. It uses MVCC to allow concurrent reads and reduce locking.
Durability: By default, Aurora writes changes to six copies across three Availability Zones before confirming a commit, ensuring high durability and fault tolerance.

CockroachDB

Atomicity: CockroachDB implements atomicity with a distributed transaction log and uses the Raft consensus protocol to ensure that all parts of a transaction are either committed or rolled back.
Consistency: Consistency is achieved using constraints and Raft consensus for each write, ensuring that all replicas reflect the same state.
Isolation: CockroachDB’s default isolation level is Serializable, the strictest isolation level, managed using distributed MVCC to avoid conflicts across nodes.
Durability: Durability is achieved by storing data on multiple nodes and using Raft replication to ensure that changes are confirmed by multiple replicas before being committed.

Snowflake

Atomicity: Snowflake uses a multi-cluster architecture with a distributed transaction log to ensure atomicity, rolling back transactions if they are incomplete.
Consistency: Snowflake maintains consistency by enforcing rules and validating each transaction across clusters, ensuring that all transactions are valid and complete.
Isolation: Snowflake provides a form of Serializable isolation, allowing transactions to operate without interference in the cloud environment.
Durability: Snowflake writes data to durable cloud storage (e.g., Amazon S3), ensuring that data remains intact even if individual clusters fail.

Google Cloud Spanner

Atomicity: Spanner uses a two-phase commit protocol across nodes to maintain atomicity for distributed transactions.
Consistency: Spanner enforces strong consistency using TrueTime API to synchronize timestamps globally, ensuring a consistent view of data across nodes.
Isolation: Spanner’s default isolation level is Serializable, providing strict isolation even across regions. It leverages MVCC to support concurrent transactions.
Durability: Spanner writes data across multiple data centers and uses the Paxos consensus algorithm to ensure durability and fault tolerance.

Limitations and Challenges

With the advent of large-scale web applications, traditional ACID-compliant relational databases struggled to handle the demands of distributed environments where data is spread across multiple nodes. ACID properties, particularly strict consistency and isolation, require locking mechanisms and global coordination, which become bottlenecks in distributed systems, reducing both performance and availability.

To address these challenges, some distributed systems adopt the BASE (Basically Available, Soft state, Eventual consistency) model, which sacrifices immediate consistency for greater availability and partition tolerance, suitable for applications where some data inconsistency is acceptable.

Summary

The ACID properties form the foundation for reliable and predictable data management. They ensure that transactions in a database are processed accurately, consistently, and independently, even in the event of failures or concurrent access. These properties are essential in applications where data integrity is crucial, such as banking, healthcare, or e-commerce. While ACID properties introduce challenges in distributed systems, they remain invaluable for scenarios that require strict data consistency and durability.

PS: I hope you’ve enjoyed your very own database trip :)