数据库系统概论:事务 Transaction

事务 Transaction

A transaction is a unit of program execution that accesses and possibly updates (访问并可能更新) various data items.

Two main issues to deal with:

  1. Failures of various kinds:

    1. Hardware failures;
    2. System crashes;
  2. Concurrent(并发的) execution of multiple transaction;

    concurrent

    adj. 并存的,同时发生的;同意的,一致的;(两个或两个以上徒刑判决)同时执行的;(三条或三条以上线)共点的,会合的

    n. 共点;同时发生的事件

ACID Properties

To preserve the integrity of data the database system must ensure:

Atomicity requirement(原子性要求)

系统应确保没有完整执行的事务不会反映在数据中

The system should ensure that updates of a partially(不完全的) executed transaction are not reflected in the database.

要么事务中的所有操作都成功,要么都不成功

Either all operations of the transaction are properly reflected in the database or none are.

Consistency requirement(一致性要求)

事务隔离地执行可以保证数据库的一致性

Execution of a transaction in isolation preserves the consistency of the database.

  • Explicitly specified integrity constraints(完整性约束) such as primary keys and foreign keys.
  • Implicit(隐含的) integrity constraints.
  • A transaction must see a consistent database.
  • During a transaction execution the database may be temporarily(暂时的) inconsistent.(所以需要隔离,需要中间事务结果对其他事务隐藏)
  • When the transaction completes successfully the database must be consistent.

Isolation requirement(隔离性要求)

多个事务并发执行,各个事务之间应该是互不可见的。中间事务的结果必须对其他并发执行的事务隐藏。

Although multiple transactions may execute concurrently, each transaction must be unaware of other concurrently executing transactions. Intermediate transaction results must be hidden from other concurrently executed transactions.

  • Running transactions serially(串行) can ensure isolation.
  • Executing multiple transactions concurrently has significant benefits.

Durability requirement(持久性要求)

系统应确保已经执行完成的事务,在出现故障的情况下仍然能够在数据库中保存

once the user has been notified that the transaction has completed , the updates to the database by the transaction must persist(继续存在) even if there are software or hardware failures.

Transaction State

  1. Active: the initial state; the transaction stays in this state while it is executing.

  2. Partially committed: after the final statement has been executed;

  3. Failed: after the discovery that normal execution can no longer proceed.

  4. Aborted(终止): 当事务已经回滚,且数据库回到了执行事务之前的状态。在被中止后有两种操作可以执行:

    after the transaction has been rolled back and the database restored to its state prior to the start of the transaction. Two options after it has been aborted:

    1. Restart the transaction: Can be done only if no internal logical error;
    2. Kill the transaction.
  5. Committed: after successful completion.

Concurrent Executions

Multiple transactions are allowed to run concurrently in the system. Advantages are:

  • Increased processor and disk utilization 提高处理器和磁盘的利用率;
  • Reduced average response time for transactions

Concurrency control schemes: mechanisms(机制) to achieve isolation

Schedules

调度

一种指令顺序,指定了并发事务指令按时间执行的顺序

A sequences of instructions that specify the chronological order in which instructions of concurrent transactions are executed

  • A schedule for a set of transactions must consist of all instructions of those transaction. 由所有事务的所有指令组成

  • 成功执行的事务最后会有一条提交指令

    A transaction that successfully completes its execution will have a commit instructions as the last statement.

  • 没有成功完成的事务最后会有一条中止指令

    A transaction that fails to successfully complete its execution will have an abort instruction as last statement.

Serializability

可串行化的

前提(基本假设):每一个事务都保持了数据库的一致性。(preserves database consistency)

所以,一组串行执行的事务同样保持了数据库的一致性。

如果一个调度等价于一种串行调度,则其实可串行话的

A schedule is serializable if it is equivalent to(等价于) a serial schedule.

Different forms(形式) of schedule equivalence(等价) give rise to the notion(概念) of:

  1. Conflict serializability
  2. View serializability

Simplified view of transactions

  • Ignore operations other than read and write instruction;

  • Assume that transactions may perform arbitrary(任意的) computations on data in local buffers in between reads and writes;

  • Schedules consist of only read and write instructions.

Conflicting Instructions

指令冲突。当且仅当有两个以上的事务中的指令访问了同一条数据,且至少一条指令是写这条数据。

Instructions lil_i and ljl_j of transactions TiT_i and TjT_j respectively, conflict if and only if there exists some item Q accessed by both lil_i and ljl_j and at least one of these instructions wrote Q.

Conflict Serializability

如果一个调度 S 可以通过一系列非冲突指令的交换转换为调度S ',我们就说 S 和 S‘ 是冲突等价( Conflict equivalent )的。

If a schedule S can be transformed into a schedule S’ by a series of swaps of non-conflicting instructions, we say that S and S’ are conflict equivalent.

Schedule S is conflict serializable if it is conflict equivalent to a serial schedule.

Example:

T1T_1 T2T_2 T1T_1 T2T_2
Schedule1 Schedule2
read(A) read(A)
write(A) write(A)
read(A) read(B)
write(A) write(B)
read(B) read(A)
write(B) write(A)
read(B) read(B)
write(B) write(B)

Schedule1 can be transformed into Schedule6, a serial schedule where T2T_2 follows T1T_1 by series of swaps of non-confliction instructions. Therefore Schedule1 is confict serializable.

T1 T2
read(A)
write(A)
write(A)

This schedule is not conflict serializable.

Unabe to swap instructions in the above schedule to obtain(获得) either the serial schedule < T1T_1, T2T_2 >, or the serial schedule < T2T_2, T1T_1 >

直观理解:两个 write 的顺序不能调换,调换后这个调度执行后 A 的值和原调度不一样。

理论理解:T1 T2 同时对 Q 进行操作,且都存在 write, 属于冲突指令,这两个指令顺序不能调换。

View Serializability

视图可串行化

S and S’ are view equivalent if the following three conditions are met, for each data item Q

  1. 如果在 S 中某个事务读取了数据 Q 的初始值,那在 S’ 中它同样需要读取这个初始值

    If in schedule S, transaction Ti reads the initial value of Q, then in
    schedule S’ also transaction Ti must read the initial value of Q.

  2. 如果 S 中某个事物 T1 读取了 T2 写后的数据,那再 S’ 中 T1 同样需要读取 T2 写后的条目

    If in schedule S transaction Ti executes read(Q), and that value was
    produced by transaction Tj (if any), then in schedule S’ also
    transaction Ti must read the value of Q that was produced by the
    same write(Q) operation of transaction Tj .

  3. 如果 S 中一个事务是最后一个对指定数据执行写操作的,那在 S’ 中它同样需要对该数据执行最后的写操作。

    If in schedule S transaction Ti executes read(Q), and that value was
    produced by transaction Tj (if any), then in schedule S’ also
    transaction Ti must read the value of Q that was produced by the
    same write(Q) operation of transaction Tj .

View equivalence is also based purely on reads and writes alone.

A schedule S is view serializable if it is view equivalent to a serial schedule.

Every conflict serializable schedule is also view serializable.

有盲写的视图可串行化都不是冲突可串行化。

Every view serializable schedule that is not conflict serializable has blind writes.

Example:

T1 T2 T3
read(Q)
write(Q)
write(Q)
write(Q)

T1 在读之后和 T2 再也没有读操作,所以他们之间的 write 顺序可以任意调整,仍满足调度的视图等价:

  • T1 仍然读取了 Q 的初始值
  • T3 仍然是最后一个写 Q 的
  • 在这期间没有事务在别的事务写后读,所以只需考虑以上两个条件即可。

但是这个调度不是冲突可串行化的,因为 T1 的 write(Q) 和 T2 的 write(Q) 属于冲突指令,无法交换位置。

Precedence graph

A direct graph(有向图) where the vertices(节点) are the transactions.

Test for Conflict Serializability

一个调度是冲突可串行化的当且仅当他的有限图是无环的

A schedule is confict serializable if and only if its precedence graph is acyclic.

Cycle-detection algorithms exist which take order n2n^2 time, where n is the number of vertices in the graph.(Or take order n + e with a better algorithm. e is the number of edges)

If precedence graph is acyclic, the serializability order can be obtained by a topological sorting(拓扑排序) of the graph.

拓扑排序:在一个有向图中,对所有的节点进行排序,要求没有一个节点指向它前面的节点。

Test for View Serializability

The problem of checking if a schedule is view serializable falls in the class of NP-complete problems(NP 完全问题). Thus, existence of an efficient algorithm is extremely unlikely.

However practical algorithms that just check some sufficient conditions for view serializability can still be used.

Recoverable Schedules

需要解决并发事务失败带来的影响。

Need to address(处理) the effect of transaction failures on concurrently running transactions.

可恢复的调度就是指如果一个事务 T1 使用了另一个事务 T2 写入的数据,那 T2 一定要比 T1 先提交( commit ),否则若 T2 发生 Abort 则 T1 会读到非一致性的数据库信息。( T2 已经回滚,而 T1 读了回滚前的数据 )

Recoverable schedule — if a transaction TjT_j reads a data item previously written by a transaction TiT_i , then the commit operation of TiT_i appears before the commit operation of TjT_j.

Cascadeless Schedules

**Cascading rollback **

级联回滚

A single transaction failure leads to a series of transaction rollbacks.

Can lead to the undoing of a significant(相当数量的) amount of work;

significant adj. 显著的,相当数量的;重要的,意义重大的; 别有含义的,意味深长的

**Cascadeless Schedules ** cascading rollbacks cannot occur.

对每一对事务,如果其中一个事务读取了另外一个事务写入了的数据,那后者的提交一定在前者读取这个数据之前。

For each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti, the commit operation of Ti appears before the read operation of Tj.

Every Cascadeless schedule is also recoverable

Concurrency Control

A database must provide a mechanism that will ensure that all possible schedules are:

  • either conflict or view serializable
  • recoverable and preferably cascadeless

测试一个调度可串行化与否必须在其开始执行之前。

Testing a schedule for serializability after it has executed is a little too late!

并发控制不进行可串行化测试反而规定了一系列规则去避免不可串行化的调度。

Concurrency control protocols (generally(通常)) do not examine the precedence graph as it is being created. Instead a protocol imposes a discipline that avoids non-serializable schedules.

Goal – to develop concurrency control protocols that will assure serializability.

Weak Levels of Consistency

  • Some applications are willing to live with weak levels of consistency, allowing schedules that are not serializable.

  • Tradeoff accuracy for performance 在性能和准确率上的权衡

Levels of Consistency in SQL-92

  • Serializable — default
  • Repeatable read — **only committed records to be read. **
    • Repeated reads of same record must return same value.
    • However, a transaction may not be serializable – it may find some records inserted by a transaction but not find others.
  • Read committed — only committed records can be read.
    Successive(连续地) reads of record may return different (but committed) values.
  • Read uncommitted — even uncommitted records may be read.

Lower degrees of consistency useful for gathering approximate information about the database

Warning: some database systems do not ensure serializable schedules by default
E.g., Oracle (and PostgreSQL prior to version 9) by default support a level of consistency called snapshot is olation (not part of the SQL standard)

Transaction Definition in SQL

  • Transaction begins implicitly.

  • A transaction in SQL ends by:

    • Commit work commits current transaction and begins a new one.
    • Rollback work causes current transaction to abort.
  • In almost all database systems, by default, every SQL statement also commits implicitly if it executes successfully.
    Implicit commit can be turned off by a database directive(隐式提交可以通过数据库指令关闭)

  • Isolation level can be set at database level

  • Isolation level can be changed at start of transaction