I was doing some research over the weekend on how transactions work “under the hood” in MySQL. I thought it might be enlightening if I wrote about what I found.
The database system must resolve the situation where, for whatever reason, the database server stops working in the middle of a transaction. Perhaps the power goes out, the hard drive crashes, or the CPU disappears in a cloud of smoke. Maybe the MySQL Server itself crashes for some reason. What does the MySQL Server do when operation resumes?
Potentially, there are transactions in an inconsistent state. How are inconsistent transactions resolved? They are resolved through a recovery process involving log files that are called transactional logs. There are two kinds of transactional logs: redo logs and undo logs.
Redo logs are used to apply changes that were made in memory but not flushed to the permanent table records (for example, InnoDB’s permanent table records are the ibdata files.) This situation would typically happen after a MySQL Server crash. Logging is done this way because it allows faster database operation.
This might be seem counter-intuitive at first look. Instead of writing to one file (the permanent table records file) you are writing to two different files (the redo log file and the permanent table records file). The difference is that the writes to the database table file(s) are random in nature while the writes to the redo log file will be sequential, which is usually going to be much faster. You perform the much-faster writes to the redo log as needed and then perform the slower writes to the table files periodically when there is time. Thus, the system actually operates faster writing to both files rather than only one file. This applying of the redo logs occurs automatically on startup of the MySQL server after a crash. Very similar processes are used by other database platforms.
In addition to the redo log there must also be undo logs. When a database user starts a transaction and executes some commands, the database does not know if the user will end the transaction with a COMMIT
or a ROLLBACK
command. Ending with a COMMIT
means all the changes made in the course of the transaction have to be preserved (fulfilling the Durable aspect of ACID). If the transaction gets interrupted for some reason, such as the MySQL daemon crashing, the client disconnecting before sending a COMMIT
, or the user issuing a ROLLBACK
command, then all changes made by the transaction need to be undone.
If the server crashed, the redo log files are applied first, on start up. This puts the database in a consistent state. Now the database server will need to roll back the transactions which were not committed but had already made changes to the database. Undo logs are used for this. As an example, if you are running a transaction that adds a million rows and the server crashes after eight hundred thousand inserts are performed, the server will first use the redo log to get the database server into a consistent stand than then will perform a rollback of the eight hundred thousand inserts using the undo logs. For InnoDB, this undo information is stored in the ibdata file(s). For Falcon the undo log information is stored in the serial log file.
Transactional logs have some common characteristics. The log records each contain Log Sequence Numbers (LSN) which are unique IDs for each record. There is a also a Previous Log Sequence Number (Prev LSN), which is a link to the last log record. A Transaction ID number is used to reference the database transaction generating the log record. Finally, checkpoints are entries in the transactional logs that are used to speed up the processing of those logs when they are needed. Checkpoints contain a list of transactions that are open when the checkpoint was created. This is used by the server to determine how far back into the log it must go to begin processing.
(A small side note. Some people get confused thinking the binary log files are used in this process. That is not true. The binary logs are used only for replication purposes and for manual data recover situations, such as when a backup is restored and you need to “catch the server up” using the binary logs. The server does not use these files in the transactional recovery process.)
That completes this general overview of the transactional logs in MySQL (or any modern RDBMS really). I hope you find it useful.
6 Comments. Leave new
From my understanding, Falcon’s serial log file contains redo and undo log information both. Falcon writes committed data and records into both serail log file and data file. Falcon writes uncommitted data only to its serial log file. If the transaction is rolled back, uncommitted records are not transferred, just truncated. So the cost of rollback is very cheap for Falcon.
The binary logs are used in the process starting with MySQL 5.0. This is what broke group commit for InnoDB. Two-phase commit is used internally so that the binlog and InnoDB can stay on sync. On crash recovery, InnoDB is given the status of the most recent transaction(s) to determine whether it should rollback or commit transactions in state PREPARE within InnoDB.
Mark is correct:
From the binary log manual page https://dev.mysql.com/doc/refman/5.0/en/binary-log.html
Updates to non-transactional tables are stored in the binary log immediately after execution….
Within an uncommitted transaction, all updates (UPDATE, DELETE, or INSERT) that change transactional tables such as BDB or InnoDB tables are cached until a COMMIT statement is received by the server. At that point, mysqld writes the entire transaction to the binary log before the COMMIT is executed. When the thread that handles the transaction starts, it allocates a buffer of binlog_cache_size to buffer statements. If a statement is bigger than this, the thread opens a temporary file to store the transaction. The temporary file is deleted when the thread ends.
Modifications to non-transactional tables cannot be rolled back. If a transaction that is rolled back includes modifications to non-transactional tables, the entire transaction is logged with a ROLLBACK statement at the end to ensure that the modifications to those tables are replicated.
Also, note that transactions are completely separate from whether or not the table(s) involved are InnoDB — I can do a long transaction using MyISAM tables and the redo and undo logs are never touched. Conversely, even when autocommit is on, the redo logs are written to upon [implicit] transaction commit…..
One more thing worth mentioning for Innodb is yet another short term logging which is happening – double write buffer.
This is really yet another log used to deal with partial page writes :)
Another thought – take a look at Maria as well. It also has logging (done in its own way) and with pace Falcon was moving for last years it may become production ready at about same time.
Mark,
Can I get a bit more clarification please? From what I can tell InnoDB does use the binary logs (for transactional logging), but only when running XA transactions correct? So if you are running a more “normal” setup than MySQL does not use the binary logs for transaction logging. The only time it does is when performing XA transactions involving multiple servers?
thanks,
Keith
This is great to know that MySQL also has transaction log. Thanks