Saturday, October 11, 2014

Linux Delayed Writeback

Note that write() returns after the kernel copies the data to the kernel buffer.  The data may not have externalized to the disk.  Dirty buffers will be batched and write out at a latter time (writeback).

Delayed writeback does not affect subsequent read() which will return the updatd data from the dirty buffers instead of from the disk copy.

If the system crashed, data in dirty buffers will be lost.  Another problem with delayed write is that it does not enforce I/O sequence.  For database, this can cause data integrity problem.

Also if I/O error (e.g. disk failure) was encountered later when the data is written out to the disk, it may not be possible to report the error back to the originiating process which could have been terminated.  In fact, the dirty buffer may contain updated data from multiple processes.

To minimize the risk, kernel write out the dirty buffers at regular interval specified via /proc/sys/vm/dirty_expire_centisecs

Page writebacks are carried out by a set of kernel threads - flusher.  Multiple flushers work on different devices.  This fixes a deficiency of older Linux (pdflush and bdflush) which work on one devices at a time and spent much time waiting causing build up of dirty pages in a high volume environment.

No comments: