Record: May 2012

Tuesday, May 29, 2012

Chord and Harmony

Chord is a sound made by 3 or more notes playing at the same time. Harmony is a succession of chords. Harmony is usually form a backdrop to melody. Some chord is pleasantly sound and some are not. A sequence of anxious chords build up tension before it is relieved by harmonious combination.

Rhythm guitar plays chords to accompany melody. Simple chords contain don't involve notes right from each other as such combination is harsh sounding. It is common though to build a harmoniously chord and add a crashing note.

Playing a chord one note at a time is called arpeggio. The most complex type of harmony is called counterpoint of which a melody is accompanied by another melody. Playing the same melody with delay is called canon. A slightly clever form of canon is to start the second run in higher or lower note.

Fugue is a piece which relies on the interplay between counterpoints as its main content.

Sunday, May 27, 2012

Join Strategies

Join strategies depends on indexes, table sizes and selectivity (measure of distinct values in an index).

Nested-loop Join

for each row in outer table {
    for each row in the inner table {
        if outer table join column matches inner table join column then add to temp table
        else fail
    }
}

The smaller size table should be the inner table so that it can be cached and reduce the number of page read. If cache is too small, the DBMS needs to look for index to avoid scanning every row in the inner table. Thus the table with good index should be the inner table. Having good indexes in both tables is not necessary useful as the outer table only goes through once. If there are multiple join columns, a compund index (c1, c2) is needed.

If there is a restrictive expression in the SQL, for example

select * from table1, table2
    where table1.c1 = table2.c1
    and table1.c2 = "abc"

the table with the restrictive expression should be the outer table. If table1 is the inner table, the restrictive expression will need to be applied every time through the inner loop.

To speed up the inner loop, IBM hybrid join sort data so outer table access will be in the order of the join column on the inner table. Scan the inner table and apply the join column, build a temporary table by appending the RID of the inner table to the outer table row Soprt the temporary table by RID order (in other words, the outer table is now in RID order of the inner table). Retrieve the inner row based on the RID in the temporary table and form the result set. Because the retrieval is in RID order, performance increase with prefetch. Hybrid join is essentailly a nested join that requires the outer table be in sequence.

Sort-merge Join

/*sort phase*/
sort table1
sort table2

get first row of table1
get first row of table2

/*merge phase*/
for (;; until no more rows in tables) {
    if (join coumn in table1 < join column in table2) then get next row in table1
else if (join column in table1 > join column in table2) then get next row in table2
else if (join column in table1 = join column in table2) then {
        pass
        get next row in table1
        get next row in table2
    }
}

This method is always going forward. Compares to nested loop, sort-merge only scan the row once. The down side is the cost of sorting. If both tables are large, merge sort join is faster generally. However, DBMS prefers nested loop because it requires less RAM and no start up overhead. Sort merge join is favourable especially if both table are already in the right sequence (cluster index sequence).

Hash Join

for each row in inner, compute a hash
for each row in outer, compute a hash
    if hash value is in inner table hash bucket then pass
    else fail

RAM is required to hold the temporary table of hash. Therefore enough RAM is a pre-requisite condition for this join. The join must be a equal join. Also the inner table join column must be unique and can be uniquely hashed. Hash join is useful when the outer table is very large which means the inner table will be searched many time.

When more than 2 tables are joined, each join is processed separatedly. There is no such thing as nested loop join within a nested loop. One reason is that the joins could be processed in parallel. Another reason is that nested loop within nested loop could keep the inner table out of cache.

Sunday, May 20, 2012

Sorting Characters

Binary sort are sorts of the cods used to store the character. Binary string comparison are twice as fast as dictionary-order string comparison on average but the order of rows may not be desired.

Dictionary sorts require a conversion step before comparison as no code page stores characters in the desired order. Dictionary sorts are reasonably right (in desired order) with some compromise for speed.

Dictionary sorts with tie breakers return the desired order especially when the data contains accented characters and is case sensitive. There is up to 3 passes:

Primary sort orders by letters as mapped to some cononical form (i.e. A comes before B comes before C etc)

Secondary sort looks for accent distinction if primary sort values are equal

Tertiary sort looks for case distinction if primary and secondary sort values are equal (e.g. Smith is followed by smith followed by Soho)

Dictionary sorts without tie breaker are same as primary sorts.

Database SORT

Three factors affecting sort speed are

The number of rows selected
The number of columns in the ORDER clause
The defined length of the columns (not the actual length in case of VCHAR)

If the physical length of the record is long, DBMS may perform a tag sort in dynamic memory with only the necessary part of record.

Position of row when sorting column with NULL various from DBMS to DBMS.

Saturday, May 19, 2012

Sargable

The ideal SQL search condition is in the form of . The left side is a column name and the right side is a easy to lookup value. IBM called these kinds of condition "sargable predicates". SARG is short for "Searchable ARGument". Microsoft and Sybase later redefined the term as "can be looked up via index".

Each component of a search condition has a point count. The higher the count, the faster the component (i.e. imvolves fewer row or easier to compare). For operator, the relative sequence is as follow:

=
>, >=, <, <=
LIKE
<>

For operands, the relative sequence is as follow:

literal alone
column alone, parameter alone
multi-operand expression
exact numeric data type
other numeric data type, temporal data type
character data type, NULL

Optimization technique include
(1) Put multiple conditions in order by their relative point count

(2) Constant propagation (transitive rule) is to replace column name by literal

e.g. WHERE column1 < column2 and column1 = 5

to WHERE 5 < column2 and column1 = 5

(3) Evaluate expression and replace by literal

(4) Avoid function on column. Use LOWER instead of UPPER if no choice as UPPER will lead to lost information in some languages.

(5) For AND conditions, put the least complex first so that DBMS can eliminate the rest when evaluated to be false. For OR, put the most likely condition first.

(6) (Distrubtive rule) A and (B or C) is faster than (A and B) OR (A and C) as there are fewer evaluation steps.

(7) Avoid NOT

e.g. NOT (column1 > 5)

to column1 <= 5

(8) Use IN over OR

e.g. column1 IN (5,6) instead of column1 = 5 or column1 = 6

(9) DBMS will use index if LIKE pattern starts with a real character but avoid index if it starts with a wild card.

Note that LIKE 'ABC' is not same as = 'ABC' as LIKE takes trailing blanks into account and not =.

(10) Transform UNION to OR

Bottom Halves

The purpose of bottom halves is to perform any interrupt-related work not performed by the interrupt handler. The reason to limit the amount of work in interrupt handler because the handler runs with the current interrupt line disabled on all processors. Some interrupt handler that register with IRQF_DISABLED run with all interrupt lines disabled on all processors. Minimizing the time spent while intrrupt lines are disable improve system performance and responsiveness. Another reason is that intrrupt processing was triggered asynchornously. It is desirable to defer works as much as possible and return to the work being disrupted.

Defer work "later" means "no now". In other words, defer work to any point in the future when the system is less busy. The importance is to run these works when the interrupt lines are enabled.

The original infrastructure provided in Linux for implementing bottom halves is called BH. BH interface was simple. It provided a statically created list of 32 bottom halves for the entire system. The top half sleect the specific bottom half to run by setting a but in the 32-bit register. Each BH is globally synchornized which means no two BH could run at the same time, even on different processors. This simple way is inflexible and also a bottleneck.

Later on, task queue was introcduced as a replacement for BH. The kernel defines a family of queues. Each queue contained a linked list of functions to call. The functions in the queue are run at specific time depending on the queue they were in. Drivers registered their bottom halves in the appropriate queue. This is still inflexible to replace BH entirely and also not a light-weight enough mechanism to replace performance critical susbsystems such as metworking.

During Linux 2.3 kernel development, softirq and tasklets were introduced which can replace the BH interface. It is non-trivial to convert BH to softirq and tasklets becasue BHs are globally synchronized. Softirq are a list of statically defined bottom halves that can run simultaneously on any processor, even 2 of same types concurrently. Tasklets are dynamically created bottom halves based on softirq. However, 2 of same types cannot run together on the same processor. Therefore, tasklets are good trade-off between flexibility and performance.

Softirq are used when performance is critical. However, care must be taken as 2 of same softirqs can run concurrently. Also softirq must be statically registered at compile time. Ont the other hands, tasklet code can be registered dynamically.

In 2.5 kernel, work queue was intoduced. Work queu is a simple method of queuing work to be later performed in process context. Currently 3 methods exist for deferring work - softirq, tasklet and work queue. BH and taskqueu interface have been removed.

UNIX History

UNIX grew out of Multic, a failed multiuser OS in Bell Labs. In 1969, Bell Labs programmer sketched out a filesystem design that evolved to become UNIX. Thompson implement the system to an idle PDP-7. It was ported to PDP-11 in 1971. In 1973, UNIX was rewritten in C which pave the way of portability to various platforms. The first UNIX widely used outside Bell Lab was UNIX System, 6th edison (V6). Other companies ported UNIX to new machines. In 1977, Bell Labs released a combination of these variants into a single system called UNIX system III. In 1982, AT&T released System V (system IV was an internal development version).

One of the main contributor of development was University of Berkeley. Variants of UNIX from Berkeley was called BSD (Berkeley Software Distributions). 1BSD, released in 1977, was a collection of patches and additional software on the top of Bell Labs UNIX. 2BSD in 1978 continued this trend and include csh and vi. the first standalone BSD release was 3BSD in 1979 which added virtual memory (VM). A serious of 4BSD releases (4.0 to 4.3) added job control, demand paging and TCPIP. The final release is 4.4 in 1979 which featured a rewritten VM subsystrem. Development of BSD continues with the Darwin, FreeBSD, NetBSD and OpenBSD systems.

In early 1980 and 1990s, various companies introduced their own commercial versions of UNIX. These variants were based on either AT&T or BSD releases and features high-end features developed for their hardware architecture. Examples are Digital's Tru64, HP's HP-UX, IBM's AIX, Sequent's DYNIX/ptx, SGI's IRIX and SUN's Solaris and SunOS.

UNIX is relative simple with 100s of system calls comparing to other OS which have 1000s. Everything in UNIX is abstracted as a file (with some exception like socket) which simplifies manipulation with a handful of commands (open, close, read, write and lseek). UNIX also features quick process creation time and the unique fork() system call.

Oracle tablespace space management

Prior to Oracle 8.1.5, there was only dictionary managed tablespaces. Space in tablespaces is managed using a dictionary. Space to be allocated causes row to be deleted from 1 data dictionary table and added to another, and de-allocated (reverse). SQL executed by background process to get space is called recursive SQL. An INSERT can trigger other recursive SQL. Recursive SQL is expensive if frequent. Updates to the data dictionary must also be serialized.

Oracle 8.1.5 intoduce locally managed tablespaces. Instead of using data dictionary, it uses a bitmap stored in the data file to manage extent. It speed up operation as manipulating bit has lower overhead.

Oracle Storage Hierarchy

Segments are database objects (table, index, rollback, temproary, cluster segments etc) that consume storage. Segment contains one or more extents. An extent is a logically contiguous allocatiom of space in a file. Some segments start with 1 extent. Some segments start with 2 extents (e.g. rollback segment). Extent contains blocks, which are the smallest unit of allocation in Oracle. Block is also the unit of I/O in Oracle.

Database may have more than 1 block size. This is mainly used for the purpose of data transportation than anything else. Tablespaces have only 1 consistent block size although the segments inside can have mutluple size.

Block contains a header which indicates the block type (table, index etc), transaction information regarding active or past transactions on the block and address on the disk. After that is Table Directory whcih contains information about the tables that store rows in this block. After that is Row Directory describing the rows stored in the block. After that is an array containing pointers to the actual rows in the block. The rest of the block contains data and free space.

Database comprises tablespaces. Tablespace is a container that holds segements. Segments never cross tablespace boundary. Tablespaces comprises one or more files. Files can be raw or cooked file systems. An extent within a segment will be contained entirely within one file. A segment can have extents from different files.

In summary:

Database contains tablespaces
Tablespace contains segements
Tablespace contains files
Segment contains extents
File contains extents
Extent contains blocks

Record