Sunday, May 20, 2012

Sorting Characters

Binary sort are sorts of the cods used to store the character.  Binary string comparison are twice as fast as dictionary-order string comparison on average but the order of rows may not be desired. 

Dictionary sorts require a conversion step before comparison as no code page stores characters in the desired order.  Dictionary sorts are reasonably right (in desired order) with some compromise for speed. 

Dictionary sorts with tie breakers return the desired order especially when the data contains accented characters and is case sensitive.  There is up to 3 passes:

Primary sort orders by letters as mapped to some cononical form (i.e. A comes before B comes before C etc)

Secondary sort looks for accent distinction if primary sort values are equal

Tertiary sort looks for case distinction if primary and secondary sort values are equal (e.g. Smith is followed by smith followed by Soho)

Dictionary sorts without tie breaker are same as primary sorts.

No comments: