Tuesday, July 7, 2015

Windows System Calls

User mode program access system services via the API exposed in ntdll.  ntdll places the number representing the target system service defined in SSDT (System Service Dispatch Table). into eax.  It then call a common stub to transit to kernel mode

mov eax,18Ch ; target services
mov edx, offset ..... ; common stub address
call [edx]

The common stub resides in the ntdll!FastSystemCall module.  It executes the sysenter (or syscall for x64) to switch to kernel mode.  It will also save the esp (user mode stack pointer) to edx.  sysenter will load the cs, eip and esp (kernel mode stack) from the corresponding msr (model specific register).

Eventually the call will end up in the corresponding kernel mode modul in nt executive.



Sunday, May 31, 2015

Finding Dialog Box in Process Dump

If the application is a 32-bits application running in 64-bits windows, switch to 32-bits mode to see the stack.  Otherwise, it will display the 64-bit stack when in windbg

!wow64exts.sw

Firstly display the stack of all threads to check for calls to any dialog box function

~*kb

Dump the memory address (length set to 100 bytes) to see the text in the dialog box

db 07b435c8 L100 or dc 07b435c8 L100

The !peb shows many useful information including number of processor, computer name, the name of the application executable file name etc


Saturday, May 16, 2015

Servlet Container

Web server accept a request in form of URL and returns the resouce request.  The resouce could be static content which will not change no matter how many times it is request.  Examples of static content are image, icon, text etc.  When wen server received a request for dynamic content, it hands off to another component such as a servlet.

Servlet is basically a java class which can be specified as a resource in the URL.  Web server will interface to a servlet container which manages (initializes, dispatches, destroys) the life cycle of a servlet, according to the servlet specification.  The servlet container also supports JSP file that commonly used by servlet.

Most of the time, the servlet generates HTML codes as part of the response.  In fact, most of the response comprises mainly static HTML codes with a few dynamic content.  Generating the response page via program writing out HTML code in string form is tedious and the code is hard to maintain.  A JSP file is a template comprises mostly the static HTML codes with a few funciton calls to incorporate in the dynamic content.  The JSP file will be compiled by the serlvet container transparently into a servlet to run.

The servlet container has the following components:
(1) Connector allows client to connect to the container in various protocol such as HTTP.  Container interacts with a pool of threads which is assigned to each client connection.  Servlet is expected to be thread-safe and runs in a multithreading environment.

(2) The main container contains level and level of subcontainers - Servlet Engine, Virtual Hosts, Application Context and Servlet Wrappers, used to process requests

(3) Supporting functions such as  security, JSP compiler, JMX, clustering etc.

Tomcat History

Tomcat is a reference implementation of JSP and servlet specification.

Tomcat was conceived by James Davidson in late 1998 as the core of the JavaServer Web Development Kit.  Tomcat was donated by then SUN Microsystems to Apache.  Apache discontinued the upgrading of JServ, a free servlet container, to support the new servlet 2.1 standard and adopted Tomcat.  Tomcat was released as 3.0 a sa successor to JSWDK 2.1.

Oracle Locally Managed TS

It uses bit map to manage free space in each of the tablespace, instead of linked list as in dictionary managed TS.  This avoid concurrency problem with dictionary and more efficient.

Oracle Managed File
This feature simply storage management for simple database.  Space will be removed when the TS is dropped.  The drawback is that it cannot stip across multiple disks (but can be used if the SAN provide transparent striping).

XML vs JSON

XML is heavily formatted.  The tag to data ratio can be quite high.  Data can be encoded between tag or as attribute of a tag.  Parsing the data from XML required knowledge of the layer and quite an amount of coding.

JSON on the other head encode the data a Javascript array format in a string representation.  JSON file can be converted to a Javascript array object easily using eval() call in Javascript.

Communicating to server side from browser

The first method is using XHR (XMLHttpResquest) call.  XHR allows the script to open up a http connection to the server and send in a GET or POST request.  One limitation is that the request could only go to the same domain as the script.

The second method is to create a script element in DOM.  This causes the browser to download the js file which could be formulated to contain data.  This method allow to access data in another domain. However, there may pose a security risk as the js file could be downloaded by others.

Browser Reflow and Repaint

Once all elements are downloaded, a browser will create a DOM tree and a rendering tree.  The rendering tree is based on the DOM tree structure.  Some DOM elemenet may not have a corresponding rednering tree nodes but those that have equivalence will have one or more nodes in the rendering tree.

When the DOM changes that affected the display geometry of element, some branches of the rendering tree will be invalidated.  A reflow will be done to rebuild that part of the rendering tree.  A repaint will then be followed to correct the display to match the changes.

Reflow happens in situation such as visible elements are added, position change, size change, content changes (e.g. text changes and window resizing etc.)

Document Object Model (DOM)

DOM is a language independent API for working with XML and HTML.  In browser, DOM is usually implemented in JavaSCript.  Major browser implement the Javascript engine and DOM separately.  For example, IE's JavaScript engine is call JScript (jscrirpt.dll).  The DOM API is called Trident (mshtml.dll).  In Webkit (Safari), the JavaScript engine is called SquirrelFish and the DOM engine is called Webcore. For chrome, the JavaScript engine is called V8.  Chrome uses Webcore for rendering as well.  For Firefox, JavaScript engine is called SpiderMonkey (or TraceMonkey in latter version) and the DOM API is implemented in Gecko.

Because of the separation of the script engine and the DOM/rendering engine, there are overhead in manipulating the DOM tree as every move need to be accomplished by procedure calls.

JavaScript Prototype Chain

A prototype is a template to construct an object.  All objects in Javascript is has a base prototype which is the JavaScropt Object.  Object inherit the methods (e.g. ToString) and properties of the prototype object.

Each object has an internal property called _proto_ which point to the prototype object, which in turn points to the Object which is the protoype for all other objects in Javascript

For example, the following defines a constructor for object car and a method

function car(vendor, model, year) {
    this.vendor = vendor;
    this.model = model;
    this.make = year;
}

car.prototype.checkcurmodel = function() {
    if this.make = "2015" {alert("current year");
}

The following defines 2 cars

var car1 = new car("toyota", " corolla", "1995");
var car2 = new car("vw", "passat", "2014"_;

car1 and car2's _proto_ field points to car.prototype object.  The constructor car also contains a _proto_ pointing to the car.prototype object.  The car.prototype object contains a constructor field pointing to the car constructor.  It also has a _proto_ field point to the Object prototype which contains base methods such as toString etc.

Accessing field and calling method require a search of the prototype chain.

Javascript Closure

Closure is a feature in Javascipt that allows an inner function access variables of its parent function.  The variables will continue to exist even if the parent function has ended.  In other words, the variable act like a global variable but only accessible to the inner function but unlike a global variable, the variable could not got messing up by other sibling functions.

For example, f is a closure.

var f = ( fn() {
   var a = 0
   return fn() { return a++;}) ();


f();

The variable a will increase by 1 each time f is called and a is protected from other functions.

When the function is called, it create an activation record.  The closure will also push this activation record into its code chain.  When the function returns, the activation record will remain as it is still referenced by the closure.  (The closure will have 2 variable objects - the top is the activation record of the function and the bottom is the global object.)

JavaScript Scope Chain

Every function in Javascipt is represented by an object.  Objects are just name-value pairs.  When the value is a function, the name represents a method.  When the function is executed, it is considered as an instance of the function.

The function object contains a [[Scope]] attribute which point to the scope chain used to resolve identifier (variable, properties, function members) when the function executes.  The [[Scope]] points to a scope chain table which in turn points to a variable object.  The variable object contain a list of variables.

When a function is declared, the scope chain is set up pointing to one variable object - the global object which contains global variables such as "window" (the window object representing the browser window and frames within), navigator (represent the browser such as codename, version properties) "document" etc.  When the function is later executed, a variable object (called activation record) is pushed on the top of the scope chain.  The activation object contains the local variable of the function. To resolve a variable, a search was done from the top of the chain down.  Therefore, a local variable of the same name as a global variable will always be used instead in the function.  When the function ends, the activation record will be removed from the scope chain.

To simplify coding, function and variable name can be shorthanded using the with statement.  For example,

function fn() {
    var doc = document;
    link = doc.getElementByTagName("a");

}

Can be shorthanded to

    with (document) {
        link = doc.getElementByTagName("a");
    :

Internally, the with statement will add a variable record on the top of the scope chain with the doc object to quicken resolution.  However, the activation record containing the local variable will be pushed down the scope chain and as a result the resolution overhead will increase.

Similar effect is also achieve with using try..catch block in the code.  The catch statement, when executed, will push an exception object onto the scope chain.

Blocking nature of JavaScript

When a HTML script tag is encountered in the HTML document, the browser will block all other activities until the script is downloaded (if it is an external link) and executed.  The reason is the script may change the html document (DOM) and thus changes the rendering of the change and behaviour of the page.

Therefore, it is desirable to place Javascript at the latter part of the tree to avoid long wait for user.  If the Javascipt is place at the front of the HTML document (such at the section), user may see a blank white page delay when the page is being loaded.

When there are successive Javascript files specified in the page, the second script file will be loaded after the first one has completed loading and execution.  If an inline script is placed after tag, the browser will also be blocked until the css file referred to by the link is download.  This is to ensure the inline script has the latest layout information to work with if required.

Saturday, April 11, 2015

C++ Dynamic Storage


Local variables are allocated in stack.  The new operator allocate storage in heap.  Unlike stack, heap storage persist beyond the current scope.  On the other hands, heap storage must be freed via the delete operator.

int* p = new(int);  // new return a reference of the newly allocated storage
*p = 100;

When an object contains pointer to heap storage, the default destructor will lead to memory lead as the pointer will be released but not the area it points to.  Therefore, a customized destructor must be written and it should use delete to remove the dynamic area allocated.

A copy constructor has the same name of the class and accept a object of the same class.  For example,

Class C1 {
...

string p*

C1(const C1 &Ca) {
    p = new string(*(ca.p));
    }
...
}

The copy constructor is use to copy a object to another object of the same class.  The default copy constructor does a shallow copy.  In other words, if the class contains a pointer to a dynamically allocated area, the copy will be copy to the target object but the dynamic area will not be duplicated.  Both objects will point to the same area.  When one of the object is released, the destructor may release the dynamic area.  The remaining object will then end up containing a dangling pointer.

C++ friend functions


Friend functions are defined within a class that enable these functions outside the class to access the function and data within the object.  The function can be a C++ function or a operator-overload function

Class C {
    friend void fn (...)  // a normal function
    friend a& operator<<(..._  // operator overloading of <<
    .... // normal class definition
}

Friend functions are declared outside of the class  

C Array


The name of an array is a constant pointer to the first element of the array.  We can pass an array to the function of the equivalent signature

int fn(int* const x)

int a[5] = [100, 200, 300, 400, 500];
int b
b = fn(a);

Because array name is a pointer, it can be returned by a function as such (beware of out-of-scope issue).

Pass By Pointer in C++


Passing argument to a function can be more efficient when using the pass-by-reference method, especially when the argument size is huge.

int fn(int& x, int& y)

The same level of efficiency can also be achieved using constant pointers.  You can change the value pointed to but not the pointer itself, similar to using references

int fn(int* const p, int* const q)

Similar to returning a reference in a function, be careful not to returning a pointer to an out-of-scope object. Commonly, the return pointer points to an area in heap allocated by the new operator.  Likewise, using a pointer to receive a returning pointer is efficient as it will not involve any copying of the object content.  Using a variable to receive a returning pointer will end up making a copy of the object the pointer points to.

C++ Pointers


Pointer contains address of another variable or object.  To point to an existing variable, assign the point using the address operator (&)

int a
int* b = &a

To access the variable pointed by a pointer, use the dereference operator (*)

int c = *b

A constant pointer cannot change to point to another variable

int* const b = &a
b = &c // generates error

A pointer to a constant cannot change the value of the variable the pointer points to

const int* b
*b = c // generates error

a constant pointer to a constant is most restrictive
const int* const b

C++ Reference


A reference is an alias for another variable.  It is declared using the & operator

int a = 100;
int& b = a

A reference must be initialized when it is declare or results in compilation error.

Reference can be used in place of the variable (as it is just like have another name for the same variable)

a += 1 or b +=1 yields the same result

To pass by reference in a function fn is efficient if the parameter size is large

int fn(int& x, int& y)
int i, j, k

i = fn(j, k)

fn can change the value of j and k in its code.  To prevent fn changing the arguments, declare the funciton using constant reference

int fn(const int& x, const int& y)

When a function returns a reference (e.g. int& fn(...)), it must make sure that the deference is not on a local variable as the memory storage will disappear when the fn goes out of scope.  It must return either a static variable or a reference to a parameter.  The calling program can assign the return reference value to another reference variable.  In this case, the return object is not copied

int& a = fn(...)  // no copy of content

If the calling program uses a variable to receive the return value, the content of the object will be copied

int a = fn (..) // content copy

Sunday, March 8, 2015

Kernel Address Space

Translating a virtual address to the physical address requires looking up of page tables.  To speed up translation, server will usually equipped with a TLB (translation lookaside buffer).  When a process is switched out to another process, the TLB will need to be flushed.  Context switching also happens in system call.  Some hardware architecture (e.g. SPARC) will have assist to preserve the TLB and some have not (e,g x86).  For the former, the kernel address space could be a separate address space from the user process.  For the latter, the entire virtual address space will partition into a kernel part and a user part.  Mapping the kernel space in the same address space as the user help to minimize the flushing of TLB during system call.

Sunday, February 22, 2015

Subjective and Objective Claims

Objective claim (empirical knowledge) is things out in the physical world that can be sensed or measured

Subjective claim is things that based on personal opinion, value, judgement and preferences.  These are claim that gives one purpose and meaning of life.

Knowledge

Knowledge is justified true belief:
(1) one believe it is the case
(2) with a basis or reason
(3) and it is in fact true

Belief + desire -> action

Knowledge does not equal to fact!

Bifurcation

Given a starting population with a certain reproduction rate and a death rate, we can predict (calculate) the population for specific generations.  If we plot the population by generation with a given birth rate, the trajectory will be linear in a straight line.

If we add an additional rule to confine the population by a carrying capacity, the trajectory changes to a point until it become impossible to predict.  In other words, a small variation in birth or death rate, the population for a specific generation will not be close together but may attain a very diverse value

The above can be modeled bu a logistic map formula.  We combine the birth rate with the death rate into a variable called R.  The carrying capacity is replaced by a variable called fraction of carrying capacity x. Logistic map is represented by the following formula: Xi+1    = R*Xi *(1-Xi )

(1) For a small value or R, e.g. 2, x will converge to a single value after a few generations.
(2) When R increases to a value of 3.1, the trajectory evolves to an oscillation of 2 values.  \
(3) As R increases further, the oscillation changes to among 4, 8 values etc.
(4) The spread of oscillation increases rapider and rapider as R picks up gradually.
(5) When R reaches about 3.569946, the trajectory becomes chaotic.  Even a small difference in the value of x (e.g. at the 10 decimal point) will produce a completely different trajectory.  In other words, the trend becomes unpredictable.

Because of this property, the logistic map is used as an algorithm to generate pseudo-random number.

Among the chaos, there are 2 universal characteristics observed:
(1) The trajectories follow a period-doubling (bifurcation) route to chaos. From a single value to 2, to 4, to 8 etc.
(2) As R increases, their bifurcation points move closer and closer. The rate of convergence is found to be a constant value of 4.6692016. This is called the Feigenbaum constant.

JavaScript functions and methods

Both do similar things.  Method is a different way to organize code.

JavaScript Anonymous Functions

Functions that are declared as they run and with no name.  For example,

(function () { ... });

One purpose of using anonymous function is to create a scope for local variables.

JSON (JavaScript Object Notation)

JSON is a format for representing data.

{
"plane" : [
"economy",
"business",
"first"
]
}

Rendering Engine in Browsers

The rendering engine creates the page appearance in browser.  There are a few rendering engines currently used in the market:

Firefox - Gecko
IE - Trident
Chrome, Safari, IOS and Android - Webkit
Opera - Presto

Types of JavaScript

(!) Inline JavaScript

about

(2) Embedded JavaScript



(3) External JavaScript

Specifying CSS in HTML

(1) Inline CSS refers to the use of style attribute in the HTML.  Inline CSS clauses are commonly generated by program logic such as JavaScript.  An example of inline CSS as follow,


">
(2) CSS can also be link to using a special HTML statement

abc

Saturday, February 21, 2015

HMTL5 Sections

Older HTML uses division
to structure the web pages into different sections.  HTML5 introduces new element such as
,

Friday, February 20, 2015

JavaScript Event Handler

Event handler allows specified Javascript function to be called when an event such as mouse click, happens.  Event handler must be hung onto a DOM tree.  For example,

var button = document.geElementById("submitbutton");
button.onclick = function() {...};

Event handler always starts with "on-something".  Each element can only has 1 event handler attached to it.  If you need to have 2 event handlers, for instance to validate the form and submit to server, you need to use event listener instead.

Changing structure of HTML

JavaScript provide calls to add and remove node from a HTML DOM.

createElement()
createTextNode()
appendChild()
removeChild()

For example,

var tagid = document.getElementById("about");  /*position to the tag*/
var p = document.createElement("p"); /*create a new element but not hang to the DOM tree yet*/
var textblk = p.creatTextNode("This is a newly created text block"); /*wrap it under a text node*/
tagid.appendChild(textblk); /*hang to the DOM tree*/

Once you have position to a node, you can use these calls to move around the position:

parentNode
previousSibling
nextSibling
firstChild
lastChild

For example,

document.getElementById("about").parentNode.setAttribute("class")

DOM Text Node

Text node refers to the text content under a tag.  JavaScript provide calls to manipulate the text block in HTML

document.getElementById("text-serction-id"),innerHTML = "new text here"

DOM Attribute Node

Attributes exists in HTML tag.  For example, class attribute in below tag

/local/small.jpg/' id=

JavaScript has the following call to manipulate the attribute in a specific tag.

hasAttribute() tests if the attribute exists in the tag
getAttribute() retrieve the value of the attribute such as "show" above
setAttribute() set a value to the attribute
removeAttribute() remove the attribute from the tag

For example,

document.getElementById("pic").getAttribute("class")

Prisoner's Dilemma

Two prisons accused on a crime are kept in separate confinement.  If A testifies against B and B keep silent, B would get a life sentence and vice versa.  If both testify against each other, both would get 10 years of imprisonment.  If both keep quiet silent, both would be charged with a lesser crime and get 5 years of imprisonment.  What would you do?

The prisoner's dilemma was postulated in 1950s during cold war to study how to achieve international cooperation.  Traditionally, most people believe one can only achieve cooperation unless there is a central effective governing bodies.  That is the backdrop of UN.  The prisoner's dilemma is to study how to cultivate cooperation without that.

Most people tend to choose defection (testify against).  However, if the game is play multiple rounds. the behavior will be affected by past result.  Different algorithms are used to simulate the game play for the best result.  And the best algorithm is called TIT for TAT.  In other words, the player will go for cooperation but if the other party defects, the player will defect to punish the other parties until the other parties cooperate again.  A good algorithm generally is nice and forgiving but also retaliatory - punish an opponent immediately.

One extension to the game is to add social norm.  Each player has a specific value of defect (boldness) and probability to punish a defective (vengefulness).  When a player defect and is witnessed by some other players, he may be punished.  If the vengefulness value is low (no social norm), defection will dominate tje community.  However, vengefulness is not sufficient to maintain cooperation reliably.  Adding punisher to punish player that does not punish defector will on the other hand promote cooperation.  The punisher is playing the role of metanorm.  For example, people giving leering look to parents that does not control their children.

Another extension is to add spatial consideration in the game.  Each player (in a grid) only interact with players around it.  Simulation found that cooperation and defect exist indefinitely without the need for norm or metanorm.

Thursday, February 19, 2015

DOM Element Nodes

By tagging an element node with an id, we can search directly to the specific node in the DOM tree via JavaScript functions:

document.getElementById("header") will position the access to


To access a particular type of element, use documnet.getElementByByTagName.  For example, to access the first paragraph in the document,

document.getElementByTagName("p")

The function returns a Nodelist.  To find out how many elements in the Nodelist:

document.getElementByTagName("p").length

Jump access over the list elements:

document.getElementByTagName.item(0) or document.getElementByTagName[0] for the first element item in the list.

Element can also be access via the CSS class name using getElementByClassName()

For example


DOM

HTML document is organized into a DOM (Document Object Model) tree.  This allows the HTML document to be manipulated programmatically.

A DOM tree consists of the following node types:

document node - the root of the tree
element nodes - HTML elements such as , , etc</p> <p> text nodes - the textual content of an element, for example the text between the <p> tags of <p> text node</p> </p> <p> attribute node - non-displayable attributes</p> <p> <br></p> <p> Both text nodes and attribute nodes are end or leaf nodes which no children.  They are also accessed using different methods in JavaScript.</p> <p> <br></p> <p> <br></p> <p> <br></p> <p> <br></p>

Sunday, February 8, 2015

Perfect Fifth and Forth

A perfect fifth is 7 semitones from the root note.   A perfect forth is 5 semitones from the root note.  If we inverted the perfect forth (i.e. lower it one octave and drops it below the root note), the perfect forth and the root note will form a perfect fifth interval.  Likewise if we invert the perfect fifth, we will get a perfect forth interval with the root note.

Therefore, the 3 notes have very strong relationship.

Progressive Enhancement and Graceful Degradation for Web Frontend Design

Web frontend refer to the webpages delivered to web browser.  Web frontend is typically rendered with a mixture of HTML, CSS and Javascript.  Progress enhancement is a concept to functionally separating the frontend into 3 layers.

Layer 1 defines the structure of the web pages .  This is achieved by using HTML.  The web page content is organized into different divisions or frames.  The areas can be tagged with an id.

Layer 2 defines the presentation layer.  Using CSS, the content of the web pages can be rendered according to the design.

Layer 3 defines the behaviour layer.  User interaction is enhanced using Javascipt.

A key point on the progressive enhancement design is to allow graceful degradation.  This means the user experience can degrades gracefully from higher layer to lower layer if the user platform does not support the upper layer feature used such as Javascript is disabled.  A web frontend designed with graceful degradation in mind will ensure the web pages can still be used by user meaningfully without the upper layer.

Friday, January 23, 2015

MLGPO

Multiple Local GPO has 3 layers:

Layer 1 computer node policy will be applied to all computer.  Layer 1 user node could be overridden by Layer 2

Application of Layer 2 policy depends on if the user is an administrator or normal user.

Application of Layer 3 policy depends on specific user.

The resultant policy is the sum of all three layers except those lower layer settings that are overridden by the upper layer.

Group Policy Nodes

Group policy has 2 branches or nodes - computer and user.  The computer node contains settings that are applied to computer (e.g. start up/shut down script, firewall etc).  The user node contains settings that are applied to users (e.g. profile, logon script etc)

Monday, January 5, 2015

Uses of spin locks

Shared memory typically protected by spinlock because the wait period is typically small compare to wait that result in context switching.

Linux fork()

Pending signals are cleared and files locks are released for the child process.  In older version of Linux, the page table is duplicated and the pages are copied page by page from the parent address space to the child's.
Nowadays, the copying is replaced by using copy-on-write feature which shorten the process creation time and minimize wastage of copying pages that are not required by the child.

vfork() was introduced in 3.0 BSD release.  vfork() stipulates that the caller must immediately call exec() or _exit() after vfork().  vfork() will also suspend the parent and so the child can share the parent memory without incurring the memory copy.  Consequently the child must not modify any memory.

Linux exec()

A family of exec calls build upon a single system call - execl().  The call accepts a string containing the path of the executable and 1 or more optional arguments to be passed to the program.  There is at least 1 argument which is the name of the program.  The execl() is varidic which means it accepts a variable number of arguments, to be terminated by a NULL char.

e.g. ret = execl("/bin/vi", "vi", "abc.txt", NULL);

In general, execl does not return as the program image will be replaced by a new one specified in the call.  A successful execl call also have the following effects:

- pending signals are lost (s original program is not there to handle it)
- the signal handler setting is reverted back to default
- memory locks are all freed, mapped file is freed
- thread attribute are returned to default
- process stat is reset
- atexit is reset

Some attributes will be retained across exec
- pid
- process priority
- owning user and group
- opened files which means the new program can access all files of its parent if it know the fd number.  More commonly, files are closed before the exec call.

Other exec calls are built upon execl():
- execlp - only specifies the file name and uses the PATH to resolve the full path name
- execle - also passes over the environment variables
- execv - uses vector (array) for argument passing
- execvp - uses file name and vector for argument passing
- execve - uses file name, vector and environment variables

e.g. const char *args[] = {"vi", "/abc.txt", NULL};
ret = execvp("vi",args);

Use of path has security risk: if the attacker manipulates the PATH variable, it could trick the application to execute a rogue replacement of a program resided in the head of the PATH chain.

Process Group

Each process is owned by a user and a user group defined in /etc/passwd ad /etc/group respectively.  Each process also belongs to one process group.  The child process will belong to the same process group as the parent.  All commands in a pipeline belongs to the same process group.

The process group is just a construct to make it easier to send signal or get information from a group of related processes.  A process group is related to a job from user's perspective.

Linux 2.6 IO Schedulers

2.6 kernel have 4 IO schedulers to choose from:

Deadline IO Scheduler - In addition to the standard sorted IO queue sorted by block number, the scheduler also maintains 2 additional queue - read queue and write queue.  A new request will insert into the standard queue and to the end of the read or write queue.  Each read or write queue item has a expiry time set and when it goes off, the schedule will schedule the item at the top of the queue (which is the oldest as the insertion is by submission time).  In other word, the schedule imposes a soft limit on the service time for each IO and at the same time minimizes seek using the sorted queue for most of time.

Anticipatory IO Scheduler - it's a common program behaviour to issue successive read calls (note that write is not relevant as write is not synchronized as read is).  Therefore, after the scheduler serviced a read from the read queue and goes back to the sorted queue, another read may come by in a short time.  The result is a constant shifting of disk arm to services between the sorted request and the periodic read requests.  As a result, IO throughput is constantly throttled by this behaviour.  The anticipatory scheduler starts off operating like the deadline scheduler.  However, it will wait up to 6ms after a read IO to see if there is another one coming in.  If yes, it will service the next read.  If no, it will return to the deadline scheudle routine and continue.

CFQ Scheduler - a queue is set up for each process in the system.  The schedule process each queue in turn for a timeslice.  When the timeslice end, the schedule moves on to the next process and thus make it "fair" for all processes in the system. If a queue is clear and the timeslice has not ended, the schedule will wait for 10ms to anticipate another read coming in.  The CFQ scheduler also favour read request over write to avoid the starvation problem.  The CFQ scheduler is a good choice for most of the workloads.

Noop IO Scheduler - no sorting is performed and just merging only.  This is for device that does not require sorting of request (such as SSD that has no seeking penalty).

The default scheduler is chosen using the boot option iosched.  The scheduler can also be selected at runtime for each block device by modifying the file /sys/block/"block device name e.g. hda"/queue/scheduler

Read Starvation

As IO schedule insert requests in ascending sequence as they come in, read to latter block may be kept waiting in the queue for a long time.  The problem can be further deteriorated by writes which batch up several requests and flushes to the disk periodically.  This is called writes-starving-read problem.

IO scheduler is further optimized to address this issue.  In Linux 2.4 kernel, the Linux Elevator scheduler will stop inserting new requests when the list have sufficient number of old request to process.  The algorithm is simple and it could help in some situations.

Synchronous and Synchronized Operations

Synchronous means the IO operation does not return control to the caller.  Synchronized operation means the operation ensures data are up to date (on disk).

When synchronous write is synchronized, the write will not return until data is flushed to disk.  This is equivalent of O_SYNC flag effect when opening file.

When synchronous write is not synchronized, the write will return when the data is copied to the kernel buffer.  So the disk copy is different from the memory copy.  This is the default effect of Linux write operation.

When asynchronous write is synchronized, the control return to caller as soon as the request is queued.  When the write is later executed, the data is guaranteed to be written to disk.

When asynchronous write is not synchronized, the control is returned immediately after the request has been queued.  The data is guaranteed to store in kernel buffer only and so the disk copy could be different from the memory copy.

Read is always synchronized (i.e. the data return is always up to date).  Synchronous or asynchronous read call determine when control is returned.

IO Scheduler

Modern processor speed is much higher than disk access.  A disk seek can take 8 msec and is equivalent to 25M processor cycle.  Therefore, the function of IO scheduler is to optimize seek optimization.

There are 2 basic tasks of IO schedule - sorting and merging.  Sorting is to arrange pending IO by block order to minimize head movement.  Merging is to coalesce several IO requests into one, thus reducing the number of requests.  For example, one IO to read block 6 and one to read block 7 can be merged into one IO to read block 6 to 7.

Linux File Advisory

Similar to madvise, the posix_fadvise() call hints the kernel to optimize file access. The options available are similar to madvise - POSIX_FADV_NORMAL/RANDOM/SEQUENTIAL/WILLNEED.

POSIX_FADV_NOREUSE indicates that the data are only used once.

POSIX_FADV_DONTNEED evicts pages in the range from the file cache.

In general, file advise can benefit application performance.  Before reading, application an give FADV_WILLNEED and kernel will read the data block in asynchronously.  When the application reach the point to read the data, blocking can be avoided.  The FADV_DONTNEED hint can return pages to the file cache for other purpose.  An application that intends to read in the whole files can use FADV_SEQUENTIAL hint to speed up the processing.

Linux File Mapping Management Functions

mprotect() can be used to change the protection setings (NONE, READ, WRITE, EXEC) for a specific region in the mapping.  The address passed in must be on page boundary.

The function of msync is equivalent to fsync in normal IO.  The data in the file or the specified region of the file associated with a mapping is written back to the disk.  The mode of flushing is controlled by a flag - MS_SYNC, MS_ASYNC and MS_INVALIDATE.  The last option invalidates all copies of the file mapping and future access will see the current file content on disk.

madvice() allows caller to hint kernel on the intended usage of the file memory mapping so that kernel can make intelligent optimization of the operation:

MADV_NORMAL - perform a moderate amount of read ahead
MADV_RANDOM - disable read ahead and read a minimal amount of data for each physcial read
MADV_SEQUENTIAL - perform read ahead aggressively
MADV_WILLNEED - initiate read ahead
MADV_DONTNEED - free the pages and discard all dirty ones.  Subsequent access to the pages will caused the page to be read in from backing store or zero-filled page (anonymous) again.
MADV_DONTFORK - do not copy the pages across fork. Used mainly for managing DMA pages

for example, madvise(addr, len, MADV_SEQUENTIAL) informs the kernel that the program intend to access the memory regional sequentially.

Linux mmap()

The call maps content of file, started from an offset passed over from the caller, into a memory location. The unit of mapping is page.  The call will round the size up to page boundary.  The caller can specify if the mapping are PRIVATE or or SHARED.

The advantage of mmap over normal read/write call is that mmap transfers data directly to user space without keeping a copy in the kernel buffer.  Once the file is map, no system calls are required to manipulate the data except potential page fault and context switching overhead.  The mapped file can be shared by multiple processes.

mmap() is typically used for large file to avoid wasting memory.  The mapping is terminated using the unmmap() call.