Wednesday, November 23, 2011

Thread

A thread is a basic unit of CPU utilization. Thread comprises a thread ID, a program counter, a register set and a stack. It shares with other threads belonging to the same process its code section, data section and other operating resoruces such as open files and signals. A traditional process contains a single thread of control.

Support for thread may be provided either at user level (for user thread) or kernel level (kernel thread). User threads are supported above the kernel and are managed without kernel support, whereas kernel threads are supported and managed directly by the OS. Ultimately, a relationship must exist between user threads and kernel threads:

(1) Many-to-one model maps many user-level threads to one kernel thread. Thread management is done by the thread libraray in user soace and so it is efficient. However, the entire process will be suspended if it make a blocking system call. Also only 1 thread can access the kernel at a time and so this model is unable to run in parallel on a multicore system. The GREEN threads (avaliable in Solaris) and GNU portable threads use this model.

(2) One-to-one model maps each user thread to a kernel thread. It provides higher concurrency by allow other threads to run when one gets blocked by system call. The disadvantage of this is the overhead of creating a kernel thread for every user thread and affects performance. Most implementaion of this model restrict the number of threads supported by the system. Linux and Windows implements this model.

(3) Many-to-many model multiplexes many user level threads to a smaller or equal number of kernel threads. This model provdes higher concurrency comparing to the many-to-one model at the same time limit the number of threads as compaaring to the one-to-one model.

(4) Two-level model is a variation of the many-to-many model by also allowing a user-level thread to be bound to a kernel thread (one-to-one). This model is supported in IRIX, HPUX and Tru64 UNIX. Solaris supports this model in pre-Solaris 9 version. From Solaris 9 onwards, it uses the one-to-one model.

Many systems implementing the many-to-many or two-level model place an intermediate data structure between the user and kernel threads. This structure, typically known as lightwight process (LWP), appears to the user-thread libraray as virtual processor on which the application can schedule a user thread to run. Each LWP is attached to a kernel thread of which the OS scheduler will dispatch on physical processors. When a kernel thread is blocked, it blocks the LWP and in turn blocks the user-thread.

Sometime the kernel needs to inform the application about certain event. It uses upcall procedure. At the beginning, the kernel allocates a number of LWP for the application. One upcall event is when an application thread is about to block. In this case, the kernel makes an upcall to the application identifying the specific thread. The upcall is hanlded by the thread library. The upcall handler is allocated with a new LWP to run. It saves the state of the blocking thread and relinquishes the virtual processor. The upcall handler then schedules another thread to the new virutal processor. When the blocking thread is eligible to run, another upcall will be performed.

Tuesday, November 22, 2011

Operating System Structure

(1) Monolithic
Frequently such system started from a small, simple and limited systems and grew beyond the original scope. For example, MSDOS and traditional UNIX.

(2) Layered
With proper hardware supoport, OS can be broken into modules. The main advantage of the layered approach is simplicity of construction and debugging. Once a layer is debugged, the upper layer can be tested easily. The major difficulty with the layered approach involves appropriately defining the various layers. sometimes it is not so obvious. For example, the backing store (paging/swapping) driver would normally be above the CPU schedululer because the driver may need to wait for I/O and CPU can be rescheduled during that time. however, on a large system, the CPU scheduler may have mor einformation about all the active processes than can fit in memory. Therefore, this information may need to be swapped in an dout of memory, requiring the back-store driver routine to be below the CPU scheduler. Another problem with layered approach is that it tends to be less efficient than other types. Each layer adds overhead to the system call. This resulted in the recent trend in OS design to have less number of layers and each with more functionality.

(3) Microkernel
In mid-1980s, researchers in Carnegie Mellon University developed an OS called Mach using microkernel approach. The method removes all non-essential components from the kernel and implementing them as user or kernel processes. This results in a smaller kernel. There is little consensus on what should be included or excluded. Typically, micokernel provides minimal process and memory management, in addition to a communication facility. The main purpose of microkernel is to provide a communication facility between the client program and various services that are also running in user space. Communication is provided via message passing through the microkernel.

One benefit is ease of extending the OS. All new services are added as user services and do not require modifying the kernel. This make the OS easier to port to different hardware platform. The microkernel also provides better security and reliability. If a services fails, the rest of the OS remains intact. Several contemporary OS using this approach. Tur64 UNIX provides a UNIX interface to user but was built with a Mach kernel. The Mach kernel maps UNIX system calls into messages to the appropriate user-level services. The Mac OS X (Darwin) was also based on Mach microkernel.

Unfortuntely, microkernel can suffer from performance decreases due to increased system function overhead. For example, the first release of Windows NT was based on a layered microkernel organization. Windows NT 4.0 partially address the performance problem by moving layers from user space to jernel space and integrating them more closely. by the time windows XP was designed, its architecture was more monolithic than microkernel.

(4) Modules
The best current methodology for operating system design involves using object-oriented programming techniques to create a modular kernel. The jernel has a set of core components and links in additional services either during boot time of run time. It uses dynamically loadable modules. For example, Solaris is organized around a core kernel with seven types of loadable kernel modules - scheduling classes, file systems, loadable system calls, executable formats, STREAMS modules, Miscellaneous, device and bus drivers. such design allows the kernel to provide core services yet also allow certain features to be implemented dynamically. For example, device and bus drive for specific hardware can be added.

The result resemble a layered approach in which each section has a defined, protected interface. But it is more flexible than a layered approach as the sections can called one another. The modular approach is also like the microkernel approach in that the primary module has only core functions and knowledge to load and communicate with other modules. but it is more efficient because module can communicate directly with one another rahter than message passing.

Sunday, November 13, 2011

Procedure and Process

Procedure is a series of steps whereas process is far richer in governance and oversight. Process has feedback mechanism for improvement and is more adaptable to change conditions. A process also interface to other processes.

Tuesday, November 8, 2011

AT&T Mnemonic

When gcc compiles a c program, it actually generate an assembly soruce code in AT&T mnemonics. The output .s file is then assembled using the gas assembler into object module.

General rules for AT&T mnemonics:
(1) All mnemonics and registers are in lower case.
(2) Register names are prefixed by "%"
(3) Every AT&T instruction mnemonics that has operands has a single character suffix indicating the length of the operands. The suffix are b, w, l(long 32-bit). For example, movw %ax,%bx.
(4) Source and destination operands are placed in opposite order from Intel syntax.
(5) Immediate value are prefix by "$". For example, pushl $32
(6) gcc does not generate all AT&T mnemonics.
(7) Displacements in memory references are signed quantities placed outside parantheses containing the base, index and scale values. For example, movl -4(%ebx,%edi) is equivalent to Intel mov byte easx,[ebx+edi-4]

Linking to the Standard C Library

C runtime library dictates a structure to any programs that link to it. This include a block of code that runs before your program begins and another block of code after your program ends. Your program is called by the startup code as though it is a procedure with the CALL instruction. Your program returns to C library using the RET instruction.

The C calling convention is as follow:
(1) A procedure must preserve the values of the EBX, ESP, EBP, ESI and EDI. The other GPR can be altered.
(2) A procedure's return value in EAX for 32-bit or less. 64-bit return value is stored in EDX and EAX.
(3) Parameters passed to procedures are pushed onto the stack in reverse order. In other words, given MyFunc(p1, p2, p3), p3 is pushed onto stack first, then followed by p2 and p1.
(4) Parameters are removed by the caller after the procedure returns.
(5) The label name of the the assembly program (or the one generated by gcc) must be called "main:" and must be declared as GLOBAL.

Monday, November 7, 2011

x86 Jump

There are 2 categories of conditional jump. The problem arises because NASM can generate a binary opcode in different way. A jump target that lies within 127 bytes of the conditional jump instruction is called short jump. A jump that is farther than 127 bytes but within the code segment is called near jump. A near jump can be as far as 2GB away. A far jump go out of a segment which mean specifying both segment and offset. This is quite uncommon.

A short jump generates a short and compact opcode which is always 2 bytes in size. Near jump generates a opcode of either 4 bytes or 6 bytes. NASM generates short jump by default as this is most commonly use in programs. To generate near jump, one must explicitly specify

jne loop ;short jump
jne near loop ;near jump

The above does not apply to unconditional jump.

Linker

There is no reason why assembler cannot generate an executable program as object code file but it is almost never done. The object module cannot be run by itself but requires another step called linking. This model allows breaking big program into smaller parts and each gets written on its own separately.

To process each modules into an executable, linker first builds an index called symbol table, with an entry for every named item in every object module. Once the symbol table is complete, the linker builds an image of how the executable will be arranged in memory when the OS loads it. The image is written to the disk as the executable.

Object modules are allowed to refer to symbols in other object modules using the EXTERN keyword. In the declaring module, the label must be defined using the GLOBAL keyword.

During assembly, these external references are left as holes in the calling module to be filled later. As the linker builds the executable image, it learns where these symbols are in the final image and update them in the reference holes.

Debugging information is a step backward: portions of the soruce code, which was all stripped out early in the assembly process, are put back into the object module by the assembler. These portions of the source code are mostly names of data items and procedures.

Linker also makes the executable relocatable (all address are referenced relative to the start of the executable. This enable the executable to be loaded in any physcial address.

x86 Memory Models

Flat Memory Model
The memory is abstracted as a list of consecutive bytes accessed via a linear address.  This resemble to the physical address but not necessary when the linear address is used in the translation process under a memory protection model

Segmented Memory Model
Memory is organized in term of distinct region called segment.  A byte in segment is referenced by a logical address (also called a far pointer).  A logical address comprises 2 parts - a segment selector and an effective address.  The logical address can resemble the physical address or may be not depending on the translation process.

IA-32 operates in 3 modes - real mode, protected mode and system management mode (SMM).

SMM
It is used to execute code in firmware (e.g. emergency shutdown, power management)

Real Mode
Real Mode implements a 16-bit execution environment of the old 8086/88.  It is an instance of the segmented memory model.  The logical address contains 16-bit segment selector and 16-bit effective address.

The selector stores the base address of a 64KB segment.  To form a 20-bit address (used by 8086/88), the segment selector is appended with a zero nibble before adding on with the effective address.  For example, the logical address 02001010 is translated to 0200[0] + [0]1010 = 03010.  The segment address is always at paragraph boundary (because of the appended zero) and the segment is at most 64KB (because of the 16-bit effective address).  There is no memory protection under Real Mode.

There are 5 combination of segment registers and GPR:
  • SS:SP
  • SS:BP
  • ES:DI
  • DS:SI
  • CS:IP

MSDOS is a real mode OS.  The first 640KB is called Conventional Memory.  The remaining memory area up to 1MB is called Upper Memory Area, which is reserved for used by ROM and peripheral.  Within UMA, there are slot of free memory not used by hardware and accessible by DOS are called UMB (Upper Memory Block).  Memory above the real mode 1MB boundary (when 386 was released) are called extended memory.

In a Real Mode Flat Model, program starts at 100h which is a holdover from CP/M-80 (PSP).

Protected Mode
Similar to real mode, protected mode is an instance of segmented memory. The difference is that operating system is now must collaborate with the CPU to resolve addresses.  The segment registers are owned by OS and program cannot read or write them.

x86 general purpose registers

There are 8 GPR. Four of them can be referenced either in full 32-bit or 16-bit.

The 16-bit names are SI, DI, BP and SP. The 32-bit names are prefixed by "E" - ESI. EDI, EBP and ESP.

The other four GPR can be referenced either in 8-bit, 16-bit or 32-bit

32-bit: EAX, EBX, ECX amd EDX
16-bit: AX, BX, CX and DX
8-bit: AH, BH, CH and DH (for the upper 8-bits) and AL, BL, CL and DL (for the lower 8bits).