Sunday, April 29, 2012

x86-64

Intel create IA64 RISC and AMD create AMD64.  AMD64 is backward compatible and allows running of 32-bit programs.  Intel subsequently create its own compatible version called EM64T/IA-32e.  The difference between the two is minimal.

AMD64 has 2 modes.  Legacy mode makes the CPU behaves like a 32-bit CPU and all 64-bit enhancements are turned off.  Long Mode is the native 64-bit mode.  32-bit programs can still run in compatible mode which can easily and quickly switch to 64-bit mode.  MAC OS X uses this mechanism to run a 32-bit kernel and 64-bit applications.

AMD64 enhancements include:
(1) 32-bit regiesters extended to 64-bit (e.g. EAX -> RAX)
(2) 8 new 64-bit registers called R8 to R15
(3) A nonexecute (NX) bit is by default to mark pages as nonexecutable. (NX bit was already available in some x86 processors wjere PAE was enabled.)
(4) Since a full 64-bit virtual address space requires lots of memory to store page tables, a subset is used, namely 48-bit.  The remaining 16 bit is a copy of the 47th bit in an address.
(5) Pages can be 4K, 2M or 1G in size.
(6) Segmentation has been crippled.  GDT exists but most entries are ignored.
(7) Calling convention procedure has changed.  IA32 pass paramenters on stack.  x86-64 passess majority of parameters via registers.

TLB

Virtual memory translation is an expensive operation.  Translation Lookaside Buffer is used to improve performance by keeping the recently translated addresses without the need to go through the MMU (Memory Management Unit) and access to the page table stored in RAM.  TLB exploits both temporal and spatial locality.

When there is a context switch, the TLB must be flushed to avoid wrong translation.  All CPU provide a mean to flush the entire TLB or specific entries.  There is also mechanism to keep specific entry that do not changes across context switch as global entries.  Flushing TLB is a performance penalty.

From MMU's perspective, operating system access its own page table like any user process.  Therefore, no only TLB is flushed every context switch, it is also flushed at every entry/exit from kernel.  Moreover, kernel needs to access user memory (e.g. bring in arguments of a call or return result to user space).  For architecture such as x86/x86-64 that do not provide any hardware support to access context of another process, this situation results in TLB flushes at each kernel entry/exit and the need to access page tables each time a reference to another context is needed.  This has huge performance impact.

To reduce such performance impact, OS implement a combined user/kernel address space scheme (divide one address space of 4G into kernel portion and user portion).  Translated address entries in kernel area will be marked as accessible by by kernel code only.  These entries will not be flushed.

Some other architecture such as SPARC v9 provides sipport for accessing a context from inside another context and to associate TLB entires to specific context.  In such case, it is possible to separate kernel address space from user address space

Saturday, April 28, 2012

Heap and Stacks

Each program has at least 2 stacks - a user stack and a kernel stack.  Kernel stack is used when user program switch into kernel mode (like making a system call).  Kernel stack operates (growth direction, stack point usage, local variable usage) same way as user stack.  There are slight difference for example, kernel stack is usually limit in size (e.g. 4K or 8K in x86) and thus kernel programming uses as few local variables as possible.  Also stacks of all processes reside in the same kernel address space but in different virtual addresses.

The baisc unit of memory that kernel manages is a physical page frame which never smaller than 4K.  Using the physical page allocator is inefficient for allocating space for small objects and buffers.  In addition, these small objects usually have a short life time and will hit system performance.  Modern OS will use a separated kernel-level memory allocator that communicate with the physical page allocator and is optimized for fast and contiuous allocation and de-allocation of small objects.  This allocator is a consumer of the physical page allocator, in that it ask for pages from the physical page allocator and returns them.  Each page is divided into a number of fixed-length chumk called slabs (from the slab allocator in SUN OS).  Pages containing objects of same size are grouped together and called a cache.

The slab allocator also must keep track of the states of objects in the cache too so that space can be reclaimed.  The reclamation is done by specific function.

Object allocator usually contains mechanism to detect overflow corruption called redzoning.  An arbitrary value will be written at the end of chunk and is checked at release time.  However, this will degrade performance and thus turned off by default.

Data Model of System

Data Model usually expressed using integer, long and pointer (ILP) size.  For example, ILP32 means the integer, long and point are all of size 32-bits.  LP62 means that long and pointer are 64-bit and integer are 32-bits (not specified).  LLP means integer and long are 32-bits but long long and pointer are 64-bits.  Note that char and short are 8 and 16-bits in all models.

Most compilers uses ILP32 for 32-bit code LP64 for 64-bit code.  Most major UNIX distributions including Mac OS X use LP64 model except Windows uses LLP64.  C program runs well on 32-bit system (ILP32) may not work in LP64.

Sunday, April 15, 2012

Connecting to Oracle

A user makes a request using a Oracle client software to connect to the database.  For example, the client software can be SQLPlus. 

sqlplus user/password@oracle10g.localdomain

The user supplies the username and password.  The target database is represented by a TNS (Transport Network Substrate, a piece of software that make remote connection) service name.  The TNS name is translated to a hostname and port number using mapping file like tnsname.ora file, or Oracle Internet Directory (OID, a LDAP).

A TNS listener on the database server will listen on the port for incoming connection request.  For a dedicated server connection, the listener will fork() and exec() to create a new UNIX process to communicate with the user.  The dedicated server process will execute SQL request, read data from files or find data in the SGA cache.

For shared server (previously called MTS, Multi-threaded Server) connection, Oracle uses a pool of shared processes to handle user requests.  However, user does not talk directly to the shared process like a dedicated server connection.  User talks to a process called dispatcher.  The  dispatcher put user request into a request queue in SGA.  The first shared process that is not busy will pick up the request from the queue, execute the request and put the result into a respond queue in SGA.  The dispatcher picks up the result and sends back to the user over network.  When user connect to the database, the listener will choose a dispatcher from the dispatcher pools and redirect the client to the dispatcher by sending the port the dispatcher listens to.  The client software then talks to the dispatcher directly.

Oracle Instance

When Oracle starts up in UNIX, the processes are of many names and they are referred as Oracle background processes.  They are persistent and exist from the time when Oracle is started until it is shut down.  They are started from a single Oracle executable but they picked up different personalities depending on the functions they perform. In Windows, these processes becomes individual thread in a single process.

Oracle SGA is a shared memory segment that are attached to by the background processes.  In UNIX, the processes uses shmget() and shmat() calls. Under Windows, the threads use C call malloc() to allocate the memory and share it within one address space.

Sunday, April 1, 2012

ACL

User's SID is the account number that Windowss assigns to the user during login.  The access token that holds the SID also contrains structures that identify the groups the user belongs to and what privileges that the user has.  Each group entry also as a SID.  This SID points to structures that describe the group's right.

The privileges section of the access token begins with a count of the number of privileges that the user has.  This section contains an array of privilege entries.  Each privilege entry contains a Locally Unique ID (LUID), essentially a pointer to the entry object, and an attribute mask.  The mask tells what rights the user has to the object.  Group SID entries are essentially the same - a privilige count and an array of privilege entries.

Object rights flow down to the lowest possible node unless overridden by another SID.  For example, if a user has read and write privileges to \temp, those rights is applicable to all sub-folders.  This applies to container such as a word document which may contains other files.

NT Security

The protection mechanism has been around since Windows NT and thus it is called NT security.  It uses a lock and key concept.  The lock is a kind of access control and the keys are rights.  There are mulitple level of permission associated with most resources to provide fine grain control.

A user has personal rights that are assigned by administrator.  A user can belong to group which all share the same rights.  The user's access is limited to the combination of group and individual rights that the administrator assigned.

Rights can be assigned by administrator.  Likewise, developer can write code that sets Windows securotu for particular objects, calls and portions of an application.  Changes by the administraor or developer affect the rights required to perform sepecific tasks using resources such as a file.  the right to write to a file is separate from the right to read from the file.

User level access depends on Security Identifier (SID).  When a user logs in, Windows assigns an access token and place the user's SID (stored in DC) in it.  The access token contains DACL (Discretionary Access Control List) and SACL (System Access Control List).  The combination of SID and ACL in the access token allows the user access to certain resources.  As the access token is session based, user need to logoff and re-login to gain addition rights assigned by administrator during the session.

The lock is called securty descriptor on resources.  The security descriptor tells what rights the user needs to access the resource.  If the ACL meets or exceeds the rights in the security decsriptor, the lock opens.