Saturday, November 22, 2014

Linux epoll

Both POSIX poll and select accepts and walks the list of fd passed over and if the list is long, this affect efficiency and performance. Linex epoll() is created to address this by separate the registration from the monitoring using different calls.

epoll works by firstly creating a polling context using epoll_create1() call.    The call returns a handle which must be freed via close() call.

fd to be monitored are then add to the context using epoll_ctl().  The events to be monitored are also specified.

Program then goes into wait for the events via the epoll_wait() call.

Scatter/Gather IO

One single system call to read or write data into a string of non-contiguous buffers.  Its advantage is performance due to reduction of overhead comparing to a serial of individual read or write system calls.  In addition, the call is atomic and it will not interleave with calls from other threads.

readv() and writev() accept 3 parameters - a fd, an array of iovec structure and a count of the number of buffers.

struct iovec {
    void *iov_base  /* ptr to the start of buffer */
    size_t iov_len
}

The buffers are processed sequentially.

Linux Process Priority

Each process has a static priority in the task_struct which is its nice value (between -20 to 19 which lower number signifies higher priority).  It is static because it does not change throughout the lifetime of the process.  Schedule pick the next task to run based on the dynamic priority.

The dynamic priority is a function of the static priority and also the interactivity of processes.  Dynamic priority uses effective_prio() which add a bonus or penalty value (in range of -5 to +5) based on the interactivity of the task.  This bonus value is added to the static priority to form the dynamic priority.

Kernel tracks how active (interactive) a process via the sleep_avg field in the task_struct.  When a task is created, it receives a high sleep_avg value.  The time a task spent sleeping will be add to sleep_avg (up to MAX_SLEEP_AVG = 10 msec).  Each time a task runs, the corresponding time will be subtract from sleep_avg.  A task with high sleep_avg is I/O bound and a zero sleep_avg value means the task is processor-bound.

The size of timeslice is proportional to the task's priority.  Higher priority task receive longer timeslice.  When a task spawn a child, the child share the timeslice with the parent.  This is to prevent task keep spawning child to get unlimited supply of processor time.

Double Buffering in Standard IO

Standard IO improved performance by maintaining buffers in user space thus avoid making successive system calls.  On the other hand, the down side is that data is required to transfer between the standard IO buffer to the user supplied buffer.

Call like getc triggers a read() call which causes data to be read from the disk to the kernel buffer, copy to the standard IO buffer and finally copy to the buffer passed into the standard IO call.

putc copies the data from the supplied buffer to the standard IO buffer.  The data will then be copied out to kernel buffer via write().

Standard IO File Locking

Standard IO is thread-safe.  When there is concurrent calls made to a stream, the call will be serially applied to the stream.  If an application needs to ensure several calls are performed together without interleaved with calls from other program, it need to lock the file using flockfile().  After the series of calls are issue, unlock the file using funlockfile().

Controlling Buffering in Standard IO

There are 3 types of buffering

Unbuffer - no user buffering is performed.  Data is written to kernel.  This option is rarely used except for stderr

Line-buffered - the buffer is submit to kernel upon every newline character in the stream.  This is default for stdout (screen).

Block-buffered (Full buffering) - data is buffered in form of block.  This is the default for file.

The type of buffering is specified via the setvbuf() call.  The 3 buffering types are specified as _IONBF, _IOLBF and _IOFBF respectively.  The call must be issued after opening a stream and before performing the first IO.  The caller must also supply the buffer to be managed by Standard IO

Typically application seldom need to manipulate the buffering mode or default buffer size.

C Standard IO Library

stdio provides a platform independent user buffering solution.  Files are referred by file pointer instead of the system level fd.  The file point type FILE is in cap because stdio as originally written as MACRO and thus follow the convention.

  • fopen - open a file and return a pointer to FILE.  The file opened is called a stream.
  • fdopen - open a file using fd
  • fclose - close file
  • fcloseall - close all streams
  • fgetc - read a char from a stream
  • fgets - reading multiple characters until a NEWLINE char or EOF is reached.  The \n will be stored as part of line read.  A NULL char will be added to the end of the string
  • fungetc - put a character (casted as unsigned int) into the stream.  You can issue more than one calls and the char are pushed back like a stack - so the next fread will return the last pushed char.  The standard defines only one push back is allowed.  Linux allows multiple push back as long as memory is available.  If fungetc is followed by a seek, the pushed back char will be lost because the buffer will be reused by a new block.
  • fread - reading binary data (in form of a structure, record) from the stream.  Caller pass the size of the structure and the number of structure to be read.  The function returns the number of structure read.  If the number is less than specified, it could be due to EOF reached or error is encountered during read.  Use ferror() or feof() to identify the condition.  Note that the program needs to assume the file is created (could be by another problem in another system) with same variable sizes, alignment, padding abd byte order.
  • fputc - write a byte
  • fputs - write a string
  • fwrite - writ binary data
  • fseek - advance the position in a stream.  Uses whence parameter to indicate if the offset provided represents the absolute position, relative to the current file position or to the end of file
  • fsetpos - like fseek with whence = SEEK_SET (absolute position).  This API is provided for non-Linux system that uses complex type to store file position.
  • rewind - set the position to the start of the stream
  • fgetpos - fseek does not return the current position like lseek.  This API acheive the requirement.
  • fflush - write out standard io buffer to the kernel buffer.
  • fileno - return the file descriptor of a stream.   Caution - do not intermix standard io with system IO call



Linux Run Queue

There is one runqueue per processor.  In a run queue, there are 2 priority arrays - one active and one expired.  the array contains a bitmap with each bit reflect if there is any task in the corresponding priority list head.  Task with its timeslice used up will be transferred to the expired array.  Scheduler will schedule task in the active array.  When all task exhausted its timeslice, scheduler will switch the active and expire array.  When a task is transfer to the expire queue, its timeslice is also calculated.  This approach avoids a long delay during priority queue switching time.

Friday, November 7, 2014

Linux Process Termination

exit(EXIT_SUCCESS or EXIT_FAILURE) terminates a process.  Before that, C lib will perform the following shutdown steps:

- call functions that registered with atexit() or on_exit() calles in reverse order of registration
- flush io streams
- remove temporary files created by the tmpfile() calls

atexit() is a POSIX call that maintain a list of clean up function registered by caller  for process normal termrination.  The clean up functions accepts no parameters and do not return any return code.  If the process is terminated via signal (e.g. kill), the clean up functions will not be called.  The list is also not inherited by child process.

on_exit() is a function equivalent to atexit() in the older version of SUNOS.  Latter version or SUNOS uses atexit().

exit() will finally call the system call _exit() that cleans up kernel resources allocated for the process.  In the kernel level, do_exit() performs the following work:

- set the PF_EXITING flag in task_struct
- remove kernel timer set using del_timer_sync()
- if BSD process accounting is enabled, calls acct_process() to write out the accounting info.
- release and remove resources, including
- __exit_mm() to release mm_struct
- sem_exit() to release IPC semaphore
- __exit_files()/fs()/namespaces()/sighand()to release and remove resources
   - set the exit code
- exit_notify() to send SIGCHLD to parent, reparent the task's child to their threadgroup or to init process.  Also set the task state to TASK_ZOMBIE
- call schedule()

What's left at this point of time is the task structure and the kernel stack and the thread_info in it.

Once the parent or init calls wait(), the above is released via the put_task_struct() function.  The user's process count is decremented.  The task struct is unlinked from the task list.

Sunday, November 2, 2014

Virtual File System (or Switch)

VFS is an abstraction by providing a common file model.  Using function pointers, VFS framework provides hooks to support reading, creating link, synchronization etc.  Each file system supplies functions to handle the relevant operations.

VFS talks in term of inode, superblock and directory entries.  A filesystem not align to UNIX will have to provide equivalent abstraction support.

poll() and select() Comparison

(1) poll does not require the user to calculate and pass in the highest fd + 1 as one of the parameter.
(2) select() need to manage the fds bitmask which has a length dependent on the fd number (e.g. 1000) and the bit mask may be sparse if only a few fd to watch.  so poll is more efficient as the list of fd depend on the number of fd to watch and not on fd number
(3) select() modify the fdslist upon return.  poll() use a separate event lists
(4) select() is more portable and some systems do not support poll()
(5) select() timeout has higher resolution (microsecond vs millisecond) than poll()



Portable Sleep using select()

Put NULL to the fds lists and set the timeval parameter - this allows the process to sleep for a specific time period.  As select() is implemented in most systems, it become a portable way for sleep.

Multiplexed IO

Programs typically services a few file descriptors concurrently (e.g. terminal, file, pipe to communicate with other programs).  Without threading, each process can only wait on 1 fd at a time.  One solution is to use non-blocking IO but this is not efficient.

Multiplexed IO allows the process to block on multiple fd.  The process will sleep and be notified when any one of them could be read or write without blocking.

select()
Introduced by 4.2BSD, it uses 3 fd lists - readfds, writefds and exceptfds.  fd in the readfds list are those that are waiting for data to read without blocking. Likewise for writefds.  Exceptfds is to monitor for exception of out-of-band data (applicable to socket only) is available.  Upon return, each set is modified to contain only the fd that is ready for the specific IO type.  The fds list is really a bit mask.  Thus select() has a parameter to indicate the highest fd in the list (so it can calculate how long the bit mask should be).

select() can pass in a timeout structure which specifies the time for select to return even if there is no ready fd by then.  Setting the time zero causes select() to return immediately with any fd that are ready.

fd lists are build using macro:

fd_set writefds
FD_ZERO(&writefds) - remove all fds
FD_SET(fd, &writefds) - add a fd, e.g. FD_SET(STDIN_FILENO, &readfds)
FD_CLR(fd, &writefds) - remove a fd
FD_ISSET(fd, &writefds) - test if a fd is set, e.g. if (FD_ISSET(STDIN_FILENO, &readfds)) {...}

POSIX defines its won equivalent to select(), called pselect().  The call signature is slightly different but it uses the same macro to build and test the fd lists.

Difference between pselect() and select()
(1) pselect() uses a different timeout structure - timespec.  Timespec uses second and nanosecond while timeval (used in select) uses second and microsec.  However, neither call is accurate even on microsecond level.

(2) When select returns, the value in timeval is undefined.  So it must be reinitialized before the next select call.  pselect does not modify timespec and so no need to reinitialize in successive calls.

(3) pselect has an additional parameter called sigmask, which uses to address a race condition between waiting on the file descriptor and signal.  for example, a process checks for a flag to be set by the signal handle before issue the select() call.  If the signal arrives (which wakes up the handler to se the flag) after the check and before the select() call, the process may be blocked indefinitely and never response to the set flag.  segmask resolve this issue by providing a list of signal for pselect to block.  When the signal is block, it will not be handled by the handler until it is unblocked at end of the pselect().  In other words, the signmask serialize the pselect() and the signal handler.

poll()
System V introduced the poll() which addresses some deficiencies of select().  Instead of 3 fd list, pool uses an array of pollfd structure.

struct pollfd {
    int fd; /* fd */
    short events; /* events to watch */
    short revents; /* return the events observed */
};

Events to watch are
POLLIN - data to read
POLLOUT - write will not block

POLLRDNORM - normal data to read = POLLIN
POLLRDBAND - priority data to read (socket?)
POLLPRI - urgent data to read

POLLWRNORM - writing normal data will not block = POLLOUT
POLLWRBAND
POLLWR

POLLMSG - a SIGPOLL message is available

Return events include the above and the following too:
POLLER - fd has encountered error
POLLHUP - fd has hang up
POLLINVAL - fd is invalid

In comparison,

POLLIN | POLLPRI = select() read event
POLLOUT | POLLWRBAND = select() write event


poll() uses a timeout parameter up to milli-second precision.  A zero value makes poll() return immediately.  A negative value makes poll() to wait indefinitely until an event is observered.  An example to use poll

struct pollfd fds[2];
fds[0].fd = STDIN_FILENO;
fds[1].fd = STDOUT_FILENO;

fds[0].events = POLLIN;
fds[1].events = POLLOUT;

ret = poll(fds, 2, 5*1000);
if (fds[0].revents & POLLIN) {...};

Linux also offer ppoll() that is similar to pselect() with a timespec and sigmask. ppoll is not POSIX but a Linux specific call.

Truncating Files

ftruncate() and truncate() trim the file to the length specified.  ftruncate() works on file descriptor and truncate() works on a pathname.

If the length given is larger than the current file size, the file is extended in a fashion similar to lseek+write.  The extended length will be filled with zeros.

Combine READ/WRITE with SEEK

pread() and pwrite() is like the normal read/write call with addition parameter on position.  The p calls ignore the current file position and perform IO at the position passed.  The call also does not update the current file position.  Thus, mixing read/write with pread/pwrite may cause data corruption.

The advantage p call is that it elimnates the race condition in a mutithread environment that shared the file table.  lseek and the following read/write call is not atomic.  Therefore, after a lseek and before the next read/write call made by one thread, the current file position can be altered by another thread.  Using pread/pwrite avoids this racy situation.

Linux lseek()

This system API advances the file position pointer for the next I/O.  Position can be specified as follow:

SEEK_CUR = current position + offset
SEEK_END = end of file + offset
SEEK_SET = beginning of file + offset

Note that offset can be 0, positive or negative.

When seeking beyond the end of file, READ will return EOF.  Write at that position will create a hole between the previous EOF point to the new data.  The hole will be filled with zeros but they will not occupy any space on disk.  The file becomes a sparse file.

The max file poisiton that lseek can go is defined as off_t type.  This is typically implemented as C long type (i.e. word size = size of the general purpose register).  In kernel, the position is kept as long long.

File Closing in Linux

In Linux, data will not automatically flushed to disk. When the last user of the file closes the file, the inode will be released.  If the file has marked for deletion, the last close will cause the file to be finally removed from disk.

Linux Direct IO

The O_DIRECT flag to open() makes IO bypasses the system file cache.  Data is transfered directly from the user space buffer to the device.  All IO is synchronous.

Sunday, October 26, 2014

AIX maxclient

This is a tunable that determine the amount of RAM to cache non-computational client (JFS) pages. The value should be less than maxperm as Client page is a subset of all cached permanent pages.  When numclient (reported in vmstat -v) reaches maxclient, lrud will page replace the client only pages.

AIX VMM Page Classification

Working storage pages are pages that are not preserved across a system reboot.  Examples are process data, heap, stack and shared memory etc.  They are also called anonymous pages.  VMM will release the working storage when a process ends.

Permanent pages are pages that have a backup store (file) and so it persists across system reboot. There are 2 subtypes:


  • Non-client pages are those in a JFS file system.  They are also called persistent pages.
  • Client pages are pages from other file systems like NFS and JFS2.


Computational pages are working storage pages.  Program text is also computational pages.  Pages from other type of files are designated as non-computational pages.    Non-computational pages will retain in the file cache for performance purpose in case it is required at a later time.

A file can start off as non-computational but when a page fault is triggered for the file due to instruction fetch (i.e. system is trying to fetch an instruction from the file), the file will then be marked as computational.  All the pages in the RAM belong to the file will be marked as computational.

The tunables minperm and maxperm control the amount of permanent storage pages cached in the system.  When the number of permanent pages (numperm) is above maxperm, lrud will steal only from the permanent pages store (file cache) to replenish the free page list.  When numperm is lower than minperm, lrud will steal from both computational and non-computational pages.

When numperm is between minperm and maxperm, AIX will consult the lru_file_repage tunable.  When this tunable is active (set to 1), AIX maintains a lru repage table to track pages (computational and non-computational) that have been paged out but then got paged in shortly again.  This indicates these pages are needed frequently and should not have paged out.  If the computational pages has a higher repaging rate than non-computational, AIX will steal from non-computational pages.  Otherwise, AIX will steal from both computational and non-computational.  When the tunable is set to 0, AIX will always steal from non-computational which generally the better choice.

Saturday, October 11, 2014

Synchronous I/O in Linux

int fsync(int fd) system call flushes the data to the disk synchronously.  The call flush both data and metadata such as creation timestamp and other attributes in the inode.

int fdatasync(int fd) system call flush the data and only metadata (e.g. file size) that is required to access the file in the future.

The sync() flushes data for all fd.

Alternately, passing O_SYNC to OPEN call causes every READ and WRITE to be synchronized IO.  It is like forcing a fsync() call after each IO but Linux implement this more efficiently.

Specifying O_SYNC will increase the CPU for WRITE and elapsed time of the process as IO wait time is included. Using fysnc() and fdatasync() is comparatively less overhead as the program can make these call at specify logic point and not after every IO.

POSIX also defined O_DSYNC and O_RSYNC flags for OPEN. These 2 flags is defined as O_SYNC in Linux.  By definition in POSIX, O_DSYNC is same as fdatasync().

O_RSYNC means READ and WRITE IO are synchronized.  READ is already always synchronized (it will not return unless some data is available for the caller).  O_RSYNC also stipuated that the metadata (file access time) associate with the READ call must also be updated to disk before READ returns.  Although this behaviour does not match O_SYNC, LINUX defined O_RSYNC as O_SYNC

Linux Delayed Writeback

Note that write() returns after the kernel copies the data to the kernel buffer.  The data may not have externalized to the disk.  Dirty buffers will be batched and write out at a latter time (writeback).

Delayed writeback does not affect subsequent read() which will return the updatd data from the dirty buffers instead of from the disk copy.

If the system crashed, data in dirty buffers will be lost.  Another problem with delayed write is that it does not enforce I/O sequence.  For database, this can cause data integrity problem.

Also if I/O error (e.g. disk failure) was encountered later when the data is written out to the disk, it may not be possible to report the error back to the originiating process which could have been terminated.  In fact, the dirty buffer may contain updated data from multiple processes.

To minimize the risk, kernel write out the dirty buffers at regular interval specified via /proc/sys/vm/dirty_expire_centisecs

Page writebacks are carried out by a set of kernel threads - flusher.  Multiple flushers work on different devices.  This fixes a deficiency of older Linux (pdflush and bdflush) which work on one devices at a time and spent much time waiting causing build up of dirty pages in a high volume environment.

Linux write()

Unlike read, write will write out the whole buffer before returning unless it encountered error.  Therefore, there is no need to code a write loop.

EDADF, EFAILT, EINVAL and EIO has meaning similar to read().

EFBIG - the file size limit is exceeded
ENOSPC - the filesystem runs out of space
EPIPE - the reading end of the pipe the fd assicates with has closed.  Normally a SIGPIPE will send to the process attempts to write in such situation.  If the signal is not handled,the process would be terminated.  If the signal is handled, the write() system call will return this errno.

Linux read()

A read call can return several ways:

If len is returned, the read is successful as expected
If greater than 0 and less then len is returned, the read may be interrupted or EOF reached.  Reissue the read.
If 0 is returned, EOF is reached
If -1 is returned and errno is EINTR, the process is interrupted by a signal.  Reissue the read
If -1 is retruned and errno is EAGAIN, this is a non=blocking read and currently no data is available.  Reissue the call at a latter time
If -1 is returned with errno set to value other than EINTR and EAGAIN, an error has happened and reissuing the call will probably not successful.

EBADF - bad fd passed
EFAULT - the buffer to hold the data is in the process address space
EINVAL - the fd does not allow reading
EIO - IO error has occurred

Note that read can return with partial result (when len is less than the size passed).  Therefore, read should be done in a loop to reissue the call under some conditions above.

Linux creat()

Opening file with O_WRONLY, O_CREAT and O_TRUNC is vert common and this system call does exactly that.

Sunday, October 5, 2014

Linux file ownership

A owner of a new file is the effective userid of the process

The owner group is more complicated to determine.  For system V behaviour (default for Linux), the owner group is the effective gid of the process.  For BSD behavior, the owner group is the gid of the parent directory.  BSD behaviour can be set by a mount-time option.

Linux by default will use the BSD behaviour if the set group ID bit is set (setgid).

Fortunately, the group owner is usually not important.

Flags in Linux OPEN call

O_APPEND
Before each write, the file posiiton will be updated to point to the end of the file.  This happens even a second program writes to the same file since the last write by the first program.  The write by the first program will started from the new end of file position.

O_ASYNC
A signal (SIGIO) will be generated when the file becomes readable or writable.  This flag is used for pipes, sockets or terminal and not for regular file.

O_CLOEXEC (close on exec)
Upon executing a new process, the file will automatically closed.  This saved the call to fcntl and eliminates possible race condition

O_CREAT
Create the file if it does not exist.  If the file exist, this flag has no action unless O_EXCL is used

O_EXCL
When used with O_CREAT, the open call will fail if the file already exists.  This is to prevent race condition. This flag has no meaning if not used with O_CREAT

O_DIRECT
Open the file for direct I/O (i,e, no system buffering)

O_DIRECTORY
If the file is not a directory, open will fail.  This flag is used internally by the opendir() call

O_LARGEFILE
Use a 64-bit offset for the file.  This is to break the 32-bit (2G) size barrier.

O_NOATIME+
The file last access time is not to be updated upon opening.  This is mainly used for performance purpose for backup and indexing programs that need to open and inspect a large number of file constantly.

O_NOCTTY
Rarely used.  The file refer to a terminal device.  This flag indicate the terminal will not become the controlling terminal even if there is no terminal currently.

O_NOFOLLOW
If the file is a symbolic link, fail the open call.  For example, opening /a/b/c, the path entries (a and b) can contain symbolic link.  Only the last file name (c)  must not be a symbolic link

C_NONBLOCK
Non-blocking open call.  This flag is only used for FIFO.

O_SYNC
Synchronous I/O - WRITE only return after data has externalized to disk.  As READ is always synchronous, this flag has no effect for READ only file.

O_TRUNC
If the file exist, a regular file and opened for WRITE, the file length will be reset to zero.  This flag is ignored for FIFO or terminal.  Used for other file type is undefined.  Use of this flag for file opened for O_RDONLY  is also undefined

File Table

Linux kernel maintain the list of opened file in a table called file table.  The index to the table is the file descriptor.  The file table entries contain information about the file such as pointer to the inode image in memory and file meta data such as access mode and file position.

Child process received a copy of the file table verbatim from its parent.  Subsequent changes to the file table will not affect the parent state.

errno

The return code (commonly -1) in C informed the operation has failed.  Specific failure condition is notified via the variable errno

extern int errno

perror prints the textual description indicated by errno.

void perror (const char *str);


The string will be printed with following colon preceding the error description message.

Another function provided by C is strerror.  This function return a pointer to the description message.  The function is not thread safe as the message buffer returned could be modified by subsequent strerror or perror call.  strerror_r is a thread safe version which accept an externally allocated buffer as argument in which to place the error description string.

errno must be set to 0 before it is used (i.e. before making call).

Process Reparenting

Process tree is rooted at the init process.  Every process in the hierarchy has a parent except the init process.    when a parent ends before its child, kernel reparent the child to the init process.

The init process routinely wait on the child processes to eliminate zombie.

UNIX Domain Socket

It is a form of socket used for communication within the local system.  It uses a special files defined in a filesystem.

Named Pipes

It is also called FIFO and is a special file for interprocess communication.  Regular pipe uses to transfer data from one application to another exists purely in memory and not on disk.  Named pipes are like regular pipes but are accessed via a file.

Linux files and directories

File or regular file is a stream of bytes.  There is no structure imposed on file like other operating systems.  The length of a file is bound by the C type used to store the file position (or offset).  The length of the file can be changed by truncating the file.  Truncation can cut the file short or make it longer.  For the latter case, zero bytes are filled in from the original EOF to the new end point.

File can be opened multiple times by different processes or event the same process.  Linux does not regulate access by different processes and is up to the processes to coordinate themselves.

A file is referenced by the inode (information node).  Inode is identified by a number which is unique within a file system only.  An inode contains meta information about file such as timestamps, owner, type, length and location on the disk.  Filename is not stored in inode.  Inode is stored physically on disk.

Directories provide the names used by user to access file.  It maps the file name to inode number.  A name-inode pair is called a link.  Link is implemented physically on disk as a table.  Conceptually, directory is also a file.

When a file is referenced, kernel walks the full pathname to find the inode of the next level of directory entry (dentry).  The kernel cache the dentry in the dentry cache to ease future lookup.

Although directory is a regular file, Linux kernel does not allow it to be manipulated by the usual set of file operations (e.g. open, read etc.).  Directory is processed by its own set of system calls.

CICS External Authentication Module (EAM)


To enable EAM

  • Set the EAMLoad attribute to yes in the /var/cics_regions/region_name/RD/RD.stanza file. 
  • Set the EAMModule attribute to the compiled output of the EAM Module Name along with the patch in the /var/cics_regions/region_name/RD/RD.stanza file. 


To enable the LDAP connection through EAM, set the following values in the CICS® region's environment file:

  • CICS_LDAP_HOST is used to specify the name of the host where the LDAP server is configured and running, for example:  CICS_LDAP_HOST=myldap.aetna.com  CICS_LDAP_PORT is used to specify the port where the LDAP server is listening for the client connections, for example: 
  • CICS_LDAP_PORT=4000.  If the CICS_LDAP_PORT environment variable is not specified in the region's environment file, the EAM assigns 389 as the default port. 


This EAM module is called whenever:

  • A user ID and password combination needs authentication 
  • A password needs changing in the external user ID and password repository 
  • A user definition that is in UD.stanza is not present for the user who is trying to log on 
  • After a successful password validation of an EAM user, EAM is called to install the user definition at CICS runtime. 


By default, CICS uses internal authentication that uses UD stanza. To use an External Authentication Manager instead of CICS, you must:


  • Install the EAM module 
  • Change the Region Definitions (RD) EAMLoad attribute to yes 
  • Use the RD EAMModule attribute to specify the EAM program path and name 
When the CICS region comes up, the EAMModule that the CICS Administrator specified is loaded into each cicsas process. When a CICS user tries to login with a user ID and password, CICS checks whether EAM is loaded. If the EAM is loaded, it passes that user ID and password to the EAM program for authentication.

Entanglement


When 2 particles interacted and they become correlated, or mathematically, their wavefunctions are intertwined and become one wavefunction in superposition, any change (e.g. collapse of wavefunction by taking a measurement or one of the wavefunction encountered a double-slit) to one of the particle will have a non-local effect to the other instantaneously, no matter how far the 2 particles are apart in distance.

Eclipse Equinox


The original plug-in architecture in Eclipse was not dynamic.  Once loaded, it will stay in memory.  OSGi framework enables the dynamic behavior.  The merge of the these 2 technology create Equinox.

The Eclipse runtime that underpin WAS is now implemented as OSGi services.  WAS also implements its components in OSGi services.  Doing so enables WAS to add and change features dynamically

WAS as an Eclipse Application


WAS is package as an Eclipse plug-in (which is equivalent to OSGi bundles).  WAS extends the extension point org.eclipse.core.runtime.applications in Eclipse plugin.xml.  Eclipse provides startup.jar to start any Eclipse application.

For WAS startup, IBM repacakage startup.jar is its own code.  The start up Java program is com.ibm.wsspi.bootstrap.WSPreLauncher in bootstrap.jar file.

This execute the Eclipse framework and pass it the name of the Eclipse application com.ibm.ws.bootstrap.WSLauncher (similar to org.eclipse.core.launcher.Main).  The launcher will read the plugin.xml file and find the extension points for org.eclipse.core.runtime.applications - in this case is com.ibm.ws.runtime.eclipse.WSStartServer which will start WAS.

- - - - -

Websphere Garbage Collection


The mark and sweep algorithm is suitable for application throughput.  The application will pause each time the GC is running.  Generational GC is good for application that creates large number of objects, uses them and destory them within a short interval.  The young objects are kept in the nursery.  A minor GC takes place regularly.  Older objects are migrated to the old generation space which a mark and sweep GC will be performed.  This method improves performance and reduce fragmentation.

The mark and sweep method will need to acquire exclusive access to JVM which means all thread activities are stopped (STW = stop the world).

In the mark phase, all live objects are marked.  All unreachable objects are considered garbaged.  The process of markin all reachable objects is called tracing.  Tracing starts off from stacks, static objects, local and global JNI references.

Parallel mark uses N-1 helper thread to trace in parallel.  N equals to the number of processor.  One application thread is used as the master coordinating agent.  Parallel marking is turned on by default and controlled by Xgcthreads parameter.  To turn off, set Xgcthreads = 1.

Concurrent mark performs the tracing concurrently with the application activities.  It ask each of the application thread to scan its stack.  Tracing is done by a low priority background thread and the application thread when it does a heap lock allocation (i.e. allocation that need to acquire an exclusive log to the heap to serialize access).  Concurrent mark reduce the GC pause and make the pause time more consistent by spreading the tracing to run concurrently with other application activities.  As the application needs to perform some tracing, it will run slightly longer and throughput will be impacted slightly.  Concurrent mark is controlled by the xgcpolciy parameter. "optthruput" disables it and "optavgpause" enables it.

When mark phase completes, the mark bit vector identifies the location of all live objects in the heap.  One bit in the mark bit vector represents 8 bytes in the heap.  To avoid filling the free pool with many small size object, only chunk with 512 bytes or more will be reclaimed.  Minimize chunk size for 64-bit platform is 768 bytes.  The chunks that are not reclaimed are called "dark matter" and they will be recovered with the adjacent object blocks when the time comes.

Parallel bitwise sweep speeds up the sweep using the same set of helper threads used for sweep.  Each helper threads will sweep an area of 256KB.

Concurrent sweep likes concurrent mark, reduces average pause time.  It shares the same mark map with the concurrent map and so these 2 activities are exclusive

Oracle Shared Pool Free Space



Comparing shared pool with buffer cache, buffer cache uses a single chuck size.  Requests to shared pool however varies in sizes.  Therefore, the management of buffer cache is relatively simplier.  To satisfy a request, the buffer cache just supplies the first item on the free list.  For shared pool, the objective is to find the chunch with the appropriate size quickly.

Oracle reserves about 5% of space from each granule (unit of allocation that make up each pool in SGA).  This is the reserved pool for large object (>4MB).  Separating the large objects from smaller ones reduce the degree of fragmentation.  Flanking the reserved pool chunk are 2 24-bytes chunks called reserved stopper.  The stopper is to help to ensure the free reserved pool will not be merged with adjacent free block.

In the extent dump, both chunks labelled recreate and freeable are free chunk.  The heap manager (which manage the shared pool) issue call to destroy the recreate chunk when it needs space.  The call will be issued to the specific SGA pool manager (e.g. Lib cache manager) which will actually carry out the destroy request.  The freeable chunks links to recreateable chunk.  When the recreateable chunk is freed, the associated freeable chunks will also be freed at the same time.  Note that there is no direct call to the Lib Manager to destroy a freeable chunk. Only call to destroying the recreateable chunk is available.

There are a large number of free lists for the shared pool because the size of space requests varies.  The first 176 lists holds chunk of increment of 4 bytes.  the next few increment by 12 bytes.  Then the next few increment by 64 bytes and so on.  If Oracle need to find space for a certain size and the best fit list does not have free space, Oracle look at the list with the next bigger size.  When a free chunk is used and the size of the free chunk is larger than the request (because there is no free chunk with exact size match), the remaining free space may be considered used or return to a free list of smaller size depending on the size differential.

LRU list in shared pool contain recreateable chunks only.  The LRU list is divided into 2 sub-lists: one list is called recurrent and the other called transient.  Recurrent list contains chunk that are used repeatedly recently (hot) and the transient list contains chunks that are not used recently (cold).  When a chunk is inserted, it is place in the head of the transient list.  When the chunk is reused, it is transferred to the head of the recurrent list.

When the freelist does not contain chunk with size large enough to satisfy the request, Oracle will go for the LRU list.  It will free some chunks which are not pinned from the transient list, transferred these chunk to the free list and check again if there is enough space for the request. If not, it will repeat this process for a few times.  After a definite time and no contiguous freespace is available, Oracle issue the 4031 error.

Oracle Cursors Sharing


Sys-recursive SQL statements are generated by oracle to query the data dictionary (system catalog) to find out information on objects and relations to interpret SQL statement.  Oracle keep some bootstrap objects in the shared pool (marked as fixed object) to allow it to start query processing.

When a SQL is passed to Oracle, it will hash the SQL and check if there is an entry in the library cache with the same hash value.  If there is a match, Oracle will then compare the SQL to the lib cached one to make sure they are indeed the same.  This is called cursor authentication.

Session cursor caching is to keep the frequently used cursors in session memory so that you do not need to search for it in the library cache.  Session cursor caching happened after the cursor authentication.

call 1 - oracle optimize the SQL
call 2 - cursor authentication and pick it up from library cache
call 3 - cache the cursor in session memory after call completed
call 4 - reuse the cursor in session memory

If someone is using a particular query, it will already be optimized in the library cache.  When you use the same query, you go straight to call 2 scenario and the cursor will cache in your session memory after call 2.

Connection State in Socket Calls


Client
- when socket is created, the connection is in CLOSED state
- when connect() is called, TCP initiates the 3-ways handshake and the status change to CONNECTING.
- when the 3-way hand shake completes, the connection status change to ESTABLISHED
- 2 possible errors
- ETIMEDOUT = TCP does not received a response from server for the handshake even with retransmission
- ECONNREFUSED = server sends a RESET packet which could means the socket server is not listening

Server
- when the socket is created, the connection is in CLOSED state
- when bind() is called, the local Iport is filled in the socket structure.  STatus is still CLOSED
- when listen() is called, status change to LISTENING
- when client request comes in, a new socket structure is allocated with the local IP address (remember the request can come in multiple interfaces of the servers) and the remote IP/port.  Status changed to CONNECTING.
- When the 3-way handshake completed, the status changed to ESTABLISHED
- when accept() is called, the new socket descriptor is returned to the caller.

Thursday, September 18, 2014

Circle of Fifth

If you enumerate the key signature, you are going through the circle of fifth in clockwise direction.  For example, starting with C and the next key is G.  G is the fifth of C.  The next key is D and so forth.

Note that if you move counter-clockwise, you have the circle of forth.  For example, C is the perfect forth of G.

tritone

Also called devil's interval\ composed of the tonic and the diminished (flat) fifth or augmented (sharp) forth.

Saturday, September 6, 2014

Scanning

TCP connect scan - complete the 3-way handshake and then tear down the connection directly.  This is the most stable scan method which will not flood or crash the target server.

SYN scan - Instead of completing the 3-way handshake, it only does the first 2 steps and then send a RST packet.  The speed is faster than the TCP Connect Scan,  The scan is also "stealth" as the 3-way handshake does not complete and so the target host is unlikely to log the connection.

UDP scan - If there is response from the scan, the port is positively identified.  As the service listening to the UDP port does not always responding to incoming packet, a null return may means the port is open or the UDP packet has dropped by firewall silently.

XMAS Tree scan - the FIN, PSH and URG flags are set on in the scan packet.  Because the packet does not contains a SYN or ACK or RST, and if the port is open, the target system would ignore the packet (i.e. no response).  If the port is not open, the system responds with a RST packet according to TCP RFC.  XMAS Tree scan is effective against UNIX and Linux but not on Windows.

NULL scan - usage is similar to the XMAS Tree scan.  The scan packet is devoid of flag (i.e. no flag set)

4 Phases of Hacking

1. Reconnaissance - to create a list of potential target IP Address
2. Scanning - to map services exposed to network
3. Exploitation - to gain access to remote services
4. Maintain Access - with backdoor and rootkits

Friday, September 5, 2014

Linux Kernel Thread


These are standard processes exist solely in kernel space.  The kernel threads do not have an address space (mm pointer is NULL).  They operates in kernel space and does not switch into user space.  Kernel threads are scheduled together with user threads.

A kernel thread can only be created by another kernel thread using

int kernel_thread(int (*fn) (void *), void * arg, unsigned long flags)

which return a pointer to the child task_struct.  The child thread exect fn with arguments passed in arg.

The child kernel thread is created using the usual clone() call with CLONE_KERNEL, CLONE_FS, CLONE_FILES and CLONE_SIGHAND.

Linux Thread


Threads are implemented as processes in Linux sharing resources.  Threads are created using clone(CLONE_FS | CLONE_VM | CLONE_FILES  CLONE_SIGHAND, 0).

Normal process creation uses fork() translates to clone(SIGCHLD, 0).  vfork called clone(CLONE_VFORK | CLONE_VM | SIGCHLD, 0)

Process Creation


Linux implements fork() using clone() system call.  The call is passed with flags which specifies what are to be shared between the parent and the child process.  The vfork() and _clone() lib call all invoke clone().  clone() invoke do_fork() which does the bulk of work.  do_fork() called copy_process() which calls

  • dup_task_struct() creates a new kernel stack, thread_info and task_struct for the new process.  The parent and child process descriptors are initially the same.  Various fields are then set or cleared.  The task state is set to TASK_UNINTERRUPTIBLE to make sure it is not run yet.
  • copy_flag() to copy the flags member of the task_struct.  It unset the PF_SUPERPRIV for task that does not run supervisor privileges.  It also set PF_FORKNOEXEC to indicate that it has not called exec() yet.
  • get_pid() to get the next available pid

Depending on the flags passed to clone(), copy_process() duplicates open files, filesystem info, signal handlers, process address space and namespace.  Then the remaining timeslice is split between the parent and the child.  copy_process returns a pointer of the child to do_fork().  The new child process is then waken up and run ahead of the parent.

vfork() has same effect as fork except the page table entries of the parent process are not copied.  The child process is not allowed to write to the address space.  The child runs in the parent address space as the sole thread.  The parent process is stopped until the child runs exec() or exits.  In do_fork(), a flag vfork_done is set to point to a specific address.  When the child exit, the vfork_done is checked and if not NULL will send a signal to the parent.

vfork() is an optimized form for 3BSD at the time when the copy-on-write is not available.

Process Context

When a process executes a system call, it transfer control to kernel.  At this point, the kernel is said as executing for the process and in process context.  The current macro is valid.

Linux Process States



  • TASk_RUNNING - process is runnable (on the run queue) but not necessarily running.
  • TASK_INTERRUPTIBLE - process is sleep waiting for some condition to happen.  The process may also be waken up by a signal.
  • TASK_UNINTERRRUPTIBLE - similar to TASK_INTERRUPTIBLE except it does not wake up by a signal.  This is used when the task must wait without interruption or expects the event will occurs quickly.  Process in this state cannot be killed.
  • TASK_ZOMBIE - process has exited but the task_struct is lefted over to wait for the parent process to call wait4().
  • TASK_STOPPED - process stops because it receives SIGSTOP, SIGTSTO, SIGTTIN OR SIGTTPU or it receives any signal while it is being debugged.


The process sstate is set using the set_task_state(task,state) function which also create a memory barrier to force ordering on other processors

Linux Process Descriptor


Process and task are used interchangeably.  The process descriptor is implemented in struct task_struct.  The task list is a double linklist of task_struct.

Prior to 2.6 kernel, task_stuct is allocated at the end (max top) of kernel stack.  This allow a quick location of the task_stuct using the stack register.  Since 2.6, the task_struct is now allocated by the slab allocator dynamically in cache.  A new struct thread_info sits at the top end of stack.  The first element in thread_info points to task_struct.  A register-impaired architecture is the only reason to create thread_info.

To look up the task_struct of the current task, kernel uses current macro.  The implementation of current macro is hardware dependent.  In x86, current is calculated by masking the lowest 13-bit of the esp stack pointer.  In PowerPC, the current pointer is stored in a register.

Linux Kernet Stack

The user mode stack has a dynamic size.  For kernel, the size is usually fixed size and small.  The size of the stack depends on hardware architecture.  Historically, kernel stack in Linux is 2 pages (8K in 32-bits and 16K in 64-bits).  The size is fixed at kernel compile time.

Using Floating Point

When an application need to use floating point function, the kernel needs to assist to switch mode.  What exactly is done depends on the hardware architecture.  Kernel will catch this request as a trap to initiate the sequence.

GNU C branch annotation


Two macro likely() and unlikely() can be specified to help the compiler to optimize code.

For example, if (unlikely(i > 10)) then ...

Uses this directives when one of the branch direction is overwhelmingly over the other.

Linux PRINTK


Linux kernel does not link with libc becuase of its size and speed.  Many of the libc functions were implemented directly in the kernel.

printf is not defined in the kernel.  Kernel uses printk which has similar syntax to printf with an additional priority flag.  printk copies the formatted string into a kernel log buffer which normally read by syslogd

Timeslices

If the timeslice duration is too long, it affects responsiveness of the process.  If it is too short, it affects the  throughput as the system have a high overhead of switching tasks (less usable time for actual processing).

Saturday, July 12, 2014

Amplifier

Amplifier generates a new voltage in the same shape as the input voltages.  The input is amplified as the output is usually higher.

The gain of the amplifier is the ratio of its output and input

Gain (A) = Vout/Vin

Amplifier essentially takes an DC source and covert it to a AC output based on the same of a smaller AC input.  Efficiency of an amplifier indicates how well this DC-AC conversion is

%Eff = (Power-out/Power-in) * 100%

Lost power is usually converted to heat.  Efficient is sometime traded off for better linearity, fidelity and lower distortion of the signal.

Another property of amplifier is frequency response.  This indicate the range of frequency that an amplifier covers.  DC amplifier covers only frequency up to a few hundred Hz.  Audio amplifier covers frequency up to 20kHz

Diodes

Diode is a semiconducting device that allow current flow in one direction only.  When used with capacitor, it can convert AC to DC, a process called rectification.

Inductor

Inductors are coil of wire.  When current flows, it generates a magnetic field which in turn develop a voltage in itself.  The voltage has opposite polarity (direction) and thus impeding to the initial current flow, making the inductor behaves like a resistor.  The opposition to current flow is called inductive reactance.  Inductor is often used to control the current flow in an AC circuit.

Capacitor

Capacitor stores charge.  It composed of 2 metallic plates with a insulation gap (ceramic, plastic, air etc) in between.  When a voltage is applied to the capacitor, it takes a finite time to charge up.  The duration of charging depends on the capacitance value of the capacitor.  The capacitor provides a DC voltage to the load when discharging.  Again, the discharge will take a period of time.

When used with AC, capacitor behave like a resistor and the opposition of current flow is called capacitive reactance.  Unlike a resistor, capacitor resistance varies depending on the frequency of the AC and the capacitance values.  In other words, the capacitor will only passes voltage of a certain frequency range and it attenuate the frequency outside the range rapidly.

Capacity can also use to block DC and allow AC to pass through.

Sunday, July 6, 2014

Electrical Signals

Analog signals are voltages that vary smoothly and continuous over time.  Analog signal can be either AC or DC.

Digital signals are pulses (on/off).

Analog-to-digital converters (ADC) and DAC converts the signals from one form to another.

Fourier theory said that any sinusoidal wave (like a square-wave digital signal) can be approximated by adding the base sine wave with its harmonics.  Usually the approximation is good enough for using 5 to 7 harmonics.

Generators

Generator at power plant is operated by a mechanical sources such as steam turbine powered by oil, coal or nuclear energy.  The turbine turns the generator which consists of metallic coil in a strong magnetic field,  The speed of rotation determine the frequency (60Hz).

Wind generator produce AC but of different varying frequency.  The electricity firstly will be convert to DC and then back to AC of the correct frequency before feeding to the electricity grid.

Car alternator is actually an AC generator.  The alternator is turned by the engine via a belt.  The AC current is converted to DC by diodes to charge the car battery.

Inverter produces AC from DC source.

Batteries

A battery is a collection of cells.  A cell generates electricity by chemical reaction.  It consists of 2 metal conductor called electrodes immersed in a solution called electrolyte.  The chemical interaction between the electrodes and the electrolyte produce a separation of charge - one electrode become positively charged and one become negatively charged..  When a conductor connects up the terminals of the cell, a current flow from the negative terminal to the positive one.  The voltage generated depends on the choice of the types of electrodes and electrolyte.

A cell is represented schematically by 2 lines.  The longer on represents positive terminal and the shorter one is the negative one.

Primary cells cannot be recharged.  When the chemical reaction stops, no charge will be produce and the battery needs to be replaced.

Alkaline - 1.5v (AA, AAA, C, D cells)
Mercuric Oxide - 1.35v (for watches and calculator)
Silver Oxide - 1.6v (for hearing aids, watches)

Secondary cells can be recharged.  In other words, the chemical reaction can be reversed when the battery is connected to an external DC voltage.  The electrodes and electrolyte can be rejuvenated.

Nickel-cadmium - 1.2v (tools)
Nickel metal hydride - 1.2v (laptop, cell phone)
Lead acid - 2.1v (cars)
Lithium ion - 4.1v (cell phone, laptop, iPod)

The amount of current produced by cell depends on the its size and the quantity of the materials use.  Large electrodes and electrolyte produce more current but voltage stays the same.  The type of material, not the volume determine the voltage.

For example, both AA and the D cell made of same material produce 1.5v cell.  The D cell can last longer because of its larger size.

When connect batteries in series, the overall voltage increases proportionally.  When connect battaries in parallel, the current increases but the voltage stays the same.

Solar cell (photovoltaic PV cell) convert light into electrical energy.  A solar cell in space satellite is often used to charge the secondary cell (Nickel-cadmium) rather than operating the equipment directly.

Fuel cell combines oxygen and hydrogen to produce voltage and current.  Heat and water is the byproducts. Fuel cell generates about 0.5v to 0.9v and so many of them is needed to produce work.

Voltage Sources

For current to flow, there must have voltage.

Direct Current (DC) is a fixed steady current flow from one direction to the other.  The voltage can be positive or negative, measure relative to a point called ground.

Alternate Current (AC) flows in two directions over the conductor.  In other words, electron flows in one direction for a moment and then reverse to flow in the opposite direction later, and so on.  The AC signals that comes into home are in the shape of sine wave.  The frequency refers to the number of reversion per second.  In the case of AC signal, it is generally at a frequency of 60Hz.

AC is represented by a symbol of circle with a varying wave in the middle.

Electromagnetic Induction

When a conductor moves in a magnetic field, a voltage will be induced in the conductor and causes current to flow.  Current is also induced if the the conductor is held still but the magnetic field moves relative to the conductor.

The direction of the current depends on the direction of the magnetic field and the relative movement of the field.  The strength of the voltage depends on the density of the flux line (how strong the magnetism field is) and the speed of the movement.  The highest induced strength is achieved when the line cut the force field in right angle.  If the wire move in the same direction as the force field, no voltage is induced.


Static and Dynamic Electricity

These are 2 types of electricity.

Static electricity refers to two charge objects.  One with an excess of electrons and one with a shortage.  A electric (force) field exists between the 2 objects.  The charge stored in the object is called static electricity.

If the 2 objects become too close or the charge build up become too great, the attractive force will caused electrons to jump across the space from one object to the other. This is called electrostatic discharge (ESD).  ESD is generally not useful as the charges are dissipated quickly in sparks or quick flow of electricity.

Dynamic electricity refers to the electricity that we know to power appliance.  This is a constant flow of electrons.  There must be voltage source that cause the electrons to flow from negative terminal to the positive terminal.  Current is measured as the number of electrons that pass through a certain point in a specific period of time.  One ampere of current equals to 1 Coulomb of charge per second.  One Coulomb of charge equals to 6.242 x 10^18 electrons.

Sunday, June 22, 2014

Virutalization and CPU Rings

Operating system runs in ring 0.  Modern processor inserts a ring -1 for the hypervisor to control access to CPU and resources.

Type-2 hypervisor sits on the top of an OS.  The hypervisor core module - VMM (VM monitor) runs as a kernel driver.  The main hypervisor and the guest OS runs in ring 1 or 3.  This is called ring compression because the using less ring (4 rings to 3 rings) by the architecture.  When a guest OS needs to do I/O, the request will be trapped by the hypervisor and pass onto the host OS via VMM.  This caused some performance penalty.

dwm

Windows Desktop Manager runs as a service - dwm session manager (internal name UxSMS - User Experience SMS).  dwm compose the desktop.  In past version, including Windwos Vista not running Aero, each application write directly to the frame buffer.  When an application windows is overlaid by another application, Windows will send a WM_PAINT message to the affected application to ask it to redraw its window on screen.  If the application is busy and does not handle the WM_PAINT message, it may result in a trial of the moving window image and this situation is called "tearing".

With dwm, each application is allocated with its own frame buffer.  dwm took these images and produce a composite image to display. dwm is also responsible to give a shade of wait over an unresponsive windows, called ghosting.

Sunday, June 8, 2014

Entanglement

When 2 particles interacted and they become correlated, or mathematically, their wavefunctions are intertwined and become one wavefunction in superposition, any change (e.g. collapse of wavefunction by taking a measurement or one of the wavefunction encountered a double-slit) to one of the particle will have a nonlocal effect to the other instantaneously, no matter how far the 2 particles are apart in distance.

*wavefunction contains a lot of information.  One of them is the probability of finding a particle in a specific location.  It is not a definite position but a "wave" spread over a space.

Sunday, May 25, 2014

Beta Decay

Some nucleus are not stable if containing too many protons and neutrons.  Lightest elements prefer to have the same number of protons and neutrons.  Heavier elements prefer more neutrons than protons.  To redress the balance, proton and neutron can transform to one and other, releasing a electron or a positron so that charge is conserved.  The process of emitting electrons or positrons is called beta decay.  The process also emits neutrino so the energy level is also conserved.


Alpha Decay

Alpha particle contains 2 protons and 2 neutrons.  Radioactive material emits alpha particle and the process is called alpha decay.  In order to unfetter from the strong nuclear forces that holds the nucleon together, the alpha particle must attain sufficient energy to get pass the Coulomb barrier.  Alpha particle could not attain such energy.  They escapes from the nucleus using quantum tunneling.  Alpha particle's wave function has increasing probability to be found outside the nucleus as time goes by.  The wave function matches the observation of half life for radioactive substances

Friday, April 18, 2014

SQLServer Protocols

SQLServer and its SQL Native Client are communicating through the protocol layer called SNI (SQLServer Network Interface).  The data packet is in MS proprietary format called TDS (Tabular Data Stream),

SQLServer communication supports 4 protocols:
(1) Shared Memory - used within the same server
(2) Named Pipe - developed for LAN environment.  Used within the server or between 2 server across LAN.
(3) TCPIP
(4) VIA (Virtual Interface Adapter) - used with compliant hardware

Thursday, April 17, 2014

Piano

Piano is a percussion instrument.  Notes are produced by hammer striking the strings with force purposely asserted by the player.  The amount of force determine the loudness of the notes.  The piano was invented in 1709 in Italy.  The full name is call piano-forte which means soft-loud.  The instrument predate piano is called harpsichord which produce note by plucking strings instead of hammering like piano.  Harpsichord player cannot alter loudness as the plucking distance is constant.

Musical Notes and Noise

Musical notes are sound in repeat pattern.  The form may not be smooth sinusoid and rugged.  In fact, almost all musical instrument produce note in its unit repeating form (timbre).  Noise on the other hand has a form that is not repeating.

When a string is pull and released, it produce the base frequency and multiple of it.  For example, an A string on guitar produce a base frequency of 110 Hz, and multiples like 220 Hz, 330 Hz ... etc.  If the string is pulled in the middle, the frequencies comprises 110/330/550/770... etc.  There will be no even multiples.  The reason is the mid point is at the highest position at only the odd multiples will fit.  The even multiple will have the mid point in the lowest position.  If the pull is at 1/3 position of the string, the frequency bundle will comprise the even multiples.

These multiples are call harmonics.  No matter the composition for the A note, the combined wave form will repeat at the base frequency of 110 Hz.  That is why all the A sounds alike.

Sunday, April 13, 2014

Chaos

Chaos is a phenomenon.  A chaotic system has a property called nonlinear dynamics which describe the cause and effect are not related in a linear fashion.  In other words, a small cause can have a complex effect which is not expected.  This is why it is called nonlinear.

Order and determinism can eventually lead to randomness.  The randomness was due to unpredictability in the systems.

Determining the Future

The French mathematician Pierre-Simon Laplace suggested an intellectual being (demon) that knows the position and motion state of every particle in the Universe, and every forces that acts on them, that being is able to predict the future and make change to it.

While theoretically possible, it cannot be done to do so because it is impossible for anyone to know with absolutely precision of the properties of every interacting atoms.

In 1886, the Sweden King offered a prize of 2500 kroner to anyone that can prove or disprove the stability of the solar systems.  In other words, will the planets remain their orbits or one of them may crash into the Sun or drifted out of the system eventually.  The French mathematician Henri Pancare tried to analyze the problem with just 3 bodies - the sun, the earth and the moon.  Pancare found that even with only 3 objects, the equations are mathematically impossible to resolve because of the sensitivity to the initial conditions.  The calculations showed complete irregularity and unpredictable result.  Pancare nonetheless won the prize.

There are the billiard game software widely available these days.  While it is possible to simulate the interaction of balls on the table, it is impossible to predict in reality.  To predict, one must know the position and movement of the white ball.  One must also know the position and movement of all other balls, and the friction of every strain of fiber on the table cloths which affect the friction, and the size and shape of every dust on the table that could steer the ball off course for a minute immeasurable amount etc.

Edward Lorenz is a mathematician and meteorologist in America.  In early 1960, Lorenz used a LGP-30 desktop computer to run a weather simulations.   At one point, he wanted to continue a simulation half way by re-entering intermediate calculated numbers from a print out.  Unexpected, the result of the simulation is different from the lat runs.  The reason was the computer calculate using 6 decimal points but the printout only contains 3 decimal points.  Lorenz noticed that the minute difference can make a big impact to the simulation result.  He coined the term butterfly effect to describe the rippling effect which based on the 1952 novel by Ray Bradbury.

Tachyon

It is a type of subatomic particles that predicted by Einstein's mathematics which travel faster than light.  The word's root is tachys which means "fast" in Greek.

GPS time correction

The pull of gravity causes clock to run slower than outer space based on relativity theory.  Satellites orbiting at 400km altitude still have 90% of gravity pull as on surface and so clock aboard runs slightly faster.  At the same time, the satellites are moving fast sideway that prevent them from falling down.  This high speed movement slows down the clock.

The reduction of gravitational pull causes the satellite clock to run faster than Earth by 45 micorsecond a day.  The movement of the satellites on the other hands slow down their clock by about 7 msec a day.  This gives a net effect of 38msec a day slower for clock in satellites.

GPS positioning works by measuring the time it takes for a signal from the device on the ground to bounce off a satellite and returns.  As each msec delay is translate to about 300 meters.  The time difference, if not correct, will cause positioning problem of 10km each day.  

Lights in Relativity

In 1905, Einstein theory was based on 2 postulates.  First, there is no way to tell if one is moving or standing still.  Imagine two rocket moving towards each other.  If there is no background object to showed their position relatively, any observer on one of these rockets will not be able to tell if he is moving, or the other, or both.

Second, light behaves like wave but it does not require a medium to transmit, contrary to sound wave.  Light reaching an observer from a source, moving or not, will travel at the same speed.  This is same as other wave like sound.  The Doppler effect affect the frequency and not the speed.

When an observer shine a light from one of the rocket, he would see the light leaving the rocket at the usual speed as if he is stationary.  The light reaches to the other rocket and the observer on board there will see the light travelling at the usual speed as well.  In other words, both measure the speed of light and nd it to be the same.

The only way light can travel at the same speed for all observers regardless of how fast they are moving with respect to each other is if they measure distances and times differently.

Muon are particles travelling at above 99% of light speed.  They are created by colliding cosmic ray with air molecules.  The lifetime of muon is about 2 microsecond.  Therefore, muon created at high altitude should not have made it to the surface of the Earth.  This is not as such.  Therefore, the muon's time is dilated.  Another observation is to the muon, as the speed is fixed and known and the time reaching the surface of the earth is fixed, the distance must be shortened.  This is called length contraction.

Relativity theory indicated that the closer one speed is to light speed, the larger the length contraction is.  For a distance of 100 light-years, if one travelling at 99% of light speed, he could make the journey in 14 years.  For one travelling at 99.99% of light speed, it only takes 1 year to reach.  If the ship is travelling at 99.9999999% of light speed, the journey only takes less than 2 days.  However, to an observer on Earth, you will take 100 years to reach.

Laws of Thermodynamics

First Law - energy can be converted from one form to another but cannot be created or destroyed

Second Law - entropy is always going down and never going up

Entropy can be view as a measurement of ability to expend energy.  Lower entropy means higher ability.  For example, a charged battery have a higher entropy than a flat one.  The second law also defines how time flows, from lower entropy to higher entropy state.

Third Law - the entropy for a pure crystal is zero when it is at absolute zero temperature

Zeroth Law - when 2 bodies are with the same temperature with the third bodies, all three of them are in thermodynamic equilibrium

Maxwell's Demon

An insulated box containing only air.  The box is divided into halves and there is a trap door in the middle.  When it opens, it allow molecules to move from one part of the box to the other.  The 2 halves started off which each air pressure and there is no temperature difference between them.

The movement of individual molecule is random.  Some molecules moves faster than the other.  When the trap door is opened, there are on average same number of molecules moving to the other side.

Maxwell's demon controls the trap door and he only allows molecules to go from left compartment to the right.  In time,  there will be more molecules in the right compartment which means higher pressure and becomes hotter.  This seems to violate the Second Law of Thermodynamics and what is needed is just information.  However, to obtain the information and making the decision will also requires energy out of the system.  Thus, the law was not broken.

Sunday, March 23, 2014

AIX iptrace and tcpdump

Both tools are used to analyze network related problem.  tcpdump captures the header information.  iptrace captures the whole packets from interface.  Unlike tcpdump, iptrace copies the packet from kernel to user space for further filtering unless -B option is used.  iptrace can monitor more than 1 interfaces.  If the number of interfaces to be monitored are high, it may result in packet drops.  The interfaces can be specified by -i option.


The iptrace command uses either the network trace kernel extension (net_xmit_trace kernel service), which is the default method, or the Berkeley Packet Filter (BPF) packet capture library to capture packets (-u flag). The iptrace command can either run as a daemon or under the System Resource Controller (SRC).

Tuesday, March 4, 2014

OSGi Service

A service in OSGi is a java object in a bundle.  The service is listed in the OSGi service registry for other services to use.  The service registry allow search for service and send notification to services when a dependent service state changed.

The recommended practice for OSGi service is to comprise a Java interface and its accompanied implementing class.

When a bundle enters ACTIVE state, the service is activated by RegiesterService() call in the start() method of the activator .  It will also search for services that it depends on.  Services can rely on the OSGi framework's listener method to obtain notification when the dependent service undergo state changes.

The strength of OSGi is that service can be upgraded without the need to restarting the JVM or other services that it interacts with.  The trick is the use of Interface and the implementing classes are packaged as 2 separate bundle.  The interacting class work with the interface bundle that never changes.  OSGi registry will ensure all consuming bundle to get a reference to the updated service the next time it is accessed.

If the interface and the implementing classes are packaged in the same bundle, update to the bundle would requires uninstalling and reinstalling the bundle and all other bundles in the dependency hierarchy.


Monday, March 3, 2014

OSGi Bundles

A bundle is a group of Java classes and a manifest file packaged in JAR format.  The MANIFAST.MF file is a text file that describe the bundle such as the version, the external packages that the bundle depends on, the package that the bundle exposed for other bundle to use and the activator class that will work with OSGi framework's life-cycle layer to manage the bundle.  OSGi run time assigns a class loader to each bundle.

A bundle has 6 states in its life cycle:
(1) Install - the bundle is validated
(2) Resolved - the bundle dependencies are resolved
(3) Starting - the bundle is in transition to the active state
(4) Active - the bundle is loaded and running
(5) Stopping - the bundle is being stopped
(6) Uninstall - the bundle is removed from the registry

Events are generated when a bundle moves from state to state.  Events can be caught by other bundles so that they can react (e.g. stop when the dependent bundle is stopped).

The strength of OSGi bundle is that it can be updated (replaced with a new version) dynamically without the need to stop the other bundles that depends on it.

Particles

In an atom, electrons are attracted to the nucleus by the electromagnetic force.  The electromagnetic force is 10^39 times stronger than the gravitational attraction between them.  The electromagnetic repulsion between proton in the nucleus is overcome by the strong nucleus force.  But when the nucleus becomes too big, the electromagnetic repulsion will have the upper hand and the nucleus will split up to smaller nuclei.

Nucleus of an atom is composed on particles called nucleons - protons and neutrons.  Nucleon is a type of hadrons (Greek word for large).  Hadrons are particles that feel the strong nuclear force.  There are 2 subgroups.  The first group called baryons which comprises proton, neutron, Lambda, Sigma, Xi and Omega.  The other group called mesons comprises pion and other more massive particles such as eta and kaon.  Hadrons are built from quarks.

There are 6 types of quarks, each with a different mass.  Three of the quarks (down/strange/bottom) carry negative charge which is 1/3 of an electron.  The other 3 (up/charm/top) carry positive charge which is 2/3 of a proton.  Each quark actually made up of 3 particles called - red, green and blue.

Proton is made up of 3 quarks - 2 up (2/3 positive charge) and 1 down (1/3 negative charge).  Neutron is made up of 2 down and 1 up quarks.

The other 4 flavor of quarks are "strange", "charm", "top" and "bottom".

The quarks have quantum properties - mass, electrical charge and spin.  Quarks also carry a color charges that is used to explain the formation of nucleons, pion and the relative masons.  Color changes is related to the strong nuclear force.

Spin property is not actual motion.  It means that the particles react to others as if it is rotating in certain way.  Particles that have an integer value of spin (0, 1, 2 etc) are called bosons.  Particles with half spin value (1/2, 3/2 etc) are called fermions

Bosons are carriers of forces.  There are 5 fundamental bosons:
  • photon (electromagnetism), spin = 1
  • gluon (strong nuclear force), spin = -1
  • Z boson (weak nuclear force), spin = -1
  • W boson (weak nuclear force), spin = -1
  • Higgs boson, spin = 0
  • graviton, spin = 2
Fermions form matter.  Quarks are fermions.  The other only elementary particles are leptons (Greek word for small).  Leptons are particles that does not feel the strong nuclear force, basically everything that is not quarks.   Leptons comprises electrons and its heavier relatives - muons and tau, and the 3 types of chargeless and lighter neutrinos - electron neutrino, muon neutrino and tau neutrino. Leptons are elementary particles that cannot be further broken down into smaller parts.

Leptons are classified into 3 generations (based on the sequence of discovery?).  Electron and the electron neutrino are called the first generation (family).  The muon and the muon neutrino are called second generation.  Tau and tau neutino are called the third generation

When a neutron decays to a proton and electron, the resulting mass is smaller than the original neutron.  The remaining mass forms into a neutrino (Italian word means the smaller neutral one).

Tachyons are particles that travels faster than the speed of light.






Sunday, February 23, 2014

Socket Address Structure

The socket API specifies a generic data type called sockaddr for used by API calls.

struct sockaddr {
    sa_family_t sa_family;  // address family e.g. AF_INET or AF_INET6
    char sa_data[14];    // address info - A blob of bits to handle diff OS and network
};

Note that this sockaddr structure is not large enough to handle a IPV6 address which is 16 bytes long.

The actual data structure used in socket call are sockaddr_in (for IPV4) and sockaddr_in6 (for IPV6).  The structure is casted with (struct sockaddr *) when used.

struct in_addr { uint32_t s_addr; }; // 4-byte IPV4 address
struct sockadr_in {
    sa_family_t sin_family;  //address family IPV4
    in_port_t sin_port;    //16-bit port
    struct in_addr sin_addr;
    char sin_zero[8];    //padding
};

Socket

It is an general abstraction through which programs send and receive data.  Different types of socket correspond to different underlying protocol suites and different stacks of protocol within the suite. 

The main types of TCPIP socket are stream socket and datagram socket.  A stream socket represents one end of the TCP connection.  It consists of an IP addressm a port number and the end to end protocol (TCP).

A socket is created by a socket call which returns a handle to the socket:

int socket(int domain, int type, int protocol)

"Domain" refers to the communication domain, recall that socket API is a generic interface for a large number of communication domains (e.g. AF_INET for IPV4 and AF_INET6 for IPV6).

HSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)

"Type" determines the semantics of the data transmission with the socket.  For example, if the transmission is reliable or message boundary is preserved etc.  Valid values are SOCK_STREAM or SOCK_DGRAM.

"Protocol" refers to the end to end protocol to be used.  Valid values are IPPROTO_TCP or IPPROTO_UDP.  A value of 0 means to use the default protocol for the "Type".

The close() call close the socket.

Special Network Addresses

(1) Loopback address is assigned to a loopback interface which is a virtual device that echoes transmitted packets back to the sender.  For IPV4, it is 127.0.0.1 and for IPV6, it is ::1.

(2) Private addresses
This group of address is for used by locations which connect to internet via NAT.  These addresses cannot be reached from the global internet.  For IPV4, they start with 10 or 192.168 or 172.16-31.  There is no correspondence for IPV6.

(3) Link Local or Autoconfiguration addresses
These addresses can only be used to communicate with hosts on the same network.  Routers will not forward these addresses.  For IPV4, it is 169.254.  For IPV6, it is start with FE80, FE90, FEA0 and FEB0.

(4) Multicast addresses
For IPV4, it is 224. to 239.  For IPV6, it start with FF.

JVM Architecture

It has a stack based architecture without registers.  This allows JVM to run the same code regardless of underlining hardware.  Real hardware machines differs in number and size of registers and how they relate to memory.  The only register like structure is the program counter.  Result of method call is returned on stack.

Mutex

It is referred as Mutants when in the kernel.  Mutexes are global objects for syncronizing execution.  Mutex names are usually hard-coded because the name must be consistent if it is used by 2 processes or threads.  Only one thread can own a mutex at any one time.  Thread gains access to mutex using WaitForSingleObject.  ReleaseMutex call release the mutex after use.  CreateMutex function creates a mutex.  The other thread uses OpenMutex to obtain a handle to the mutex before using it. 

First and Second Chance Exceptions

Debuggers are given 2 chances to handle an exception of the program being debugged.  When an exception occurs, the execution of the program will stop and the debugger is given a first chance to handle the exception.  The debugger can handle it or choose to pass it on to the program.  In the latter, the program registered exception handler will be given control.

If the program does not handle the exception, the debugger is given a second chance to handle the exception.  If there is no debugger attached, the program will usually crash at this point.  The debugger must resolve the exception to enable the program to continue to run.

Break Points

Software breakpoints are implemented by overwriting the instruction at the break location with 0xCC which is a INT 3 instruction.  This allows control passed to the debugger when execution reach that point.  The debugger will show the instruction before patching but if one inspect the memory, the value has changed to INT 3.

Software breakpoints may not work when a code is self modifying (e.g. malware).  In this case, the patch may be overwritten and the breakpoint will not be effective

Hardware breakpoints are assisted by hardware.  For each instruction being executed, hardware will compare the address with the special register to determine if a breakpoint is reached.  One major drawback is that there are only 4 debug register in x86.  DR0 to DR3 store the addresses of breakpoints.  DR7 is the control register which indicates if any of the DR0-3 is active and if the address represent a read, write or execute breakpoint.  Read/write breakpoint allow the program to break out when an address is referenced.

To protect the DR from modified by malware, set the General Detect flag in DR7.  It will break prior to any mov instruction that modify the DR0-3.

Conditional breakpoint breaks when certain predefined condition is reached.  For example, break when the second parameter of a function is of a particular value.  This facilitates debugging to stop frequently executed point only on condition of interest.  Conditional breakpoints are implemented as software breakpoints

Stack Layout

ESP points to the top of the stack.  EBP is usually not change during the call to provide a reference point to access local variable using offset.

(1) arguments was pushed onto the stack first
(2) Next is the return address is pushed automatically because of the CALL instruction
(3) The old EBP is pushed next
(4) Lastly the local variable is allocated

pusha and pushad push a set of 16- and 32-bit registers onto the stack - EAX, EBX, ECX, EDX, EBP, ESP, ESI and EDI.

ESP always points to the top element in the stack.

NOP (Intel)

Actually a XCHG EAX,EAX instruction. Opcode is 0x90.  NOP is commonly seen in buffer overflow hack when the exact code address can only be approximate.  So lacing a series of NOP allow the code jump to complete

Windows Thread

Threads share the address space of the process.  Each thread has its own stack and registers.  When OS switches thread, the CPU context is stored in a structure called thread context.

CreateThread fucntion create a new thread.  The function call specify a start address of the program to be executed.  If the start address is LoadLibrary call, the DLLMain will be executed after the DLL is loaded

Windows Network API

Berkeley Compatible Sockets function similar to UNIx.  It is implemented in the Winsock libraries, primarily in ws2_32.dll.  Common socket functions:

  • socket - create a socket
  • bind - attach a socket to a port
  • listen - start a socket to listen to a port
  • accept - open a connection to a remote socket and accept the connection
  • connect - open a connection to a remote socket which is waiting for a connection
  • recv - receive data
  • send - send data

Prior to use these function, the WSAStartup function must be call to load the network library and allocate resources.

WinINet is a higher level API which implement HTTP and FTP protocols.  It is implemented in Wininet.dll.
  • InternetOpen - initialize a connect to Internet
  • InternetOpen Url - open a connection to HTTP or FTP site
  • InternetReadFile - retrieve a file from the site

reg File

File with reg suffix is a readable text file.  When user double-click the reg file, the content will be automatically merge with the registry.  For example, the following add a program to run automatically when Windows starts:

Windows REgistry Editor Version x.xx

[HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run]
"abcvalue"="C:\abc.exe"

Alternate Data Stream (ADS)

It is a feature allows additional data to be added to existing file in NTFS, essentially adding one file to another.  The extra data does not show up in DIR command listing.  It is not visible when the file is browsed or edited.  Program can access the stream via the name file.txt:Stream:$DATA

Long Pointer (LP)

Strings are usually named as lp (e.g lpStr1) as they really point to memory location where the strings start.  LP is 32-bit.  P (pointer) is same as LP in 32-bit systems.  They only make a difference in 16-bit system.

Windows Handles

Like pointers, handle refer to object or memory location.  However, handles cannot be used in arithmatic operations and they do not always represent memory addresses.  They can only be used in function calls to refer to the same objects

Oracle Network Architectu​re

Application layer are implemented by OCI (Oracle Call Interface) in client and OPI (Oracle Program Interface) in server side.

Presentation layer protocol is call Two-Task Common (TTC) and is responsible for character set and data type conversion between client and server.

Session layer and network layer are implemented by Net8/Net9 and SQL*Net before that.  The Net8 protocol has 2 components - Net Foundation and Protocol Support.  Protocol Support further breaks down into 2 layers - Routing/Naming/Auth and TNS (Transport Network Substrate). 

The role of TNS is to select the Oracle Protocol Adapter which wrap around one of the support transport protocol – TCPIP, name pipes and SDP (Socket Direct Protocol) for Infiband network.

TNS Data Packet Structure

Byte 0-7 is the header

Byte 8-9 is the data flag.  0x0040 indicates a disconnect packet.  0x0000 indicates normal data.

Byte 10 determines what is in the data packet

Type
Description
0x01
Protocol negotiation.  The client sent to the server the protocol versions acceptable (e.g. 6, 5, 4, 3, 2, 1, 0).  The server will response with the common version and other information such as character set, version string and server flags
0x02
Data type representation exchange
0x03
Two-Task Interface (TTI) function call
0x02 Open
0x03 Query
0x04 Execute
0x05 Fetch
0x08 Close
0x09 Disconnect/logoff
0x0C Autocommit ON
0x0D Autocommit OFF
0x0E Commit
0x0F Rollback
0x14 Cancel
0x2B Describe
0x30 Start up
0x31 Shutdown
0x3B Version (will be called before authentication)
0x43 K2 Transactions
0x47 Query
0x4A OSQL7
0x51 Logon (present password)
0x52 Logon (present userid)
0x5C OKOD
0x5E Query
0x60 LOB Operations
0x62 ODNY
0x67 Transaction end
0x68 Transaction begin
0x69 OCCA
0x6D Start up
0x73 Logon (present password – send AUTH_PASSWORD)
0x76 Logon (present username – request AUTH_SESSKEY)
0x77 Describe
0x7F OOTCM
0x8B OKPPC
0x08
Indicate OK – send from server
0x11
Extended TTI functions
0x6B Switch or detach session
0x78 Close
0x87 OSCID
0x9A OKEYVAL
0x20
Used when calling external procedures with service registration
0x44
ditto