Monday, April 29, 2013

Microphone Characteristics

Dynamic range is the range of sound intensity a mic can provide to the recording device.  A small dynamic range means a limited range of amplitude levels relative to the noise floor.  For an empty concert hall, the nosie floor is around 50 dB SPL.  Noise floor is the point at which the softest sound can be registered as a useable signal.  Any sound below the noise floor cannot be heard.

Frequency response measures how the mic translate SPL (Sound Pressure Level) into audio signal at different frequencies.  An ideal frequency response is flat meaning the mic can capture sound with different frequencies into equal amplitude level.  Some mic are designed to respond to certain frequencies based on their needs.

Omnidirectional mic responds to sound pressure from all angles.  Condenser mic are typically omnidirectional.  A directional mic responds to sound pressire from a particular angle.  Cardoid is the most common response pattern.  It is named as the pattern is heart shaped.  Both dynamic and condenser mic exhibit this pattern.  Hypercardoid is more directional.  Another name is called mini-shotgun.  It is used if it needs to keep a distance from the source.  Supercardoid, or shotgun, is highly directional.


Sunday, April 28, 2013

Microphones

Any device that converts one form of energy into another is called a transducer.  A microphone is a transducer and so does a loud speaker.

According to the theory of electro-magnetic induction, a metal suspended in a flux field of magnet will produce a current of certain direction and magnitude within the metal.

The most commonly microphone used is dynamic microphone.  They are extremelyt durable and less expensive.  They are commonly used in live performance and concerts.  The mic is constructed based on a diaphragm connected to a coil of metal floating in the flux planes of a magnet.  When the diaphargm vibrates to the sound pressure, the coil moves and sending an electrical current through the coil connected to an output line.

Dynamic mic contains rather heavy magnets which makes it durable.  However, weight of component also limits its frequency response.  High frequencies require a diaphragm to move very quickly but the response of the heavy component is slower, thus antenuattimng higher frequencies.

Condenser mic, on the other hand, is not based on magnetic but can generate a voltage.  The voltage however has no power behind it.  The design is based on the movement of electrons and the open-air capacitor.  Behind the diaphragm, there is a conductive back plane separated by a small pocket of air.  This forms a capacitor.  A current is sending through the plane.  When the diaphragm vibrate, closing up and opening up the gap between the diaphragm and the plane, it varies the amount of current through, thus generating a signal.  Condensor mic requires an external power source, known as phantom power (48 volts).  The power can be supplied by battery.  Condenser mic is delicate and can be damaged when falls.

Ribbon mic are least used but in radio broadcast.  It uses same principle as the dynamic mic, wherein a thin ribbon of corrugated aluminium is located between 2 strong magnets.  It generate a current but typically not strong enough.  Instead of using phantom power, ribbon mic contains a built-in transformer to boost the level up.  It is like a mic with a signla booster (pre-amp) built in.  Ribbon mic are famous to have a warm sound, which lends well with voice.  The microphone is fragile and heavy.  They are also very expensive.

Wave

Sound move in the form of longitudinal wave.  Analysis of this waveform is complex.  A simpler visualization is to use a transverse wave.  When throwing a rock into water, it creates ripples.  When looked at from above, the ripples propagate outwards in the form of longitudinal wave.  When looked at form the side (like a cross section), we see the transverse waveform.  The upper part of a transverse wave represents the greatest point of compression, while the lowest point represents the rarefaction.  The mid point is the position of molecule which it is not vibrating, is called the standard reference level.

A sinusoidal wave represents simple harmonic motion (SHM).  A sine waveform result from mass vibration is the simplest and most economical way because it only contains a single frequency and has no harmonic content.  Otherwise, the wave is called a complex periodic waveform.  A waveform without pattern is called a random waveforms.

Frequency = speed of sound/wave length

The range of frequency human can hear is between 20 to 20kHz.  Sound below the lower limit of hearing is called subsonic, whereas above the limit is called ultrasonic.  Cat can hear between 45Hz to 85kHz.  Bat and dolphin can hear up to 120kHz.

Music occupies about 1/4 of the range of hearing.  The fundamental tone in music is that which you hear most prominently when an instrument is played.  It occupies about 50% of the total sound heard.  Some example of the frequency range of musical instruments:

violin = 200Hz to 3.5kHz
viola = 124Hz to 1kHz
Cello = 63Hz to 630Jz
Double Bass = 40Hz to 200Hz
Guitar = 80Hz to 630Hz
Piano = 28Hz to 4.1kHz

When hearing a periodic wave, we are actually hearing a complex averaging of the waveform's peak to peak values.  The root mean square (RMS) ks the average level of a waveform over time.  For a sine wave, RMS = 0.707*peak values.

Unlike frequency, amplitude cannot be measured without a reference value.  Decibel is a logarithmic unit representing a ratio.  Intensity level of a sound is measured as the energy transmitted per unit time and area of a sound wave.  The greater the amplitude of a vibration, the greater the energy transmitted.

I = P/S of which P = Power (energy) and S = area covered

The loudest sound one hear is about 1 W/m2, which is a trillion times more energy than the softest sound (1*10-12 W/m2).  These values are very awkward to use and so decibels are used.  Another reason is that we hear sound intensity logarithmically.

Decibel is one tenth of a Bel (derived from Alexander Graham Bell).  Bel is a ration of 10 to 1 between 2 numbers.  The amount of energy between 1 Bel and 2 Bel is 10 times.  The standard ratio of hearing is 0 dB SPL (Sound Pressure Level).  10 dB SPL is 10 times louder than 0 dB SPL.  20 dB is 100 times louder than 0 dB.

When we walk away from a sound, the loudness decrease following the inverse square law.

When 2 sounds of difference frequencies (e.g. 100 Hz and 105 Hz) are produced at the same time, they produce a pulsation effect, call beats.  The number of beats = f1 - f2.  When the difference between 2 frequencies is greater than 30 Hz to 40 H, the beat phenomenons ceases to exists.  In its place is the existence of the simultaneous sounding 2 frequencies known as interval in music.

Sound

Sound is an aural pecrception of vibration.  There are 2 types of sound.  Noise is sound that is not organized or harmonized.  Music is organized and intentional.

A sound is produced when an object is set in motion by conversion of mechanical energy into acoustic energy.  The acoustic energy is in a form of pressure waves in the medium (e.g. surrounding air).  The disturbances in the air are known as compressions and rarefactions.  These forms of compression and rarefaction occurs around the source and move away in all directions.  As a result, the wave propagate outwards.  The air molecules does not move with the wave, thet just dislodged from their current locations.  The form of acoustical energy transmission is respresented by a longitudinal waveform.

The scientific study of sound perception is called psychoacoustics.  It is not concerned with how sounds produce a particular emotional or cognitive response, which is in the area of psychology.  Psychological perception of sound is on 2 categories - pitch and loudness.  This is equivalent to 2 properties of sound - frequency and amplitude.  Frequency measures the rate of repetition and amplitude measures the strength of air pressure produced.

The psychological measurement of the magnitude of sound include its frequency, pressure, harmonics, duration and surface properties within the sound space.

Saturday, April 13, 2013

Signals

SIGABRT - sent by abort() to its calling process.  The process terminates and generates a core dump.  assert() call abort() when the condition fails.

SIGALRM - sent by alarm() and setitimer() when the period has lapsed to the calling process.

SIGBUS - rasied by kernel when the process incurs a hardware fault other than memory protection, usually a irrecoverable errors such as unaligned memory access.

SIGCHLD - sent to the parent process when a process ends.  Parent process issues a wait().

SIGCONT - sent to the process that resumed from stop.  Usually caught by terminal or editor use to refresh screen.

SIGFPE - cover not just floating point exception but all arithmetic exception

SIGHUP - kernel sends to the session leader when the terminal disconnects.  The kernel also send to all foreground processes when the session leader terminates.  The default action is to terminate.  This signal means the user has logged out.  Daemon overloads this signal to instruct them to reload its configuration.  As daemon has no control terminal, it should never receive this signal from other sources.

SIGILL - sent when process execute an illegal instruction.  Process can catch this signal but the behaviour is undefined.

SIGINT - sent to all foreground processes when user presses the interrupt key (CTL-C).  This allow the processes to clean up before terminating.

SIGIO - BSD style asynchronous I/O event

SIGKILL - sent from the kill() system call.  It cannot be caught or ignored.

SIGPIPE - If a process write to a queue but the reader has terminated, kernel raised this signal.

SIGPROF - raised by setitimer() with the ITIMER_PROF flag when the profile timer expires.

SIGPWR - system dependent.  A UPS monitoring process sends this signal to init when the the battery level is low to allow the system to shut down orderly.

SIGQUIT - sent to all foreground processes when user presses the quit key (CTL-\)

SIGSEGV - sent when process access an invalid memory address (segmentation violation)

SIGSTOP - sent by kill() system call.  This cannot be caught or ignored.  The process is unconditionally stopped.

SIGSYS - process executes an illegal system call.  For example, code compiled with newer version of OS runs on an older version.

SIGTERM - sent by kill().  Allows a process to catch it to initiate an oerderly termination.

SIGTRAP - sent when process cross a breakpoint, generally caught by debugger and ignored by most other processes.

SIGTSTP - sent by kernel to foreground process when user press suspend key (CTL-Z)

SIGTTIN/SIGTTOU - sent to a background process when it attempts to read from/write to control terminal.

SIGURG - kernel sends to process when an out-of-band data arrived at a socket

SIGURS1/2 - used solely by user processes.  Common use is to instruct daemon to change behaviour

SIGVTALRM - raised by setitimer() when timer created with ITIMER_VIRTUAL flag expires

SIGWINCH - sent by kernel to all foreground processes when the terminal window size changes

SIGXCPU/SIGXFSZ - riased by kernel when the CPU and file size limit reached.


Signal Handling

Ignore

No action is taken.  Two signals cannot be ignored - SIGKILL and SIGSTOP to allow SA to be able to kill or stop all processes.  Otherwise, there will be processes that is unstoppable.

Catch and handle

The kernel suspend the execution of the process's current code path and jump to the signal handler registered.  Execution will continue once the handler ends.  SIGINT and SIGTERM are 2 commonly caught signal.  SIGINT allows the shell process to return to the prompt.  SIGTERM allow the process to clean up for a orderly terminating.

Perform default action

Take the defaul action usually means terminating the process

Anonymous Memory Mapping

Large memory allocation request will not be satisfied using heap.  Kernel allocates an anonymous memory mapping for this type of request.  Anonymous memory mapping is like file-based memory mapping except it is not backed by any file, thus the name.  It is just a large piece of zero-filled memory area (in multiple of page size) ready for use.

Anonymous memory mapping uses mmapp call with special flag MAP_ANONYMOUS.  The fd parameter is ignored.  In BSD without the flag, anonymous memory mapping is implemented by mapping /dev/null with copy-on-write pages.

brk

Older UNIX has its stack and heap in the same data segment.  Heap grows upward from the segment and stack grows downward.  The line demarcating the two was called the break or break point.  In modern UNIX where data segment is its own memory mapping, the end address of the mapping continue to be called break.

A call to brk function set the end address of the segment.  sbrk increment the end of data segment by amount which can be +ve or -ve.

Device Node

Device nodes are special files to allow interaction with device driver.  Kernel will hand over the I/O calls (e.g. read) to the driver instead of to file.  The driver handles the request and returns result to the caller.  This abstraction allows user to use familiar I/O call to interact with drivers.

Each device node is assigned a major number and a minor number.  The major and minor numbers identify the device driver loaded in memory.  If the numbers cannot be matched, system returns ENODEV as the device cannot be found.

Special device nodes are:
/dev/null (1,3) - read returns EOF, writes are discarded
/dev/zero (1,5) - read returns \0, writes are discarded
/dev/full (1,7) - read returns \0, write returns ENOSPC indicating the device is full
/dev/random (1,8) - random number generator.  An entropy pool is generated by hashing noise collected from driver and other sources.  Read returns from entropy pool.  The result is suitable for seeding process like keygen as it is cryptographicall strong.  Kernel monitors the amount of entropy in the pool.  If it reaches zero, read will be blocked.  This scenario could happen in diskless station which have little or no I/O activities.
/dev/urandom (1,9) - a lower grade version of /dev/random.  Read will be successful even if the entropy pool is depleted.

Normal I/O call cannot represent all functions of device e.g. set baud rate.  ioctl (I/O control) is used for such out of band communication with the device.

int ioctl (int fd, int request, ...)

The request is a code known to kernel representing the command to the driver.

Saturday, April 6, 2013

Standard I/O Locking

stdio is inherently thread-safe.  Each opened stream is associated with a lock, a lock count and an owning thread.  Thread must acquires the lock to become the owning thread before issuing any I/O call.

Still, it may need to lock the file to allow multiple I/O calls to complete in a thread.  flockfile() waits until the stream is no longer locked and then acquire the lock, increase the lock count and become the owning thread.

funlockfile() release the lock after finishing up the I/O calls.

ftrylockfile() is a non-blocking version of flockfile.

Using these calls, programmer can control the locking and can work with a set of I/O calls in standard library which does not check for locks and thus increases performance (e.g. fgetc_unlocked, fgets_unlocked, fwrite_unlocked).

Standard I/O Buffering

Standard I/O implement 3 types of user buffering for different situations. They are set by setvbuf call:

(1) unbuffered (_IONBF) - no user buffering. Data is directly submitted to kernel, This option is seldom used. Example is stderr.

(2) line-buffered (_IOLBF) - buffering performed on per line basis. Data is submitted to kernel at \n reached. This type is suitable for line oriented stream like terminal (stdout)

(3) block-buffered (_IOFBF) - ideal for file. Standard I/O uses the term full buffering.

C Standard I/O Library

This refer to buffering in user space performed by application or standard library.  The C language does not provide any advanced I/O function.  In turn, the standard C library (stdio) provides a platform independently user buffering solution.  As buffering is maintained in user space rather than kernel space, there is a performance improvement. Standard I/O calls are not system calls.

The standard I/O routines use file pointer instead of file descriptor.  Inside C library, file pointer is mapped to file descriptor.  File pointer points to FILE typedef.

e.g. FILE * fopen(const char *path, const char *mode)

Mode includes
r = read
w = write
a = append
r+ = read and write, position at the start of file
w+ = read and write, truncate the file to size 0, positon at start of file
a+ = read and write, create file if does not exist, position at end of file

Other stdio routines include

fdopen - open using fd

fgetc/fputc - read/write a character from stream

ungetc - put a read character back to stream.  If multiple characters are unget, they are read in reverse order.  In other words, the last ungetc char will be returned first.  POSIX allows only 1 push back.  If a seek is performed before read, all pushed back characters will be lost.

fgets/fputs - read/write a string.  For read, a \0 character will be place at the end of the buffer.  Reading stop at EOF or a newline character is reached.  Newline \n is stored in the provided buffer

fread/fwrite - read/write specified number of elements (structures) from file.  This is reading the file as binary data.

fseek - seek to a particular position in the file
fsetpos - similar to seek.  This function is provided mainly for non-UNIX platform with have complex type representating stream positon.

rewind - reset the sream position to start of file

ftell - return the current stream position
fgetpos - pair with fsetpos above

fflush - write data from buffer to kernel space.  No gurarantee that the data are flushed to disk.  Issue fsync() after the flush to ensure data are written to disk.

fileno - obtain fd of a stream

Page Cache

Page cache exploite temporal locality which means a thing access recently is highly probable that it will be accessed again.  When free page runs out, the least used page will be pruned from the cache.  Sometime, it is more effective to swap out a chunk of seldomly used data instead of pruning the cache.  The hueristics to balance between swappig and paging is controlled via /proc/sys/vm/swappiness.

Write back of dirty pages to disk is carried out by a group of kernel threads called pdflush.  They are woken up when the number of free pages falls below a threshold or the age of dirty pages reaches a threshold.  Multiple pdflush are instantiated concurrently to take advantage of multi-processors and also for congestion avoidance, which prevent write from being backed up while writing to a single device. 

Multiplex I/O

Multiplex I/O allows application to block on multiple file descriptors and be notified when any one of them is ready for read or write.

select() sustem call implements synchronous multiplex I/O.  The call is passed with 3 watched file descriptor sets, with one of them for read, one for write and one for exception.  The set is ignore if NULL is passed.

When returns, the sets are modified to contain only the fd which is ready for I/O.  For example, if fd x and y is placed in the readfds when calling select(), and x is returned with the readfds, it means x is ready for reading without blocking and y is not.

select() accepts a parameter to indicate the amount of time to block before returning even if no fd is ready for I/O.

The watched fds list is manipluated by macro FD_SET and FD_CLR.  FD_ISSET is used to test if a particular fd is in the set and is used after the select() call.

Because select() has historically available in most UNIX comparing to other mechanism for subsecond resolution sleeping, it is used as a portablewya to sleep by providing an non-NULL time out but NULL for all 3 watched file sets.

pselect() system call is introduced in 4.2BSD and also adopted by POSIX.  There are 3 differences between pselect and select:

(1) pselect() uses the timespec structure instead of timeval structure. for its timeout parameter.  Timespec uses sec and nanosec and not microsec.  In practice, neither call provide reliable nanosec resolution.

(2) call to pselect() does not modify the timespec paramter and thus it does not need to reinitiatlized like timeval.

(3) pselect has an additional parameter sigmask.  pselect() with NULL as sigmask is same as select().  This parameter is intended to solve a race condition between waiting for the fd firing and signals.

poll() is a SystemV solution which solve a few deficiency of select().  The call uses a pollfd structure with each describe a file and a bitmask of events to look out for. the revent filed in the structure return the events that have fired (e.g. POLLIN for data to read, POLLPRI for reading urgent data, POLLOUT for writing, POLLWRBAND for writing priority data, POLLMSG for a SIGPOLL message is available)

POLLIN | POLLPRI is equivalent to select() read event.  POLLOUT | POLLWRBAND is equivalent to select() write event.  POLLIN is equivalent to POLLRDNORM + POLLRDBAND.  Linux provide a ppoll interface similar to pselect but ppoll is not a POSIX standard.

Comparing poll to select:
(1) poll does not need the programmer to specify the number of fd contained in the watched list
(2) poll is more efficient when monitoring a long list of fd because it passes in individual fd structure rather than a possibly sparse bitmask.
(3) select is more portable as it is available in most systems

Seeking in File

lseek() is used to set the filepos of a file.  Seeking past end of file is allowed.  Read will return EOF and write will cause data to be written at the position.  The range between the old EOF to the filepos is filled with zeros logically but not physically.  In other words, the actual file size is smaller than what it is recorded.  Performance is enhanced as the hole will not initiate any real I/O.  The file is called a sparse file.

In lieu of lseek, Linux also provides pread and pwrite system calls.  p stands for positional.  Semantically, p-call is similar to a lseek follow by read/write.  Differences are (1) they do not change the file pointer upon completion and (2) they avoid potential race condition with using lseek, as threads share the same fole pointer.

Closing Files

Close unmap the file decriptor from the associated file.  Closing file does not mean the data will be flushed to disk.  Always check for errno after close becuase error occurs in deferred operatons will be reported when close() is called.

Direct I/O

UNIX implements a layer of buffering or caching between the application and the device.  A high-performance application (e.g. database) may want to bypass this layer of complexity and manages its own I/O.  O_DIRECT specifies the I/O is done directly from user space buffers to the device, bypassing the page cache.  All I/O is synchronous.  The request length, buffer alignment, file offsets must all be integer multiples of the underlying device's sector size.

Read and Write System Calls

read() can return a few possible scenarios:

(1) call returns a value equal to len and the data is stored in buf
(2) call returns a value less than len by >0.  This happens because the read is interrupted by a signal midway, or an error has occurred during read, or there is less data avaliable than len bytes, or EOF is reached.  Issue read() again with the remaining len value can complete the call or detect the cause of the problem.
(3) call returns 0 (or EOF)
(4) call is blocked because there is no data available for read
(5) call returns -1 and errno equals to EINTR means the call is disrupted before any byte is read.   If the errno equal to EAGAIN, there is no data to read and the read() call is operated in non-blocking mode.  Issue the call again.
(6) call returns -1 with other errno values indicated a more severe problem has happen

write() is less likely to return a partial write than a read().  For regular files, write() is guaranteed to perform the entire requested write unless an error occurs.  For other type (e.g. socket), partial write may be possible and it can be re-issue when the write is incomplete.

Using O_APPEND mode ensure file corrupted by 2 racing processing competing for write.  If the file is not open using O_APPEND, the write will occurs at the filepos for each processes.  O_APPEND ensure the write always occurs at the end of the file.  This mode is useful for log files but less sensible for other type.
EPIPE indicates that the reading end of a pipe has closed.  The process will also receive a SIGPIPE signal, with default action to terminate the process.  The process intends to handle this errno must ignore, block or handle this signal.

When a write() returns, the kernel has copied the data from the supplied buffer into a kernel buffer.  There is no guarantee that the data will be sent to the disk.  The kernel will batch the dirty buffer and write to disk later.  This delayed write behaviour also means the write order is not preserved.  Another problem is that write error may not be reported immediately as the actual write occurs later and asynchronously with the actual system call.  To mitigate the risk of deferred write, kernel institute a maximum buffer age to write out all dirty pages when it is reached.  This is configured via /proc/sys/vm/dirty_expire_centiseconds.

fsybc() ensures all dirtt data associated with a file (mapped by fd) is written to disk.  The call writes back both data and metadata (e.g. creation timestam and other attributes in an inode).  It will returns when the disk acknowledged the data externalization has completed.

fdatasync() writes data only.  Neither call guarabtees that any updated directory entries containing the file are synchronously to disk.  To ensure this, fsync() must be called against the fd representing the directory itself.

sync() wrties out all buffers to disk.  Both data and metadata are written out.  Sync returns before all buffers are written out.  It just initiates the action.  So processes may invoke the call multiple time to ensure all buffer is committed to disk.  For Linus, sync() returns after all buffers are written out.  sync() may take some time in a busy system.

Open System Call

open() maps a filename to a file description.  Some flags for open:

O_ASYNC - SIGIO signal will be generated when the file is ready for read or write.  This flag is available only for terminal and socket

O_DIRECT - file to be opened for Direct I/O (i.e. no kernel buffering)

O_DIRECTORY - If the file to be opened is not a directory, the call will fail

O_LARGEFILE - use 64-bit offsets

O_NOCTTY - if the file is a termianl, it will not become the control terminal of the process

O_NOFOLLOW - if the file is a symbolic link, the call will fail

O_NONBLOCK - open using non-blocking mode

O_CREAT - create the file is it does not exist.  When use with O_EXCL, the call will fail if the file exist.  This combination is used to prevent race condition.

O_SYNC - file to be opened as synchronous mode.  write() will only return until data and metadata are written to disk.  read() is always synchronous and so this flag has no effect.  All I/O write time is incurred to the process.  The increase in cost is huge.

O_DSYNC (POSIX) - specifies that only data is synchronized and not metadata.

O_RSYNC (POSIX) - specifies the synchronization of read requests as well as write.  It must be used with either O_SYNC or O_DSYNC.  As read is always synchronized, this flag ensure the side effect of read is also synchronized (for example, file access time is updated befire read returns).

creat() is equivalent to open() with flags O_WRONLY | O_CREAT | O_TRUNC