Saturday, August 22, 2009

SLIP and PPP

SLIP (Serial line IP) is a link level protocol to carry IP over serial line (e.g. modem, RC232 interace). Each IP datagram is terminated by END character. The END character will be ESCAPEd if presents in the datagram. There is no checksum in SLIP and error detection and handling is assumed to be done in the upper protocol layer.

As serial interface speed is typically slow, CSLIP is used to optimize throughput by reducing the 20+20 TCP+IP header to 3 or 5 bytes. This is done by maintaining state information (fields that rarely changed) in the CSLIP protocol thus removing the need for such information to be present in the normal TCPIP header. CSLIP can maintain up to 16 connections. The smaller header size improve response time for interactive session on serial line.

PPP (Point-to-point Protocol) improves on SLIP by adding LCP (Link Control Protocol) and NCP (Network Control Protocol) capability. LCP allow the both ends to negotiate options (e.g. IP address negitiation for both ends). NCP allows PPP to support more than 1 network protocol (i.e. no just IP) on one serial line. Finally, PPP included checksum.

Sunday, August 9, 2009

Challenges to Pipleining and Superscalar

Data Hazard refers to the use of related data in 2 instruction that prevent them from executing simultaneously. For example, the output of the one instruction is used as an input to the next instruction. Pipelined processors use "forwarding" to resolve this issue. Output port of the ALU is fed into the input port directly and bypassing the register-file write stage. Superscalar processor uses "register renaming" to decouple instructions using the same register in the calculation. For example, the following 2 instruction can be executed simultanously using register renaming technique.

Add A, B, C; add a and b and store result in c
Add D, B, A; add d and b and store result in a

Structure Hazard refer to the shortage of resources to execute multiple instruction simultaneously. In a superscalar design, it takes a large number of wire to connect each ALU to the register. Hence, CPU registers are grouped into a special unit called register file. Register files are like memory array which consists a data bus and 2 ports - read and write ports. for example, ALU accesses the register file's read port and requests the data to be placed on the bus. A single read port allows the ALU to access a signle registr at a time. Therefore, for 3 operand instruction like the above requires 2 read port and 1 write port. Modern CPU also uses separate regiester files to store integer, floating-point and vector numbers as each of them uses separate execution units. Another reason for this separation is to keep the register file size small. The large the register file, the slower the access will be.

Control (Branch) Hazard arises when the processor arrives at a conditional branch instruction. Branch prediction is used to get around this type of stall. Instruction cache is used to improve the performance for loading the next instruction from a branch.

ISA

In 1960, IBM S/360 introduced the concept of ISA as a layer of abstraction to the underlining CPU hardware microarchitecture. Programs written on an ISA are guaranteed to run on any CPU that implement the ISA. ISA provides a standardized way to expose the features of a system's hardware that allows manufactures to enhance the implementation without breaking programs. ISA is implemented using microcode engine, wich consists of some storage, microcode ROM which holds the microcode programs, and an execution unit that translate the standard instruction to the ones specific to the hardware implementaiton.

The drawback of microcode engine is it is slower than direct decoding. (Modern microcode engine has approached 99% of the speed.) However, the benefit of abstraction is so signifcant that outweight this slight penalty.

Instruction Flow

Execution of instruction takes multiple stages. Generally, there are 4 basic stages - (1) fetch (from memory), (2) decode, (3) execute and (4) write (back result). Contemporate CPU further break down these stages and enhanced them for performance improvement. Discrete logics are used to implement these stages and form a pipeline. This allow the processing of mulitple instructions simultaneously.

Superscalar

As the number of transistors increases, chip designer could afford to put more than 1 ALU on a single chip. As the design could do more than one scalar operations, it was called superscalar. IBM RS6000 was the first superscalar CPU released in 1990. The first superscalar CPU from Intel was Pentium, released in 1993.