Sunday, April 28, 2024

Addressing mode

Bit 31 in PSW is called extended addressing mode bit. Bit 32 is called basic addressing mode bit. 
When both bits are zero, it uses 24 bit addressing. When bit 32 is a one, it uses 31 bit addressing. When both bits are ones, it uses 64 bit addressing. 

Address related to cross memory operation is always 31 bits. 

Address in CCW is either 24 or 31 bits 

Locks in MvS

 Creating a separate lock for each resources incurred high overheads to maintain them. Creating too few locks will inhibit concurrency. A balanced approach is to group related programs that share resources and create separate lock for each group. In this case the groups of program can process without serializarion interrupted by other group of programs to achieve a balanced concurrency. 

Lock word

Locks are represented by a memory location. CPU attempt to obtain the lock by using compare and swap instruction to store its CPU id to the lock word location.  CPU can repeatedly tried to obtain the lock by looping on the compare and swap instruction until it succeeds. This is called spin lock. 

Sunday, April 21, 2024

SRB execution

SRB is scheduled for execution via a SCHEDULE macro which linked the SRB to either the global or local chain in CVT. 

Dispatcher will dispatch the global SRB before the local ones. When control is given to the SRB routine, it will first free the SRB storage as the dispatcher is not going to do so. SRB is executed with interupt enabled.  But dispatcher will return control to SRB routine when the interrupt is handled. Dispatcher will not pre-empt the SRB unit of work until it gives up control voluntarily. This is to avoid saving and restoring state for a presumed short piece of work. 

SRB can be suspended when t hits a page fault or it asking for a lock that is not available.in this case, SRB execution cannot continue. The page fault handler or lock manager would save the state of execution to a special SRB (SSRB”. Once the resources requested is available, the SSRB will be chained to the local SRB list with a special priority called non-quiesceable”  

When an address space is quiesced, dispatcher will runs the SSRB to finish before the address space is stopped. 

Tuesday, April 16, 2024

Task vs Service

To execute a program in MVS, one can call the ATTACH macro which create a TCB. The ATTACH macro is expanded to a SVC call (42) which trigger interrupt handling. If the task was a performing very short procedure, the overhead of creating a task is too expensive. 

MVS provide a SRB mechanism to allow a subsystem or address space to perform a procedure with less overhead than via creating a task. SRB is invoked via a SCHEDULE macro which does not expand to a SVC instruction. The macro will put the SRB on the appropriate queue and awaiting execution when the address space is picked by dispatcher as the next highest priority to run 

Wednesday, April 10, 2024

Compare and swap

 In a multiprocessor systems that several CPU share the same memory, it is important to serialize access to specific memory cell to prevent different CPU overwrites the cell from each others. 

For example, one CPU read the cell content into ALU register, add 1 and store it back to the cell. If the execution is interrupted and the value of the cell is changed by another CPU, the value stored will be overwritten by the first CoU when it resume its execution at a later time not realising the cell value has changed. 

Compare and swap a a hardware interlocking mechanism to prevent this scenario. To use it, a CPU will read the value into the register (first parameter). He CS can instruction then compare the register value with the memory location. If the compare is equal, CS will store the register value (second parameter) to the memory location. If the compare is not equal, CS will store the value at the memory location to the first parameter register. In the latter case, the program need to handle the fact that the memory value has changed and retry the CS instruction until it is successful. 

IPL

 When the UPL device address is dialed and operator pressed the load button, the system reads in 24 bytes from the device. 

The first 8 bytes is a PSW. The second  bytes is a CCW which read in a channel program to fetch in the system start up code. The third 8 bytes s a TIC CCW which transfer control to the new channel program read in by the previous CCW. 

Once the system start up code is read in, the system will load the PSW with the first 8 bytes to start the bootstrapping. 

Channel Programming

 Before IOS issue a SIO command, it firstly must set up the address of the channel program, a eries of channel command words in a special memory location called channel status word. If the channel responded positively to SIO, the channel start fetching and executing channel programs and frees the CPU to do other words. 

Channel command word contains the command codes such as read or write, the data address n memory from or to which the data is the be transferred and several flags that nodify the execution of the command word. 

Chain Data flag caused tue channel to continue execute the same command with the data address in the next word. This resulted in an effect of dispersed IO which read or write from few buffers. 

Chain command flag caused the channel to execute the next command on the same device. 

Skip flag caused the channel to read the data but not transmit to the memory. This is used to check the data just written (write check). 

SLI suppresssed length indicator is to ask channel to not abend if the IO byte counts differs from the one specified in the command word. This is to handle variable length record. 

TIC transfer in channel flag is the branch instruction in channel progra. CD and CC flag continue with TIC. If the next work is a TIC, the program will amend. If the next word is not CD or CD, it t indicated end of program. 

Sunday, April 7, 2024

MVS program control

In a job, each step runs a program. Control is passed to the program via an ATTACH macro. The macro has expanded to a SVC to ask supervisor to find and load the program and create a Request Block (RB) to indicate a level of control. Supervisor then pass control to it via BALR 14,15 wheee R14 contains the return address and R15 contains the load add re as of the program.  ATTACH will also create a TCB  

The program can issue LINK macro which ask supervisor to load another program and pass control to it. OS will create another RB for the called program to indicate another level of control. In other words, the called program will return control to the previous level when it issues a RETURN macro. 

The XCTL macro is similar to link except it means called program return control not to the immediate caller but one higher level to the caller id the caller program.  XCTL issue a SVC to replace the caller RB with the caller RB in the call chain  

The LOAD macro loads a program and return the load address. However, no SVC is called this no RB is created.  A caller can jump to the loaded program via a CALL macro and return to the caller via a RETURN macro. 

RB are linked off from TCB. The TCB points to the latest called program. The foist called program for the step will be the last element in the project list and point back to the TCB. 

Saturday, April 6, 2024

ASCB

The communication vector table (CVT) is kept in SQA. It contains a pointer to the ASVT (address space vector table) which is table the keep track of anll address spaces in the system.  ASVT entry contains the ASID and pointer to the ASCB. 

The size of ASVT is fixed at system generation. All address spaces are kept in ASVT except the master adddess space. The master ASID is 1 and is the first afdtrsss space created. The ASCB is hand crafted and not store in SQA

There is also a head and tail pointers that thread through the list of swapped in ASCB in dispatching priority order. This list is used by dispatcher to find the next AS to run. 

ASCB are kept in SQA and not swappable. ASCB contains information about the address space such as the address space ID, sequence number representing its position I the dispatcher queue, address pointer of the next ASCB file n the dispatcher queue. Whether the AS is swapped in or out, dispatching priority, EPS allocated, real storage frames allocated, number of ready TCB, number of active CPU this AS is on 

ASCB points to ASXB which contain information that f interest to individual AS. I’m t contusions j formation such as number of TCB in this AS, interrupt handler save area (IGSA) and SRB queue.   

ASXB is stored on LSQA and is swappable 

Wednesday, April 3, 2024

Prefixing

MVS use the first 4K page as save area for interrupt PSW. This is called PSA (prefix save area). Each interrupt has a slot to keep the old PSW of the current process and the new PSW used by the first level interrupt handler. In a multi CPU installation, we will need multiple 4K pages for this and each page is used for 1 CPU. Prefixing is used to do that. 

Each CPU has a PVR (prefix value register) which is 12 bit long. DAT translate virtual address referenced by the CPU into a real address the real address top 12 bit (assuming 24 bit addressing) is compare with the PVR value of the CPU. 

If the top 12 bit is zero (ie refer to the first 4K address), the 0s are replaced by PVR value and this form the absolute address. This is equivalent to transpose the fist 4K real dress to another block of real address. This is called forward prefixing. 

If the top 12 bits is not 0 and also does not match the PVR value, the real address remains unchanged and becomes the absolute address. 

If the top 12 bits match the PVR value, the prefixing hardware replace the top 12 bits with 0, effectively point it to the first real 4K memory block. This block of memory is used by the system ti store hardware I formation which can be inspected by all CPUs in the system. This is called reversed prefixing   

Tuesday, April 2, 2024

V=R Region

 MVS reserved some amount of real storage set by a value in the IPL parameter. Region requested V=R will have its virtual storage same as the real storage.  Two V=R regions will be mapped to different range in the real memory so they can share the reserved space. V=R region is not subjected to page fault and the storage is fixed in memory. 

Monday, April 1, 2024

MVS AS Layout

System Area occupies the low address range.  System Area contains the nucleus load modules and nucleus extension.  Nucleus load modules contains the dispatcher, interrupt handler and recovery support code.  Nucleus also contains the CVT and page frame table.  Nucleus extension contains fixed BLDL tables (device address of the program entries), fixed link pack area and other system-wise information.  Nucleus starts at address 0 and the virtual addresses of System Area is same as real addresses.

Private Area is above the System Area.  Private Area contains the user program in a region specified in the JOB card.  To keep track of works and stroage in region, the Private Area also contains the LSQA and SWA.  LSQA contains CB and tables related to the address space such as the segment table and page table.  Subpool 253, 254 and 255 are in LSQA.

SWA is a work area of dispatcher.  It contains subpool 236 and 237 which contains the job queue of the address space.

Private Area is pageable except the LSQA.  When the address space is swapped in, LSQA will be fixed in real memory until the address space is swapped out.

Above Private Area is the Common Area.  Common Area is common to all address spaces.  Common Area contains the SQA, Pageable LPA and Common System Area.  SQA contains tables and queues of the entire system and information related to all private address spaces.  These information cannot be placed in LSQA which can be paged out with specific address space. 

PLPA contains SVC routines, access method and other selected program.  The routines are re-entrant.  PLPA is pageable.

CSA is used for communication between address spaces. 

System Area and Common Area are shared by and common to all address spaces.  Their page tables are kept in SQA.  Segment table and the private page tables are kept in LSQA

Virtual Address Translation

 1. CPU Extended Control, EC-mode bit must be turned on in the PSW to enable DAT

2. Load the segment table address of the program in STOR

3. LPSW to load the program PSW with the translation bit (5) on

4. DAT verify the segment number is within the length of segment table (part of STOR).  If out of range, program is terminated with 0C4.

5. DAT check the segment entry invalid bit.  If it is 1. DAT returns segment translation error to OS

6. DAT checked the PTE and if GETMAIN bit is off, DAT returns program check error 0C4

7. If PTE invalid bit is off, the page is in main memory

8. If PTE invalid bit is on, a page fault is triggered and OS will bring the page in.  During this time, the program lose control of CPU.

24 bit and 31 bit address will be padded with zeros on the left to 64 bit before DAT translate it or prefixing translate it  


Sunday, March 31, 2024

Segment Table Origin Tegister

Control register 1 is the STOR which contains the segment table of the.currently running address space.  CR1 contains only 18 bits.  To find the segment tab;e, DAT appends 6 bits of 0s to form the real address for the segment table.

Page Table Entry

Page in virtual memory is allocated when it is GETMAINed.  The first page or a segment requested will be allocated a page table. Each PTE is either allocated or not allocated. The invalid bit indicated if the page is allocated.  The GETMAIN G-bit indicated donated if the page is in real memory or in external page store. The corresponding XPTE table store the cylinder track and tecord address of the page. 

Mainframe Storage Evolution

Primary Control Program (PCP) had no virtual storage. It manages real storage. Program loaded in the storage and uses overlay technique if the storage is not sufficient to host the entire program. One program is loaded and run at a time. 

The next step is MFT which divide the real storage into partitions predefined with fixed size. Program are fixed to particular partition based on their size. Use of storage is not optimal as small program wasted storage in the partition. Program loaded will have its base address be relocated by the loader to the partition real address for run. 

MVT is similar to MFT but the partition size is set    dynamically to reduce storage wastage for small program.  The downside is the possibility of memory fragmentation. If the freed up memory (when a program ends) is not large enough for the next program to run, the new program will have to wait. 

PCP, MFT and MVT addressed real memory directly. Program is compiled with origin 0. Loader will load the starting real memory address in the base register. This is called static relocation. 

MVT supports roll out and roll in operation to dynamically swap the whole program out the external store during run. Roll in will load the program in its original address before rollout so it is not very effective memory management. 

The next step is to develop virtual memory. Firstly, a program address space is divided into segments of block. Secondly, the address for each block is translate during run time which means the block load address can change while it runs. 

OS/VS1 and OS/VS2 release 1 (SVS) implemented virtual memory. However, there is only 1 virtual address space in the whole system. So they are corresponding to PCP and MVT using a larger virtual memory instead of real memory. 

MVS gave each user its own virtual memory space. 

Saturday, March 16, 2024

Domain Linkage

 Calling into a domain gate is via a macro for the desired gate. The macro set R1 to point to the TPL, set R0 to contain the row and column numbers of the gate in the Domain Gate Table and then call to the module DFHKEDCL to link to the gate. The RPL contains a fixed length header and a variable length parameter list. 

CICS use standard register convention when entering gate. R1 points to the parm list. R4 points to the stack storage of current user (similar to stack frame pointer which point to the base of the current frame). R13 points to top of stack of the current frame. R14 is the return address and R15 is the go to addresss. 

CICS Task and Transaction

 A task is a how represented by DTA (Dispatcher Task Area) control block in the Dispatcher Domain.  A task is created via the ATTACH call to the dispatcher and is registered with the Kernel domain such that an KE task is assigned with associated KSS storage. 

A transaction refers to a how originated from AP domain with an associated TCA and EIS control block there. A TQE (task year element) in AP which map to DTA in DS which in turn maps to KE. 

System task only have DTA but no TCA  


Kernel Anchor Block

 KCB contains some fields previously found in CSA. It contains the address of the KE-Task table which map work unit to TCB (QR, CO, RO resource owning etc). It also co gain pointers to the KSS (kernel stack segments) which is 2 stacks (24 and 31 bits) used for save area for each KE task when module calling. KSS is in MVS storage separate form application for better protection. 

Domain Gate Table

 This table reside in the kernel domain and contains pointer to the domain anchor block and the entry points of the specific and generic domain gates for each domain. The index to the table form the domain token.   

Tuesday, February 13, 2024

OS/MVT

Multi programming with variable number of task OS load program one after another until all memory is used. When a program ends, its memory range is released. However a new program can be loaded if the contiguous space is large enough required by the program. Fragmentation of memory caused by programs ending at different time may not free enough contiguous memory for new program. 

OS/MFT

Multiple fixed task OS divided real memory into partitions. Each job ran in one partition. When loader load a program in a partition, it relocate the code to run in the partition memory address range. 

OS/PCP

Primary Control Program system run one job at a time. Program is loaded in a fixed location in memory. PCP automates many of the operator intervention tasks at that time. Program larger than memory need to be broken down into overlays. Input output spooling was available but not officially supported. 

Saturday, February 3, 2024

Mainframe Main Memory access

 Main memory is divided into block. Each block is associate with a storage key. To fetch or update data in the block, program access key is checked against the block key. This is for compatibility with old machine architecture before virtual storage is available. Also the block has a reference and a change bit. 

Main storage blocks are 2K in size though most manipulation is 4K. The he reason is to be compatible with DOS. 

MCU control the access to memory by COU or channel. Address is stored in memory address register (MAR) and data to store or the result of fetch is stored in memory data register (MDR). 

To store, CPU (or channel) alerts MCU with a MEMORIZE signal to signify it is a store request. It then update the MAR and MDR. It then issues the NOW signal to start the store operation. 

To fetch from memory, the request or issue a RECALL signal, then update the MAR. It issues a NOW signal to MCU to fetch the data 

MVS channel and control unit

An I/O address is a combination of channel number, control unit number and device number. 

Control unit handle the assembly of bits sent from device to bytes format, it also validate the data using CRC. The purpose of CU is to centralize these logic to make device cheaper to manufacture. 

Channel is used to offload CPU from moving data from device to main memory (DMA). 

VIO

 VIO uses virtual storage and paging dataset to simulate temporary dataset for program. VIO provides better performance as it eliminate VTOC processing, and better I/O load balancing. 

IOS

 I/O Supervisor is responsible to start I/O operation and monitor events from channel, control unit and devices. 

To start an I/O, IOS store the address of a channel program in caw (channel address word) follow by start I/O instruction. When I/O is done. IOS performs termination processing. IOS also responds to event, purging or restoring an I/O operation. 

Program and access method interact with IOS via Drivers. Most of access method uses EXCP driver. The EXCP and EXCPVR macro invokes the EXCP driver. The driver convert the virtual addresses in the channel command words into real address. It issues STARTIO macro to starts the I/O. IOS takes over and issue the STARTIO instruction or queue the request for later execution

DSCB

 DSCB resides in VTOC. It is the dataset label that contains the characteristics of the dataset, and the physical tracks that t resides. 

DASDM routines manipulates these DSCB and include ALLOCATE, SCRATCH, PARTIAL RELEASE and EXTEND. It also include VTOC RELATED ROUTINES such as RENAME, OBTAIN, LEPACE and PROTECT. 

MVS OPEN, CLOSE and EOV processing

 The OPEN macro verify the volume and dataset password. For tape, it writes volume label. It then passes vi trip to access method. 

The Close macro update the dscb in vtoc. For tape processing it writes the tape mark and reposition the tape. 

EOV handle the situation when the write reach the end of volume transparently to the application. It extends the dataset to another disk or tape volume.  For disk, VTOC is updated. For tape, you t will ask to mount another volume and continue writing. 

JES

 JES read in job and spool it in DASD. The Converter translates the JCL into internal text. An initiator asked for a job to run, JES selects a job based on tge PRIORITY specified in the JCL. The Interpreter allocate the control blocks through which the system will manage the job execution. The Initiator attach the job task. When the job step is running, JES collect the job output and spool it. When the job ends, the Terminator releases the job resources 

MVS linage

Early generation (1950s) mainframe does not have an operating system. It run batch one at a time. Operator will feed the program to card reader together with the subroutine deck the program uses. Operator also allocate the devices required. When the program ends, operator deallocate the device es and prepare for the next step (program) to run. The operation is manual. 

The first rudimentary OS at early 1960d are Primary Control Program (PCP) and Disk Operation System (DOS). These OS mechanizing job transition. Device allocation is still Manual. The OS will search for the subroutine library to retrieve the program requires to run. The system is still running one job at a time. 

MFT (multi programming with fixed number or task) came out in 1967 for OS/360. It support running a fixed number of jobs co currently. JCL is used to separate the program from operations (device, dispatching etc). Output is managed by HASP. Multi programming led to TSO and RJE. 

MFT gave rise to MVT (variable number of task). More programming language is supported include PL/1, ALGOL, APL and Basic. 

In 973, three new OS came out - SVS,VS and VM370 all supporting virtual storage. This eliminate the overlay technics. As demand to support more users gave birth to MVS in 1974 

The evolution of mainframe OS improved productivity of users and application and system programmers. It automate computer operations, workload management, data and resource management. 

Saturday, January 20, 2024

APPLE II BASIC

 The BASIC ROM contains the BASIC interpreter. User enter the program line by line and it was parsed and stored as linked list in memory. The line number specified for each statement is needed for the parser to know where to insert the parsed structure in the linked list. The parsed statements are then interpreted to run the program. 

Sunday, January 14, 2024

CICS PCP

Program Control Program (PCP) is used support program call function in CICS.  CICS supports LINK API which allow one program to call another program.  The callee returns to the caller when finished.  XCTL pass control to another program of the same level and there is not return from the callee.  LOAD allows a program to load another load module (table, map etc) into memory.  PCP also support abend handler (HANDLE ABEND) which percolate control up program levels when a program encountered an abend.

CICS PCP functions allow related program to call each other dynamically.  Related programs does not need to be linked into a big module.  PCP also allocate separate working storage for each transaction which share the same program.  PCP also saves and restores program related fields in TCA.

Saturday, January 13, 2024

CICS Table Management

Table management module was introduced to supported RDO on CICS.  As RDO is dynamic in nature, TMP uses a chained table entries in a list so that the size can changed without restarting CICS. Each control table chain in CICS started from  a scatter table which contain a hash table of pointers. Table entries are hashed for fast search starting from the pointer in corresponding table entries. Entries hashed to the same value will be chained off. The scatter table also contains a pointer that chain all entries for sequential scan through the whole table.  Program like CEMT can access tables via a call to TMP.

TCA LIFO

The LIFO area is a stack area used by CICS modules to save registers. Additional LIFO storage can be allocated in DSA. 

Transaction Work Area

 TWA can be requested via PCT. TWA is mainly used by macro level program became the working storage is not duplicated for each task using the same macro level program.  In marco level program, the working storage is assumed to be read only.  

(his is different from command line program which using EIP and the working storage is automated allocated separately for each transaction using the same program.  Therefore, command level program seldom uses TWA. 

TWA is deleted when the transaction ends. The issue for using TWA is that it is allocated behind UTCA in control block area. Storage violation would likely affects CICS availability. 

CICS Dispatcher list

The suspend list contains tasks that waiting for long wait. I/O wait is not considered to be long and thus task waiting for I/O is queued in the active list instead. Long wait are those that may not have a target deadline like terminal I/O or interval control wait etc. 

Tasks in active list are ordered by priority which is calculated as the sum of transaction, terminal and user priority capped to 255 (stored in a byte). 

TCP (Terminal Control) task which is used to attach new task have priority of 255 and is placed on top of the list. Tasks are queued in the active chain in priority order.  New task are placed behind tasks with same priority (FIFO).

Saturday, January 6, 2024

X86 floating point

80287 was the first floating point co-processor for x86. The floating point registers were organized in a stack architecture. Floating point instructions was handled by ESC opcode prefix. The performance has been lagging comparing to other CPU. Intel subsequently introduced SSE implemented with a traditional floating point architecture. SSE also allow loading of 4 single precision floating point data into its register and gave further speed up.

Floating point Arithmatics

As there is infinite number of values between 0 and 1, storing floating point in a bit string with fixed length (like 32 bit register) is a an approximation of the actual real number. Therefore, floating point arithmetic may not be precise. Floating point instruction provide multiple option for programmer to treat the computed result - rounds up, rounds down etc. 

Floating point bias

To facilitate comparison, floating point store the sign in fist speed follow by exponent bits. The exponent is stored as two’s compliment number. 

An unsigned integer of 8 bits cover the value of 0 to 255. Two’s compliment encoding divide the 256 values into (almost) half for positive and half for negative. The advantage of two’s compliment number is that we can just add positive and negative number without heeding the sign and still generate the correct result. 

As exponent is stored as two’s compliment, the value of negative is higher ( in binary format) than positive exponent. To allow correct comparison, ieee754 uses a bias of 127. The bias was added to the exponent before it is stored. So the comparison would be performed naturally. 

When the floating point is used for computation, the bias is subtracted from the exponent before use 

Friday, January 5, 2024

Floating point

Scientific notation refers to a number with a single digit to the left of the decimal point and an exponential figure to the right. For example 1.23x10^2 is the scientific notation for 123. 

Normalised number refer to a scientific notation without leading zero. For example, 0.123x10^4 a s not a normalised scientific notation.  

Floating point is an encoding of the normalised scientific notation in binary in a word. The fraction part represents the precision and the exponent part represents range. The fraction is assumed to have a leading one which is not included in the encoding. 

The first bit is a sign bit followed by 8 bits for exponent and 23 bits for fraction. So the precision is 24 bits with the implicit leading 1. For double, the fraction is 52 (+1) bits long. The exponent is 11 bits long. 

The decimal value represented is equal to (-1)^sign x fraction ^ exponent

Multiplication and division logic

Multiplication is implemented in hardware as a series of shift and add operation. The implementation can be accelerated using multiple adder hardware to perform the operation concurrently. 

Division algorithm is serial in nature. Each step depends on the result of the previous step in the algorithm. Therefore, it cannot be enhanced easily like multiplication    In other words, the performance of division will be lower than multiplication. 

Tuesday, January 2, 2024

X86 Instruction Prefix

 X86 instructions support data size of 8, 16, 32 and 64 bits.  The default size (either 16 or 32 which is considered more commonly used) is set in bit in code segment register. To override the default, we be can use an instruction prefix. 


There are 3 other prefix originated from the 8086 that modify the behaviour of instructions. They are used see to

1. Overdue the default segment register

2. Lock the bus to support synchronzation

3. Repay the instruction until ECR counts for ti 0.  This prefix is commonly used to move a number of bytes  ironically, this method is slower comparing to software routine using load (to register) and store (to memory). If we use floating point register, the performance is even higher in this way. 

X86 register

X86 has 8 GPR which is much less comparing to RISC CPU.  The low number of GPR also influenced the instruction set format. One of the register will be both source and destination of the instruction.