Friday, September 27, 2013

Intel Assembly Addressing Mode

Global variable is defined in a .DATA section. DB, DW and DD declare variables of 1, 2 and 4 bytes in length.

.DATA
var1 DB 64 ; initialize variable with value 64
var2 DB ? ; uninintialized variable
var3 DD 1, 2, 3 ; declare 3 doubleword variables and initialized to value 1, 2 and 3

arr1 DD 100 DUP (0) ; declare and array of 100 entries. Initialized to 0
str1 DB 'hello',0 ; declare a null terminating string of 6-bytes long

mov eax, [ebx] ; move the eax content to 4 byte pointed to by address in ebx
mov [eax], ebx ; move the ebx content to the address stored in eax

Somtimes, the size of data during manipulation is ambiguous e.g. when immediate value is used

mov BYTE PTR [ebx], 2 ; move 2 into a single byte at address stored in ebx
mov WORD PTR [ebx], 2
mov DWORD PTR [ebx], 2

Windows System Call Flow

(1) user mode program call BOOL WINAPI WriteFile()
(2) control transfer to Writefile() routine implemented by kernel32.dll
(3) kernal32.dll calls ZwWriteFile() in ntdll.dll (user mode)
(4) ZwWriteFile() calls KiFastSystemCall() in ntdll.dll which execute the SYSENTER instruction to transit to kernel mode
(5) SYSENTER transfers control to KiFastCallEntry() in ntoskrnl.exe (Executive) via the MSR_CS and MSR_EIP settings
(6) KiFastCallEntry() calls KiSystemService() in ntoskrnl.exe
(7) KiSystemService() dispatch 0x163 which is NtWriteFile() in ntoskrnl.exe

Windows API

With the exception of NtGetTickCount() and NtCurrentTeb(), each Nt* function has a matching Zw* function. To the user mode program, calling Nt* function eventually ends up calling Zw* function. In kernel mode, calling Zw* module will follow a formal transition path via KiSystemService() routine. Calling Nt* will not.

Windows user mode components

Environmental subsystem provide API for specific applications to run. NT4 supports 5 environmental subsystems:

Win32 or later Windows subsystem
Windows on Windows (WOW) for 16-bit Windows applications e.g. Win 3.1
NT Virtual DOS machine (NTVDM) for DOS applications
OS/2
POSIX and later Services for UNIX (SFU) or Subsystem for UNIX based application (SUA)

Windows subsystem consists of 3 basic components:
(1) csrss.exe - Client Server Runtime Subsystem (user mode) It plays a role in managing processes and threads. It supports command line interface.
(2) win32k.sys - Kernel mode device driver
(3) User mode DLL that implement the subsystem's API, e.g. kernel32.dll, gdi.dll, shell32.dll, rpcrt4.dll, advapi32.dll, user32.dll etc.

When a Windows API need to access services in executives, it goes through ntdll.dll which reroutes code to ntoskrnl.exe

Service Control Manager (SCM) is implemented by service.exe in system32 directory. SCM launches and manages user mode service which is just a user-mode application runs in background.

Windows kernel mode components

The core is implemented in ntoskrnl.exe. This executable implements its functionalist in 2 layers - executive and kernel.

The executive implements the system call interface and major OS components such as I/O manager, memory manager, process and thread manager). Kernel mode device drives is in layer between the executive's I/O manager and HAL. The kernel implements low level routines (e.g. synchronization, thread scheduling, interrupt handling) that executive uses to provide high level services.

There are several version of kernel executives

  • ntoskrnl.exe - uniprocessor without PAE 
  • ntkrnlpa.exe - uniprocessor with PAE 
  • ntkrnlmp.exe - multiprocessor without PAE 
  • ntkrpamp.exe - multiprocessor with PAE 


win32k.sys is a kernel mode driver tat implement both user and graphic device interface (GDI) services. GDI is pushed to run in kernel mode for speed.

User to Kernel Model Switching

In real mode, MSDOS uses the Interrupt Vector Table (IVT) to expose system services runs in supervisor mode. Applications call INT 0x21 with a function code placed in AH.

Windows use IDT (Interrupt Descriptor Table). In a multiprocessor environment, each processor has its own IDTR register. Windows check the processor it's running on during start up to determine its system call invocation mechanism.

For Pentium II, INT 0x2E instruction and IDT are used to implement system call mechanism. For later IA32 processors, Windows uses SYSENTER instruction to jump to kernel space. IDT is only used to handle hardware exceptions IDT contain up to 256 8-byte descriptors. To dump the descriptor registers content, use rM 0x100 command in debugger. idtr shows the base address and idtl shows the limit (length). To format idt content, use debugger command !idt -a

In Windows, most of the entries point to KiUnexpectedInterrupt routines, which in turn jump to nt!KiEndUnexpectedRange routine. Even those later processor uses SYSENTER, the IDT entry at 0x2E also implement the functionality by pointing to nt!KiSystemService (System Service Dispatcher). It uses information passed on from application to invoke the native API routine. Nowadays, switching from user to kernel mode is done via the SYSENTER instruction. 3 64-bit machine specific registers (MSR) is used to identify the target to jump to, the location of kernel-level stack (in case the user mode stack needs to copy over).

  • IA32_SYSENTER_CS (0x174 register address) - kernel mode code code and stack segment 
  • IA32_SYSENTER_ESP (0x175) - stack pointer in the stack segment 
  • IA32_SYSENTER_ISP (0x176) - first instruction to execute These registers are manipulated using the RDMSR and WRMSR instructions. 

SYSENTER_CS usually points to a Ring 0 code segment that spans the entire address range. Thus SYSENTER_EIP is a full 32-bit linear address in a kernel module called KiFastCallEntry. The module will eventually jump to KiSystemService.

Like INT 0x2E,the service number needs to stow in EAX before calling SYSENTER. KiFastCallEntry involve KiSystemService to dispatch the target Nt funciton. The dispatch is achieved via a service number to index a lookup table. The system service number is 32-bit. Bit 0 to 11 represents the service number to be invoked. Bit 12-13 specify 1 of 4 possible service descriptor tables. In fact, only 2 of the service tables are used. If the table number is 0x00, the KeServiceDescriptorTable is used. If the table number is 0x01, the KeSErviceDescriptorTableShadow is to be used. The KeServiceDescriptorTable is exported by ntoskrnl.exe and KeServiceDescriptorTableShadow is not exposed and used internally in the executive.

The 2 descriptor tables contain a structure called System Service Table (SST):

  • serviceTable points to an array of linear addresses which are entry points of routines. The array is called SSDT System Service Dispatch Table and contains 391 elements. SSDT is similar to IVT. 
  • nEntries specifies the number of elements in the SSDT 
  • argumentTable is a pointer to an array of bytes called SSPT (System Service Parameter Table). Each byte represent the number of bytes allocated for function arguments for the corresponding SSDT routine. 

KeServiceDescriptorTable contain one SST. KeServiceDescriptorTableShadow contains 2 SST. The first one is same as the one contains in KeServiceDescriptorTable. The second one points to the SSDT for the GDI routines implemented by win32k.sys and contain 772 entries.

HAL and bootvid

Hardware Abstraction Layer (HAL) insulates the OS from hardware by wrapping machine-specific details with an API that is implemented by HAL.DLL. Kernel mode device drivers invoke HAL routines rather than interface to hardware directly.

HAL implementation depends on hardware on which Windows runs on. HAL is located in system32 directory:

  • hal.dll - standard PC 
  • halacpi.dll - hardware with advanced configuration and power interface (ACPI) 
  • halmacpi.dll - hardware uses multiple processors Sitting with HAL, 


bootvid.dll offers primitive VGA graphic support during boot phase. It can be controlled via the /noguiboot option in boot.ini.

ASLR (Address Space Layout Randomization)

Memory Manager in early version of Windows tried to load binaries in the same location in the linear address pace each time they are loaded. The /BASE linker option allows the developer to specify a preferred address for a DLL or executable. The preferred address is stored in the header of the binary. If preferred address is not specified, the default load address for executable is 0x400000 and for DLL is 0x10000000. If the address is in used, system will relocate the binary to another region. /FIXED linker address will prevent relocation and causes an error message to be issued instead.

ASLR allows the binaries to be loaded in random addresses. It is enabled with the /DYNAMICBASE linker option. Common DLL will still be shared by multiple address spaces that use them.

I/O Techniques

(1) Programmed I/O When a processor encounters an I/O instruction, it issues a command to the appropriate I/O module. The I/O module sets the appropriate bits in the I/O status register but does not alert the processor. The processor will need to check for the I/O completion periodically after the I/O instruction is executed. The processor is also responsible to transfer the data from the hardware buffer to memory. Processor has various I/O instruction to control the device (e.g. unwind a tape drive), test status and transfer data.

(2) Interrupt driven I/O The I/O module will interrupt the processor when the I/O completes. However, the processor is still required to transfer the data to memory. There are 2 drawbacks: the I/O transfer rate is limited by the speed the processor can test and service a device. The processor is also tied up in managing I/O transfer

(3) Direct Memory Access Interrupted driven I/O is not efficient when a large amount of data are to be transferred. DMA is performed by a separate module on the system bus. Processor issues a command to the DMA module with information such as operation (READ/WRITE) required, address of the I/O device, starting address of the memory location and number of words to be moved. The DMA module will transfer the data directly, one word at a time. When the transfer is completed, DMA module alerts the processor. As the DMA module needs to take control of the bus to transfer data, it may contend with the processor for the use. The processor will wait for one bus cycle when DMA is using the bus. However, no context switch is incurred. Overall, DMA is more efficient when transfer multi-words I/O.

Thread Models

In a User-Level Thread environment, all thread management is done by application. Kernel is unaware of the existence of thread. Application uses a thread library for thread management (creation, destroy, pass data, scheduling and storing thread context). The application begins in a process with a single thread. Application spawn threads in the same process. The context of the thread consists of registers, program counter and stack pointer. The kernel schedule execution in the level of process.

Advantages of user level thread are:
(1) Thread switching completely in user mode and no context switching
(2) Differetn application can use different scheduling algorithm
(3) The threading model can run on any OS as there is no need for the kernel to support

Disadvantages of user level thread are:
(1) When ULT executes a system call, the process and all the thread will be blocked. A technique call jacketing which convert the blocking call to a non-blocking call. The jacket routine checks if the device is busy. If it is, the router will block the thread and pass control to another thread.
(2) As kernel schedule process to only 1 processor, ULT cannot take advantage of multiprocessor environment.

In a Kernel-Level Thread environment, thread management is done by kernel. Application create thread using a kernel API. The disadvantage is that transferring control from one thread to another requires switching from user to kernel mode. In a benchmark, kernel mode thread switching can be 30 times slower.

Windows Boot Process

(1) Machine starts in POST (Power On Self Test) which will detect the amount of memory and enumerates storage devices attached.

(2) BIOS search for the bootavle devices for a boot sector. If the bootable device is a hard disk, the boot sector is a MBR (Master Boot Record) written by Windows setup. MBR contains code and a partition table used to identify the active partition. The active partition is also called the bootable partition or the system volume.

(3) MBR load the partition boot sector (called VBR or volume boot record) into memory

(4) If the boot devices is not hard disk (e.g. DVD or floppy), the BIOS will load the device's VBR into memory

(5) VBR boot code reads the partition's file system just well enough to locate and load 16-bit boot manager program. The boot mamager is actually 2 executables concatenated together. The first module is 16-bit and execute in real mode. It sets up the necessary data structure and switches to protected mode and load the protected mode boot manager (32-bit or 66-bit) into memory.

(6) For EFI (Extensible Firmware Interface) machine, the boot code is in the firmware and there is no need for MBR or VBR. The boot manager path is provided to EFI via a varaiable setting. EFI firmware switches to protected mode in a flat memory model with paging disabled and run bootmgr.efi (boot manager)

(7) Both BIOS and EFI load the boot manager. The boot manager uses configuration data stored in registry (BCD or boot configuration data). The BCD has 2 elements. A Windows boot manager object control the character-based boot menu (locale, default timeout etc). The boot loader objects represent different boot configuration (e.g. normal, debugging etc). If there is only 1 boot loader object, the boot manager will not display the character UI.

(8) boot manager will load the Windows boot loader (winload.exe) whose location is specified in the boot loader object.

(9) winload.exe is a successor of the NTLDR. winload starts by loading the SYSTEM registry hive (c:\Windows\system32\config). SYSTEM hive is mounted under HKLM\SYSTEM.

(10) winload load nt5.cat which contains digit signature catalog and performs an integrity test of its own memory image. If the signature not matches, winload will halt.

(11) winload then loads ntoskrnl.exe and hal.dll. If a debugger is attached, winload will also load the kernel mode driver for the debugger (kdcom.dll for null modem, kd1394.dll for firewire and kdusb.dll for USB debug cable). Winload will check the integrity of the loaded module against nt5.cat.

(12) winload then continue to load the DLL imported bu ntoskrnl.exe and checks their image against nt5.cat. The DLL loaded are pshed.dll, bootvid.dll, clfs.sys and ci.dll.

(13) winload scans`HKLM\SYSTEM\CurrentControlSet\SErvices for device drivers that belong to boot class category (i.e. Start parameter with value equal to 0x00000000 or SERVICE_BOOT_START. If integrity check is enabled, winload will check the signatures of these drivers against nt5.cat. Again, winload will halt if integrity check fails.

(14) winload enables paging, save the bootlog and transfer control to ntoskrnl.exe via its exported function kiSystemStartUp().

(15) ntoskrnl builds the data structure (e.g. page table) and load ntdll.dll. The executive searches HKLM\SYSTEM\CurrentcontrolSet\Services for system class driver and services (subkey with Start value equals to 0x00000001). If integrity check is enabled, the executive will check it against ci.dll. Any driver that fails the test will not be loaded.

(16) The executive initiates the session manager (smss.exe). smss starts the Windows subsystem that support the Windows API. It means smss uses only native API.

(17) Windows subsystem consists of 2 parts. win32k.sys is the kernel mode driver, csrss.exe is the user mode component. smss locates the kernel mode driver in the registry HKLM\SYSTEM\CurrentcontrolSet\Control\Session Manager\SubSystems\Kmode. win32k.sys switches from the default boot VGA mode to the target display mode.

(18) smss also loaded the user component specifed in HKLM\SYSTEM\CurrentControl\Set\Session Manager\Subsystems\Required. The entry points to two other subkeys - Debug and Windows. Normally Debug is empty and Windows points to csrss.exe.

(19) csrsss.exe enable sessions to support user-mode applications that make call to Windows API

(20) smss.exe continue to load "known" DLL specified under \HKLM\SYSTEM\Current\Control\Set\Session Manager\KnownDLLs\

(21) smss.exe creates 2 session (0 and 1). Session 0 hosts the init process. Session 1 hosts the logon process.

(22) Session 0 version of smss.exe launches wininit.exe

(23) Session 1 version of smss.exe launches winlogon.exe

(24) The original smss.exe then waits in a loop and listen for LPC requests to spawn other subsystems, create new sessions or shutdown the system.

(25) wininit creates 3 child processes. Local Security Authority Subsystem (lsass.exe) sits in a loop listening for LPC for security related request. The Service Control Manager (services.exe) load drivers and services marked as SERVICE_AUTO_START in the registry (0x00000002). The Local Session Manager (lsm.exe) hanbdles connections to the machine made via terminal services.

(26) winlogon.exe handles user logons. It runs logonui.exe to display the logon prompt. logonui.exe passes the credentials to lsass.exe. If successful, winlogon.exe will launch the application specified by UserInit and Shell values under HKLM\SOFTWARE\Microsoft\Windows NT\CurrentVersion\Winlogon. By default, UserInit specifies userinit.exe and Shell specifies explorer.exe

(27) userinit.exe process the group policy objects. It also cycle through several registry subkeys and directory to launch start up programs and scripts:


  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce\ 
  • HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Run\  
  • HKLM\Software\Microsoft\Windows\CurrentVersion\RunOnce\ 
  • HKLM\Software\Microsoft\Windows\CurrentVersion\Run\ 
  • C:\ProgramData\Microsoft\Windows\Start Menu\Programs\Startup\
  • C:\Users\%USERNAME%\AppData\Roaming\Microsoft\Windows\Start Memu\

Kernel Mode Driver

KMD layers between I/O manager (Io*) and hal.dll. KMD uses API exposed by hal.dll to interact with the hardware.

KMD process IRP (I/O Request Packets) handed down from I/O manager on behalf of user applications. Microsoft introduced device framework to ease devlopment of KMD. WDM (Windows Driver Model) was released to support Win98 and W2K. WDF (Windows Driver Framework) encapsulates WDM with another layer of abstraction.

The DriverEntry() routine is executed when KMD is first loaded into kernel space. DriverEntry() returns the status in NTSTATUS type. DriverEntry() takes 2 parameters. The first IN parameter is of type DRIVER_OBJECT which contains information of the driver, including a list of function pointers:

DriverInit - by default, I/O manager set this to the address of DriverEntry()

DriverUnload - to be set by KMD for the routine to execute when KMD is to be unload

DriverDispatch - an array of MajorFunction which define the routines to be executed in response to the major function codes (e.g.IRP_MJ_READ, IRP_MJ_WRITE, IRP_MJ_DEVICE_CONTROL etc) in the IRP passed down Dispatch routines carry the following signature: NTSTATUS DispatchRoutine(IN PDEVICE_OBJECT DeviceObject, IN PIRP Irp); For device control, IRP contains a 32-bit field, IoControlCode, which provide further information on the IRP. IocontrolCode comprises four sub-fields:

(1) DeviceType - Microsoft reserves type value 0x0000 to 0x7FFF e.g.FILE_DEVICE_DISK, FILE_DEVICE_KEYBOARD. User can define its own type using 0x8000 to 0xFFFF (32K)

(2) Function - program specific integer value defines action to be performed. MS reserves 0x0000 to 0x7FFF. User defined function span 0x8000 to 0xFFFF

(3) Method - defines how data are to be passed between user and kernel mode code. e.g. METHOD_BUFFER means OS to create a non-paged system buffer

(4) Access - READ or WRITE access to be declared before opening the file object representing the device.
To use the KMD, it must firstly be registered to the OS via RegisterDriverDeviceName(). Then use RegisterDriverDeviceLink() call to create a symbolic link for user mode program to communicate with the KMD. User mode program first use CreateFile() to open the device. It then can use Windows API DeviceIoControl() to communicate with the KMD

Kernel Patch Protection (KPP) or PatchGuard

Originally deployed in 2005 and have 2 later upgrade (v2 and v3) to counter bypass techniques. PatchGuard monitor several vital system components (SSDT, IDT, GDT, MSR, ntoskrnl.exe, hal.dll and ndis.sys) periodically (5 to 10 min) against known singatures. It issues a bug check with stop code 0x00000109 (CRITICAL_STRUCTURE_CORRUPTION) when it detects any component change.

Kernel Mode Code Signing (KMCS)

KMD are required to be digitally signed in order to be loaded. Boot drivers are loaded early by winload.exe. Any driver that fails the integrity fail will prevent Windows from starting up. ntoskrnl.exe uses routines exported from ci.dll to check the rest of the drivers.

Service Control Manager

SCM is used to load and start drivers (KMD) and services in kernel space.

sc.exe is a utility to define, starts, stop and delete services and drivers.

The corresponding programmatic calls are

  • OpenSCManager - open and obtaina handle to the SCM database. The handle is required for subsequent SCM calls. 
  • CreateService() - defines the services using supplied information including, name, binary path, START type etc. The handle return is required for the StartService call 
  • StartService() - load the driver or services 
  • ControlService() - use to stop the service DeleteService() - remove the driver information from SCM database

Thursday, September 19, 2013

DEP (Data Execution Prevention)

Windows feature that prohibit execution in designated pages to protect data, stack or heap pages. Hardware enforced DEP is applicable to both OS and applicaiton Software enforced DRP is applicable to application Enabling DEP will also enable PAE. DEP is enable using /NXCOMPAT liner option.

PAE and AWE

The amount of RAM that can be accessed by Windows depends on OS version and underlining hardware.
For IA32 hardware, Windows can access beyond 4G RAM using Intel PAE (Physical Address Extension available since Pentium Pro). PAE is an extension to the system level bookeeping that allows a machine (via paging mechanism) to increase the number of address lines from 32 to 36. PAE is enabled in Windows via the /PAE boot option.

AWE (Address Windowing Extension) is a Microsoft specific feature that allows an application to access RAM beyond the 4G linear address space limit. AWE is an API (declard in winbase.h). AWE uses a set of fixed size regions (windows) in an application linear address space and maps them to a larger set of fixed size windows in physical memory.
AWE can operate with or without PAE. Application needs "Lock Page in Memory" privilege to use AWE.
VirtualAlloc() or VirtualAllocEx () - reserve a region in linear address space

AllocateUserPhysicalPages() - allocate pages of physical memory to be mapped to linear memory

MapUserPhysicalPages() or MapUserPhysicalPagesScatter() - map allocated pages of phsyical memory to linear memory

FreeUserPhysicalPages() - release the allocated pages

Tuesday, September 17, 2013

Debugger

A machine debugger (e.g. debug command in DOS) views program as a stream of bytes.  It can examine content stored in registers and memory location.  It has no concept of variables or routines.

A symbolic debugger is a source level debugger.  To perform debugging on source level, it uses the target's program's debug symbol table.  The table contains a collection of variable length records which generated by compiler.  The records contains information about variable (name, type, address) and functions (name, start address, end address, statement start and end address range).

These information allow the debugger to step execute the source code by running the machine instructions within defined ranges.

All operating systems provide hooks for debugger.  Under DOS, debugger is driven off by 2 ISR:

INT 0x3 - signal to breakpoint.

INT 0x1 - allow single stepping

When the TF (Trace Flag) is set in the FLAGS/EFLAGS, the processor will execute a single instruction and then automatically execute an INT 0x1 instruction.  This caused the ISR for 0x1 to execute.  Processor will clear the TF automatically whenever it invokes a ISR so that the debugger does not need to operate in single step mode.