Wednesday, July 20, 2011

Resoruce Virtualization

(1) Processor
The key aspect of virtualization a processor lies in the execution of the guest instruction, including both system-level and user-level instructions. There are 2 ways. One is via emulation (interpretation or binary translation). Emulation is the only processor virutalization mechanism available when the ISA of the guest is different from the host. Emulation may also be required even if host and guest ISA are the same, such as for instructions that interact with hardware resources need to operate differently on virtualized processor than on a real one.

The second method use direct native execution on the host machine. This is possible only when the host and guest ISA are the same. To minimize performance degratdation, one basic design goal for system VM is to execute significant fraction of the instructions directly on native hardware. the overhead for emulating the remaining instructions depends on the number of such instructions, the complexity to discover this instructions, and the data structure and algorithm used for emulation.

(2) Memory
In a system VM, each guest VM has its own set of virutal memory tables. A guest's real address must undergo a further mapping to derive the physical address. Memory resource virtualization is done differently depending on if the page table or TLB is architected. If page table is architected, its structure is defined by the ISA, and the OS and hardware cooperate in maintaining and using it. TLB is maintained by hardware not visible to OS. When TLB misses, hardware walk the page table to find the new entry to load. If the mapping is not found, a page fault result and OS take over. If TLB is architected, its structure and special instruction used to manipulate it is defined in the ISA. Hardware is unaware of the page table. When there is a TLB miss, a trap is sent to OS to be handle instead.

Most older ISA uses architected page table. Some of the more recent RISC ISA use architected TLB.

Vitualizing architected page tables - Each guest OS maintains its owne page table. Virtual-to-physical address mapping is maintained by VMM in shadow page tables., one for each VM. these tables are actually used by hardware to do the address translation and to keep TLB up-to-date. the shadow page tables eliminate the addition translation required from real to physical address. VMM control the real page table pointer. When VMM activates a guest, it loads the page table pointer to use the correct shadow page table for this guest. Guest OS attempts to read/write to the page table pointer will result in a trap to VMM. The trap is generated either because these instructions are privilege or via the wrapper code around these instructions. For read instruction, VMM return the value of the guest's virutal page table pointer (in memory block). For write instruction, VMM updates both virtual and real pointer. Where there is a page fault, the page may or may not have been mapped in the guest's virutal page tables. If it is mapped, the fault is handled entirely by VMM. The guest is not notified of it. If it is not mapped, VMM transfer control to the page faulter handler of the guest OS. However, many of the instruction issued by the page fault handler must be intercepted and handled by VMM because these instructions are privilege or the memory where the virtual page tables reside are write protected by VMM. Doing I/O to real address is also tricky as the physcial address may not be contiguous as the real address is. In this case, the I/O instruction may need to be broken up to multiple I/O instructions to perform I/O on discontinuous blocks of memory. These operation can degrade the performance substantially.

virtualizing TLB - When ISA provides a software managed TLB, the TLB must be virtualized. VMM must mainains a copy of TLB for each guest and intercept instruction so that VMM can keep these TLB up-to-date. One approach is for VMM to load the TLB whenevem a guest VM is activated. This TLB rewrite incur quite high overhead especially for large TLB. An alternate approach is to leverage the address space ID (ASID) that are part of the architect TLB. This allows the TLB to contains entries for various guest simultaneously. A special regiester containing the current ASID is use to indicate the entries in TLB to be used for translation.

(3) I/O
This is one of the more difficult parts of implementing a system VM because there is a large number of device types and a large number of devices in each types.

dedicated devices - e.g. display, keyboard, mouse and speaker. The device itself does not necessarily have to be virtualized. Requests to and from the device could theoretically bypass the VMM and go directly to the guest. However, this is not the case as guest usually runs in user mode. VMM will capture the interrupt and pass to the guest VM when it is activated.

partitioned devices - the VMM translate the parameters into corresponding parameters for the underlying physical device, using a map and reissue the request. Similarly, the result is translated on its way back.

shared devices - e.g. network adaptor. Each guest has its virtual stage of the device e.g. network address, port. VMM translate the parameters and result similar to partitioned devices.

spooled devices - e.g. printer. Printer is solely under control of one program until the printing is complete. This is true even if the program is swapped out. VMM maintain a spool table which consolidate the spool entries in each guest. The virtualization of printer of this kind was more appropriate for older line printers which were expensive and attached to the machine. Nowaday, network printers usually have its own buffer and spool jobs from machines across network.

Generally, VMM can intercept a guest's I/O action and convert it from a vertial device action toa real device action at 3 interfaces

virtualizing at I/O operation level - Memory mapped I/O, commonly in many RISC platform, perform I/O by reading/writing to a special memory location. These locations are protected by OS which make them inaccessible by user mode programs. System like S/360 and IA-32 performed I/O using special command (e.g. SIO). These privilege nature make them easy to be trapped by VMM. However, a I/O request from application typically results in a few such I/O request. Reverse engineering these requets to deduce the I/O action is extremely difficult in practice.

virtualizing at Device Driver level - this is straightforward and allow virtualization at a natural points. However, it requires the VMM developer have knowledge of the guest OS and its device interface. Special virtual device drivers can be developed for each guest. These drivers are bundled and installed as part of VMM. This can be simplified by "borrowing" the drivers (and interface) from an existing OS. In a hosted VM environment, the drivers in the host OS are used.
virtualizing at system call level - this is done by intecepting the initail I/O request at the OS inerface, the ABI. Then the entire I/O the be done by VMM. To do this, VMM must shadow (emulate) the ABI routines. This is a daunting task.

No comments: