Wednesday, July 20, 2011

Incremental Predecoding and Translation

To tackle the code discovery problem, a general solution is to translate the binary while the program is operating on actual input data, i.e. dynamically, and to predecode or tranate new sections of code incrementally as the program reaches them.

There are 4 components:
(1) The emulation manager (EM) controls the overall flow.
(2) The interpretor translates the source code to intermediate code
(3) The translator translates to the target binary code
(4) The map table associates the SPC, for a block of source code with TPC for the corresponding block of translated code. The map table is typically implemented using hash table.

The system translates one block of code at a time. The unit for translation is called dynamic basic block. This is different from the static basic block which determined by the static structure of the program. A static baisc block starts a a branch label and end at a branch instruction. For example,

- - - - - - - - - start of block 1
add...
load....
store...
- - - - - - - - - end of block 1/start of block 2
loop: load...
add...
store
brcond skip
- - - - - - - - - end of block 2/start of block 3
load...
sub...
- - - - - - - - - end of block 3/start of block 4
skip: add...
store
brcond loop
- - - - - - - - - end of blcock 4/start of block 5
add...
load...
store...
jump indirect
- - - - - - - - - end of block 5


A dynamic basic block is determined b the acutal flow of a programas it is executed. It begins at the instruction executed immediatelhy after a branch or jump, follow the sequential stream, and end with the next branch or jump. For example,

- - - - - - - - - start of block 1
add...
load....
store...
loop: load...
add...
store
brcond skip
- - - - - - - - - end of block 1/start of block 2
load...
sub...
skip: add...
store
brcond loop
- - - - - - - - - end of blcock 2/start of block 3
loop: load...
add...
store
brcond skip
- - - - - - - - - end of block 3/start of block 4
skip: add...
store
brcond loop
- - - - - - - - - end of block 4

Incremental (staged) translation works as follow. After the source code binary is loaded into memory, the EM begins interpreting the binary using a simple decode-and-dispatch loop or indirect threaded method. When the EM reaches a bracnh or jump, the SPc-TPC mapping table is consulted.

If it is a miss, the profile table is checked to see if the next block is hot (i.e. it has been used a few times above a predefined threshold). If it is not, update the profile table and control is passed back to the interpreter. If it is hot,the next block is translated and placed into the code cache, and SPC-TPC mapping table is updated.

If it is a hit, the EM transfer to the code cache. Exection start in the code block and follows the linked block until the forward link end and control passes back to EM.

When the interpreter or translated code hit an OS call, control is passed back to the EM. The instruction is processed by the OS emulator. When the Os emulator returns, the control passes back to EM which do a SPC-TPC look up and continue. Similarly, when the code genreates an exception, control is passed to the exception emulator.

EM follows the path of the soruce program and either directly executes the next block or begin translating the next dynamic basic block. Incrementally, more of the program is discovered and translated, until eventually only translated code is executed.

Further optimization includes:
(1) Translation chaining - similar to threading, instead of branching back to EM at end of the block, the jump is target directly to the next block
(2) Software Indirect Jump Prediction - implementing indirect jumps by map table lookup is expensive in execution time. Instead, use a series of compare statement for the source address value to determine the jump address reduce the overhead. For example, Rx is the register holding the indirect jump target PC value,

if (Rx == addr1) go to target1
else if (Rx == addr2) go to target2
else table_lookup(Rx)

(3) Shadow Stack - when a translated code block contains a procedure call via an indirect jump to a target binary routine, the SPC value must be saved by emulation code as part of the source architected state, either in register or in memory stack. when the called procedure completes, it can restore this SPC value, access the map table and jump to the translated address. As map table resolution is expensive, the overhead can be avoided if the target return PC value can be made available directly. In this case, the return value of the target code is pushed onto a shadow stack maintained by the emulation manager. In case the return SPC may have changed during the call, the SPC is pushed onto the shadow stack too. upon return, the SPC on the stack is checked against the SPC for the shadow stack before the link address is used.

No comments: