Everybody knows how useful unwinding stack is.
There are some ways to implement this.
1. By using FP. (at runtime without risk, but limited)
If FP is available, unwinding stack is very easy. Nothing to explain. (MSVC7 uses FP in debug mode build on Intel-CPU-PC.)
2. By pseudo execution.(at runtime with risk.)
If FP is not available, we can get LR values by continuous pseudo execution analyzing opcode.
Prerequisite to do this is “Knowing stack, register and values of code area – instructions”. So, after dumping stack and register, we can do this outside process by reading elf and dumped data. Or this can be done inside process at runtime, because at runtime, we know all those. During pseudo execution, we should skip ‘function call’; we are not making perfect emulator!. Because of this, we cannot trust register value obtained by calculation or read from memory except for the one from stack; we cannot know things done inside skipped function and returned value. What we can trust is only the value read from stack. And especially, what we are interested in is LR value – usually, located in the stack.
In general, function – sub routine – is consist of prologue, main body and epilogue. In the prologue, function saves previous(caller) state. And at the epilogue, control back to caller state. And, at this moment, PC is back to LR. So, finding epilogue part is significant. As I mentioned, register value this is not read from stack is unreliable. Therefore, operation with these value is meaningless. But focusing on PC, LR and SP is enough for unwinding stack. We should track these 3 values by pseudo execution. Therefore, we can narrow down instructions to those that are affect to above three registers and executing these instruction virtually is efficient and reasonable even if it is not 100% enough.
We can consider handling following instructions in case of ARM Thumb mode.
unsigned short instr; (instr & 0xfc00) == 0x4400 : Hi register operations/branch exchange. (instr & 0xff00) == 0xb000 : SP operation. (instr & 0xf600) == 0xb400 : push & pop (instr & 0xf800) == 0xe000 : B label.
There is important thing to take care of. During pseudo running, instruction is fetched from code area. But, this unwinding doesn’t emulate perfectly. So, sometimes, we may come across unexpected case; Value of pseudo PC is invalid. Unwinding routine tries to read value from the address of pseudo PC. This result in “Data Abort Exception” in ARM. Yes, this is very dangerous. So, we should check that pseudo PC value is valid or not carefully. Or instead of unwinding , we can just dump stack and register values, and then unwind with this data outside process. Dumping stack and register values is not dangerous at all.
Unwinding by pseudo execution can give quite reliable unwinding result. And even without debugging information, we can use this. So, it is very powerful. Example code for unwinding stack with this way in ARM can be found at here.
3. By filtering values in the stack. (at runtime without risk.)
Critical disadvantage of “2” is that there is possibility of crash during pseudo execution.
Someone may need safer one. And that is the way using stack filtering. This is very safe but less accurate; But it gives enough information to guess call stack.
Basic concept is,
At function prologue, usually, LR is stored at the stack. So, most of LRs that can show call stack are somewhere in the stack. So, stack dump already includes call stack. The issue is how can we extract valid information to know call stack. Simple and efficient way is filtering address that in the range of code area from stack dump. Than this will be superset of real call stack.
Even if the result is superset, developer can know which one is dummy value intuitively based on his/her experience. So, this is useful. Above all, this is very safe because the only operation that can be wrong is reading stack value. But software can verify whether SP is valid or not by comparing with TCB. If SP is not valid, we can ignore this to avoid software crash. Same with above (pseudo execution), we also do this outside process with dumped data.
Then, how can we know that address value is in code range or not?
In the process (unwinding at runtime), we can use linker-generating-symbols that indicates memory block. In case ADS or RVCT, we can refer scatter-load file.
Outside process, we can use information of ELF file, Or using memory map file is also fine.
4. Analyzing debugging (ex. DWARF) information with dumped stack. (Impossible at runtime.)
This way can give best information. The amount of information is totally depends on amount of dump. If we dumped entire RAM, we can know every information at that moment; It’s just like the state that program is stopped at the break point using interactive debugger. But it is not easy to implement.
We should remind one thing. In case assembly codes, usually, generated debugging information by compiler is less than C. And sometimes, this prevent us from walking call stack. So, we would better to use “2 – pseudo execution – way” together to handling exceptional case like this. That is, if debugging information is available, it is used. But not, unwinding by pseudo execution.
Here is more detail sample example.