With the release of Windows 11, Microsoft is introducing another iteration of control flow integrity mitigation called “eXtended Flow Guard” or XFG. In short, it further restricts targets of indirect calls to not only valid function entry points, but to a subset of functions that have a particular signature consisting of return value type, number and types of parameters and other function properties.
Surely, this added metadata can somehow aid us in our reverse engineering process. To see how, we’ll need to understand the implementation details.
NOTE: The following article first appeared in print in issue 0x22 of the International Journal of “Proof Of Concept || GTFO”. It is accompanied by a Binary Ninja plugin.
Control flow integrity protections, with its various implementations, have been the latest iteration of compiler mitigations for memory corruption exploits. They hope to make code reuse attacks more difficult or impossible. Implementation details vary, but all boil down to restricting possible valid targets of indirect calls. LLVM’s is called “Control Flow Integrity,” Grsecurity has “Reuse Attack Protector” and Microsoft’s is called “Control Flow Guard” (CFG).
The core idea behind Microsoft’s CFG is ensuring that function pointers can only point to valid function entry points before being used to perform a function call. The compiler inserts checks that, during runtime, inspect every indirect call instruction and terminate the process if the target isn’t a valid and known function start.
Putting aside the completeness or effectiveness of this mitigation, let’s ask whether we can glean some extra information about the code itself by the presence of these checks. As Deroko points out in Control Flow Guard Instrumentation,1 CFG mechanisms can serve as a way to hook all indirect calls in a binary without specifically looking for them in advance. They can also precisely identify function entry points, which is not always a trivial task.
Extended Flow Guard is introduced as a compiler extension that can be
enabled via /guard:xfg switch that is available in MS’s C and C++
compilers since at least the 19.27.29112 version of Visual Studio 2019.
It hasn’t seen full support or much public use until release of Windows
11. Consider an example:
int test( ){
return 0;
}
int (*cfgTest[1])() = {test};
int main(){
cfgTest[0]();
}
This code has a simple function pointer array cfgTest and makes a call
to test using that function pointer. If compiled with
cl /Zi /guard:xfg simple.c its assembly looks a little odd.
sub rsp, 38h
mov eax, 8
imul rax, 0
lea rcx, cfgTest
mov rax, [rcx+rax]
mov [rsp+38h+var_18], rax
mov r10, 0D30527475E523070h
mov rax, [rsp+38h+var_18]
call cs:__guard_xfg_dispatch_icall_fptr
xor eax, eax
add rsp, 38h
retn
This is some peculiar code. There is no indirect call to function
test, rather there’s a call to __guard_xfg_dispatch_icall_fptr with
certain arguments. The function pointer is actually saved in rax and
an odd-looking constant is moved into r10 before
__guard_xfg_dispatch_icall_fptr is called. This odd-looking constant
is what we will call an XFG hash. Interestingly, if we take a look at
test function’s prologue on page , we’ll see (almost) the same data.
.text:0000000140001008 dq 0D30527475E523071h
.text:0000000140001010
.text:0000000140001010 ; ============= S U B R O U T I N E =================================
.text:0000000140001010
.text:0000000140001010
.text:0000000140001010 ; int test(...)
.text:0000000140001010 test proc near ; DATA XREF: .rdata:__guard_fids_table
.text:0000000140001010 ; .data:cfgTest
.text:0000000140001010 xor eax, eax
.text:0000000140001012 retn
.text:0000000140001012 test endp
Long story short, before invoking the target function,
__guard_xfg_dispatch_icall_fptr will check that the hash in r10
matches the hash located right before the function. If they don’t match,
process is terminated.2
This ensures that only legal target functions can be executed at this particular indirect function call. The next obvious question is: how is this function hash derived? That brings us to the core idea behind XFG.
If we think about it, no matter how an indirect call instruction happens to be generated by the compiler, several things are true for all the possible, valid, target functions in a valid program. All possible target functions must have the same number of arguments, the same calling convention, same argument types, same return value type and so on. Even if the compiler doesn’t know of all possible target functions in advance, it must know all of these facts about those targets. It can, then, generate a unique representation of those facts when it encounters an indirect function call. On the other hand, for every function that could be a possible target for indirect call, the same unique representation can be calculated and those two can be compared during runtime.
This unique representation of function prototype information is what constitutes an XFG hash.
Francisco Falcon over at Quarkslab has already done the hard work of
reverse engineering most of XFG internals. Their extended writeup
provides a number of examples.3 XFG hash generation happens in the
cl.exe compiler’s frontend c1.dll and revolves around gathering
function prototype information and using the SHA256 hashing algorithm on
it while following certain rules. A list of function properties that
figure into the XFG hash is (as far as C code is concerned at least) as
follows:
number of arguments
the types of individual arguments
type of return value
whether the function is variadic or not
the calling convention
When preparing to calculate the hash, each of these is represented in a specific way. Some are simple constants, while others have more structure and are often recursively defined. For example, the number of arguments is just represented as a 32-bit integer, the calling convention appears to be a 16-bit constant, and variadic is one byte boolean. Return value and argument types, on the other hand, are more complicated.
Those consist of values specifying type qualifiers (const,
volatile), type groups (primitives, pointers, structs/unions/enums),
and values according to the type group. Calculating values for primitive
types are the simplest and are just a table lookup:
"void" :0xe,
"char" :0x1,
"signed char" :0x1,
"unsigned char" :0x1,
"__int8" :0x1,
"char8_t" :0x1,
"__int16" :0x6,
"short int" :0x6,
"unsigned short int" :0x86,
"float" :0x11,
"int" :0x7,
"__int32" :0x7,
"unsigned int" :0x87,
"long int" :0x10,
"unsigned long int" :0x8a,
"double" :0x12,
"__int64" :0x8,
"long double" :0x12,
"long long int" :0x8,
"unsigned long long int" :0x88,
"unsigned long long" :0x88,
Notice that there are several distinct primitive types that have the same value. Structs, unions, and enums are treated the same, and their actual (verbatim text) names are included as part of a hash calculation.
Pointers of any kind are the most complicated, as their value is the hash of the type they point to, requiring recursive evaluation.
This can look a bit confusing and — although it’s covered in great
detail in the referenced Quarkslab article — we’ll illustrate the
process with the simplest example. We’ll add a void pointer as an
argument to test from before:
int test(void *arg);
First, there’s only a single argument to this function, so we will
append “\x01\x00\x00\x00” to our data to be hashed (data0). Second,
we need to consider function arguments, calculate their hashes, and
append them to data to be hashed. There is only one argument and it’s a
pointer without qualifiers. Starting a new hash (data1), we append
“\x00” for qualifiers, “\x03” for type group but then we need to
consider the type of pointer and calculate that hash separately.
Starting yet another hash calculation (data2), we append “\x00” for
qualifiers, “\x01” for type group and finally “\x0e” for primitive
type. Calculate the SHA256 of data2 and append its first 8 bytes to
data1 that completes necessary data for calculating first argument
hash. Hash data1 and append the first 8 bytes to data0. That
concludes the argument part of the hash. Next is whether the function is
variadic, so we append “\x00” and what the calling convention is,
which defaults to just “\x01”. The last segment is the return value
type which is an integer primitive, so it’s simply “\x00” for
qualifiers, “\x01” for type group and finally 0x7 for a primitive
type. The hash of that is appended to data0.
Putting that together gives us the following, with all SHA256 results truncated to the first eight bytes.
sha256("\x01\x00\x00\x00"
+sha256("\x00\x03"+sha256("\x00\x01\x0e"))
+"\x00"+"\x01"+sha256("\x00\x01\x07"))
After some final transformations, the result of the operation is the
“719a5e10e1b2” value that appears before the test function in the
binary.
An implementation of this algorithm, in Python, that parses a given C function prototype and generates its corresponding hash can be found as an attachment.4
Now that we know how XFG works, we can consider how it can be of use as a reverse engineering aid.
The first, and most obvious idea is that it can reduce the uncertainty
of analyzing indirect calls. Since all indirect calls in an
XFG-protected binary will inevitably be dispatched through
__guard_xfg_dispatch_icall_fptr that must match callsite’s hash and
target function’s hash, it should be possible to enumerate all possible
targets completely statically (assuming all possible linked code is
known/available for analysis).
Let’s illustrate this with an example. Throughout the rest of the
article, we’ll use ntdll.dll binary from Windows 11 for illustrations
and testing. If we go to function LdrQueryProcessModuleInformationEx
and take a look at the following piece of assembly:
18000174e 488d04bf lea rax, [rdi+rdi*4]
180001752 49ba7048da56963e... mov r10, 0x85f13e9656da4870
18000175c 498b44c118 mov rax, qword [r9+rax*8+0x18]
180001761 ff15a9181900 call qword [rel __guard_xfg_dispatch_icall_fptr] {j_sub_1800aa130}
180001767 4c8d0df2b71200 lea r9, [rel data_18012cf60]
While we don’t know without debugging what possible target this XFG
dispatch call might have, we can see that its hash must be
0x85f13e9656da (the $1$ is added at the end of the supplied hash by
dispatcher). If we search the binary for functions that have this XFG
hash, we’ll find many results: LdrQueryModuleInfoLocalLoaderUnlock,
LdrShutdownThread, LdrShutdownProcess, RtlDetectHeapLeaks,
TpTrimPools, RtlCleanUpTEBLangLists,
RtlFreeThreadActivationContextStack,
LdrProcessInitializationComplete, RtlFlushHeaps,
RtlReleasePebLock, RtlAcquirePebLock,
LdrFastFailInLoaderCallout, …
Obviously, from the function names, not all of these make sense as
possible targets for this indirect call because of their differing
semantics, but there’s a good chance that all with Ldr prefix are
actual possible targets.
Why are there so many hash hits that are unlikely to be real targets?
It’s probable that the target function prototype in this case is very
simple, and matches many other functions. In fact, hash
0x85f13e9656da represents the simplest possible case of ‘void
fname()’. As another example, the TppCallbackEpilog function has the
following indirect call:
18001766e 488b8eb8000000 mov rcx, qword [rsi+0xb8]
180017675 4c89aeb8000000 mov qword [rsi+0xb8], r13 {0x0}
18001767c 488b4108 mov rax, qword [rcx+0x8]
180017680 49ba70125178f527...mov r10, 0xa6d127f578511270
18001768a 488b4008 mov rax, qword [rax+0x8]
18001768e ff157cb91700 call qword [rel __guard_xfg_dispatch_icall_fptr] {j_sub_1800aa130}
Looking up the target hash, 0xa6d127f5, in the binary yields:
TppSimplepFree, TppWorkpFree, TppAlpcpCallbackEpilog,
TppJobpCallbackEpilog, TppFreeWait, TppTimerpFree, TppIopFree,
TppAlpcpFree, TppJobpFree, TppWorkCancelPendingCallbacks,
TppIopCancelPendingCallbacks.
All of these look like possible real targets given their context.
So while not completely precise, this simple static analysis that relies on XFG hashes definitely sheds some light on indirect calls that might otherwise remain completely unresolved.
Attached is a Binary Ninja plugin that annotates indirect calls with information gained by XFG analysis.5
Another, more involved, idea stems from the fact that XFG hashes aren’t random and actually encode function prototypes. Surely, there would be a way to recover at least some of that information and make use of it.
While it is not possible to reverse the hash back to function prototype
directly, it is perfectly feasible to precompute a lookup table for all
possible function prototypes (up to certain number of arguments). If we
ignore structs, unions and enums for a second, there are only a fairly
small number of primitive types. In fact, if we remove the duplicates,
there’s a total of only 12 primitive types (with distinct values as far
as XFG generation is concerned). Adding in type qualifiers (const,
volatile) and pointers, a bit of simple combinatorics tells us that
total number of all possible function prototypes is roughly
$(12 * 3)^{num_args}+1$ .
This gets big very fast as we increase the number of arguments, but the whole list is precomputed in minutes for functions up to three arguments.
import sys
import itertools
from jinja2 import Template
types = ["void", "char", "short int",
"unsigned short int", "float",
"int", "unsigned int", "long int",
"unsigned long int", "double",
"long long int", "unsigned long long"]
# add all types as pointers
types += [x + " *" for x in types]
# and as consts
types += ["const " + x for x in types]
# and as volatiles
types += ["volatile " + x for x in types]
j2_template = Template("""
{{ret_type}} fname( {%- for param_type in param_types -%} {{param_type}} arg{{loop.index}}{{ "," if not loop.last }} {%- endfor -%});
""")
max_func_params = 3
f = open(sys.argv[1], "w")
i = 0
for ret_type in types:
for pn in range(4, max_func_params+1):
for c in itertools.product(types,
repeat=pn):
f.write(j2_template.render({"ret_type": ret_type, "param_types": c}))
i+=1
f.close()
This code uses a jinja2 template to generate an exhaustive list of all possible function prototypes starting with given primitive types. These generated prototypes can then be fed into the hash generation algorithm to compile a lookup table.
So, does this work? We’ll test this on ntdll.dll again. This
particular version of the DLL has a total of 1564 functions that have an
XFG hash associated with them. Out of those 1564, there are a total of
995 unique XFG hashes. After lookups, this simple matching has
identified function prototypes for 131 unique hashes, corresponding to a
total of 294 functions!
By simply precomputing all possible function prototypes up to three
parameters (using nothing target specific, only primitive types) we were
able to recover precise function prototypes for about 13% of unique
hashes in ntdll.dll.
Figure has some examples:
char RtlGetSecurityDescriptorRMControl(void *arg1, char *arg2);
unsigned long int RtlNumberOfSetBitsUlongPtr(unsigned long long int arg1);
char RtlEqualWnfChangeStamps(unsigned long int arg1, unsigned long int arg2);
unsigned long int RtlSetProxiedProcessId(unsigned long int arg1);
void RtlWnfDllUnloadCallback(void *arg1);
void *memchr(const void *arg1, int arg2, unsigned long long arg3);
char *strchr(const char *arg1, int arg2);
unsigned long long strcspn(const char *arg1, const char *arg2);
unsigned long long strnlen(const char *arg1, unsigned long long arg2);
char *strpbrk(const char *arg1, const char *arg2);
char *strrchr(const char *arg1, int arg2);
unsigned long long strspn(const char *arg1, const char *arg2);
char *strstr(const char *arg1, const char *arg2);
int tolower(int arg1);
int WinSqmCommonDatapointSetDWORD64(unsigned long int arg1, unsigned long long arg2, unsigned long int arg3);
int WinSqmCommonDatapointSetString(unsigned long int arg1, const unsigned short int *arg2, unsigned long int arg3);
int WinSqmGetInstrumentationProperty(const unsigned short int *arg1, const unsigned short int *arg2, unsigned short int *arg3, unsigned long int *arg4);
int WinSqmIsOptedInEx(unsigned long int arg1);
void AlpcGetCompletionListLastMessageInformation(void *arg1, unsigned long int *arg2, unsigned long int *arg3);
unsigned long int DbgPrompt(const char *arg1, char *arg2, unsigned long int arg3);
char RtlQueryProcessPlaceholderCompatibilityMode();
char RtlSetProcessPlaceholderCompatibilityMode(char arg1);
char RtlIsNonEmptyDirectoryReparsePointAllowed(unsigned long int arg1);
char RtlIsZeroMemory(void *arg1, unsigned long long arg2);
unsigned short int RtlLogStackBackTrace();
void *RtlLogStackTrace(unsigned long int arg1);
void RtlReleaseStackTrace(void *arg1);
The proof-of-concept works, but there are a couple of reasons why we
didn’t get a higher hit rate. First and most obvious is that many
functions simply have more than three arguments, but even bigger factor
is the fact that ntdll.dll code heavily relies on use of structures,
enums, and structure pointers. Since hashes for struct, union, and enum
types include their names directly, straight up brute forcing isn’t
practical, but seeding certain (domain specific) names would greatly
increase the hit rate. XFG hash calculation implementation supports
structs in function prototypes, and since structs, enums, and unions are
treated the same, all we need to do to add struct names is to expand the
list of primitive types. Adding struct in_addr to list of primitive
types leads to following result:
7139d252a1b76de8 char *func( const struct in_addr *arg1, char *s)
This calculated hash matches the XFG hash for RtlIpv4AddressToStringA. By adding target specific, commonly used, structs to prototype generation we can greatly increase the number of found hashes at the expense of a larger lookup table. Since structures and other type information are sometimes publicly available even if function prototypes are not, this allows for very precise function prototype recovery.
How do we know that these results are actually correct? Let’s take another look at an example where we do know the function prototype. Function ‘RtlSetUserValueHeap’ has four arguments. Binary Ninja guesses its prototype to be:
void* const* RtlSetUserValueHeap(int64_t arg1, int32_t arg2, int64_t arg3, int64_t arg4);
Similarly, IDA guesses:
char __fastcall RtlSetUserValueHeap(__int64 a1, unsigned int a2, __int64 a3, __int64 a4)
This function’s XFG hash is 0xc76c585af171 and a lookup reveals the
following function prototype:
char RtLSetUserValueHeap(void *arg1,unsigned long int arg2,void *arg3, void *arg4);
Notice how both Binary Ninja and IDA cannot know that some of the arguments are pointers. This simple fact adds a lot of information that greatly aids further function analysis and decompilation. And what about correctness? While source for ‘RtLSetUserValueHeap’ isn’t available, it is reimplemented in ReactOS where its function prototype is:
BOOLEAN
NTAPI
RtlSetUserValueHeap(
_In_ PVOID HeapHandle,
_In_ ULONG Flags,
_In_ PVOID BaseAddress,
_In_ PVOID UserValue
);
While the prototype gathered from XFG analysis lacks some extra annotations, the types themselves match precisely!
Even though mitigations like XFG pose a real challenge when it comes to exploitation, it sometimes pays off to take a step back and consider the possible side effects that can be handy in other ways. We’ve shown that a very simple lookup table can recover a treasure trove of information that can be helpful when reverse engineering an XFG-protected binary. As XFG adoption spreads to code other than Microsoft’s, this can definitely lead to some interesting discoveries.
unzip pocorgtfo22.pdf cfghook.zip ↩︎
A great in-depth description from Connor McGar is available as Exploit Development: Between a Rock and a (Xtended Flow) Guard Place: Examining XFG. ↩︎
See How the MSVC Compiler Generates XFG Function Prototype Hashes by Francisco Falcon. ↩︎
unzip -p pocorgtfo22.pdf xfg-scripts-args.tgz tar -xzvf- gen_hash_from_ast.py ↩︎
unzip -p pocorgtfo22.pdf xfg-scripts-args.tgz tar -xzvf- xfg_analyzer.py ↩︎