- Development Basics 开发基础
- Adding a Command 添加命令
- Adding a Configuration Option
添加配置选项 - Porting public tools 移植公共工具
- Random developer notes 随机开发者笔记
- Annotations 注释
After installing pwndbg
by running setup.sh
, you additionally need to run ./setup-dev.sh
to install the necessary development dependencies.
安装 pwndbg
后,通过运行 setup.sh
,您还需要运行 ./setup-dev.sh
以安装必要的开发依赖项。
If you would like to use Docker, you can create a Docker image with everything already installed for you. To build and
run the container, run the following commands:
如果您想使用 Docker,可以创建一个已经安装好所有内容的 Docker 镜像。要构建和运行容器,请运行以下命令:
docker build -t pwndbg .
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -v `pwd`:/pwndbg pwndbg bash
If you'd like to use docker compose
, you can run
如果您想使用 docker compose
,可以运行
docker compose run -i main
There is a development shell defined in the flake that should install all of the development requirements. To enter the
environment run nix develop
or automatically enter the environment using direnv
.
在 flake 中定义了一个开发 shell,应该会安装所有开发要求。要进入环境,请运行 nix develop
或使用 direnv
自动进入环境。
When testing changes run nix build .#pwndbg-dev
and use the copy of the files in the results/
folder.
在测试更改时运行 nix build .#pwndbg-dev
并使用 results/
文件夹中的文件副本。
It's highly recommended you write a new test or update an existing test whenever adding new functionality to pwndbg
.
强烈建议您在添加新功能时编写新测试或更新现有测试 pwndbg
。
We have four types of tests: gdb-tests
,qemu-tests
, unit-tests
, and Linux kernel tests, which are all located in subdirectories of tests
.
我们有四种类型的测试: gdb-tests
、 qemu-tests
、 unit-tests
和 Linux 内核测试,它们都位于 tests
的子目录中。
gdb-tests
refers to our x86 tests, which are located tests/gdb-tests
.
gdb-tests
指的是我们的 x86 测试,这些测试位于 tests/gdb-tests
。
To run these tests, run ./tests.sh
. You can filter the tests to run by providing an argument to the script, such as ./tests.sh heap
, which will only run tests that contain "heap" in the name. You can also drop into the PDB debugger when a test fails with ./tests.sh --pdb
.
要运行这些测试,请运行 ./tests.sh
。您可以通过向脚本提供参数来过滤要运行的测试,例如 ./tests.sh heap
,这将仅运行名称中包含“heap”的测试。当测试失败时,您还可以通过 ./tests.sh --pdb
进入 PDB 调试器。
To invoke cross-architecture tests, use ./qemu-tests.sh
, and to run unit tests, use ./unit-tests.sh
要调用跨架构测试,请使用 ./qemu-tests.sh
,要运行单元测试,请使用 ./unit-tests.sh
To run the tests in the same environment as the testing CI/CD, you can use the following Docker command.
要在与测试 CI/CD 相同的环境中运行测试,您可以使用以下 Docker 命令。
# General test suite
docker compose run --rm --build ubuntu24.04-mount ./tests.sh
# Cross-architecture tests
docker compose run --rm --build ubuntu24.04-mount ./qemu-tests.sh
This comes in handy particularly for cross-architecture tests because the Docker environment has all the cross-compilers installed. The active pwndbg
directory is mounted, preventing the need for a full rebuild whenever you update the codebase.
这在跨架构测试中尤其方便,因为 Docker 环境中安装了所有交叉编译器。活动的 pwndbg
目录被挂载,避免了每次更新代码库时都需要完全重建。
Remove the -mount
if you want the tests to run from a clean slate (no files are mounted, meaning all binaries are recompiled each time).
如果您希望测试从干净的状态运行(没有文件被挂载,这意味着每次都重新编译所有二进制文件),请移除 -mount
。
Each test is a Python function that runs inside of an isolated GDB session.
Using a pytest
fixture at the beginning of each test,
GDB will attach to a binary
or connect to a QEMU instance
.
Each test runs some commands and uses Python assert
statements to verify correctness.
We can access pwndbg
library code like pwndbg.aglib.regs.rsp
as well as execute GDB commands with gdb.execute()
.
每个测试都是一个在隔离的 GDB 会话中运行的 Python 函数。在每个测试开始时使用 pytest
夹具,GDB 将附加到 binary
或连接到 QEMU instance
。每个测试运行一些命令,并使用 Python assert
语句来验证正确性。我们可以访问 pwndbg
库代码,如 pwndbg.aglib.regs.rsp
,以及使用 gdb.execute()
执行 GDB 命令。
We can take a look at tests/gdb-tests/tests/test_symbol.py
for an example of a
simple test. Looking at a simplified version of the top-level code, we have this:
我们可以查看 tests/gdb-tests/tests/test_symbol.py
的示例,了解一个简单测试的情况。查看顶层代码的简化版本,我们有:
import gdb
import pwndbg
import tests
BINARY = tests.binaries.get("symbol_1600_and_752.out")
Since these tests run inside GDB, we can import the gdb
Python library. We also import the tests
module, which makes it easy to get the path to the test binaries located in tests/gdb-tests/tests/binaries
. You should be able to reuse the binaries in this folder for most tests, but if not feel free to add a new one.
由于这些测试在 GDB 内部运行,我们可以导入 gdb
Python 库。我们还导入 tests
模块,这使得获取位于 tests/gdb-tests/tests/binaries
的测试二进制文件的路径变得简单。您应该能够重用此文件夹中的二进制文件进行大多数测试,但如果不能,请随意添加新的。
Here's a small snippet of the actual test:
这是实际测试的一个小片段:
def test_hexdump(start_binary):
start_binary(BINARY)
pwndbg.config.hexdump_group_width.value = -1
gdb.execute("set hexdump-byte-separator")
stack_addr = pwndbg.aglib.regs.rsp - 0x100
pytest
will run any function that starts with test_
as a new test, so there is no need to register your new test anywhere. The start_binary
argument is a function that will run the binary you give it, and it will set some common options before starting the binary. Using start_binary
is recommended if you don't need any additional customization to GDB settings before starting the binary, but if you do it's fine to not use it.
pytest
将运行任何以 test_
开头的函数作为新测试,因此无需在任何地方注册您的新测试。 start_binary
参数是一个将运行您提供的二进制文件的函数,并且在启动二进制文件之前会设置一些常见选项。如果您在启动二进制文件之前不需要对 GDB 设置进行任何额外的自定义,建议使用 start_binary
,但如果需要,也可以不使用它。
Our gdb-tests
run in x86. To debug other architectures, we use QEMU for emulation, and attach to its debug port. These tests are located in tests/qemu-tests/tests/user
. Test creation is identical to our x86 tests - create a Python function with a Pytest fixture name as the parameter (it matches based on the name), and call the argument to start debugging a binary. The qemu_assembly_run
fixture takes in a Python string of assembly code, compiles it in the appropriate architecture, and runs it - no need to create an external file or edit a Makefile.
我们在 x86 中运行 gdb-tests
。要调试其他架构,我们使用 QEMU 进行仿真,并连接到其调试端口。这些测试位于 tests/qemu-tests/tests/user
。测试创建与我们的 x86 测试相同 - 创建一个 Python 函数,参数为 Pytest fixture 名称(根据名称匹配),并调用该参数以开始调试二进制文件。 qemu_assembly_run
fixture 接收一段汇编代码的 Python 字符串,在适当的架构中编译并运行 - 无需创建外部文件或编辑 Makefile。
We use qemu-system
for full system level emulation for our Linux kernel tests. These are located in tests/qemu-tests/tests/system
. The tests will run for a variety kernel configurations and architectures.
我们使用 qemu-system
进行完整系统级仿真,以进行我们的 Linux 内核测试。这些测试位于 tests/qemu-tests/tests/system
。测试将针对多种内核配置和架构运行。
You will need to build a nix-compatible gdbinit.py
file, which you can do with nix build .#pwndbg-dev
. Then simply
run the test by adding the --nix
flag:
您需要构建一个与 nix 兼容的 gdbinit.py
文件,您可以使用 nix build .#pwndbg-dev
来完成。然后只需通过添加 --nix
标志来运行测试:
./tests.sh --nix [filter]
The lint.sh
script runs isort
, ruff
, shfmt
, and vermin
. isort
and ruff
are able to automatically fix any
issues they detect, and you can enable this by running ./lint.sh -f
. You can find the configuration files for these
tools in pyproject.toml
or by checking the arguments passed inside lint.sh
.
lint.sh
脚本运行 isort
、 ruff
、 shfmt
和 vermin
。 isort
和 ruff
能够自动修复它们检测到的任何问题,您可以通过运行 ./lint.sh -f
来启用此功能。您可以在 pyproject.toml
中找到这些工具的配置文件,或通过检查传递给 lint.sh
的参数来查看。
When submitting a PR, the CI job defined in .github/workflows/lint.yml
will verify that running ./lint.sh
succeeds, otherwise the job will fail and we won't be able to merge your PR.
提交 PR 时, .github/workflows/lint.yml
中定义的 CI 作业将验证 ./lint.sh
的运行是否成功,否则作业将失败,我们将无法合并您的 PR。
You can optionally set the contents of .git/hooks/pre-push
to the following if you would like lint.sh
to automatically be run before every push:
如果您希望在每次推送之前自动运行 lint.sh
,可以选择将 .git/hooks/pre-push
的内容设置为以下内容:
#!/usr/bin/env bash
./lint.sh || exit 1
Our goal is to fully support all Ubuntu LTS releases that have not reach end-of-life, with support for other platforms on a best-effort basis. Currently that means all code should work on Ubuntu 22.04, and 24.04 with GDB 12.1 and later. This means that the minimum supported Python version is 3.10, and we cannot use any newer Python features unless those features are backported to this minimum version.
我们的目标是完全支持所有尚未达到生命周期结束的 Ubuntu LTS 版本,并在尽力而为的基础上支持其他平台。目前,这意味着所有代码应在 Ubuntu 22.04 和 24.04 上与 GDB 12.1 及更高版本兼容。这意味着最低支持的 Python 版本是 3.10,除非这些新特性被移植到此最低版本,否则我们无法使用任何更新的 Python 特性。
Note that while all code should run without errors on these supported LTS versions, it's fine if older versions don't support all of the features of newer versions, as long as this is handled correctly and this information is shown to the user. For example, we may make use of some GDB APIs in newer versions that we aren't able to provide alternative implementations for in older versions, and so in these cases we should inform the user that the functionality can't be provided due to the version of GDB.
请注意,虽然所有代码在这些支持的 LTS 版本上都应该无错误运行,但如果旧版本不支持新版本的所有功能也是可以的,只要正确处理并向用户显示此信息。例如,我们可能在新版本中使用一些 GDB API,而在旧版本中无法提供替代实现,因此在这些情况下,我们应该告知用户由于 GDB 的版本,无法提供该功能。
The lint.sh
script described in the previous section runs vermin
to ensure that our code does not use any features that aren't supported on Python 3.10.
在前一节中描述的 lint.sh
脚本运行 vermin
以确保我们的代码不使用任何在 Python 3.10 上不受支持的功能。
Create a new Python file in pwndbg/commands/my_command.py
, replacing my_command
with a reasonable name for the command. The most basic command looks like this:
在 pwndbg/commands/my_command.py
中创建一个新的 Python 文件,将 my_command
替换为该命令的合理名称。最基本的命令如下所示:
import argparse
import pwndbg.commands
parser = argparse.ArgumentParser(description="Command description.")
parser.add_argument("arg", type=str, help="An example argument.")
@pwndbg.commands.ArgparsedCommand(parser)
def my_command(arg: str) -> None:
"""Print the argument"""
print(f"Argument is {arg}")
In addition, you need to import this file in the load_commands
function in pwndbg/commands/__init__.py
. After this, running my_command foo
in GDB or LLDB will print out "Argument is foo".
此外,您需要在 load_commands
函数中导入此文件,在 pwndbg/commands/__init__.py
中。之后,在 GDB 或 LLDB 中运行 my_command foo
将打印出“参数是 foo”。
import pwndbg
pwndbg.config.add_param("config-name", False, "example configuration option")
pwndbg.config.config_name
will now refer to the value of the configuration option, and it will default to False
if not set.
pwndbg.config.config_name
现在将引用配置选项的值,如果未设置,则默认为 False
。
TODO: There are many places GDB shows docstrings, and they show up slightly differently in each place, we should give examples of this
TODO:GDB 显示文档字符串的地方很多,而且每个地方的显示方式略有不同,我们应该给出这些例子的说明。
- When using
pwndbg.config.add_param
to add a new config, there are a few things to keep in mind:
使用pwndbg.config.add_param
添加新配置时,有几点需要注意:- For the
set_show_doc
parameter, it is best to use a noun phrase like "the value of something" to ensure that the output is grammatically correct.
对于set_show_doc
参数,最好使用名词短语,例如“某物的值”,以确保输出在语法上是正确的。 - For the
help_docstring
parameter, you can use the output ofhelp set follow-fork-mode
as a guide for formatting the documentation string if the config is an enum type.
对于help_docstring
参数,如果配置是枚举类型,可以使用help set follow-fork-mode
的输出作为格式化文档字符串的指南。 - For the
param_class
parameter
对于param_class
参数- See the documentation for more information.
请参阅文档以获取更多信息。 - If you use
gdb.PARAM_ENUM
asparam_class
, you must pass a list of strings to theenum_sequence
parameter.
如果您将gdb.PARAM_ENUM
用作param_class
,则必须将字符串列表传递给enum_sequence
参数。
- See the documentation for more information.
- For the
TODO: If we want to do something when user changes config/theme - we can do it defining a function and decorating it
with pwndbg.config.Trigger
.
TODO:如果我们想在用户更改配置/主题时做一些事情 - 我们可以定义一个函数并用 pwndbg.config.Trigger
装饰它。
If porting a public tool to pwndbg, please make a point of crediting the original author. This can be added to
CREDITS.md noting the original author/inspiration, and linking to the original tool/article. Also please
be sure that the license of the original tool is suitable to porting into pwndbg, such as MIT.
如果将公共工具移植到 pwndbg,请务必注明原作者。这可以添加到 CREDITS.md 中,注明原作者/灵感,并链接到原始工具/文章。同时,请确保原始工具的许可证适合移植到 pwndbg,例如 MIT。
Feel free to update the list below!
请随意更新下面的列表!
-
If you want to play with pwndbg functions under GDB, you can always use GDB's
pi
which launches python interpreter or justpy <some python line>
.
如果你想在 GDB 下使用 pwndbg 函数,你可以随时使用 GDB 的pi
启动 Python 解释器,或者直接使用py <some python line>
。 -
If you want to do the same in LLDB, you should type
lldb
, followed byscript
, which brings up an interactive Python REPL. Don't forget toimport pwndbg
!
如果你想在 LLDB 中做同样的事情,你应该输入lldb
,然后输入script
,这将打开一个交互式 Python REPL。别忘了import pwndbg
! -
Do not access debugger-specific functionality - eg. anything that uses the
gdb
,lldb
, orgdblib
modules - from outside the proper module inpwndbg.dbg
.
不要从pwndbg.dbg
中的正确模块外部访问调试器特定功能 - 例如,任何使用gdb
、lldb
或gdblib
模块的功能。 -
Use
aglib
instead ofgdblib
, as the latter is in the process of being removed. Both modules should have nearly identical interfaces, so doing this should be a matter of typingpwndbg.aglib.X
instead ofpwndbg.gdblib.X
. Ideally, an issue should be opened if there is any functionality present ingdblib
that's missing fromaglib
.
请使用aglib
而不是gdblib
,因为后者正在被移除。这两个模块的接口几乎相同,因此只需输入pwndbg.aglib.X
而不是pwndbg.gdblib.X
。理想情况下,如果gdblib
中有任何功能在aglib
中缺失,应当打开一个问题。 -
We have our own
pwndbg.config.Parameter
- all of our parameters can be seen usingconfig
ortheme
commands.
我们有自己的pwndbg.config.Parameter
- 所有参数都可以通过config
或theme
命令查看。 -
The dashboard/display/context we are displaying is done by
pwndbg/commands/context.py
which is invoked through GDB's and LLDB's prompt hook, which are defined, respectively, inpwndbg/gdblib/prompt.py
asprompt_hook_on_stop
, and inpwndb/dbg/lldb/hooks.py
asprompt_hook
.
我们显示的仪表板/显示/上下文是通过 GDB 和 LLDB 的提示钩子调用的,这些钩子分别在pwndbg/gdblib/prompt.py
中定义为prompt_hook_on_stop
,在pwndb/dbg/lldb/hooks.py
中定义为prompt_hook
。 -
We change a bit GDB settings - this can be seen in
pwndbg/dbg/gdb.py
underGDB.setup
- there are also imports for all pwndbg submodules.
我们稍微更改了 GDB 设置 - 这可以在pwndbg/dbg/gdb.py
下的GDB.setup
中看到 - 还导入了所有 pwndbg 子模块。 -
Pwndbg has its own event system, and thanks to it we can set up code to be invoked in response to them. The event types and the conditions in which they occurr are defined and documented in the
EventType
enum, and functions are registered to be called on events with the@pwndbg.dbg.event_handler
decorator. Both the enum and the decorator are documented inpwndbg/dbg/__init__.py
.
Pwndbg 有自己的事件系统,借助它我们可以设置代码以响应事件。事件类型及其发生的条件在EventType
枚举中定义和记录,函数通过@pwndbg.dbg.event_handler
装饰器注册以在事件发生时调用。枚举和装饰器都在pwndbg/dbg/__init__.py
中记录。 -
We have a caching mechanism ("memoization") which we use through Python's decorators - those are defined in
pwndbg/lib/cache.py
- just check its usages
我们有一个缓存机制(“记忆化”),通过 Python 的装饰器使用 - 这些在pwndbg/lib/cache.py
中定义 - 只需检查其用法。 -
To block a function before the first prompt was displayed use the
pwndbg.decorators.only_after_first_prompt
decorator.
在第一个提示显示之前,要阻止一个函数,可以使用pwndbg.decorators.only_after_first_prompt
装饰器。 -
Memory accesses should be done through
pwndbg/aglib/memory.py
functions.
内存访问应该通过pwndbg/aglib/memory.py
函数进行。 -
Process properties can be retrieved thanks to
pwndbg/aglib/proc.py
- e.g. usingpwndbg.aglib.proc.pid
will give us current process pid
进程属性可以通过pwndbg/aglib/proc.py
获取 - 例如,使用pwndbg.aglib.proc.pid
将给我们当前进程的 pid。 -
We have a wrapper for handling exceptions that are thrown by commands - defined in
pwndbg/exception.py
- current approach seems to work fine - by usingset exception-verbose on
- we get a stacktrace. If we want to debug stuff we can always doset exception-debugger on
.
我们有一个处理命令抛出的异常的包装器 - 定义在pwndbg/exception.py
中 - 当前的方法似乎工作良好 - 通过使用set exception-verbose on
- 我们得到了一个堆栈跟踪。如果我们想调试一些东西,我们可以随时使用set exception-debugger on
。 -
Some of pwndbg's functionality require us to have an instance of
pwndbg.dbg.Value
- the problem with that is that there is no way to define our own types in either GDB or LLDB - we have to ask the debugger if it detected a particular type in this particular binary (that sucks). We do that inpwndbg/aglib/typeinfo.py
and it works most of the time. The known bug with that is that it might not work properly for Golang binaries compiled with debugging symbols.
pwndbg 的某些功能需要我们有一个pwndbg.dbg.Value
的实例 - 问题在于我们无法在 GDB 或 LLDB 中定义自己的类型 - 我们必须询问调试器是否在这个特定的二进制文件中检测到特定类型(这很糟糕)。我们在pwndbg/aglib/typeinfo.py
中这样做,并且大多数情况下它都能正常工作。已知的缺陷是它可能无法正确处理带有调试符号的 Golang 二进制文件。
Pwndbg is an tool that supports multiple debuggers, and so using debugger-specific functionality
outside of pwndbg.dbg.X
is generally discouraged, with one imporant caveat, that we will get into
later. When adding code to Pwndbg, one must be careful with the functionality being used.
Pwndbg 是一个支持多个调试器的工具,因此在 pwndbg.dbg.X
之外使用特定于调试器的功能通常是不鼓励的,唯一的重要例外是我们稍后会讨论的。当向 Pwndbg 添加代码时,必须小心使用的功能。
Our support for multiple debuggers is primarily achieved through use of the Debugger API, found
under pwndbg/dbg/
, which defines a terse set of debugging primitives that can then be built upon
by the rest of Pwndbg. It comprises two parts: the interface, and the implementations. The interface
contains the abstract classes and the types that lay out the "shape" of the functionality that may
be used by the rest of Pwndbg, and the implementations, well, implement the interface on top of each
supported debugger.
我们对多个调试器的支持主要通过使用调试器 API 来实现,该 API 位于 pwndbg/dbg/
下,定义了一组简洁的调试原语,其他 Pwndbg 组件可以在此基础上构建。它包括两个部分:接口和实现。接口包含抽象类和类型,描述了 Pwndbg 其余部分可以使用的功能的“形状”,而实现则在每个支持的调试器之上实现该接口。
As a matter of clarity, it makes sense to think of the Debugger API as a debugger-agnostic version
of the lldb
and gdb
Python modules. Compared to both modules, it is much closer in spirit to
lldb
than to gdb
.
为了清晰起见,可以将调试器 API 视为与 lldb
和 gdb
Python 模块无关的版本。与这两个模块相比,它在精神上更接近 lldb
而不是 gdb
。
It is important to note that a lot of care must be exercised when adding things to the Debugger API,
as one must always add implementations for all supported debuggers of whatever new functionality is
being added, even if only to properly gate off debuggers in which the functionality is not supported.
Additionally, it is important to keep the Debugger API interfaces as terse as possible in order to
reduce code duplication. As a rule of thumb, if all the implementations of an interface are expected
to share code, that interface is probably better suited for aglib
, and it should be further broken
down into its primitives, which can then be added to the Debugger API.
需要注意的是,在向调试器 API 添加内容时必须非常小心,因为必须始终为所有支持的调试器添加新功能的实现,即使只是为了正确地限制不支持该功能的调试器。此外,保持调试器 API 接口尽可能简洁也很重要,以减少代码重复。作为经验法则,如果接口的所有实现都预计会共享代码,那么该接口可能更适合 aglib
,并且应该进一步细分为其基本元素,这些基本元素可以添加到调试器 API 中。
Some examples of debugging primitives are memory reads, memory writes, memory map acquisition,
symbol lookup, register reads and writes, and execution frames. These are all things that one can
find in both the GDB and LLDB APIs.
一些调试原语的例子包括内存读取、内存写入、内存映射获取、符号查找、寄存器读取和写入以及执行帧。这些都是在 GDB 和 LLDB API 中可以找到的内容。
The entry point for the Debugger API is pwndbg.dbg
, though most process-related methods are accessed
through a Process
object. Unless you really know what you're doing, you're going to want to use the
objected yielded by pwndbg.dbg.selected_inferior()
for this.
调试器 API 的入口点是 pwndbg.dbg
,尽管大多数与进程相关的方法是通过 Process
对象访问的。除非你真的知道自己在做什么,否则你会想使用 pwndbg.dbg.selected_inferior()
产生的对象。
Along with the Debugger API, there is also aglib
, found under pwndbg/aglib/
, in which lives
functionality that is both too broad for a single command, and that can be shared between multiple
debuggers. Things like QEMU handling, ELF and dynamic section parsing, operating system functionality,
disassembly with capstone, heap analysis, and more, all belong in aglib
.
除了调试器 API,还有 aglib
,它位于 pwndbg/aglib/
下,包含了功能过于广泛而无法用单一命令实现的内容,并且可以在多个调试器之间共享。像 QEMU 处理、ELF 和动态节解析、操作系统功能、使用 capstone 的反汇编、堆分析等,所有这些都属于 aglib
。
In order to facilitate the process of porting Pwndbg to the debugger-agnostic interfaces, and also
because of its historical roots, aglib
is intended to export the exact same functionality provided
by gdblib
, but on top of a debugger-agnostic foundation.
为了促进将 Pwndbg 移植到调试器无关接口的过程,并且由于其历史根源, aglib
旨在导出 gdblib
提供的完全相同的功能,但建立在调试器无关的基础之上。
If it helps, one may think of aglib
like a pwndbglib
. It takes the debugging primitives provided
by the Debugger API and builds the more complex and interesting bits of functionality found in
Pwndbg on top of them.
如果有帮助,可以将 aglib
看作是 pwndbglib
。它利用调试器 API 提供的调试原语,并在其基础上构建 Pwndbg 中更复杂和有趣的功能。
Here are some things one may want to do, along with how they can be achieved in the LLDB, GDB, and
Pwndbg Debugger APIs.
这里有一些人可能想要做的事情,以及如何在 LLDB、GDB 和 Pwndbg 调试器 API 中实现它们。
Action 操作 | GDB/ | LLDB | Debugger API1 |
---|---|---|---|
Setting a breakpoint at an address 在一个地址设置断点 |
gdb.Breakpoint("*<address>") |
lldb.target.BreakpointCreateByAddress(<address>) |
inf.break_at(BreakpointLocation(<address>)) |
Querying for the address of a symbol 查询符号的地址 |
int(gdb.lookup_symbol(<name>).value().address) |
lldb.target.FindSymbols(<name>).GetContextAtIndex(0).symbol.GetStartAddress().GetLoadAddress(lldb.target) |
inf.lookup_symbol(<name>) |
Setting a watchpoint at an address 在地址处设置观察点 |
gdb.Breakpoint(f"(char[{<size>}])*{<address>}", gdb.BP_WATCHPOINT) |
lldb.target.WatchAddress(<address>, <size>, ...) |
inf.break_at(WatchpointLocation(<address>, <size>)) |
Some commands might not make any sense outside the context of a single debugger. For these commands,
it is generally okay to talk to the debugger directly. However, they must be properly marked as
debugger-specific and their loading must be properly gated off behind the correct debugger. They
should ideally be placed in a separate location from the rest of the commands in pwndbg/commands/
.
一些命令在单个调试器的上下文之外可能没有任何意义。对于这些命令,直接与调试器交互通常是可以的。然而,它们必须被正确标记为特定于调试器,并且它们的加载必须在正确的调试器后面适当地限制。理想情况下,它们应该与 pwndbg/commands/
中的其余命令放在不同的位置。
Alongside the disassembled instructions in the dashboard, Pwndbg also has the ability to display annotations - text that contains relevent information regarding the execution of the instruction. For example, on the x86 MOV
instruction, we can display the concrete value that gets placed into the destination register. Likewise, we can indicate the results of mathematical operations and memory accesses. The annotation in question is always dependent on the exact instruction being annotated - we handle it in a case-by-case basis.
除了在仪表板中显示反汇编指令外,Pwndbg 还具有显示注释的能力——包含与指令执行相关的信息的文本。例如,在 x86 MOV
指令上,我们可以显示放入目标寄存器的具体值。同样,我们可以指示数学运算和内存访问的结果。相关的注释始终取决于被注释的确切指令——我们逐个案例处理它。
The main hurdle in providing annotations is determining what each instruction does, getting the relevent CPU registers and memory that are accessed, and then resolving concrete values of the operands. We call the process of determining this information "enhancement", as we enhance the information provided natively by GDB.
提供注释的主要障碍是确定每条指令的作用,获取相关的 CPU 寄存器和访问的内存,然后解析操作数的具体值。我们将确定这些信息的过程称为“增强”,因为我们增强了 GDB 原生提供的信息。
The Capstone Engine disassembly framework is used to statically determine information about instructions and their operands. Take the x86 instruction sub rax, rdx
. Given the raw bytes of the machine instructions, Capstone creates an object that provides an API that, among many things, exposes the names of the operands and the fact that they are both 8-byte wide registers. It provides all the information necessary to describe each operand. It also tells the general 'group' that a instruction belongs to, like if its a JUMP-like instruction, a RET, or a CALL. These groups are architecture agnostic.
Capstone 引擎反汇编框架用于静态确定指令及其操作数的信息。以 x86 指令 sub rax, rdx
为例。给定机器指令的原始字节,Capstone 创建一个对象,提供一个 API,其中包括许多内容,暴露操作数的名称以及它们都是 8 字节宽寄存器的事实。它提供了描述每个操作数所需的所有信息。它还告诉指令所属的一般“组”,例如它是 JUMP 类指令、RET 还是 CALL。这些组是与架构无关的。
However, the Capstone Engine doesn't fill in concrete values that those registers take on. It has no way of knowing the value in rdx
, nor can it actually read from memory.
然而,Capstone 引擎并不会填充这些寄存器所取的具体值。它无法知道 rdx
中的值,也无法实际从内存中读取。
To determine the actual values that the operands take on, and to determine the results of executing an instruction, we use the Unicorn Engine, a CPU emulator framework. The emulator has its own internal CPU register set and memory pages that mirror that of the host process, and it can execute instructions to mutate its internal state. Note that the Unicorn Engine cannot execute syscalls - it doesn't have knowledge of a kernel.
为了确定操作数所取的实际值,并确定执行指令的结果,我们使用 Unicorn 引擎,一个 CPU 仿真框架。该仿真器具有自己的内部 CPU 寄存器集和内存页,镜像主机进程的状态,并且可以执行指令以改变其内部状态。请注意,Unicorn 引擎无法执行系统调用——它对内核没有知识。
We have the ability to single-step the emulator - tell it to execute the instruction at the program counter inside the emulator. After doing so, we can inspect the state of the emulator - read from its registers and memory. The Unicorn Engine itself doesn't expose information regarding what each instruction is doing - what is the instruction (is it an add
, mov
, push
?) and what registers/memory locations is it reading to and writing from? - which is why we use the Capstone engine to statically determine this information.
我们能够单步执行模拟器 - 告诉它在模拟器内部执行程序计数器处的指令。这样做后,我们可以检查模拟器的状态 - 从其寄存器和内存中读取数据。Unicorn 引擎本身并不公开有关每条指令正在做什么的信息 - 这条指令是什么(是 add
, mov
,还是 push
?)以及它正在读取和写入哪些寄存器/内存位置? - 这就是我们使用 Capstone 引擎静态确定这些信息的原因。
Using what we know about the instruction based on the Capstone engine - such as that it was a sub
instruction and rax
was written to - we query the emulator after stepping in to determine the results of the instruction.
根据我们从 Capstone 引擎了解到的指令信息 - 比如它是 sub
指令,并且 rax
被写入 - 我们在单步执行后查询模拟器以确定指令的结果。
We also read the program counter from the emulator to determine jumps and so we can display the instructions that will actually be executed, as opposed to displaying the instructions that follow consecutively in memory.
我们还从模拟器中读取程序计数器以确定跳转,因此我们可以显示实际上将被执行的指令,而不是显示内存中连续的后续指令。
Everytime the inferior process stops (and when the disasm
context section is displayed), we display the next handful of assembly instructions in the dashboard so the user can understand where the process is headed. The exact amount is determined by the context-disasm-lines
setting.
每当被调试的进程停止(并且显示 disasm
上下文部分时),我们会在仪表板上显示下一批汇编指令,以便用户理解进程的走向。确切的数量由 context-disasm-lines
设置决定。
We will be enhancing the instruction at the current program counter, as well as all the future instructions that are displayed. The end result of enhancement is that we get a list of PwndbgInstruction
objects, each encapsulating relevent information regarding the instructions execution.
我们将增强当前程序计数器的指令,以及所有将来显示的指令。增强的最终结果是我们得到一个 PwndbgInstruction
对象的列表,每个对象封装了与指令执行相关的信息。
When the process stops, we instantiate the emulator from scratch. We copy all the registers from the host process into the emulator. For performance purposes, we register a handler to the Unicorn Engine to lazily map memory pages from the host to the emulator when they are accessed (a page fault from within the emulator), instead of immediately copying all the memory from the host to the emulator.
当进程停止时,我们从头开始实例化模拟器。我们将所有寄存器从主机进程复制到模拟器。出于性能考虑,我们注册了一个处理程序到 Unicorn 引擎,以便在访问时懒惰地将主机的内存页面映射到模拟器(模拟器内的页面错误),而不是立即将主机的所有内存复制到模拟器中。
The enhancement is broken into a couple stops:
增强过程分为几个步骤:
- First, we resolve the values of all the operands of the instruction before stepping the emulator. This means we read values from registers and dereference memory depending on the operand type. This gives us the values of operands before the instruction executes.
首先,我们在步进模拟器之前解析指令的所有操作数的值。这意味着我们根据操作数类型从寄存器读取值并解引用内存。这使我们在指令执行之前获得操作数的值。 - Then, we step the emulator, executing a single instruction.
然后,我们步进模拟器,执行单个指令。 - We resolve the values of all operands again, giving us the
after_value
of each operand.
然后,我们再次解析所有操作数的值,给出每个操作数的after_value
。 - Then, we enhance the "condition" field of PwndbgInstructions, where we determine if the instruction is conditional (conditional branch or conditional mov are common) and if the action is taken.
然后,我们增强 PwndbgInstructions 的“条件”字段,在这里我们确定指令是否是条件性的(条件分支或条件移动是常见的)以及是否采取了行动。 - We then determine the
next
andtarget
fields of PwndbgInstructions.next
is the address that the program counter will take on after using the GDB commandnexti
, andtarget
indicates the target address of branch/jump/PC-changing instructions.
然后我们确定 PwndbgInstructions 的next
和target
字段。next
是程序计数器在使用 GDB 命令nexti
后将采取的地址,target
表示分支/跳转/改变 PC 的指令的目标地址。 - With all this information determined, we now effectively have a big switch statement, matching on the instruction type, where we set the
annotation
string value, which is the text that will be printed alongside the instruction in question.
确定所有这些信息后,我们现在有效地拥有一个大的 switch 语句,根据指令类型进行匹配,在这里我们设置annotation
字符串值,即将在相关指令旁边打印的文本。
We go through the enhancement process for the instruction at the program counter and then ensuing handful of instructions that are shown in the dashboard.
我们对程序计数器处的指令以及随后在仪表板上显示的一小部分指令进行增强处理。
When possible, we code aims to use emulation as little as possible. If there is information that can be determined statically or without the emulator, then we try to avoid emulation. This is so we can display annotations even when the Unicorn Engine is disabled. For example, say we come to a stop, and are faced with enhancing the following three instructions in the dashboard:
在可能的情况下,我们的代码旨在尽量少使用仿真。如果有信息可以静态确定或在不使用仿真的情况下获得,那么我们会尽量避免仿真。这样我们可以在禁用独角兽引擎时仍然显示注释。例如,假设我们停下来,面临在仪表板上增强以下三条指令的情况:
1. lea rax, [rip + 0xd55]
2. > mov rsi, rax # The host process program counter is here
3. mov rax, rsi
Instruction #1, the lea
instruction, is already in the past - we pull our enhanced PwndbgInstruction for it from a cache.
指令 #1, lea
指令,已经在过去 - 我们从缓存中提取增强的 PwndbgInstruction。
Instruction #2, the first mov
instruction, is where the host process program counter is at. If we did stepi
in GDB, this instruction would be executed. In this case, there is two ways we can determine the value that gets written to rsi
.
指令 #2,第一个 mov
指令,是主机进程程序计数器所在的位置。如果我们在 GDB 中执行 stepi
,这条指令将被执行。在这种情况下,我们可以通过两种方式确定写入 rsi
的值。
- After stepping the emulator, read from the emulators
rsi
register.
在步进仿真器后,从仿真器的rsi
寄存器读取。 - Given the context of the instruction, we know the value in
rsi
will come fromrax
. We can just read therax
register from the host. This avoids emulation.
根据指令的上下文,我们知道rsi
中的值将来自rax
。我们可以直接从主机读取rax
寄存器。这避免了仿真。
The decision on which option to take is implemented in the annotation handler for the specific instruction. When possible, we have a preference for the second option, because it makes the annotations work even when emulation is off.
选择哪个选项的决定在特定指令的注释处理程序中实现。当可能时,我们更倾向于第二个选项,因为即使在关闭仿真的情况下,注释也能正常工作。
The reason we could do the second option, in this case, is because we could reason about the process state at the time this instruction would execute. This instruction is about to be executed (Program PC == instruction.address
). We can safely read from rax
from the host, knowing that the value we get is the true value it takes on when the instruction will execute. It must - there are no instructions in-between that could have mutated rax
.
我们能够在这种情况下选择第二个选项的原因是,我们可以推理出在执行此指令时的进程状态。此指令即将被执行( Program PC == instruction.address
)。我们可以安全地从主机读取 rax
,知道我们得到的值是在指令执行时它所取的真实值。必须如此——在此之间没有其他指令可能会改变 rax
。
However, this will not be the case while enhancing instruction #3 while we are paused at instruction #2. This instruction is in the future, and without emulation, we cannot safely reason about the operands in question. It is reading from rsi
, which might be mutated from the current value that rsi
has in the stopped process (and in this case, we happen to know that it will be mutated). We must use emulation to determine the before_value
of rsi
in this case, and can't just read from the host processes register set. This principle applies in general - future instructions must be emulated to be fully annotated. When emulation is disable, the annotations are not as detailed since we can't fully reason about process state for future instructions.
然而,在我们暂停在指令 #2 时,增强指令 #3 时情况并非如此。该指令在未来,没有仿真,我们无法安全地推理出相关的操作数。它正在读取 rsi
,而 rsi
在停止的进程中可能会被修改(在这种情况下,我们恰好知道它会被修改)。我们必须使用仿真来确定 before_value
的 rsi
,而不能仅仅从主机进程的寄存器集中读取。这个原则一般适用 - 未来的指令必须被仿真才能被完全注释。当仿真被禁用时,注释不够详细,因为我们无法完全推理未来指令的进程状态。
It is possible for the emulator to fail to execute an instruction - either due to a restrictions in the engine itself, or the instruction inside segfaults and cannot continue. If the Unicorn Engine fails, there is no real way we can recover. When this happens, we simply stop emulating for the current step, and we try again the next time the process stops when we instantiate the emulator from scratch again.
仿真器可能无法执行指令 - 可能是由于引擎本身的限制,或者指令内部发生段错误而无法继续。如果 Unicorn 引擎失败,我们就没有真正的办法可以恢复。当这种情况发生时,我们只是停止当前步骤的仿真,并在下次进程停止时再次从头开始实例化仿真器。
When we are stepping through the emulator, we want to remember the annotations of the past couple instructions. We don't want to nexti
, and suddenly have the annotation of the previously executed instruction deleted. At the same time, we also never want stale annotations that might result from coming back to point in the program to which we have stepped before, such as the middle of a loop via a breakpoint.
当我们在模拟器中逐步执行时,我们希望记住过去几条指令的注释。我们不希望在 nexti
时,之前执行的指令的注释突然被删除。同时,我们也不希望出现过时的注释,这可能是由于我们回到程序中之前已经执行过的某个点,例如通过断点进入循环的中间。
New annotations are only created when the process stops, and we create annotations for next handful of instructions to be executed. If we continue
in GDB and stop at a breakpoint, we don't want annotations to appear behind the PC that are from a previous time we were near the location in question. To avoid stale annotations while still remembering them when stepping, we have a simple caching method:
只有在进程停止时才会创建新的注释,我们为接下来要执行的几条指令创建注释。如果我们在 GDB 中 continue
并在断点处停止,我们不希望在程序计数器后面出现来自我们之前接近该位置的时间的注释。为了避免过时的注释,同时在逐步执行时仍然记住它们,我们有一个简单的缓存方法:
While we are doing our enhancement, we create a list containing the addresses of the future instructions that are displayed.
在我们进行增强时,我们创建一个列表,包含将要显示的未来指令的地址。
For example, say we have the following instructions with the first number being the memory address:
例如,假设我们有以下指令,第一个数字是内存地址:
0x555555556259 <main+553> lea rax, [rsp + 0x90]
0x555555556261 <main+561> mov edi, 1 EDI => 1
0x555555556266 <main+566> mov rsi, rax
0x555555556269 <main+569> mov qword ptr [rsp + 0x78], rax
0x55555555626e <main+574> call qword ptr [rip + 0x6d6c] <fstat64>
► 0x555555556274 <main+580> mov edx, 5 EDX => 5
0x555555556279 <main+585> lea rsi, [rip + 0x3f30] RSI => 0x55555555a1b0 ◂— 'standard output'
0x555555556280 <main+592> test eax, eax
0x555555556282 <main+594> js main+3784 <main+3784>
0x555555556288 <main+600> mov rsi, qword ptr [rsp + 0xc8]
0x555555556290 <main+608> mov edi, dword ptr [rsp + 0xa8]
In this case, our next_addresses_cache
would be [0x555555556279, 0x555555556280, 0x555555556282, 0x555555556288, 0x555555556290]
.
在这种情况下,我们的 next_addresses_cache
将是 [0x555555556279, 0x555555556280, 0x555555556282, 0x555555556288, 0x555555556290]
。
Then, the next time our program comes to a stop (after using si
, n
, or any GDB command that continues the process), we immediately check if the current program counter is in this list. If it is, then we can infer that the annotations are still valid, as the program has only executed a couple instructions. In all other cases, we delete our cache of annotated instructions.
然后,下次我们的程序停止时(在使用 si
、 n
或任何继续进程的 GDB 命令之后),我们立即检查当前程序计数器是否在这个列表中。如果在,那么我们可以推断注释仍然有效,因为程序只执行了几条指令。在所有其他情况下,我们会删除注释指令的缓存。
We might think "why not just check if it's the next address - 0x555555556279 in this case? Why a list of the next couple addresses?". This is because when source code is available, step
and next
often skip a couple instructions. It would be jarring to remove the annotations in this case. Likewise, this method has the added benefit that if we stop somewhere, and there happens to be a breakpoint only a couple instructions in front of us that we continue
to, then previous couple annotations won't be wiped.
我们可能会想“为什么不直接检查下一个地址 - 0x555555556279 呢?为什么要检查一系列下一个地址?”这是因为当源代码可用时, step
和 next
通常会跳过几条指令。在这种情况下,删除注释会显得很突兀。同样,这种方法还有一个额外的好处,如果我们在某个地方停止,而恰好在我们面前几条指令处有一个断点,我们 continue
到那里,那么之前的几条注释就不会被清除。
- We don't emulate through CALL instructions. This is because the function might be very long.
我们不通过 CALL 指令进行仿真。这是因为函数可能非常长。 - We resolve symbols during the enhancement stage for operand values.
我们在增强阶段解析符号以获取操作数值。 - The folder
pwndbg/aglib/disasm
contains the code for enhancement. It follows an object-oriented model, witharch.py
implementing the parent class with shared functionality, and the per-architecture implementations are implemented as subclasses in their own files.
文件夹pwndbg/aglib/disasm
包含增强的代码。它遵循面向对象模型,arch.py
实现了具有共享功能的父类,而每个架构的实现则作为子类在各自的文件中实现。 pwndbg/aglib/nearpc.py
is responsible for getting the list of enhanced PwndbgInstruction objects and converting them to the output seen in the 'disasm' view of the dashboard.
pwndbg/aglib/nearpc.py
负责获取增强的 PwndbgInstruction 对象列表,并将其转换为仪表板 'disasm' 视图中看到的输出。
We annotate on an instruction-by-instruction basis. Effectively, imagine a giant switch
statement that selects the correct handler to create an annotation based on the specific instruction. Many instruction types can be grouped and annotated using the same logic, such as load
, store
, and arithmetic
instructions.
我们在逐条指令的基础上进行注释。实际上,可以想象一个巨大的 switch
语句,根据特定指令选择正确的处理程序来创建注释。许多指令类型可以使用相同的逻辑进行分组和注释,例如 load
、 store
和 arithmetic
指令。
See pwndbg/aglib/disasm/aarch64.py
as an example. We define sets that group instructions using the unique Capstone ID for each instruction, and inside the constructor of DisassemblyAssistant
we have a mapping of instructions to a specific handler. The _set_annotation_string
function will match the instruction to the correct handler, which set the instruction.annotation
field.
以 pwndbg/aglib/disasm/aarch64.py
为例。我们定义集合来使用每条指令的唯一 Capstone ID 对指令进行分组,在 DisassemblyAssistant
的构造函数中,我们有一个指令到特定处理程序的映射。 _set_annotation_string
函数将指令匹配到正确的处理程序,从而设置 instruction.annotation
字段。
If there is a bug in an annotation, the first order of business is finding its annotation handler. To track down where we are handling the instruction, you can search for its Capstone constant. For example, the RISC-V store byte instruction, sb
, is represented as the Capstone constant RISCV_INS_SB
. Or, if you are looking for the handler for the AArch64 instruction SUB, you can search the disasm code for _INS_SUB
to find where we reference the appropriate Capstone constant for the instruction and following the code to the function that ultimately sets the annotation.
如果注释中存在错误,首要任务是找到其注释处理程序。要追踪我们处理指令的位置,可以搜索其 Capstone 常量。例如,RISC-V 存储字节指令 sb
表示为 Capstone 常量 RISCV_INS_SB
。或者,如果您正在寻找 AArch64 指令 SUB 的处理程序,可以在反汇编代码中搜索 _INS_SUB
,以找到我们引用该指令的适当 Capstone 常量的位置,并跟随代码到最终设置注释的函数。
If an annotation is causing a crash, is it most likely due to a handler making an incorrect assumption on the number of operands, leading to a list index out of range
error. One possible source of this is that a given instruction has multiple different disassembly representations. Take the RISC-V JALR
instruction. It can be represented in 3 ways:
如果一个注释导致崩溃,最可能是由于处理程序对操作数数量做出了错误的假设,从而导致 list index out of range
错误。一个可能的来源是某个指令有多种不同的反汇编表示。以 RISC-V JALR
指令为例,它可以用三种方式表示:
jalr rs1 # return register is implied as ra, and imm is implied as 0
jalr rs1, imm # return register is implied as ra
jalr rd, rs1, imm
Capstone will expose the most "simplified" one possible, and the underlying list of register operands will change. If the handler doesn't take these different options into account, and rather assumes that jalr
always has 3 operands, then an index error can occur if the handler accesses instruction.operands[2]
.
Capstone 将暴露出最“简化”的一种可能性,并且底层的寄存器操作数列表将会改变。如果处理程序没有考虑这些不同的选项,而是假设 jalr
总是有 3 个操作数,那么如果处理程序访问 instruction.operands[2]
时就可能发生索引错误。
When encountering an instruction that is behaving strangely (incorrect annotation, or there is a jump target when one shouldn't exist, or the target is incorrect), there are a couple routine things to check.
当遇到行为异常的指令(注释不正确,或者存在不应该存在的跳转目标,或者目标不正确)时,有几个常规的检查事项。
- Use the
dev-dump-instruction
command to print all the enhancement information. With no arguments, it will dump the info from the instruction at the current address. If given an address, it will pull from the instruction cache at the corresponding location.
使用dev-dump-instruction
命令打印所有增强信息。如果没有参数,它将从当前地址的指令中转储信息。如果给定一个地址,它将从相应位置的指令缓存中提取信息。
If the issue is not related to branches, check the operands and the resolved values for registers and memory accesses. Verify that the values are correct - are the resolved memory locations correct? Step past the instruction and use instructions like telescope
and regs
to read memory and verify if the claim that the annotation is making is correct. For things like memory operands, you can try to look around the resolved memory location in memory to see the actual value that the instruction dereferenced, and see if the resolved memory location is simply off by a couple bytes.
如果问题与分支无关,请检查操作数以及寄存器和内存访问的解析值。验证这些值是否正确 - 解析的内存位置是否正确?逐步执行指令,并使用 telescope
和 regs
等指令读取内存,验证注释所做的声明是否正确。对于内存操作数,您可以尝试查看解析的内存位置周围的内存,以查看指令解引用的实际值,并查看解析的内存位置是否仅偏差了几个字节。
Example output of dumping a mov
instruction:
转储 mov
指令的示例输出:
mov qword ptr [rsp], rsi at 0x55555555706c (size=4) (arch: x86)
ID: 460, mov
Raw asm: mov qword ptr [rsp], rsi
New asm: mov qword ptr [rsp], rsi
Next: 0x555555557070
Target: 0x555555557070, Target string=, const=None
Condition: UNDETERMINED
Groups: []
Annotation: [0x7fffffffe000] => 0x7fffffffe248 —▸ 0x7fffffffe618 ◂— '/usr/bin/ls'
Operands: [['[0x7fffffffe000]': Symbol: None, Before: 0x7fffffffe000, After: 0x7fffffffe000, type=CS_OP_MEM, size=8, access=CS_AC_WRITE]] ['RSI': Symbol: None, Before: 0x7fffffffe248, After: 0x7fffffffe248, type=CS_OP_REG, size=8, access=CS_AC_READ]]]
Conditional jump: False. Taken: False
Unconditional jump: False
Declare unconditional: None
Can change PC: False
Syscall: N/A
Causes Delay slot: False
Split: NO_SPLIT
Call-like: False
- Use the Capstone disassembler to verify the number of operands the instruction groups.
使用 Capstone 反汇编器来验证指令组的操作数数量。
Taken the raw instruction bytes and pass them to cstool
to see the information that we are working with:
获取原始指令字节并将其传递给 cstool
以查看我们正在处理的信息:
cstool -d mips 0x0400000c
The number of operands may not match the visual appearance. You might also check the instruction groups, and verify that an instruction that we might consider a call
has the Capstone call
group. Capstone is not 100% correct in every single case in all architectures, so it's good to verify. Report a bug to Capstone if there appears to be an error, and in the meanwhile we can create a fix in Pwndbg to work around the current behavior.
操作数的数量可能与视觉外观不匹配。您还可以检查指令组,并验证我们可能认为是 call
的指令是否具有 Capstone call
组。Capstone 在所有架构的每个单独案例中并不是 100%正确的,因此进行验证是很好的。如果出现错误,请向 Capstone 报告错误,同时我们可以在 Pwndbg 中创建一个修复程序以解决当前行为。
- Check the state of the emulator.
检查模拟器的状态。
Go to pwndbg/emu/emulator.py and uncomment the DEBUG = -1
line. This will enable verbose debug printing. The emulator will print it's current pc
at every step, and indicate important events, like memory mappings. Likewise, in pwndbg/aglib/disasm/arch.py you can set DEBUG_ENHANCEMENT = True
to print register accesses to verify they are sane values.
前往 pwndbg/emu/emulator.py 并取消注释 DEBUG = -1
行。这将启用详细的调试打印。模拟器将在每一步打印其当前 pc
,并指示重要事件,如内存映射。同样,在 pwndbg/aglib/disasm/arch.py 中,您可以设置 DEBUG_ENHANCEMENT = True
以打印寄存器访问,以验证它们是合理的值。
Potential bugs: 潜在的错误:
- A register is 0 (may also be the source of a Unicorn segfault if used as a memory operand) - often means we are not copying the host processes register into the emulator. By default, we map register by name - if in pwndbg, it's called
rax
, then we find the UC constant namedU.x86_const.UC_X86_REG_RAX
. Sometimes, this default mapping doesn't work, sometimes do to differences in underscores (FSBASE
vsFS_BASE
). In these cases, we have to manually add the mapping.
寄存器为 0(如果用作内存操作数,可能也是 Unicorn 段错误的来源) - 通常意味着我们没有将主机进程的寄存器复制到模拟器中。默认情况下,我们按名称映射寄存器 - 如果在 pwndbg 中,它被称为rax
,那么我们找到名为U.x86_const.UC_X86_REG_RAX
的 UC 常量。有时,这种默认映射不起作用,有时是由于下划线的差异(FSBASE
与FS_BASE
)。在这些情况下,我们必须手动添加映射。 - Unexpected crash - the instruction at hand might require a 'coprocessor', or some information that is unavailable to Unicorn (it's QEMU under the hood).
意外崩溃 - 当前指令可能需要“协处理器”或一些 Unicorn 无法获取的信息(它在底层使用 QEMU)。 - Instructions are just no executing - we've seen this in the case of Arm Thumb instructions. There might be some specific API/way to invoke the emulator that is required for a certain processor state.
指令没有执行 - 我们在 Arm Thumb 指令的情况下见过这种情况。可能有一些特定的 API/方法需要在某个处理器状态下调用模拟器。
If you are encountering a strange behavior with a certain instruction or scenario in a non-native-architecture program, you can use some great functions from pwntools
to handle the compilation and debugging. This is a great way to create a small reproducible example to isolate an issue.
如果您在非本地架构程序中遇到某个指令或场景的奇怪行为,可以使用 pwntools
中的一些优秀函数来处理编译和调试。这是创建一个小的可重现示例以隔离问题的好方法。
The following Python program, when run from inside a tmux
session, will take some AArch64 assembly, compile it, and run it with GDB attached in a new tmux
pane. It will search your system for the appropriate cross compiler for the architecture at hand, and run the compiled binary with QEMU.
以下 Python 程序在 tmux
会话中运行时,将获取一些 AArch64 汇编代码,编译它,并在新的 tmux
窗格中附加 GDB 运行它。它将搜索您的系统以找到适合当前架构的交叉编译器,并使用 QEMU 运行编译后的二进制文件。
from pwn import *
context.arch = "aarch64"
AARCH64_GRACEFUL_EXIT = """
mov x0, 0
mov x8, 93
svc 0
"""
out = make_elf_from_assembly(STORE)
# Debug info
print(out)
gdb.debug(out)
pause()
Footnotes
-
Many functions in the Debugger API are accessed through a
Process
object, which is usually obtained throughpwndbg.dbg.selected_inferior()
. These are abbreviatedinf
in the table. ↩
许多调试器 API 中的函数是通过Process
对象访问的,该对象通常通过pwndbg.dbg.selected_inferior()
获得。这些在表中缩写为inf
。