这是用户在 2024-7-9 21:35 为 https://epi052.gitlab.io/notes-to-self/blog/2019-09-01-how-to-build-an-automated-recon-pipeline-with... 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

Blog 博客

How to Build an Automated Recon Pipeline with Python and Luigi - Part I (Setup and Scope)
如何使用 Python 和 Luigi 构建自动化侦察管道 - 第一部分(设置和范围)

Jan 22, 2020 | 13 minutes read
2020 年 1 月 22 日 | 13 分钟阅读
Share this:  分享这个:

Tags: how to, bug bounty, hack the box, python, recon, luigi
标签: 如何, 漏洞赏金, 破解盒子, python, 侦查, 路易吉

Welcome to part one of a multi-part series demonstrating how to build an automated pipeline for target reconnaissance. The target in question could be the target of a pentest, bug bounty, or capture the flag challenge (shout out to my HTB peoples!). Each post in this series has an associated git tag in the repository for readers’ ease of use. By the end of the series, we’ll have built a functional recon pipeline that can be tailored to fit whatever needs you have.
欢迎阅读多部分系列文章的第一部分,该部分将演示如何构建目标侦查的自动化管道。目标可以是 pentest、bug 赏金或夺旗挑战的目标(向 HTB 人民致敬!)。本系列的每篇文章都有一个相关的 git 标签,方便读者使用。在本系列结束时,我们将建立一个实用的侦察管道,可根据您的任何需求进行定制。

Part I will: 第一部分将

  • Provide an overview of Luigi and why we’re using it
    概述 Luigi 以及我们使用它的原因
  • Setup our development environment (python virtual environment, git repository, luigi, etc…)
    设置我们的开发环境(python 虚拟环境、git 仓库、luigi 等)
  • Build stage 0 of our pipeline (Target Scope)
    建立第 0 阶段管道(目标范围)

Part I’s git tags:
第一部分的 git 标签

  • pipenv-install
  • stage-0 阶段-0

As this is a ‘how-to’ series, don’t be concerned if you don’t know about a particular topic to be covered. All of the steps are clearly laid out. The roadmap below outlines topics covered in future posts.
由于这是一个 "如何做 "系列,如果你不了解将要涉及的特定主题,也不必担心。所有的步骤都很清楚。下面的路线图概述了未来文章中涉及的主题。

Roadmap: 路线图

  • Target scope <– this post
    目标范围 <- 本职位
  • Port scanning I 端口扫描 I
  • Port scanning II 端口扫描 II
  • Subdomain enumeration 子域枚举
  • Web scanning  网络扫描
    • Screenshots 屏幕截图
    • Subdomain takeover 子域接管
    • CORS misconfiguration CORS 配置错误
    • Forced browsing 强制浏览
    • Tech stack identification
  • Data storage 数据存储
  • Visualization / reporting
  • Slack integration Slack 整合

All right, enough with the intro, let’s dive in!

Note to Readers: If you find yourself wanting to know more about classes and Object Oriented Programming (OOP) @0xghostwriter recommends this youtube series on the subject. Special thanks to ghostwriter for reaching out and sharing!
读者须知:如果您想了解更多有关类和面向对象编程 (OOP) 的信息,@0xghostwriter 向您推荐有关这一主题的 youtube 系列。特别感谢 ghostwriter 的联系和分享!


Luigi; An Overview 路易吉概述

Luigi is a python library written by the folks at Spotify. Its purpose is to chain multiple tasks together and automate them. The tasks can be just about anything. According to the documentation:
Luigi 是 Spotify 员工编写的一个 python 库。它的目的是将多个任务串联起来并实现自动化。任务可以是任何东西。根据文档介绍

Luigi is a Python (2.7, 3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.
Luigi 是一个 Python(2.7、3.6、3.7 测试版)软件包,可帮助您构建复杂的批处理作业管道。它能处理依赖性解析、工作流管理、可视化、故障处理、命令行集成等问题。

Imagine you have a tool that needs to run to produce output. Another tool uses that output as its input (i.e., nmap scan produces xml; xml sent to the next tool as input). Consider the next logical step; a third tool uses the output from the second tool as its input. This is the type of scenario that Luigi was built to handle.
想象一下,你有一个需要运行以产生输出的工具。另一个工具使用该输出作为输入(例如,nmap 扫描产生 xml;xml 作为输入发送给下一个工具)。考虑下一个逻辑步骤;第三个工具使用第二个工具的输出作为输入。这就是 Luigi 所要处理的场景类型。

A naive approach to automating this sort of behavior is to write a wrapper script that executes each tool in turn, hoping that no tool in the chain runs into any errors. If it does, the script likely needs to be rerun from the beginning. Luigi, on the other hand, can recover from the last successful chain in the pipeline. For instance, say that we’ve run masscan and nmap successfully but the pipeline breaks while running the third tool nikto. On the next run of the pipeline, Luigi picks up from where it left off, skipping the two successful scans.
将此类行为自动化的一种简单方法是编写一个封装脚本,依次执行每个工具,希望链中的任何工具都不会出错。如果出现错误,脚本很可能需要从头开始重新运行。另一方面,Luigi 可以从管道中最后一个成功的链中恢复。例如,我们已经成功运行了 masscannmap ,但在运行第三个工具 nikto 时,管道中断了。在下一次运行管道时,Luigi 会跳过两次成功的扫描,从上次中断的地方继续运行。

Luigi also has a lot of pretty cool features, such as its task scheduler, dependency visualizer, process synchronization, error notifications, task status monitoring, admin web panel and a whole bunch of other stuff. We’ll be using some of these pieces naturally as they come up in development. In short, Luigi is pretty legit. Before we move past Luigi, we need to discuss a few fundamental ideas about how it works; let’s do that now.
Luigi 还有很多很酷的功能,比如任务调度器、依赖关系可视化器、进程同步、错误通知、任务状态监控、管理员网络面板等。在开发过程中,我们自然会用到其中的一些功能。总之,Luigi 是非常合法的。在介绍 Luigi 之前,我们需要先讨论一下它的一些基本工作原理;现在就让我们开始吧。

Luigi; Core Concepts 路易吉;核心概念

There are two fundamental building blocks of Luigi; Tasks and Targets. Each Target corresponds to a file on disk or some observable checkpoint (row in a database, file in an S3 bucket, remote target responsiveness, etc). Targets are fairly straightforward.
Luigi 有两个基本构件:任务和目标。每个目标都对应磁盘上的文件或某些可观测的检查点(数据库中的行,S3 存储桶中的文件,远程目标的响应速度等)。目标相当简单明了。

Tasks are the more interesting of the two concepts. Tasks are a single unit of work. Tasks define what happens during that section of the pipeline. Tasks take Targets as input, and (usually) create Targets as output. Additionally, Tasks can specify their dependence on another class. Here is a visualization of a simple Task dependency and the related Targets.


In the image, the Database dump Task expects a DB Target as input. After successful execution, it produces the dump.txt Target. Compute Toplist Task uses the dump.txt Target as its input. The Compute Toplist Task creates the toplist.txt Target. Also, the Compute Toplist Task requires the Dump Database Task. We’ll see many of these relationships written out in code as we progress.
在图像中,数据库转储任务将 DB 目标作为输入。成功执行后,它会生成 dump.txt 目标。计算 Toplist 任务使用 dump.txt 目标作为其输入。计算 Toplist 任务创建 toplist.txt 目标。此外,计算 Toplist 任务需要 Dump 数据库任务。我们将在代码中看到这些关系。

A simple idea to understand about Luigi is that one can specify what one wants to build, and then backtrack to find out what is required to fulfill the request. If we were executing our above example, we would tell Luigi that we want to run the Compute Toplist Task. Luigi would then walk that Task’s dependencies backward (including any other dependencies found along the way) until reaching the beginning of the pipeline. Once luigi finds the beginning Task, execution begins. If this sounds similar to how GNU’s Make utility works, it should, Luigi’s creator based Luigi’s design on Make.
了解 Luigi 的一个简单概念是,我们可以指定想要构建的内容,然后回溯找出满足请求所需的内容。如果我们在执行上述示例,我们会告诉 Luigi,我们要运行 "计算 Toplist 任务"。然后,Luigi 会回溯该任务的依赖关系(包括沿途发现的任何其他依赖关系),直到到达流水线的起点。一旦 luigi 找到起始任务,执行就开始了。如果这听起来类似于 GNU 的 Make 工具的工作方式,那就对了,Luigi 的创造者就是根据 Make 设计 Luigi 的。

That’s enough background to get us started. We’ll be diving into code later that demonstrates some of what we’ve already discussed. Before we can get to the code, we need to set up our development environment; let’s begin!

Development Environment 开发环境

Prerequisites (kind of) 先决条件

This guide assumes a few things about your operating system/environment.

  • Linux (kali assumed) Linux(假定为 kali)
  • Running systemd as its init system
    将 systemd 作为初始系统运行
  • Has python 3.6+ installed
    已安装 python 3.6+
  • Has git installed 已安装 git

We won’t cover how to install python (though on linux, it should just ‘be there’), we also won’t cover startup scripts for different init systems. If you don’t meet one or more of these requirements, that’s ok. Just understand that where you deviate from requirements, you’re on your own (you can @me on twitter if you’re hard stuck and we’ll work it out).
我们不会介绍如何安装 python(不过在 Linux 上,它应该 "就在那里"),也不会介绍不同启动系统的启动脚本。如果你不符合其中一项或多项要求,也没关系。但请理解,如果你偏离了要求,那就只能靠自己了(如果你被难住了,可以在 twitter 上@我,我们会解决的)。

Install Luigi 安装路易吉

Our first step is to install luigi. We’ll do this inside of a python virtual environment. My virtual environment manager preference is pipenv. Let’s get pipenv installed.
第一步是安装 luigi。我们将在 python 虚拟环境中进行安装。我的虚拟环境管理器首选 pipenv。让我们安装 pipenv。

apt install pipenv

After that, we’ll clone the git repository we’ll be working with throughout these posts. We’re going to be using git tags to track significant checkpoints within the code. As such, the command below is how we’ll grab the baseline repository.
之后,我们将克隆在这些文章中一直使用的 git 仓库。我们将使用 git 标签来跟踪代码中的重要检查点。因此,下面的命令就是我们抓取基准仓库的方法。

git clone --branch pipenv-install https://github.com/epi052/recon-pipeline.git
git options used:

        Clone a repository into a new directory
        checkout <branch> instead of the remote's HEAD (can be used for tags as well)

Now we have a place to work! Let’s use the Pipfiles included in our repository to install luigi.
现在我们有地方工作了!让我们使用软件仓库中的 Pipfiles 来安装 luigi。

cd recon-pipeline
pipenv install

If everything went well, we should see output similar to what’s below.

Creating a virtualenv for this project…
Pipfile: /opt/recon-pipeline/Pipfile
Using /usr/bin/python3.7m (3.7.3) to create virtualenv…
⠴ Creating virtual environment...Using base prefix '/usr'
New python executable in /home/epi/.local/share/virtualenvs/recon-pipeline-nDSyRWzr/bin/python3.7m
Also creating executable in /home/epi/.local/share/virtualenvs/recon-pipeline-nDSyRWzr/bin/python
Installing setuptools, pip, wheel...
Running virtualenv with interpreter /usr/bin/python3.7m

✔ Successfully created virtual environment! 
Virtualenv location: /home/epi/.local/share/virtualenvs/recon-pipeline-nDSyRWzr
Installing dependencies from Pipfile.lock (e32771)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 7/7 — 00:00:01
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.

To make use of our virtual environment, we use the command below (it’s also in the pipenv output).
要使用虚拟环境,我们可以使用下面的命令(也在 pipenv 输出中)。

pipenv shell

A simple test while in our virtual environment tells us if everything worked correctly.

python -c 'import luigi'

If there is no error output, we’ve successfully installed luigi!
如果没有错误输出,说明我们已经成功安装了 luigi!

Stage 0 - Target Scope
第 0 阶段--目标范围

Now that we have luigi installed, we can create our first Task. We need a way to feed input into our pipeline. Specifically, we’ll want to define the scope of our target. According to hackerone scope is defined as
现在我们已经安装了 luigi,可以创建第一个任务了。我们需要一种方法将输入输入到我们的管道中。具体来说,我们需要定义目标的范围。hackerone 将范围定义为

A collection of assets that hackers are to hack on.

For us, this boils down to either a list of ip addresses or a list of domains. Instead of trying to automate some method of pulling in scope files from bugcrowd, hackerone, synack, or some other platform, we can instead manually create the scope file and place it on disk for luigi to ingest. This approach allows us to use the pipeline for any of the bug bounty platforms, pentest targets, hack the box/CTFs, etc…
对我们来说,这可以归结为 IP 地址列表或域列表。与其尝试从 bugcrowd、hackerone、synack 或其他平台自动获取范围文件,不如手动创建范围文件并将其放在磁盘上供 luigi 接收。这种方法允许我们将管道用于任何 bug 赏金平台、pentest 目标、hack the box/CTFs 等...

Though, there is a contract of sorts that we’ll place upon ourselves with the scope file. We’ll eventually take different actions later in the pipeline based on whether or not the file contains ip addresses or domains. That means that, for the sake of simplicity, the scope file should only contain either ip addresses or domain names, not both.
不过,我们将与范围文件签订某种契约。我们最终会根据文件是否包含 IP 地址或域名,在稍后的流水线中采取不同的操作。这意味着,为了简单起见,范围文件只能包含 IP 地址或域名,而不能同时包含两者。

Directory Hierarchy 目录层次

To structure our directory layout, we’ll begin by creating a python module inside of our repository.
为了构建目录布局,我们首先要在版本库中创建一个 python 模块。

mkdir recon
touch recon/__init__.py

After that, inside of our recon module, we’ll create targets.py.
之后,我们将在侦察模块中创建 targets.py .

touch recon/targets.py

Now our directory structure should look like this:

├── Pipfile
├── Pipfile.lock
├── README.md
└── recon
    ├── __init__.py
    └── targets.py

Anatomy of a Task

Let’s spend a few minutes to look at the basics of the luigi Task class.
让我们花几分钟时间来了解一下 luigi 任务类的基础知识。

A Task describes a unit of work and is the base unit of work in luigi. To create a luigi Task, we’ll need to create a class that inherits from luigi.Task. We’ll also need to override a few methods:
任务描述了一个工作单位,是 luigi 中的基本工作单位。要创建一个 luigi 任务,我们需要创建一个继承自 luigi.Task 的类。我们还需要覆盖几个方法:

  • run() - contains the logic to be performed by this Task
    run() - 包含该任务要执行的逻辑
  • output() - the output Target that this task creates (e.g., a file, database entry, etc…)
    output() - 该任务创建的输出目标(如文件、数据库条目等)
  • requires() - the list of Tasks that this Task depends on
    requires() - 该任务依赖的任务列表

Each piece of functionality we add to the pipeline is some form of Task, so it’s essential to cover the basics before continuing.

targets.TargetList target.targetlist

Now that we have a file to work in, and we’ve covered the bare-bones essentials of the Task class, let’s start taking a look at some code!

targets.py holds our Task class that handles our scope file. Recall that this file is generated manually by the user. Typically, luigi Tasks get their input from some source, so ours is a special case for which the luigi creators planned. In luigi, when we need to say that a source outside of luigi generates the Task’s output, we use an ExternalTask. An ExternalTask is a subclass of luigi.Task discussed above, and doesn’t require overriding the run() method.
targets.py 是我们的任务类,用于处理我们的作用域文件。回想一下,这个文件是由用户手动生成的。通常情况下,luigi 任务会从某个源获取输入,因此我们的任务是一种特殊情况,而 luigi 的创建者也为此进行了规划。在 luigi 中,当我们需要说明任务的输出是由 luigi 之外的源生成时,我们使用 ExternalTask。ExternalTask 是上文讨论的 luigi.Task 的子类,无需重载 方法。 run()

 1import shutil
 2import logging
 3import ipaddress
 5import luigi
 8class TargetList(luigi.ExternalTask):
 9    target_file = luigi.Parameter()
10    -------------8<-------------

Each luigi Task can have Parameters. A Parameter handles creating the class’s constructor and a command-line parser option for that particular Task. We’ll see how to use Parameters from the command line shortly.
每个 luigi 任务都可以有参数。参数用于创建类的构造函数和特定任务的命令行解析器选项。我们很快就会看到如何在命令行中使用参数。

11    def output(self):
12        try:
13            with open(self.target_file) as f:
14                first_line = f.readline()
15                ipaddress.ip_interface(first_line.strip())  # is it a valid ip/network?
16        except OSError as e:
17            # can't open file; log error / return nothing
18            return logging.error(f"opening {self.target_file}: {e.strerror}")
19        except ValueError as e:
20            # exception thrown by ip_interface; domain name assumed
21            logging.debug(e)
22            with_suffix = f"{self.target_file}.domains"
23        else:
24            # no exception thrown; ip address found
25            with_suffix = f"{self.target_file}.ips"
27        shutil.copy(self.target_file, with_suffix)  # copy file with new extension
28        return luigi.LocalTarget(with_suffix)

Parameters are how we’ll pass user-controlled input to our class. In this case, it is the path to our scope file. A LocalTarget represents a local file on the file system. The LocalTarget here is what this particular Task produced and what it passes to tasks further down the pipeline.
参数是我们将用户控制的输入传递给类的方式。在本例中,它就是我们的作用域文件的路径。LocalTarget 表示文件系统中的本地文件。这里的 LocalTarget 是这个特定任务生成的文件,也是它传递给管道下游任务的文件。

The high-level description of this Task is that it opens the file specified by the user in the --target-file command-line option (seen below). It reads the first line to determine whether the file contains ip addresses or domain names (remember our contract of only one or the other?). After making that determination, it copies the target_file with either .ips or .domains appended to the filename. That’s it. The LocalTarget returned from this Task is available to the next Task in the pipeline by calling self.input().
该任务的高级描述是:打开用户在 --target-file 命令行选项(如下所示)中指定的文件。它会读取第一行,以确定文件中包含的是 IP 地址还是域名(还记得我们签订的只能选其一的合同吗?)确定后,它会复制 target_file ,并在文件名后附加 .ips.domains 。就是这样。通过调用 self.input() ,该任务返回的 LocalTarget 可供管道中的下一个任务使用。

We can update our local source code to what’s seen above (with docstrings/comments) by running the following command.

git checkout stage-0

Luigi Command Execution luigi 命令的执行

To run the pipeline, we’ll need to set our PYTHONPATH environment variable to the path of our project on disk. We can set the environment variable in a few ways; outlined below are two solutions.
要运行管道,我们需要将 PYTHONPATH 环境变量设置为磁盘上项目的路径。我们可以通过几种方法设置环境变量;下面概述了两种解决方案。

  1. Prepend PYTHONPATH=/path/to/recon-pipline to any luigi pipeline command being run.
    在任何正在运行的 luigi 管道命令前加上 PYTHONPATH=/path/to/recon-pipline
  2. Add export PYTHONPATH=/path/to/recon-pipeline to your .bashrc
    export PYTHONPATH=/path/to/recon-pipeline 添加到您的 .bashrc

We also need to specify --local-scheduler on the command line. While the --local-scheduler flag is useful for development purposes, it’s not recommended for production usage. There is also a centralized scheduler that runs as a system service and serves two purposes:
我们还需要在命令行中指定 --local-scheduler 。虽然 --local-scheduler 标志对开发很有用,但不建议在生产中使用。还有一个作为系统服务运行的集中式调度程序,它有两个作用:

  • Make sure two instances of the same task are not running simultaneously
  • Provide visualization of everything that’s going on.

For now, we’ll stick with --local-scheduler. As our pipeline becomes larger, we’ll swap over to the central scheduler.
目前,我们使用 --local-scheduler 。当我们的流水线规模变大时,我们将改用中央调度程序。

With our PYTHONPATH setup, luigi commands take on the following structure (prepend PYTHONPATH if not exported from .bashrc):
PYTHONPATH 的设置下,luigi 命令的结构如下(如未从 .bashrc 导出,则在 PYTHONPATH 之前添加):


We can get options for each module by running luigi --module PACKAGENAME.MODULENAME CLASSNAME --help
我们可以通过运行 luigi --module PACKAGENAME.MODULENAME CLASSNAME --help

An example help statement:

luigi --module recon.targets TargetList --help

usage: luigi [--local-scheduler] [--module CORE_MODULE] [--help] [--help-all]
             [--TargetList-target-file TARGETLIST_TARGET_FILE]
             [--target-file TARGET_FILE]
             [Required root task]

positional arguments:
  Required root task    Task family to run. Is not optional.

optional arguments:
  --local-scheduler     Use an in-memory central scheduler. Useful for
  --module CORE_MODULE  Used for dynamic loading of modules
  --help                Show most common flags and all task-specific flags
  --help-all            Show all command line flags
  --TargetList-target-file TARGETLIST_TARGET_FILE
  --target-file TARGET_FILE

Notice the --target-file option that we specified as a Parameter in our code above. Putting it all together, we can see an example scope file command, where tesla is the name of the file, and it is located in the current directory (ensure you’re in your python virtual environment).
注意我们在上面的代码中作为参数指定的 --target-file 选项。将所有内容放在一起,我们可以看到一个 scope file 命令示例,其中 tesla 是文件名,它位于当前目录下(确保处于 python 虚拟环境中)。

echo > tesla
PYTHONPATH=$(pwd) luigi --local-scheduler --module recon.targets TargetList --target-file tesla

DEBUG: Checking if TargetList(target_file=tesla) is complete
INFO: Informed scheduler that task   TargetList_tesla_591d3b1ff1   has status   DONE
INFO: Done scheduling tasks
INFO: Running Worker with 1 processes
DEBUG: Asking scheduler for work...
DEBUG: There are no more tasks to run at this time
INFO: Worker Worker(salt=092373507, workers=1, host=main, username=epi, pid=13645) was stopped. Shutting down Keep-Alive thread
===== Luigi Execution Summary =====

Scheduled 1 tasks of which:
* 1 complete ones were encountered:
    - 1 TargetList(target_file=tesla)

Did not run any tasks
This progress looks :) because there were no failed tasks or missing dependencies

===== Luigi Execution Summary =====

After running the command above, we see a new file in our current directory named tesla.ips.
运行上述命令后,我们会在当前目录下看到一个名为 tesla.ips 的新文件。

cat tesla.ips

Finalized Code 最终代码

Here we have the finalized code with comments.

 1import shutil
 2import logging
 3import ipaddress
 5import luigi
 8class TargetList(luigi.ExternalTask):
 9    """ External task.  `TARGET_FILE` is generated manually by the user from target's scope. """
11    target_file = luigi.Parameter()
13    def output(self):
14        """ Returns the target output for this task. target_file.ips || target_file.domains
16        In this case, it expects a file to be present in the local filesystem.
17        By convention, TARGET_NAME should be something like tesla or some other
18        target identifier.  The returned target output will either be target_file.ips
19        or target_file.domains, depending on what is found on the first line of the file.
21        Example:  Given a TARGET_FILE of tesla where the first line is tesla.com; tesla.domains
22        is written to disk.
24        Returns:
25            luigi.local_target.LocalTarget
26        """
27        try:
28            with open(self.target_file) as f:
29                first_line = f.readline()
30                ipaddress.ip_interface(first_line.strip())  # is it a valid ip/network?
31        except OSError as e:
32            # can't open file; log error / return nothing
33            return logging.error(f"opening {self.target_file}: {e.strerror}")
34        except ValueError as e:
35            # exception thrown by ip_interface; domain name assumed
36            logging.debug(e)
37            with_suffix = f"{self.target_file}.domains"
38        else:
39            # no exception thrown; ip address found
40            with_suffix = f"{self.target_file}.ips"
42        shutil.copy(self.target_file, with_suffix)  # copy file with new extension
43        return luigi.LocalTarget(with_suffix)

That wraps things up for this post. In the next installment, we’ll add masscan into our pipeline!
本期文章到此结束。在下一篇中,我们将把 masscan 添加到我们的管道中!

Additional Resources 其他资源

  1. Luigi - Building Workflows
    路易吉 - 建立工作流程
  2. Luigi - External Tasks
    路易吉 - 外部任务
  3. Luigi - Parameters 路易吉 - 参数
  4. Luigi - LocalTarget
  5. git tags git 标记