这是用户在 2024-10-18 14:49 为 https://citizenlab.ca/2024/10/should-we-chat-too-security-analysis-of-wechats-mmtls-encryption-proto... 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
ResearchApp Privacy and Controls

Should We Chat, Too? Security Analysis of WeChat’s MMTLS Encryption Protocol
我们也应该聊天吗?微信MMTLS加密协议的安全分析


作者:Mona Wang、Pellaeon Lin 和 Jeffrey Knockel 2024 年 10 月 15 日

閱讀繁體中文摘要 阅读繁体中文摘要

This report’s key findings are translated into Chinese, it has an accompanying FAQ, and the FAQ is translated into Simplified and Traditional Chinese.
该报告的主要发现已翻译成中文,并附有常见问题解答,该常见问题解答已翻译成简体中文和繁体中文

Key contributions 主要贡献

  • We performed the first public analysis of the security and privacy properties of MMTLS, the main network protocol used by WeChat, an app with over one billion monthly active users.
    我们对 MMTLS 的安全和隐私属性进行了首次公开分析,MMTLS 是微信使用的主要网络协议,该应用程序每月活跃用户超过 10 亿。
  • We found that MMTLS is a modified version of TLS 1.3, with many of the modifications that WeChat developers made to the cryptography introducing weaknesses.
    我们发现MMTLS是TLS 1.3的修改版本,微信开发者对密码学所做的许多修改引入了弱点。
  • Further analysis revealed that earlier versions of WeChat used a less secure, custom-designed protocol that contains multiple vulnerabilities, which we describe as “Business-layer encryption”. This layer of encryption is still being used in addition to MMTLS in modern WeChat versions.
    进一步分析显示,早期版本的微信使用了安全性较低的定制设计协议,其中包含多个漏洞,我们将其称为“业务层加密”。在现代微信版本中,除了 MMTLS 之外,仍然使用这一层加密。
  • Although we were unable to develop an attack to completely defeat WeChat’s encryption, the implementation is inconsistent with the level of cryptography you would expect in an app used by a billion users, such as its use of deterministic IVs and lack of forward secrecy.
    尽管我们无法开发出完全破解微信加密的攻击,但其实现与十亿用户使用的应用程序所期望的加密水平不一致,例如其使用确定性 IV 和缺乏前向保密。
  • These findings contribute to a larger body of work that suggests that apps in the Chinese ecosystem fail to adopt cryptographic best practices, opting instead to invent their own, often problematic systems.
    这些发现有助于更广泛的研究工作,表明中国生态系统中的应用程序未能采用加密最佳实践,而是选择发明自己的、经常存在问题的系统。
  • We are releasing technical tools and further documentation of our technical methodologies in an accompanying Github repository. These tools and documents, along with this main report, will assist future researchers to study WeChat’s inner workings.
    我们正在随附的Github 存储库中发布技术工具和技术方法的进一步文档。这些工具和文档以及这份主要报告将帮助未来的研究人员研究微信的内部运作。

Introduction 介绍

WeChat, with over 1.2 billion monthly active users, stands as the most popular messaging and social media platform in China and third globally. As indicated by market research, WeChat’s network traffic accounted for 34% of Chinese mobile traffic in 2018. WeChat’s dominance has monopolized messaging in China, making it increasingly unavoidable for those in China to use. With an ever-expanding array of features, WeChat has also grown beyond its original purpose as a messaging app.
微信每月活跃用户超过 12 亿,是中国最受欢迎、全球第三大的消息传递和社交媒体平台。据市场研究显示,2018年微信的网络流量占中国移动流量的34%。微信的霸主地位已经垄断了中国的消息传递,越来越成为中国人不可避免的使用方式。随着功能的不断丰富,微信的发展也超出了其作为消息应用程序的初衷。

Despite the universality and importance of WeChat, there has been little study of the proprietary network encryption protocol, MMTLS, used by the WeChat application. This knowledge gap serves as a barrier for researchers in that it hampers additional security and privacy study of such a critical application. In addition, homerolled cryptography is unfortunately common in many incredibly popular Chinese applications, and there have historically been issues with cryptosystems developed independently of well-tested standards such as TLS.
尽管微信具有普遍性和重要性,但对微信应用程序使用的专有网络加密协议 MMTLS 的研究却很少。这种知识差距对研究人员来说是一个障碍,因为它阻碍了对此类关键应用程序进行额外的安全和隐私研究。此外,不幸的是家庭密码学许多非常流行的中国应用程序中常见,并且独立于经过充分测试的标准(例如 TLS)开发的密码系统历来存在问题

This work is a deep dive into the mechanisms behind MMTLS and the core workings of the WeChat program. We compare the security and performance of MMTLS to TLS 1.3 and discuss our overall findings. We also provide public documentation and tooling to decrypt WeChat network traffic. These tools and documents, along with our report, will assist future researchers to study WeChat’s privacy and security properties, as well as its other inner workings.
这项工作深入探讨了 MMTLS 背后的机制和微信程序的核心工作原理。我们将 MMTLS 与 TLS 1.3 的安全性和性能进行比较,并讨论我们的总体发现。我们还提供公共文档和工具来解密微信网络流量。这些工具和文档以及我们的报告将帮助未来的研究人员研究微信的隐私和安全属性及其其他内部运作方式。

This report consists of a technical description of how WeChat launches a network request and its encryption protocols, followed by a summary of weaknesses in WeChat’s protocol, and finally a high-level discussion of WeChat’s design choices and their impact. The report is intended for privacy, security, or other technical researchers interested in furthering the privacy and security study of WeChat. For non-technical audiences, we have summarized our findings in this FAQ.
本报告首先对微信如何发起网络请求及其加密协议进行技术描述,然后总结微信协议的弱点,最后对微信的设计选择及其影响进行高层讨论。该报告面向隐私、安全或其他有兴趣进一步深入微信隐私和安全研究的技术研究人员。对于非技术受众,我们在此常见问题解答中总结了我们的发现。

Prior work on MMTLS and WeChat transport security
之前在 MMTLS 和微信传输安全方面的工作

Code internal to the WeChat mobile app refers to its proprietary TLS stack as MMTLS (MM is short for MicroMessenger, which is a direct translation of 微信, the Chinese name for WeChat) and uses it to encrypt the bulk of its traffic.
微信移动应用程序内部的代码将其专有的 TLS 堆栈称为 MMTLS(MM 是 MicroMessenger 的缩写,是微信的直接翻译),并使用它来加密其大部分流量。

There is limited public documentation of the MMTLS protocol. This technical document from WeChat developers describes in which ways it is similar and different from TLS 1.3, and attempts to justify various decisions they made to either simplify or change how the protocol is used. In this document, there are various key differences they identify between MMTLS and TLS 1.3, which help us understand the various modes of usage of MMTLS.
MMTLS 协议的公开文档有限。微信开发人员的这份技术文档描述了它与 TLS 1.3 的相似和不同之处,并试图证明他们为简化或改变协议使用方式而做出的各种决定的合理性。在本文档中,他们指出了 MMTLS 和 TLS 1.3 之间的各种关键差异,这有助于我们了解 MMTLS 的各种使用模式。

Wan et al. conducted the most comprehensive study of WeChat transport security in 2015 using standard security analysis techniques. However, this analysis was performed before the deployment of MMTLS, WeChat’s upgraded security protocol. In 2019, Chen et al. studied the login process of WeChat and specifically studied packets that are encrypted with TLS and not MMTLS.
万等人。 2015 年,利用标准安全分析技术对微信传输安全进行了最全面的研究。然而,这一分析是在微信升级的安全协议MMTLS部署之前进行的。 2019 年,陈等人。研究了微信的登录过程,特别研究了使用TLS而不是MMTLS加密的数据包。

As for MMTLS itself, in 2016 WeChat developers published a document describing the design of the protocol at a high level that compares the protocol with TLS 1.3. Other MMTLS publications focus on website fingerprinting-type attacks, but none specifically perform a security evaluation. A few Github repositories and blog posts look briefly into the wire format of MMTLS, though none are comprehensive. Though there has been little work studying MMTLS specifically, previous Citizen Lab reports have discovered security flaws of other cryptographic protocols designed and implemented by Tencent.
至于MMTLS本身,2016年微信开发者发布了一份文档,从高层描述了该协议的设计,将该协议与TLS 1.3进行了比较。其他 MMTLS 出版物主要关注网站指纹类型的攻击,但没有专门进行安全评估。一些Github 存储库博客文章简要介绍了 MMTLS 的有线格式,但都不够全面。尽管专门研究 MMTLS 的工作很少,但公民实验室之前的报告已经发现了腾讯设计和实现的其他加密协议的安全缺陷。

Methodology 方法论

We analyzed two versions of WeChat Android app:
我们分析了微信Android应用程序的两个版本:

  • Version 8.0.23 (APK “versionCode” 2160) released on May 26, 2022, downloaded from the WeChat website.
    版本8.0.23(APK“versionCode”2160)于2022年5月26日发布,从微信网站下载。
  • Version 8.0.21 (APK “versionCode” 2103) released on April 7, 2022, downloaded from Google Play Store.
    版本 8.0.21(APK“versionCode”2103)于 2022 年 4 月 7 日发布,从 Google Play 商店下载。

All findings in this report apply to both of these versions.
本报告中的所有发现均适用于这两个版本。

We used an account registered to a U.S. phone number for the analysis, which changes the behavior of the application compared to a mainland Chinese number. Our setup may not be representative of all WeChat users, and the full limitations are discussed further below.
我们使用注册到美国电话号码的帐户进行分析,与中国大陆号码相比,这改变了应用程序的行为。我们的设置可能无法代表所有微信用户,完整的限制将在下面进一步讨论。

For dynamic analysis, we analyzed the application installed on a rooted Google Pixel 4 phone and an emulated Android OS. We used Frida to hook the app’s functions and manipulate and export application memory. We also performed network analysis of WeChat’s network traffic using Wireshark. However, due to WeChat’s use of nonstandard cryptographic libraries like MMTLS, standard network traffic analysis tools that might work with HTTPS/TLS do not work for all of WeChat’s network activity. Our use of Frida was paramount for capturing the data and information flows we detail in this report. These Frida scripts are designed to intercept WeChat’s request data immediately before WeChat sends it to its MMTLS encryption module. The Frida scripts we used are published in our Github repository.
为了进行动态分析,我们分析了安装在已 root 的 Google Pixel 4 手机和模拟 Android 操作系统上的应用程序。我们使用Frida来挂钩应用程序的函数并操作和导出应用程序内存。我们还使用Wireshark对微信的网络流量进行了网络分析。然而,由于微信使用 MMTLS 等非标准加密库,可能与 HTTPS/TLS 配合使用的标准网络流量分析工具并不适用于微信的所有网络活动。我们使用 Frida 对于捕获我们在本报告中详细介绍的数据和信息流至关重要。这些 Frida 脚本旨在在微信将请求数据发送到其 MMTLS 加密模块之前立即拦截微信的请求数据。我们使用的 Frida 脚本发布在我们的 Github 存储库中。

For static analysis, we used Jadx, a popular Android decompiler, to decompile WeChat’s Android Dex files into Java code. We also used Ghidra and IDA Pro to decompile the native libraries (written in C++) bundled with WeChat.
为了进行静态分析,我们使用流行的Android反编译器Jadx将微信的Android Dex文件反编译为Java代码。我们还使用GhidraIDA Pro反编译微信捆绑的原生库(用 C++ 编写)。

Notation 符号

In this report, we reference a lot of code from the WeChat app. When we reference any code (including file names and paths), we will style the text using monospace fonts to indicate it is code. If a function is referenced, we will add empty parentheses after the function name, like this: somefunction(). The names of variables and functions that we show may come from one of the three following:
在本报告中,我们引用了微信应用程序中的大量代码。当我们引用任何代码(包括文件名和路径)时,我们将使用monospace fonts设置文本样式以表明它是代码。如果引用函数,我们将在函数名称后面添加空括号,如下所示: somefunction() 。我们显示的变量和函数的名称可能来自以下三个之一:

  1. The original decompiled name.
    原来的反编译名称。
  2. In cases where the name cannot be decompiled into a meaningful string (e.g., the symbol name was not compiled into the code), we rename it according to how the nearby internal log messages reference it.
    如果名称无法反编译为有意义的字符串(例如,符号名称未编译到代码中),我们将根据附近的内部日志消息引用它的方式对其进行重命名。
  3. In cases where there is not enough information for us to tell the original name, we name it according to our understanding of the code. In such cases, we will note that these names are given by us.
    如果没有足够的信息让我们说出原始名称,我们会根据我们对代码的理解来命名。在这种情况下,我们会注意到这些名称是由我们指定的。

In the cases where the decompiled name and log message name of functions are available, they are generally consistent. Bolded or italicized terms can refer to higher-level concepts or parameters we have named.
在函数的反编译名称和日志消息名称存在的情况下,一般是一致的。粗体或斜体术语可以指代我们命名的更高级别的概念或参数。

Utilization of open source components
开源组件的利用

We also identified open source components being used by the project, the two largest being OpenSSL and Tencent Mars. Based on our analysis of decompiled WeChat code, large parts of its code are identical to Mars. Mars is an “infrastructure component” for mobile applications, providing common features and abstractions that are needed by mobile applications, such as networking and logging.
我们还确定了该项目使用的开源组件,其中最大的两个是OpenSSL腾讯火星。根据我们对微信反编译代码的分析,其大部分代码与火星相同。 Mars 是移动应用程序的“基础设施组件”,提供移动应用程序所需的通用功能和抽象,例如网络和日志记录。

By compiling these libraries separately with debug symbols, we were able to import function and class definitions into Ghidra for further analysis. This helped tremendously to our understanding of other non-open-source code in WeChat. For instance, when we were analyzing the network functions decompiled from WeChat, we found a lot of them to be highly similar to the open source Mars, so we could just read the source code and comments to understand what a function was doing. What was not included in open source Mars are encryption related functions, so we still needed to read decompiled code, but even in these cases we were aided by various functions and structures that we already know from the open source Mars.
通过使用调试符号单独编译这些库,我们能够将函数和类定义导入 Ghidra 中以进行进一步分析。这对我们理解微信中其他非开源代码有很大帮助。比如,我们在分析微信反编译出来的网络函数时,发现很多与开源的Mars高度相似,所以我们只需阅读源码和注释就可以了解函数在做什么。开源Mars中没有包含与加密相关的功能,因此我们仍然需要阅读反编译的代码,但即使在这些情况下,我们也得到了我们从开源Mars中已经知道的各种功能和结构的帮助。

Matching decompiled code to its source
将反编译代码与其源代码进行匹配

In the internal logging messages of WeChat, which contain source file paths, we noticed three top level directories, which we have highlighted below:
在包含源文件路径的微信内部日志消息中,我们注意到三个顶级目录,我们在下面突出显示:

  • /home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars/
  • /home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars-wechat/
  • /home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars-private/

The source files under “mars” can all be found in the open source Mars repository as well, while source files in the other two top level directories cannot be found in the open source repository. To illustrate, below is a small section of decompiled code from libwechatnetwork.so :
mars下的源文件也都可以在开源Mars仓库中找到,而另外两个顶级目录下的源文件在开源仓库中找不到。为了说明这一点,下面是libwechatnetwork.so的一小部分反编译代码:

    XLogger::XLogger((XLogger *)&local_2c8,5,"mars::stn",

"/home/android/devopsAgent/workspace/p-e118ef4209d745e1b9ea0b1daa0137ab/src/mars/mars/stn/src/longlink.cc"
                ,"Send",0xb2,false,(FuncDef0 *)0x0);
    XLogger::Assert((XLogger *)&local_2c8,"tracker_.get()");
    XLogger::~XLogger((XLogger *)&local_2c8);

From its similarity, is highly likely that this section of code was compiled from this line in the Send() function, defined in longlink.cc file from the open source repository:
从其相似性来看,这部分代码很可能是从 Send() 函数中的这一行编译的,该函数在开源存储库的 longlink.cc 文件中定义:

xassert2(tracker_.get());

Reusing this observation, whenever our decompiler is unable to determine the name of a function, we can use logging messages within the compiled code to determine its name. Moreover, if the source file is from open source Mars, we can read its source code as well.
重用这一观察结果,每当我们的反编译器无法确定函数的名称时,我们就可以使用已编译代码中的日志消息来确定其名称。而且,如果源文件来自开源的Mars,我们也可以阅读它的源代码。

Three parts of Mars 火星的三个部分

In a few articles on the Mars wiki, Tencent developers provided the following motivations to develop Mars:
Mars wiki上的几篇文章中,腾讯开发者给出了以下开发Mars的动机:

According to its developers, Mars and its STN module are comparable to networking libraries such as AFNetworking and OkHttp, which are widely used in other mobile apps.
据其开发人员称,Mars 及其 STN 模块可与AFNetworkingOkHttp等网络库相媲美,这些库广泛用于其他移动应用程序中。

One of the technical articles released by the WeChat development team wrote about the process of open-sourcing Mars. According to the article, they had to separate WeChat-specific code, which was kept private, from the general use code, which was open sourced. In the end, three parts were separated from each other:
微信开发团队发布的一篇技术文章写了开源Mars的过程。根据这篇文章,他们必须将微信专用代码(保密)与通用代码(开源)分开。最终,三个部分相互分离:

  • mars-open: to be open sourced, independent repository.
    mars-open:开源、独立的存储库。
  • mars-private: potentially open sourced, depends on mars-open.
    mars-private:可能开源,依赖于 mars-open。
  • mars-wechat: WeChat business logic code, depends on mars-open and mars-private.
    mars-wechat:微信业务逻辑代码,依赖于mars-open和mars-private。

These three names match the top level directories we found earlier if we take “mars-open” to be in the “mars” top-level directory. Using this knowledge, when reading decompiled WeChat code, we could easily know whether it was WeChat-specific or not. From our reading of the code, mars-open contains basic and generic structures and functions, for instance, buffer structures, config stores, thread management and, most importantly, the module named “STN” responsible for network transmission. (We were unable to determine what STN stands for.) On the other hand, mars-wechat contains the MMTLS implementation, and mars-private is not closely related to the features within our research scope.
如果我们将“mars-open”放在“mars”顶级目录中,那么这三个名称与我们之前找到的顶级目录相匹配。利用这些知识,在阅读反编译的微信代码时,我们可以很容易地知道它是否是微信特有的。从我们阅读代码来看,mars-open包含基本的通用结构和功能,例如缓冲区结构配置存储线程管理,以及最重要的负责网络传输的名为“STN”的模块。 (我们无法确定 STN 代表什么。)另一方面,mars-wechat 包含 MMTLS 实现,而 mars-private 与我们研究范围内的功能关系并不密切。

As a technical side note, the open source Mars compiles to just one object file named “libmarsstn.so”. However, in WeChat, multiple shared object files reference code within the open source Mars, including the following:
作为技术旁注,开源 Mars 仅编译为一个名为“libmarsstn.so”的目标文件。然而,在微信中,多个共享对象文件引用了开源Mars内的代码,包括以下内容:

  • libwechatxlog.so
  • libwechatbase.so
  • libwechataccessory.so
  • libwechathttp.so
  • libandromeda.so
  • libwechatmm.so
  • libwechatnetwork.so

Our research focuses on the transport protocol and encryption of WeChat, which is implemented mainly in libwechatmm.so and libwechatnetwork.so. In addition, we inspected libMMProtocalJni.so, which is not part of Mars but contains functions for cryptographic calculations. We did not inspect the other shared object files.
我们的研究重点是微信的传输协议和加密,主要在libwechatmm.so和libwechatnetwork.so中实现。此外,我们还检查了 libMMProtocalJni.so,它不是 Mars 的一部分,但包含用于密码计算的函数。我们没有检查其他共享对象文件。

Matching Mars versions 匹配火星版本

Despite being able to find open source code to parts of WeChat, in the beginning of our research, we were unable to pinpoint the specific version of the source code of mars-open that was used to build WeChat. Later, we found version strings contained in libwechatnetwork.so. For WeChat 8.0.21, searching for the string “MARS_” yielded the following:
尽管能够找到微信部分开源代码,但在研究之初,我们无法确定用于构建微信的 mars-open 源代码的具体版本。后来我们发现libwechatnetwork.so中包含版本字符串。对于微信8.0.21,搜索字符串“MARS_”得到以下结果:

MARS_BRANCH: HEAD MARS_BRANCH:头
MARS_COMMITID: d92f1a94604402cf03939dc1e5d3af475692b551
MARS_COMMITID:d92f1a94604402cf03939dc1e5d3af475692b551

MARS_PRIVATE_BRANCH: HEAD
MARS_PRIVATE_BRANCH:头

MARS_PRIVATE_COMMITID: 193e2fb710d2bb42448358c98471cd773bbd0b16
MARS_PRIVATE_COMMITID:193e2fb710d2bb42448358c98471cd773bbd0b16

MARS_URL: 火星_网址:
MARS_PATH: HEAD MARS_PATH:头
MARS_REVISION: d92f1a9 MARS_REVISION:d92f1a9
MARS_BUILD_TIME: 2022-03-28 21:52:49
火星构建时间:2022-03-28 21:52:49

MARS_BUILD_JOB: rb/2022-MAR-p-e118ef4209d745e1b9ea0b1daa0137ab-22.3_1040
MARS_BUILD_JOB:rb/2022-MAR-p-e118ef4209d745e1b9ea0b1daa0137ab-22.3_1040

The specific MARS_COMMITID (d92f1a…) exists in the open source Mars repository. This version of the source code also matches the decompiled code.
具体的 MARS_COMMITID (d92f1a…) 存在于开源 Mars 存储库中。该版本的源代码也与反编译的代码相符。

Pinpointing the specific source code version helped us tremendously with Ghidra’s decompilation. Since a lot of the core data structures used in WeChat are from Mars, by importing the known data structures, we can observe the non-open-sourced code accessing structure fields, and inferring its purpose.
查明具体的源代码版本对我们的 Ghidra 反编译有很大帮助。由于微信中使用的很多核心数据结构都来自火星,通过导入已知的数据结构,我们可以观察非开源代码访问结构体字段的情况,并推断其用途。

Limitations 局限性

This investigation only looks at client behavior and is therefore subject to other common limitations in privacy research that can only perform client analysis. Much of the data that the client transmits to WeChat servers may be required for functionality of the application. For instance, WeChat servers can certainly see chat messages since WeChat can censor them according to their content. We cannot always measure what Tencent is doing with the data that they collect, but we can make inferences about what is possible. Previous work has made certain limited inferences about data sharing, such as that messages sent by non-mainland-Chinese users are used to train censorship algorithms for mainland Chinese users. In this report, we focus on the version of WeChat for non-mainland-Chinese users.
这项调查仅关注客户行为,因此受到隐私研究中只能执行客户分析的其他常见限制。客户端传输到微信服务器的大部分数据可能是应用程序功能所必需的。例如,微信服务器当然可以看到聊天消息,因为微信可以根据内容对其进行审查。我们无法总是衡量腾讯收集的数据在做什么,但我们可以推断出可能发生的事情。之前的工作对数据共享做出了某些有限的推论,例如非中国大陆用户发送的消息被用来训练中国大陆用户的审查算法。在本报告中,我们重点关注面向非中国大陆用户的微信版本。

Our investigation was also limited due to legal and ethical constraints. It has become increasingly difficult to obtain Chinese phone numbers for investigation due to the strict phone number and associated government ID requirements. Therefore, we did not test on Chinese phone numbers, which causes WeChat to behave differently. In addition, without a mainland Chinese account, the types of interaction with certain features and Mini Programs were limited. For instance, we did not perform financial transactions on the application.
由于法律和道德的限制,我们的调查也受到限制。由于严格的电话号码和相关的政府身份证件要求,获取中国电话号码进行调查变得越来越困难。因此,我们没有对中国电话号码进行测试,这导致微信的行为有所不同。此外,由于没有中国大陆账户,某些功能和小程序的交互类型受到限制。例如,我们没有在应用程序上进行金融交易。

Our primary analysis was limited to analyzing only two versions of WeChat Android (8.0.21 and 8.0.23). However, we also re-confirmed our tooling works on WeChat 8.0.49 for Android (released April 2024) and that the MMTLS network format matches that used by WeChat 8.0.49 for iOS. Testing different versions of WeChat, the backwards-compatibility of the servers with older versions of the application, and testing on a variety of Android operating systems with variations in API version, are great avenues for future work.
我们的主要分析仅限于分析微信 Android 的两个版本(8.0.21 和 8.0.23)。不过,我们还重新确认了我们的工具适用于 Android 版微信 8.0.49(2024 年 4 月发布),并且 MMTLS 网络格式与 iOS 版微信 8.0.49 使用的网络格式匹配。测试不同版本的微信、服务器与旧版本应用程序的向后兼容性,以及在 API 版本不同的各种 Android 操作系统上进行测试,都是未来工作的重要途径。

Within the WeChat Android app, we focused on its networking components. Usually, within a mobile application (and in most other programs as well), all other components will defer the work of communicating over the network to the networking components. Our research is not a complete security and privacy audit of the WeChat app, as even if the network communication is properly protected, other parts of the app still need to be secure and private. For instance, an app would not be secure if the server accepts any password to an account login, even if the password is confidentially transmitted.
在微信 Android 应用程序中,我们重点关注其网络组件。通常,在移动应用程序中(以及大多数其他程序中),所有其他组件都会将通过网络进行通信的工作推迟到网络组件。我们的研究并不是对微信应用程序进行完整的安全和隐私审计,因为即使网络通信受到适当的保护,应用程序的其他部分仍然需要安全和私密。例如,如果服务器接受帐户登录的任何密码,即使密码是秘密传输的,应用程序也不安全。

Tooling for studying WeChat and MMTLS
用于研究微信和MMTLS的工具

In the Github repository, we have released tooling that can log keys using Frida and decrypt network traffic that is captured during the same period of time, as well as samples of decrypted payloads. In addition, we have provided additional documentation and our reverse-engineering notes from studying the protocol. We hope that these tools and documentation will further aid researchers in the study of WeChat.
Github 存储库中,我们发布了一些工具,可以使用 Frida 记录密钥并解密同一时间段内捕获的网络流量以及解密的有效负载样本。此外,我们还提供了额外的文档和研究该协议的逆向工程笔记。我们希望这些工具和文档能够进一步帮助研究人员对微信的研究。

Launching a WeChat network request
发起微信网络请求

As with any other apps, WeChat is composed of various components. Components within WeChat can invoke the networking components to send or receive network transmissions. In this section, we provide a highly simplified description of the process and components surrounding sending a network request in WeChat. The actual process is much more complex, which we explain in more detail in a separate document. The specifics of data encryption is discussed in the next section “WeChat network request encryption”.
与任何其他应用程序一样,微信由各种组件组成。微信内的组件可以调用网络组件来发送或接收网络传输。在本节中,我们对微信中发送网络请求的流程和组件进行了高度简化的描述。实际过程要复杂得多,我们在单独的文档中更详细地解释。数据加密的具体细节将在下一节“微信网络请求加密”中讨论。

In the WeChat source code, each API is referred to as a different “Scene”. For instance, during the registration process, there is one API that submits all new account information provided by the user, called NetSceneReg. NetSceneReg is referred to by us as a “Scene class”, Other components could start a network request towards an API by calling the particular Scene class. In the case of NetSceneReg, it is usually invoked by a click event of a button UI component.
在微信源码中,每个API被称为不同的“场景”。例如,在注册过程中,有一个 API 可以提交用户提供的所有新帐户信息,称为NetSceneRegNetSceneReg被我们称为“Scene 类”,其他组件可以通过调用特定的 Scene 类向 API 发起网络请求。对于NetSceneReg ,它通常由按钮 UI 组件的单击事件调用。

Upon invocation, the Scene class would prepare the request data. The structure of the request data (as well as the response) is defined in “RR classes”. (We dub them RR classes because they tend to have “ReqResp” in their names.) Usually, one Scene class would correspond to one RR class. In the case of NetSceneReg, it corresponds to the RR class MMReqRespReg2, and contains fields like the desired username and phone number. For each API, its RR class also defines a unique internal URI (usually starting with “/cgi-bin”) and a “request type” number (an approximately 2–4 digit integer). The internal URI and request type number is often used throughout the code to identify different APIs. Once the data is prepared by the Scene class, it is sent to MMNativeNetTaskAdapter.
调用时,Scene 类将准备请求数据。请求数据(以及响应)的结构在“RR 类”中定义。 (我们将它们称为 RR 类,因为它们的名称中往往包含“ReqResp”。)通常,一个 Scene 类对应一个 RR 类。对于NetSceneReg ,它对应于 RR 类MMReqRespReg2 ,并包含所需的用户名和电话号码等字段。对于每个 API,其 RR 类还定义了一个唯一的内部 URI(通常以“/cgi-bin”开头)和一个“请求类型”编号(大约 2-4 位整数)。内部 URI 和请求类型编号通常在整个代码中使用来标识不同的 API。 Scene 类准备好数据后,会将其发送到MMNativeNetTaskAdapter

MMNativeNetTaskAdapter is a task queue manager, it manages and monitors the progress of each network connection and API requests. When a Scene Class calls MMNativeNetTaskAdapter, it places the new request (a task) onto the task queue, and calls the req2Buf() function. req2Buf() serializes the request Protobuf object that was prepared by the Scene Class into bytes, then encrypts the bytes using Business-layer Encryption.
MMNativeNetTaskAdapter是一个任务队列管理器,它管理和监视每个网络连接和 API 请求的进度。当场景类调用MMNativeNetTaskAdapter时,它将新请求(任务)放入任务队列中,并调用 req2Buf() 函数。 req2Buf() 将场景类准备的请求Protobuf对象序列化为字节,然后使用业务层加密对字节进行加密。

Finally, the resultant ciphertext from Business-layer encryption is sent to the “STN” module, which is part of Mars. STN then encrypts the data again using MMTLS Encryption. Then, STN establishes the network transport connection, and sends the MMTLS Encryption ciphertext over it. In STN, there are two types of transport connections: Shortlink and Longlink. Shortlink refers to an HTTP connection that carries MMTLS ciphertext. Shortlink connections are closed after one request-response cycle. Longlink refers to a long-lived TCP connection. A Longlink connection can carry multiple MMTLS encrypted requests and responses without being closed.
最后,业务层加密得到的密文被发送到“STN”模块,该模块是 Mars 的一部分。然后 STN 使用MMTLS 加密再次加密数据。然后,STN建立网络传输连接,并通过它发送MMTLS加密密文。在STN中,有两种类型的传输连接:短链路长链路。短链接是指携带MMTLS密文的HTTP连接。短链路连接在一个请求-响应周期后关闭。长链接是指长寿命的 TCP 连接。一个长链路连接可以承载多个MMTLS加密的请求和响应而无需关闭。

WeChat network request encryption
微信网络请求加密

WeChat network requests are encrypted twice, with different sets of keys. Serialized request data is first encrypted using what we call the Business-layer Encryption, as internal encryption is referred to in this blog post as occurring at the Business-layer. The Business-layer Encryption has two modes: Symmetric Mode and Asymmetric Mode. The resultant Business-layer-encrypted ciphertext is appended to metadata about the Business-layer request. Then, the Business-layer requests (i.e., request metadata and inner ciphertext) are additionally encrypted, using MMTLS Encryption. The final resulting ciphertext is then serialized as an MMTLS Request and sent over the wire.
微信网络请求经过两次加密,使用不同的密钥集。序列化请求数据首先使用我们所谓的业务层加密进行加密,因为本博客文章中将内部加密称为发生在业务层 。业务层加密有两种模式:对称模式非对称模式。生成的业务层加密密文将附加到有关业务层请求的元数据中。然后,使用MMTLS 加密对业务层请求(即请求元数据和内部密文)进行额外加密。最终生成的密文随后被序列化为MMTLS 请求并通过线路发送。

WeChat’s network encryption system is disjointed and seems to still be a combination of at least three different cryptosystems. The encryption process described in the Tencent documentation mostly matches our findings about MMTLS Encryption, but the document does not seem to describe in detail the Business-layer Encryption, whose operation differs when logged-in and when logged-out. Logged-in clients use Symmetric Mode while logged-out clients use Asymmetric Mode. We also observed WeChat utilizing HTTP, HTTPS, and QUIC to transmit large, static resources such as translation strings or transmitted files. The endpoint hosts for these communications are different from MMTLS server hosts. Their domain names also suggest that they belong to CDNs. However, the endpoints that are interesting to us are those that download dynamically generated, often confidential resources (i.e., generated by the server on every request) or endpoints where users transmit, often confidential, data to WeChat’s servers. These types of transmissions are made using MMTLS.
微信的网络加密系统是脱节的,似乎仍然是至少三种不同密码系统的组合。腾讯文档中描述的加密过程与我们对MMTLS加密的发现基本相符,但该文档似乎没有详细描述业务层加密其在登录注销时的操作有所不同。登录的客户端使用对称模式,而注销的客户端使用非对称模式。我们还观察到微信利用 HTTP、HTTPS 和 QUIC 传输大型静态资源,例如翻译字符串或传输的文件。这些通信的端点主机不同于 MMTLS 服务器主机。他们的域名也表明他们属于CDN 。然而,我们感兴趣的端点是那些下载动态生成的、通常是机密资源的端点(即,由服务器在每次请求时生成)或用户将数据(通常是机密)传输到微信服务器的端点。这些类型的传输是使用 MMTLS 进行的。

As a final implementation note, WeChat, across all these cryptosystems, uses internal OpenSSL bindings that are compiled into the program. In particular, the libwechatmm.so library seems to have been compiled with OpenSSL version 1.1.1l, though the other libraries that use OpenSSL bindings, namely libMMProtocalJni.so and libwechatnetwork.so were not compiled with the OpenSSL version strings. We note that OpenSSL internal APIs can be confusing and are often misused by well-intentioned developers. Our full notes about each of the OpenSSL APIs that are used can be found in the Github repository.
作为最终的实施说明,微信在所有这些加密系统中都使用编译到程序中的内部 OpenSSL 绑定。特别是, libwechatmm.so 库似乎是使用OpenSSL 版本 1.1.1l编译的,尽管使用 OpenSSL 绑定的其他库(即libMMProtocalJni.solibwechatnetwork.so未使用 OpenSSL 版本字符串进行编译。我们注意到 OpenSSL 内部 API 可能会令人困惑,并且经常被善意的开发人员滥用。我们关于所使用的每个 OpenSSL API 的完整说明可以在Github 存储库中找到。

In Table 1, we have summarized each of the relevant cryptosystems, how their keys are derived, how encryption and authentication are achieved, and which libraries contain the relevant encryption and authentication functions. We will discuss cryptosystem’s details in the coming sections.
在表1中,我们总结了每个相关的密码系统,它们的密钥是如何导出的,如何实现加密和认证,以及哪些库包含相关的加密和认证函数。我们将在接下来的章节中讨论密码系统的细节。

Key derivation 密钥导出 Encryption 加密 Authentication 验证 Library 图书馆 Functions that perform the symmetric encryption
执行对称加密的函数
MMTLS, Longlink MMTLS、长链路 Diffie-Hellman (DH) 迪菲-赫尔曼 (DH) AES-GCM AES-GCM tag AES-GCM 标签 libwechatnetwork.so Crypt()
MMTLS, Shortlink MMTLS、短链接 DH with session resumption
DH 与会话恢复
AES-GCM AES-GCM tag AES-GCM 标签 libwechatnetwork.so Crypt()
Business-layer, Asymmetric Mode
业务层,非对称模式
Static DH with fresh client keys
具有新客户端密钥的静态 DH
AES-GCM AES-GCM tag AES-GCM 标签 libwechatmm.so HybridEcdhEncrypt(), AesGcmEncryptWithCompress()
HybridEcdhEncrypt(), AesGcmEncryptWithCompress()
Business-layer, Symmetric Mode
业务层,对称模式
Fixed key from server 来自服务器的固定密钥 AES-CBC Checksum + MD5 校验和+MD5 libMMProtocalJNI.so pack(), EncryptPack(), genSignature()

Table 1: Overview of different cryptosystems for WeChat network request encryption, how keys are derived, how encryption and authentication are performed, and which libraries perform them.
表 1:微信网络请求加密的不同密码系统概述、如何导出密钥、如何执行加密和身份验证以及哪些库执行它们。

1. MMTLS Wire Format 1. MMTLS 有线格式

Since MMTLS can go over various transports, we refer to an MMTLS packet as a unit of correspondence within MMTLS. Over Longlink, MMTLS packets can be split across multiple TCP packets. Over Shortlink, MMTLS packets are generally contained within an HTTP POST request or response body.
由于 MMTLS 可以通过各种传输方式,因此我们将MMTLS 数据包称为 MMTLS 内的通信单元。通过长链路,MMTLS 数据包可以拆分为多个 TCP 数据包。通过短链路,MMTLS 数据包通常包含在 HTTP POST 请求或响应正文中。

1

Each MMTLS packet contains one or more MMTLS records (which are similar in structure and purpose to TLS records). Records are units of messages that carry handshake data, application data, or alert/error message data within each MMTLS packet.
每个 MMTLS 数据包包含一个或多个MMTLS 记录(其结构和用途与TLS 记录类似)。记录是在每个 MMTLS 数据包中携带握手数据、应用程序数据或警报/错误消息数据的消息单元。

1A. MMTLS Records 1A。 MMTLS 记录

Records can be identified by different record headers, a fixed 3-byte sequence preceding the record contents. In particular, we observed 4 different record types, with the corresponding record headers:
记录可以通过不同的记录头(记录内容之前的固定 3 字节序列)来标识。特别是,我们观察到 4 种不同的记录类型,以及相应的记录头

Handshake-Resumption Record
握手-恢复记录
19 f1 04
Handshake Record 握手记录 16 f1 04
Data Record 数据记录 17 f1 04
Alert Record 警报记录 15 f1 04

Handshake records contain metadata and the key establishment material needed for the other party to derive the same shared session key using Diffie-Hellman. Handshake-Resumption record contains sufficient metadata for “resuming” a previously established session, by re-using previously established key material. Data records can contain encrypted ciphertext that carries meaningful WeChat request data. Some Data packets simply contain an encrypted no-op heartbeat. Alert records signify errors or signify that one party intends to end a connection. In MMTLS, all non-handshake records are encrypted, but the key material used differs based on which stage of the handshake has been completed.
握手记录包含元数据和另一方使用 Diffie-Hellman 派生相同共享会话密钥所需的密钥建立材料。握手恢复记录包含足够的元数据,用于通过重新使用先前建立的密钥材料来“恢复”先前建立的会话。数据记录可以包含加密密文,其中携带有意义的微信请求数据。一些数据包仅包含加密的无操作心跳。警报记录表示错误或表示一方打算终止连接。在 MMTLS 中,所有非握手记录都被加密,但使用的密钥材料根据握手完成的阶段而有所不同。

Here is an annotated MMTLS packet from the server containing a Handshake record:
以下是来自服务器的带注释的 MMTLS 数据包,其中包含握手记录:


Here is an example of a Data record sent from the client to the server:
以下是从客户端发送到服务器的数据记录的示例:

To give an example of how these records interact, generally the client and server will exchange Handshake records until the Diffie-Hellman handshake is complete and they have established shared key material. Afterwards, they will exchange Data records, encrypted using the shared key material. When either side wants to close the connection, they will send an Alert record. More illustrations of each record type’s usage will be made in the following section.
举例说明这些记录如何交互,通常客户端和服务器将交换握手记录,直到 Diffie-Hellman 握手完成并且它们建立了共享密钥材料。之后,他们将交换使用共享密钥材料加密的数据记录。当任何一方想要关闭连接时,他们都会发送一条警报记录。下一节将详细说明每种记录类型的用法。

1B. MMTLS Extensions 1B. MMTLS 扩展

As MMTLS’ wire protocol is heavily modeled after TLS, we note that it has also borrowed the wire format of “TLS Extensions” to exchange relevant encryption data during the handshake. Specifically, MMTLS uses the same format as TLS Extensions for the Client to communicate their key share (i.e. the client’s public key) for Diffie-Hellman, similar to TLS 1.3’s key_share extension, and to communicate session data for session resumption (similar to TLS 1.3’s pre_shared_key extension). In addition, MMTLS has support for Encrypted Extensions, similar to TLS, but they are currently not used in MMTLS (i.e., the Encrypted Extensions section is always empty).
由于MMTLS的有线协议很大程度上模仿了TLS,我们注意到它还借用了“ TLS扩展”的有线格式来在握手过程中交换相关的加密数据。具体来说,MMTLS 使用与客户端的 TLS 扩展相同的格式来为 Diffie-Hellman 传递其密钥共享(即客户端的公钥),类似于 TLS 1.3 的key_share扩展,并传递用于会话恢复的会话数据(类似于TLS 1.3 的pre_shared_key扩展)。此外,MMTLS 支持Encrypted Extensions ,与 TLS 类似,但目前 MMTLS 中未使用它们(即Encrypted Extensions部分始终为空)。

2. MMTLS Encryption

This section describes the outer layer of encryption, that is, what keys and encryption functions are used to encrypt and decrypt the ciphertexts found in the MMTLS Wire Format” section, and how the encryption keys are derived.

The encryption and decryption at this layer occurs in the STN module, in a separate spawned “com.tencent.mm:push”

2 process on Android. The spawned process ultimately transmits and receives data over the network. The code for all of the MMTLS Encryption and MMTLS serialization were analyzed from the library libwechatnetwork.so. In particular, we studied the Crypt() function, a central function used for all encryption and decryption whose name we derived from debug logging code. We also hooked all calls to HKDF_Extract() and HKDF_Expand(), the OpenSSL functions for HKDF, in order to understand how keys are derived.

When the “:push” process is spawned, it starts an event loop in HandshakeLoop(), which processes all outgoing and incoming MMTLS Records. We hooked all functions called by this event loop to understand how each MMTLS Record is processed. The code for this study, as well as the internal function addresses identified for the particular version of WeChat we studied, can be found in the Github repository.

Figure 1: Network requests: MMTLS encryption connection over longlink and over shortlink. Each box is an MMTLS Record, and each arrow represents an “MMTLS packet” sent over either Longlink (i.e., a single TCP packet) or shortlink (i.e., in the body of HTTP POST). Once both sides have received the DH keyshare, all further records are encrypted.

2A. Handshake and key establishment

In order for Business-layer Encryption to start sending messages and establish keys, it has to use the MMTLS Encryption tunnel. Since the key material for the MMTLS Encryption has to be established first, the handshakes in this section happen before any data can be sent or encrypted via Business-layer Encryption. The end goal of the MMTLS Encryption handshake discussed in this section is to establish a common secret value that is known only to the client and server.

On a fresh startup of WeChat, it tries to complete one MMTLS handshake over Shortlink, and one MMTLS handshake over Longlink, resulting in two MMTLS encryption tunnels, each using different sets of encryption keys. For Longlink, after the handshake completes, the same Longlink (TCP) connection is kept open to transport future encrypted data. For Shortlink, the MMTLS handshake is completed in the first HTTP request-response cycle, then the first HTTP connection closes. The established keys are stored by the client and server, and when data needs to be sent over Shortlink, those established keys are used for encryption, then sent over a newly established Shortlink connection. In the remainder of this section, we describe details of the handshakes.

ClientHello

First, the client generates keypairs on the SECP256R1 elliptic curve. Note that these elliptic curve keys are entirely separate pairs from those generated in the Business-layer Encryption section. The client also reads some Resumption Ticket data from a file stored on local storage named psk.key, if it exists. The psk.key file is written to after the first ServerHello is received, so, on a fresh install of WeChat, the resumption ticket is omitted from the ClientHello.

The client first simultaneously sends a ClientHello message (contained in a Handshake record) over both the Shortlink and Longlink. The first of these two handshakes that completes successfully is the one that the initial Business-layer Encryption handshake occurs over (details of Business-layer Encryption are discussed in Section 4). Both Shortlink and Longlink connections are used afterwards for sending other data.

In both the initial Shortlink and Longlink handshake, each ClientHello packet contains the following data items:

  • ClientRandom (32 bytes of randomness)
  • Resumption Ticket data read from psk.key, if available
  • Client public key

An abbreviated version of the MMTLS ClientHello is shown below.

16 f1 04 (Handshake Record header) . . .
01 04 f1 (ClientHello) . . .
08 cd 1a 18 f9 1c . . . (ClientRandom) . . .
00 0c c2 78 00 e3 . . . (Resumption Ticket from psk.key) . . .
04 0f 1a 52 7b 55 . . . (Client public key) . . .

Note that the client generates a separate keypair for the Shortlink ClientHello and the Longlink ClientHello. The Resumption Ticket sent by the client is the same on both ClientHello packets because it is always read from the same psk.key file. On a fresh install of WeChat, the Resumption Ticket is omitted since there is no psk.key file.

ServerHello

The client receives a ServerHello packet in response to each ClientHello packet. Each contains:

  • A record containing ServerRandom and Server public key
  • Records containing encrypted server certificate, new resumption ticket, and a ServerFinished message.

An abbreviated version of the MMTLS ServerHello is shown below; a full packet sample with labels can be found in the annotated network capture.

16 f1 04 (Handshake Record header) . . .
02 04 f1 (ServerHello) . . .
2b a6 88 7e 61 5e 27 eb . . . (ServerRandom) . . .
04 fa e3 dc 03 4a 21 d9 . . . (Server public key) . . .
16 f1 04 (Handshake Record header) . . .
b8 79 a1 60 be 6c . . . (ENCRYPTED server certificate) . . .
16 f1 04 (Handshake Record header) . . .
1a 6d c9 dd 6e f1 . . . (ENCRYPTED NEW resumption ticket) . . .
16 f1 04 (Handshake Record header) . . .
b8 79 a1 60 be 6c . . . (ENCRYPTED ServerFinished) . . .

On receiving the server public key, the client generates

secret = ecdh(client_private_key, server_public_key).

Note that since each MMTLS encrypted tunnel uses a different pair of client keys, the shared secret, and any derived keys and IVs will be different between MMTLS tunnels. This also means Longlink handshake and Shortlink handshake each compute a different shared secret.

Then, the shared secret is used to derive several sets of cryptographic parameters via HKDF, a mathematically secure way to transform a short secret value into a long secret value. In this section, we will focus on the handshake parameters. Alongside each set of keys, initialization vectors (IVs) are also generated. The IV is a value that is needed to initialize the AES-GCM encryption algorithm. IVs do not need to be kept secret. However, they need to be random and not reused.

The handshake parameters are generated using HKDF (“handshake key expansion” is a constant string in the program, as well as other monotype double quoted strings in this section):

key_enc, key_dec, iv_enc, iv_dec = HKDF(secret, 56, “handshake key expansion”)

Using key_dec and iv_dec, the client can decrypt the remainder of the ServerHello records. Once decrypted, the client validates the server certificate. Then, the client also saves the new Resumption Ticket to the file psk.key.

At this point, since the shared secret has been established, the MMTLS Encryption Handshake is considered completed. To start encrypting and sending data, the client derives other sets of parameters via HKDF from the shared secret. The details of which keys are derived and used for which connections are fully specified in these notes where we annotate the keys and connections created on WeChat startup.

2B. Data encryption

After the handshake, MMTLS uses AES-GCM with a particular key and IV, which are tied to the particular MMTLS tunnel, to encrypt data. The IV is incremented by the number of records previously encrypted with this key. This is important because re-using an IV with the same key destroys the confidentiality provided in AES-GCM, as it can lead to a key recovery attack using the known tag.

ciphertext, tag = AES-GCM(input, key, iv+n)
ciphertext = ciphertext | tag

The 16-byte tag is appended to the end of the ciphertext. This tag is authentication data computed by AES-GCM; it functions as a MAC in that when verified properly, this data provides authentication and integrity. In many cases, if this is a Data record being encrypted, input contains metadata and ciphertext that has already been encrypted as described in the Business-layer Encryption section.

We separately discuss data encryption in Longlink and Shortlink in the following subsections.

Client-side Encryption for Longlink packets is done using AES-GCM with key_enc and iv_enc derived earlier in the handshake. Client-side Decryption uses key_dec and iv_dec. Below is a sample Longlink (TCP) packet containing a single data record containing an encrypted heartbeat message from the server

3:

17 f1 04     RECORD HEADER (of type “DATA”)
00 20                                           RECORD LENGTH
e6 55 7a d6 82 1d a7 f4 2b 83 d4 b7 78 56 18 f3         ENCRYPTED DATA
1b 94 27 e1 1e c3 01 a6 f6 23 6a bc 94 eb 47 39             TAG (MAC)

Within a long-lived Longlink connection, the IV is incremented for each record encrypted. If a new Longlink connection is created, the handshake is restarted and new key material is generated.

Shortlink connections can only contain a single MMTLS packet request and a single MMTLS packet response (via HTTP POST request and response, respectively). After the initial Shortlink ClientHello sent on startup, WeChat will send ClientHello with Handshake Resumption packets. These records have the header 19 f1 04 instead of the 16 f1 04 on the regular ClientHello/ServerHello handshake packets.

An abbreviated sample of a Shortlink request packet containing Handshake Resumption is shown below.

19 f1 04 (Handshake Resumption Record header) . . .
01 04 f1 (ClientHello) . . .
9b c5 3c 42 7a 5b 1a 3b . . . (ClientRandom) . . .
71 ae ce ff d8 3f 29 48 . . . (NEW Resumption Ticket) . . .
19 f1 04 (Handshake Resumption Record header) . . .
47 4c 34 03 71 9e . . . (ENCRYPTED Extensions) . . .
17 f1 04 (Data Record header) . . .
98 cd 6e a0 7c 6b . . . (ENCRYPTED EarlyData) . . .
15 f1 04 (Alert Record header) . . .
8a d1 c3 42 9a 30 . . . (ENCRYPTED Alert (ClientFinished)) . . .

Note that, based on our understanding of the MMTLS protocol, the ClientRandom sent in this packet is not used at all by the server, because there is no need to re-run Diffie-Hellman in a resumed session. The Resumption Ticket is used by the server to identify which prior-established shared secret should be used to decrypt the following packet content.

Encryption for Shortlink packets is done using AES-GCM with the handshake parameters key_enc and iv_enc. (Note that, despite their identical name, key_enc and iv_enc here are different from those of the Longlink, since Shortlink and Longlink each complete their own handshake using different elliptic curve client keypair.) The iv_enc is incremented for each record encrypted. Usually, EarlyData records sent over Shortlink contain ciphertext that has been encrypted with Business-layer Encryption as well as associated metadata. This metadata and ciphertext will then be additionally encrypted at this layer.

The reason this is referred to as EarlyData internally in WeChat is likely due to it being borrowed from TLS; typically, it refers to the data that is encrypted with a key derived from a pre-shared key, before the establishment of a regular session key via Diffie-Hellman. However, in this case, when using Shortlink, there is no data sent “after the establishment of a regular session key”, so almost all Shortlink data is encrypted and sent in this EarlyData section.

Finally, ClientFinished indicates that the client has finished its side of the handshake. It is an encrypted Alert record with a fixed message that always follows the EarlyData Record. From our reverse-engineering, we found that the handlers for this message referred to it as ClientFinished.

3. Business-layer Request

MMTLS Data Records either carry an “Business-layer request” or heartbeat messages. In other words, if one decrypts the payload from an MMTLS Data Record, the result will often be messages described below.

This Business-layer request contains several metadata parameters that describe the purpose of the request, including the internal URI and the request type number, which we briefly described in the “Launching a WeChat network request” section.

When logged-in, the format of a Business-layer request looks like the following:

00 00 00 7b                 (total data length)
00 24                       (URI length)
/cgi-bin/micromsg-bin/...   (URI)
00 12                       (hostname length)
sgshort.wechat.com          (hostname)
00 00 00 3D                 (length of rest of data)
BF B6 5F                    (request flags)
41 41 41 41                 (user ID)
42 42 42 42                 (device ID)
FC 03 48 02 00 00 00 00     (cookie)
1F 9C 4C 24 76 0E 00        (cookie)
D1 05 varint                (request_type)
0E 0E 00 02                 (4 more varints)
BD 95 80 BF 0D varint       (signature)
FE                          (flag)
80 D2 89 91
04 00 00                    (marks start of data)
08 A6 29 D1 A4 2A CA F1 ... (ciphertext)

Responses are formatted very similarly:

bf b6 5f                    (flags)
41 41 41 41                 (user ID)
42 42 42 42                 (device ID)
fc 03 48 02 00 00 00 00     (cookie)
1f 9c 4c 24 76 0e 00        (cookie)
fb 02 varint                (request_type)
35 35 00 02 varints
a9 ad 88 e3 08 varint       (signature)
fe
ba da e0 93
04 00 00                    (marks start of data)
b6 f8 e9 99 a1 f4 d1 20 . . . ciphertext

This request then contains another encrypted ciphertext, which is encrypted by what we refer to as Business-layer Encryption. Business-layer Encryption is separate from the system we described in the MMTLS Encryption section. The signature mentioned above is the output of genSignature(), which is discussed in the “Integrity check” section. Pseudocode for the serialization schemes and more samples of WeChat’s encrypted request header can be found in our Github repository.

4. Business-layer Encryption

WeChat Crypto diagrams (inner layer)

This section describes how the Business-layer requests described in Section 3 are encrypted and decrypted, and how the keys are derived. We note that the set of keys and encryption processes introduced in this section are completely separate from those referred to in the MMTLS Encryption section. Generally, for Business-layer Encryption, much of the protocol logic is handled in the Java code, and the Java code calls out to the C++ libraries for encryption and decryption calculations. Whereas for MMTLS Encryption everything is handled in C++ libraries, and occurs on a different process entirely. There is very little interplay between these two layers of encryption.

The Business-layer Encryption has two modes using different cryptographic processes: Asymmetric Mode and Symmetric Mode. To transition into Symmetric Mode, WeChat needs to perform an Autoauth request. Upon startup, WeChat typically goes through the three following stages:

  1. Before the user logs in to their account, Business-layer Encryption first uses asymmetric cryptography to derive a shared secret via static Diffie-Hellman (static DH), then uses the shared secret as a key to AES-GCM encrypt the data. We name this Asymmetric Mode. In Asymmetric Mode, the client derives a new shared secret for each request.
  2. Using Asymmetric Mode, WeChat can send an Autoauth request, to which the server would return an Autoauth response, which contains a session_key.
  3. After the client obtains session_key, Business-layer Encryption uses it to AES-CBC encrypt the data. We name this Symmetric Mode since it only uses symmetric cryptography. Under Symmetric Mode, the same session_key can be used for multiple requests.

For Asymmetric Mode, we performed dynamic and static analysis of C++ functions in libwechatmm.so; in particular the HybridEcdhEncrypt() and HybridEcdhDecrypt() functions, which call AesGcmEncryptWithCompress() / AesGcmDecryptWithUncompress(), respectively.

For Symmetric Mode, the requests are handled in pack(), unpack(), and genSignature() functions in libMMProtocalJNI.so. Generally, pack() handles outgoing requests, and unpack() handles incoming responses to those requests. They also perform encryption/decryption. Finally, genSignature() computes a checksum over the full request. In the Github repository, we’ve uploaded pseudocode for pack, AES-CBC encryption, and the genSignature routine.

The Business-layer Encryption is also tightly integrated with WeChat’s user authentication system. The user needs to log in to their account before the client is able to send an Autoauth request. For clients that have not logged in, they exclusively use Asymmetric Mode. For clients that have already logged in, their first Business-layer packet would most often be an Autoauth request encrypted using Asymmetric Mode, however, the second and onward Business-layer packets are encrypted using Symmetric Mode.

Figure 2: Business-layer encryption, logged-out, logging-in, and logged-in: Swimlane diagrams showing at a high-level what Business-layer Encryption requests look like, including which secrets are used to generate the key material used for encryption. 🔑secret is generated via DH(static server public key, client private key), and 🔑new_secret is DH(server public key, client private key). 🔑session is decrypted from the first response when logged-in. Though it isn’t shown above, 🔑new_secret is also used in genSignature() when logged-in; this signature is sent with request and response metadata.

4A. Business-layer Encryption, Asymmetric Mode

Before the user logs in to their WeChat account, the Business-layer Encryption process uses a static server public key, and generates new client keypair to agree on a static Diffie-Hellman shared secret for every WeChat network request. The shared secret is run through the HKDF function and any data is encrypted with AES-GCM and sent alongside the generated client public key so the server can calculate the shared secret.

For each request, the client generates a public, private keypair for use with ECDH. We also note that the client has a static server public key pinned in the application. The client then calculates an initial secret.

secret = ECDH(static_server_pub, client_priv)
hash = sha256(client_pub)
client_random = <32 randomly generated bytes>
derived_key = HKDF(secret)

derived_key is then used to AES-GCM encrypt the data, which we describe in detail in the next section.

4B. Business-layer Encryption, obtaining session_key

If the client is logged-in (i.e., the user has logged in to a WeChat account on a previous app run), the first request will be a very large data packet authenticating the client to the server (referred to as Autoauth in WeChat internals) which also contains key material. We refer to this request as the Autoauth request. In addition, the client pulls a locally-stored key autoauth_key, which we did not trace the provenance of, since it does not seem to be used other than in this instance. The key for encrypting this initial request (authrequest_data) is derived_key, calculated in the same way as in Section 4A. The encryption described in the following is the Asymmetric Mode encryption, albeit a special case where the data is the authrequest_data.

Below is an abbreviated version of a serialized and encrypted Autoauth request:

    08 01 12 . . . [Header metadata]
    04 46 40 96 4d 3e 3e 7e [client_publickey] . . .
    fa 5a 7d a7 78 e1 ce 10 . . . [ClientRandom encrypted w secret]
    a1 fb 0c da . . .               [IV]
    9e bc 92 8a 5b 81 . . .         [tag]
    db 10 d3 0f f8 e9 a6 40 . . . [ClientRandom encrypted w autoauth_key]
    75 b4 55 30 . . .               [IV]
    d7 be 7e 33 a3 45 . . .         [tag]
    c1 98 87 13 eb 6f f3 20 . . . [authrequest_data encrypted w derived_key]
    4c ca 86 03 . .                 [IV]
    3c bc 27 4f 0e 7b . . .         [tag]

A full sample of the Autoauth request and response at each layer of encryption can be found in the Github repository. Finally, we note that the autoauth_key above does not seem to be actively used outside of encrypting in this particular request. We suspect this is vestigial from a legacy encryption protocol used by WeChat.

The client encrypts here using AES-GCM with a randomly generated IV, and uses a SHA256 hash of the preceding message contents as AAD. At this stage, the messages (including the ClientRandom messages) are always ZLib compressed before encryption.

iv = <12 random bytes>
compressed = zlib_compress(plaintext)
ciphertext, tag = AESGCM_encrypt(compressed, aad = hash(previous), derived_key, iv)

In the above, previous is the header of the request (i.e. all header bytes preceding the 04 00 00 marker of data start). The client appends the 12-byte IV, then the 16-byte tag, onto the ciphertext. This tag can be used by the server to verify the integrity of the ciphertext, and essentially functions as a MAC.

4B1. Obtaining session_key: Autoauth Response

The response to autoauth is serialized similarly to the request:

08 01 12 . . . [Header metadata]
04 46 40 96 4d 3e 3e 7e [new_server_pub] . . .
c1 98 87 13 eb 6f f3 20 . . . [authresponse_data encrypted w new_secret]
4c ca 86 03 . . [IV]
3c bc 27 4f 0e 7b . . . [tag]

With the newly received server public key (new_server_pub), which is different from the static_server_pub hardcoded in the app, the client then derives a new secret (new_secret). new_secret is then used as the key to AES-GCM decrypt authresponse_data. The client can also verify authresponse_data with the given tag.

new_secret = ECDH(new_server_pub, client_privatekey)
authresponse_data= AESGCM_decrypt(aad = hash(authrequest_data),
new_secret, iv)

authresponse_data is a serialized Protobuf containing a lot of important data for WeChat to start, starting with a helpful Everything is ok status message. A full sample of this Protobuf can be found in the Github repository. Most importantly, authresponse_data contains session_key, which is the key used for future AES-CBC encryption under Symmetric Mode. From here on out, new_secret is only used in genSignature(), which is discussed below in Section 4C2 Integrity Check.

We measured the entropy of the session_key provided by the server, as it is used for future encryption. This key exclusively uses printable ASCII characters, and is thus limited to around ~100 bits of entropy.

The WeChat code refers to three different keys: client_session, server_session, and single_session. Generally, client_session refers to the client_publickey, server_session refers to the shared secret key generated using ECDH i.e. new_secret, and single_session refers to the session_key provided by the server.

4C. Business-layer Encryption, Symmetric Mode

After the client receives session_key from the server, future data is encrypted using Symmetric Mode. Symmetric Mode encryption is mostly done using AES-CBC instead of AES-GCM, with the exception of some large files being encrypted with AesGcmEncryptWithCompress(). As AesGcmEncryptWithCompress() requests are the exception, we focus on the more common use of AES-CBC.

Specifically, the Symmetric Mode uses AES-CBC with PKCS-7 padding, with the session_key as a symmetric key:

ciphertext = AES-CBC(PKCS7_pad(plaintext), session_key, iv = session_key)

This session_key is doubly used as the IV for encryption.

4C1. Integrity check

In Symmetric Mode, a function called genSignature() calculates a pseudo-integrity code on the plaintext. This function first calculates the MD5 hash of WeChat’s assigned user ID for the logged-in user (uin), new_secret, and the plaintext length. Then, genSignature() uses Adler32, a checksumming function, on the MD5 hash concatenated with the plaintext.

signature = adler32(md5(uin | new_secret | plaintext_len) |
            plaintext)

The result from Adler32 is concatenated to the ciphertext as metadata (see Section 3A for how it is included in the request and response headers), and is referred to as a signature in WeChat’s codebase. We note that though it is referred to as a signature, it does not provide any cryptographic properties; details can be found in the Security Issues section. The full pseudocode for this function can also be found in the Github repository.

5. Protobuf data payload

The input to Business-layer Encryption is generally a serialized Protobuf, optionally compressed with Zlib. When logged-in, many of the Protobufs sent to the server contain the following header data:

"1": {
    "1": "\u0000",
    "2": "1111111111", # User ID (assigned by WeChat)
    "3": "AAAAAAAAAAAAAAA\u0000", # Device ID (assigned by WeChat)
    "4": "671094583", # Client Version
    "5": "android-34", # Android Version
    "6": "0"
    },

The Protobuf structure is defined in each API’s corresponding RR class, as we previously mentioned in the “Launching a WeChat network request” section.

6. Putting it all together

In the below diagram, we demonstrate the network flow for the most common case of opening the WeChat application. We note that in order to prevent further complicating the diagram, HKDF derivations are not shown; for instance, when “🔑mmtls” is used, HKDF is used to derive a key from “🔑mmtls”, and the derived key is used for encryption. The specifics of how keys are derived, and which derived keys are used to encrypt which data, can be found in these notes.

Figure 3: Swimlane diagram demonstrating the encryption setup and network flow of the most common case (user is logged in, opens WeChat application).

We note that other configurations are possible. For instance, we have observed that if the Longlink MMTLS handshake completes first, the Business-layer “Logging-in” request and response can occur over the Longlink connection instead of over several shortlink connections. In addition, if the user is logged-out, Business-layer requests are simply encrypted with 🔑secret (resembling Shortlink 2 requests)

Security issues

In this section, we outline potential security issues and privacy weaknesses we identified with the construction of the MMTLS encryption and Business-layer encryption layers. There could be other issues as well.

Issues with MMTLS encryption

Below we detail the issues we found with WeChat’s MMTLS encryption.

Deterministic IV

The MMTLS encryption process generates a single IV once per connection. Then, they increment the IV for each subsequent record encrypted in that connection. Generally, NIST recommends not using a wholly deterministic derivation for IVs in AES-GCM since it is easy to accidentally re-use IVs. In the case of AES-GCM, reuse of the (key, IV) tuple is catastrophic as it allows key recovery from the AES-GCM authentication tags. Since these tags are appended to AES-GCM ciphertexts for authentication, this enables plaintext recovery from as few as 2 ciphertexts encrypted with the same key and IV pair.

In addition, Bellare and Tackmann have shown that the use of a deterministic IV can make it possible for a powerful adversary to brute-force a particular (key, IV) combination. This type of attack applies to powerful adversaries, if the crypto system is deployed to a very large (i.e., the size of the Internet) pool of (key, IV) combinations being chosen. Since WeChat has over a billion users, this order of magnitude puts this attack within the realm of feasibility.

Lack of forward secrecy

Forward secrecy is generally expected of modern communications protocols to reduce the importance of session keys. Generally, TLS itself is forward-secret by design, except in the case of the first packet of a “resumed” session. This first packet is encrypted with a “pre-shared key”, or PSK established during a previous handshake.

MMTLS makes heavy use of PSKs by design. Since the Shortlink transport format only supports a single round-trip of communication (via a single HTTP POST request and response), any encrypted data sent via the transport format is encrypted with a pre-shared key. Since leaking the shared `PSK_ACCESS` secret would enable a third-party to decrypt any EarlyData sent across multiple MMTLS connections, data encrypted with the pre-shared key is not forward secret. The vast majority of records encrypted via MMTLS are sent via the Shortlink transport, which means that the majority of network data sent by WeChat is not forward-secret between connections. In addition, when opening the application, WeChat creates a single long-lived Longlink connection. This long-lived Longlink connection is open for the duration of the WeChat application, and any encrypted data that needs to be sent is sent over the same connection. Since most WeChat requests are either encrypted using (A) a session-resuming PSK or (B) the application data key of the long-lived Longlink connection, WeChat’s network traffic often does not retain forward-secrecy between network requests.

Issues with Business-layer encryption

On its own, the business-layer encryption construction, and, in particular the Symmetric Mode, AES-CBC construction, has many severe issues. Since the requests made by WeChat are double-encrypted, and these concerns only affect the inner, business layer of encryption, we did not find an immediate way to exploit them. However, in older versions of WeChat which exclusively used business-layer encryption, these issues would be exploitable.

Metadata leak

Business-layer encryption does not encrypt metadata such as the user ID and request URI, as shown in the “Business-layer request” section. This issue is also acknowledged by the WeChat developers themselves to be one of the motivations to develop MMTLS encryption.

Forgeable genSignature integrity check

While the purpose of the genSignature code is not entirely clear, if it is being used for authentication (since the ecdh_key is included in the MD5) or integrity, it fails on both parts. A valid forgery can be calculated with any known plaintext without knowledge of the ecdh_key. If the client generates the following for some known plaintext message plaintext:

sig = adler32(md5(uin | ecdh_key | plaintext_len) | plaintext)

We can do the following to forge the signature evil_sig for some evil_plaintext with length plaintext_len:

evil_sig = sig - adler32(plaintext) + adler32(evil_plaintext)

Subtracting and adding from adler32 checksums is achievable by solving for a system of equations when the message is short. Code for subtracting and adding to adler32 checksum, thereby forging this integrity check, can be found in adler.py in our Github repository.

Possible AES-CBC padding oracle

Since AES-CBC is used alongside PKCS7 padding, it is possible that the use of this encryption on its own would be susceptible to an AES-CBC padding oracle, which can lead to recovery of the encrypted plaintext. Earlier this year, we found that another custom cryptography scheme developed by a Tencent company was susceptible to this exact attack.

Key, IV re-use in block cipher mode

Re-using the key as the IV for AES-CBC, as well as re-using the same key for all encryption in a given session (i.e., the length of time that the user has the application opened) introduces some privacy issues for encrypted plaintexts. For instance, since the key and the IV provide all the randomness, re-using both means that if two plaintexts are identical, they will encrypt to the same ciphertext. In addition, due to the use of CBC mode in particular, two plaintexts with identical N block-length prefixes will encrypt to the same first N ciphertext blocks.

Encryption key issues

It is highly unconventional for the server to choose the encryption key used by the client. In fact, we note that the encryption key generated by the server (the “session key”) exclusively uses printable ASCII characters. Thus, even though the key is 128 bits long, the entropy of this key is at most 106 bits.

No forward secrecy

As mentioned in the previous section, forward-secrecy is a standard property for modern network communication encryption. When the user is logged-in, all communication with WeChat, at this encryption layer, is done with the exact same key. The client does not receive a new key until the user closes and restarts WeChat.

Other versions of WeChat

To confirm our findings, we also tested our decryption code on WeChat 8.0.49 for Android (released April 2024) and found that the MMTLS network format matches that used by WeChat 8.0.49 for iOS.

Previous versions of WeChat network encryption

To understand how WeChat’s complex cryptosystems are tied together, we also briefly reverse-engineered an older version of WeChat that did not utilize MMTLS. The newest version of WeChat that did not utilize MMTLS was v6.3.16, released in 2016. Our full notes on this reverse-engineering can be found here.

While logged-out, requests were largely using the Business-layer Encryption cryptosystem, using RSA public-key encryption rather than static Diffie-Hellman plus symmetric encryption via AES-GCM. We observed requests to the internal URIs cgi-bin/micromsg-bin/encryptcheckresupdate and cgi-bin/micromsg-bin/getkvidkeystrategyrsa.

There was also another encryption mode used, DES with a static key. This mode was used for sending crash logs and memory stacks; POST requests to the URI /cgi-bin/mmsupport-bin/stackreport were encrypted using DES.

We were not able to login to this version for dynamic analysis, but from our static analysis, we determined that the encryption behaves the same as Business-layer Encryption when logged-in (i.e. using a session_key provided by the server for AES-CBC encryption).

Discussion

Why does Business-layer encryption matter?

Since Business-layer encryption is wrapped in MMTLS, why should it matter whether or not it is secure? First, from our study of previous versions of WeChat, Business-layer encryption was the sole layer of encryption for WeChat network requests until 2016. Second, from the the fact that Business-layer encryption exposes internal request URI unencrypted, one of the possible architectures for WeChat would be to host different internal servers to handle different types of network requests (corresponding to different “requestType” values and different cgi-bin request URLs). It could be the case, for instance, that after MMTLS is terminated at the front WeChat servers (handles MMTLS decryption), the inner WeChat request that is forwarded to the corresponding internal WeChat server is not re-encrypted, and therefore solely encrypted using Business-layer encryption. A network eavesdropper, or network tap, placed within WeChat’s intranet could then attack the Business-layer encryption on these forwarded requests. However, this scenario is purely conjectural. Tencent’s response to our disclosure is concerned with issues in Business-layer encryption and implies they are slowly migrating from the more problematic AES-CBC to AES-GCM, so Tencent is also concerned with this.

Why not use TLS?

According to public documentation and confirmed by our own findings, MMTLS (the “Outer layer” of encryption) is based heavily on TLS 1.3. In fact, the document demonstrates that the architects of MMTLS have a decent understanding of asymmetric cryptography in general.

The document contains reasoning for not using TLS. It explains that the way WeChat uses network requests necessitates something like 0-RTT session resumption, because the majority of WeChat data transmission needs only one request-response cycle (i.e., Shortlink). MMTLS only required one round-trip handshake to establish the underlying TCP connection before any application data can be sent; according to this document, introducing another round-trip for the TLS 1.2 handshake was a non-starter.

Fortunately, TLS1.3 proposes a 0-RTT (no additional network delay) method for the protocol handshake. In addition, the protocol itself provides extensibility through the version number, CipherSuite, and Extension mechanisms. However, TLS1.3 is still in draft phases, and its implementation may still be far away. TLS1.3 is also a general-purpose protocol for all apps, given the characteristics of WeChat, there is great room for optimization. Therefore, at the end, we chose to design and implement our own secure transport protocol, MMTLS, based on the TLS1.3 draft standard. [originally written in Chinese]

However, even at the time of writing in 2016, TLS 1.2 did provide an option for session resumption. In addition, since WeChat controls both the servers and the clients, it doesn’t seem unreasonable to deploy the fully-fledged TLS 1.3 implementations that were being tested at the time, even if the IETF draft was incomplete.

Despite the architects of MMTLS’ best effort, generally, the security protocols used by WeChat seem both less performant and less secure than TLS 1.3. Generally speaking, designing a secure and performant transport protocol is no easy feat.

The issue of performing an extra round-trip for a handshake has been a perennial issue for application developers. The TCP and TLS handshake each require a single round-trip, meaning each new data packet sent requires two round-trips. Today, TLS-over-QUIC combines the transport-layer and encryption-layer handshakes, requiring only a single handshake. QUIC provides the best of both worlds, both strong, forward-secret encryption, and halving the number of round-trips needed for secure communication. Our recommendation would be for WeChat to migrate to a standard QUIC implementation.

Finally, there is also the issue of client-side performance, in addition to network performance. Since WeChat’s encryption scheme performs two layers of encryption per request, the client is performing double the work to encrypt data, than if they used a single standardized cryptosystem.

The trend of home-rolled cryptography in Chinese applications

The findings here contribute to much of our prior research that suggests the popularity of home-grown cryptography in Chinese applications. In general, the avoidance of TLS and the preference for proprietary and non-standard cryptography is a departure from cryptographic best practices. While there may have been many legitimate reasons to distrust TLS in 2011 (like EFF and Access Now’s concerns over the certificate authority ecosystem), the TLS ecosystem has largely stabilized since then, and is more auditable and transparent. Like MMTLS, all the proprietary protocols we have researched in the past contain weaknesses relative to TLS, and, in some cases, could even be trivially decrypted by a network adversary. This is a growing, concerning trend unique to the Chinese security landscape as the global Internet progresses towards technologies like QUIC or TLS to protect data in transit.

Anti-DNS-hijacking mechanisms

Similar to how Tencent wrote their own cryptographic system, we found that in Mars they also wrote a proprietary domain lookup system. This system is part of STN and has the ability to support domain name to IP address lookups over HTTP. This feature is referred to as “NewDNS” in Mars. Based on our dynamic analysis, this feature is regularly used in WeChat. At first glance, NewDNS duplicates the same functions already provided by DNS (Domain Name System), which is already built into nearly all internet-connected devices.

WeChat is not the only app in China that utilizes such a system. Major cloud computing providers in China such as Alibaba Cloud and Tencent Cloud both offer their own DNS over HTTP service. A VirusTotal search for apps that tries to contact Tencent Cloud’s DNS over HTTP service endpoint (119.29.29.98) yielded 3,865 unique results.

One likely reason for adopting such a system is that ISPs in China often implement DNS hijacking to insert ads and redirect web traffic to perform ad fraud. The problem was so serious that six Chinese internet giants issued a joint statement in 2015 urging ISPs to improve. According to the news article, about 1–2% of traffic to Meituan (an online shopping site) suffers from DNS hijacking. Ad fraud by Chinese ISPs seems to remain a widespread problem in recent years.

Similar to their MMTLS cryptographic system, Tencent’s NewDNS domain lookup system was motivated by trying to meet the needs of the Chinese networking environment. DNS proper over the years has proven to have multiple security and privacy issues. Compared to TLS, we found that WeChat’s MMTLS has additional deficiencies. However, it remains an open question as to, when compared to DNS proper, whether NewDNS is more or less problematic. We leave this question for future work.

Use of Mars STN outside WeChat

We speculate that there is a widespread adoption of Mars (mars-open) outside of WeChat, based on the following observations:

The adoption of Mars outside of WeChat is concerning because Mars by default does not provide any transport encryption. As we have mentioned in the “Three Parts of Mars” section, the MMTLS encryption used in WeChat is part of mars-wechat, which is not open source. The Mars developers also have no plans to add support of TLS, and expect other developers using Mars to implement their own encryption in the upper layers. To make matters worse, implementing TLS within Mars seems to require a fair bit of architectural changes. Even though it would not be unfair for Tencent to keep MMTLS proprietary, MMTLS is still the main encryption system that Mars was designed for, leaving MMTLS proprietary would mean other developers using Mars would have to either devote significant resources to integrate a different encryption system with Mars, or leave everything unencrypted.

Mars is also lacking in documentation. The official wiki only contains a few, old articles on how to integrate with Mars. Developers using Mars often resort to asking questions on GitHub. The lack of documentation means that developers are more prone to making mistakes, and ultimately reducing security.

Further research is needed in this area to analyze the security of apps that use Tencent’s Mars library.

“Tinker”, a dynamic code-loading module

In this section, we tentatively refer to the APK downloaded from the Google Play Store as “WeChat APK”, and the APK downloaded from WeChat’s official website as “Weixin APK”. The distinction between WeChat and Weixin seems blurry. The WeChat APK and Weixin APK contain partially different code, as we will later discuss in this section. However, when installing both of these APKs to an English-locale Android Emulator, they both show their app names as “WeChat”. Their application ID, which is used by the Android system and Google Play Store to identify apps, are also both “com.tencent.mm”. We were also able to login to our US-number accounts using both APKs.

Unlike the WeChat APK, we found that the Weixin APK contains Tinker, “a hot-fix solution library”. Tinker allows the developer to update the app itself without calling Android’s system APK installer by using a technique called “dynamic code loading”. In an earlier report we found a similar distinction between TikTok and Douyin, where we found Douyin to have a similar dynamic code-loading feature that was not present in TikTok. This feature raises three concerns:

  1. If the process for downloading and loading the dynamic code does not sufficiently authenticate the downloaded code (e.g., that it is cryptographically signed with the correct public key, that it is not out of date, and that it is the code intended to be downloaded and not other cryptographically signed and up-to-date code), an attacker might be able to exploit this process to run malicious code on the device (e.g., by injecting arbitrary code, by performing a downgrade attack, or by performing a sidegrade attack). Back in 2016, we found such instances in other Chinese apps.
  2. Even if the code downloading and loading mechanism contains no weaknesses, the dynamic code loading feature still allows the application to load code without notifying the user, bypassing users’ consent to decide what program could run on their device. For example, the developer may push out an unwanted update, and the users do not have a choice to keep using the old version. Furthermore, a developer may selectively target a user with an update that compromises their security or privacy. In 2016, a Chinese security analyst accused Alibaba of pushing dynamically loaded code to Alipay to surreptitiously take photos and record audio on his device.
  3. Dynamically loading code deprives app store reviewers from reviewing all relevant behavior of an app’s execution. As such, the Google Play Developer Program Policy does not permit apps to use dynamic code loading.

When analyzing the WeChat APK, we found that, while it retains some components of Tinker. The component which seems to handle the downloading of app updates is present, however the core part of Tinker that handles loading and executing the downloaded app updates has been replaced with “no-op” functions, which perform no actions. We did not analyze the WeChat binaries available from other third party app stores.

Further research is needed to analyze the security of Tinker’s app update process, whether WeChat APKs from other sources contain the dynamic code loading feature, as well as any further differences between the WeChat APK and Weixin APK.

Recommendations

In this section, we make recommendations based on our findings to relevant audiences.

To application developers

Implementing proprietary encryption is more expensive, less performant, and less secure than using well-scrutinized standard encryption suites. Given the sensitive nature of data that can be sent by applications, we encourage application developers to use tried-and-true encryption suites and protocols and to avoid rolling their own crypto. SSL/TLS has seen almost three decades of various improvements as a result of rigorous public and academic scrutiny. TLS configuration is now easier than ever before, and the advent of QUIC-based TLS has dramatically improved performance.

To Tencent and WeChat developers

Below is a copy of the recommendations we sent to WeChat and Tencent in our disclosure. The full disclosure correspondence can be found in the Appendix.

In this post from 2016, WeChat developers note that they wished to upgrade their encryption, but the addition of another round-trip for the TLS 1.2 handshake would significantly degrade WeChat network performance, as the application relies on many short bursts of communication. At that time, TLS 1.3 was not yet an RFC (though session resumption extensions were available for TLS 1.2), so they opted to “roll their own” and incorporate TLS 1.3’s session resumption model into MMTLS.

This issue of performing an extra round-trip for a handshake has been a perennial issue for application developers around the world. The TCP and TLS handshake each require a single round-trip, meaning each new data packet sent requires two round-trips. Today, TLS-over-QUIC combines the transport-layer and encryption-layer handshakes, requiring only a single handshake. QUIC was developed for this express purpose, and can provide both strong, forward-secret encryption, while halving the number of round-trips needed for secure communication. We also note that WeChat seems to already use QUIC for some large file downloads. Our recommendation would be for WeChat to migrate entirely to a standard TLS or QUIC+TLS implementation.

There is also the issue of client-side performance, in addition to network performance. Since WeChat’s encryption scheme performs two layers of encryption per request, the client is performing double the work to encrypt data than if WeChat used a single standardized cryptosystem.

To operating systems

On the web, client-side browser security warnings and the use of HTTPS as a ranking factor in search engines contributed to widespread TLS adoption. We can draw loose analogies to the mobile ecosystem’s operating systems and application stores.

Is there any platform or OS-level permission model that can indicate regular usage of standard encrypted network communications? As we mentioned in our prior work studying proprietary cryptography in Chinese IME keyboards, OS developers could consider device permission models that surface whether applications use lower-level system calls for network access.

To high-risk users with privacy concerns

Many WeChat users use it out of necessity rather than choice. For users with privacy concerns who are using WeChat out of necessity, our recommendations from the previous report still hold:

  • Avoid features delineated as “Weixin” services if possible. We note that many core “Weixin” services (such as Search, Channels, Mini Programs) as delineated by the Privacy Policy perform more tracking than core “WeChat” services.
  • When possible, prefer web or applications over Mini Programs or other such embedded functionality.
  • Use stricter device permissions and update your software and OS regularly for security features.

In addition, due to the risks introduced by dynamic code loading in WeChat downloaded from the official website, we recommend users to instead download WeChat from the Google Play Store whenever possible. For users who have already installed WeChat from the official website, removing and re-installing the Google Play Store version would also mitigate the risk.

To security and privacy researchers

As WeChat has over one billion users, we posit that the order of magnitude of global MMTLS users is on a similar order of magnitude as global TLS users. Despite this, there is little-to-no third-party analysis or scrutiny of MMTLS, as there is in TLS. At this scale of influence, MMTLS deserves similar scrutiny as TLS. We implore future security and privacy researchers to build on this work to continue the study of the MMTLS protocol, as from our correspondences, Tencent insists on continuing to use and develop MMTLS for WeChat connections.

Acknowledgments

We would like to thank Jedidiah Crandall, Jakub Dalek, Prateek Mittal, and Jonathan Mayer for their guidance and feedback on this report. Research for this project was supervised by Ron Deibert.

Appendix

In this appendix, we detail our disclosure to Tencent concerning our findings and their response.

April 24, 2024 — Our disclosure

To Whom It May Concern:

The Citizen Lab is an academic research group based at the Munk School of Global Affairs & Public Policy at the University of Toronto in Toronto, Canada.

We analyzed WeChat v8.0.23 on Android and iOS as part of our ongoing work analyzing popular mobile and desktop apps for security and privacy issues. We found that WeChat’s proprietary network encryption protocol, MMTLS, contains weaknesses compared to modern network encryption protocols, such as TLS or QUIC+TLS. For instance, the protocol is not forward-secret and may be susceptible to replay attacks. We plan on publishing a documentation of the MMTLS network encryption protocol and strongly suggest that WeChat, which is responsible for the network security of over 1 billion users, switch to a strong and performant encryption protocol like TLS or QUIC+TLS.

For further details, please see the attached document.

Timeline to Public Disclosure

The Citizen Lab is committed to research transparency and will publish details regarding the security vulnerabilities it discovers in the context of its research activities, absent exceptional circumstances, on its website: https://citizenlab.ca/.

The Citizen Lab will publish the details of our analysis no sooner than 45 calendar days from the date of this communication.

Should you have any questions about our findings please let us know. We can be reached at this email address: disclosure@citlab.utoronto.ca.

Sincerely,

The Citizen Lab

May 17, 2024 — Tencent’s response

Thank you for your report.Since receiving your report on April 25th, 2024, we have conducted a careful evaluation.The core of WeChat’s security protocol is outer layer mmtls encryption, currently ensuring that outer layer mmtls encryption is secure. On the other hand, the encryption issues in the inner layer are handled as follows: the core data traffic has been switched to AES-GCM encryption, while other traffic is gradually switching from AES-CBC to AES-GCM.If you have any other questions, please let us know.thanks.


  1. The terms “shortlink” and “longlink” do not seem to be specific to WeChat, since it was also mentioned in other technical blogs.↩︎
  2. On Android, the main process is named after the app ID, “com.tencent.mm”. (The process name can be seen using the ps command in adb shell.) When an app starts a new process, it assigns a name. The assigned name will be added to the app ID to form the full name of the new process. So the “:push” process’s full name is “com.tencent.mm:push”.↩︎
  3. This server heartbeat is a reply to a prior client-sent heartbeat.↩︎