Zura Kakushadze 祖拉·卡库沙泽 Quantigic Solutions LLC, High Ridge Road, #135, Stamford, CT 06905 Quantigic Solutions LLC, 高岭路, #135, 斯坦福, CT 06905 Free University of Tbilisi, Business School & School of Physics 第比利斯自由大学,商学院与物理学院240, David Agmashenebeli Alley, Tbilisi, 0159, Georgia 乔治亚州,第比利斯,阿格马申贝利大街 240 号,邮政编码 0159
December 9, 2015 2015 年 12 月 9 日
"There are two kinds of people in this world: 这个世界上有两种人:
Those seeking happiness, and bullfighters." 那些追求幸福的人和斗牛士。
(Zura Kakushadze, ca. early '90s) (祖拉·卡库沙泽,大约 90 年代初)
Abstract 摘要
We present explicit formulas - that are also computer code - for 101 real-life quantitative trading alphas. Their average holding period approximately ranges 0.6-6.4 days. The average pair-wise correlation of these alphas is low, . The returns are strongly correlated with volatility, but have no significant dependence on turnover, directly confirming an earlier result based on a more indirect empirical analysis. We further find empirically that turnover has poor explanatory power for alpha correlations. 我们提供了 101 个现实生活中量化交易阿尔法的明确公式——这些公式也是计算机代码。它们的平均持有期大约在 0.6-6.4 天之间。这些阿尔法的平均成对相关性较低, 。收益与波动性高度相关,但与换手率没有显著依赖,直接证实了基于更间接实证分析的早期结果。我们进一步实证发现,换手率对阿尔法相关性的解释能力较差。
1. Introduction 1. 引言
There are two complementary - and in some sense even competing - trends in modern quantitative trading. On the one hand, more and more market participants (e.g., quantitative traders, inter alia) employ sophisticated quantitative techniques to mine alphas. This results in ever fainter and more ephemeral alphas. On the other hand, technological advances allow to essentially automate (much of) the alpha harvesting process. This yields an ever increasing number of alphas, whose count can be in hundreds of thousands and even millions, and with the exponentially increasing progress in this field will likely be in billions before we know it... 在现代量化交易中,有两种互补的——在某种意义上甚至是竞争的——趋势。一方面,越来越多的市场参与者(例如,量化交易者等)采用复杂的量化技术来挖掘阿尔法。 这导致阿尔法变得越来越微弱和短暂。另一方面,技术进步使得基本上可以自动化(大部分)阿尔法收割过程。这产生了越来越多的阿尔法,其数量可以达到数十万甚至数百万,随着这一领域的指数级进展,可能在不久的将来就会达到数十亿……
This proliferation of alphas - albeit mostly faint and ephemeral - allows combining them in a sophisticated fashion to arrive at a unified "mega-alpha". It is then this "mega-alpha" that is actually traded - as opposed to trading individual alphas - with a bonus of automatic internal crossing of trades (and thereby crucial-for-profitability savings on trading costs, etc.), alpha portfolio diversification (which hedges against any subset of alphas going bust in any given time period), and so on. One of the challenges in combining alphas is the usual "too many variables, too few observations" dilemma. Thus, the alpha sample covariance matrix is badly singular. 这种阿尔法的激增——尽管大多数是微弱和短暂的——使得可以以复杂的方式将它们结合起来,从而得出一个统一的“超级阿尔法”。实际上交易的是这个“超级阿尔法”——而不是单独交易各个阿尔法——并且有自动内部交叉交易的额外好处(从而在交易成本等方面节省了对盈利至关重要的费用)、阿尔法投资组合的多样化(对任何特定时间段内任何子集阿尔法破产的对冲)等等。结合阿尔法的挑战之一是通常的“变量过多,观察值过少”的困境。因此,阿尔法样本协方差矩阵严重奇异。
Also, naturally, quantitative trading is a secretive field and data and other information from practitioners is not readily available. This inadvertently creates an enigma around modern quant trading. E.g., with such a large number of alphas, are they not highly correlated with each other? What do these alphas look like? Are they mostly based on price and volume data, mean-reversion, momentum, etc.? How do alpha returns depend on volatility, turnover, etc.? 此外,自然地,量化交易是一个保密的领域,来自从业者的数据和其他信息并不容易获得。这无意中为现代量化交易创造了一个谜团。例如,考虑到如此大量的阿尔法,它们之间是否高度相关?这些阿尔法是什么样的?它们主要是基于价格和成交量数据、均值回归、动量等吗?阿尔法收益如何依赖于波动性、换手率等?
In a previous paper [Kakushadze and Tulchinsky, 2015] took a step in demystifying the realm of modern quantitative trading by studying some empirical properties of 4,000 real-life alphas. In this paper we take another step and present explicit formulas - that are also computer code - for 101 real-life quant trading alphas. Our formulaic alphas - albeit most are not necessarily all that "simple" - serve a purpose of giving the reader a glimpse into what some of the simpler real-life alphas look like. It also enables the reader to replicate and test these alphas on historical data and do new research and other empirical analyses. Hopefully, it further inspires (young) researchers to come up with new ideas and create their own alphas. 在之前的论文中[Kakushadze 和 Tulchinsky, 2015]通过研究 4000 个真实的阿尔法,迈出了揭开现代量化交易领域神秘面纱的一步。在本文中,我们又迈出了一步,提出了 101 个真实量化交易阿尔法的显式公式——这些公式也是计算机代码。我们的公式化阿尔法——尽管大多数并不一定“简单”——旨在让读者一窥一些较简单的真实阿尔法的样子。 这也使读者能够在历史数据上复制和测试这些阿尔法,并进行新的研究和其他实证分析。希望这能进一步激励(年轻)研究人员提出新想法并创造自己的阿尔法。
We discuss some general features of our formulaic alphas in Section 2. These alphas are mostly "price-volume" (daily close-to-close returns, open, close, high, low, volume and vwap) based, albeit "fundamental" input is used in some of the alphas, including one alpha utilizing market cap, and a number of alphas employing some kind of a binary industry classification such as GICS, BICS, NAICS, SIC, etc., which are used to industry-neutralize various quantities. 我们在第二节讨论了我们公式化阿尔法的一些一般特征。这些阿尔法主要基于“价格-成交量”(每日收盘到收盘的回报、开盘、收盘、最高、最低、成交量和加权平均价格),尽管在一些阿尔法中使用了“基本面”输入,包括一个利用市值的阿尔法,以及一些采用某种二元行业分类(如 GICS、BICS、NAICS、SIC 等)的阿尔法,这些分类用于行业中性化各种数量。
We discuss empirical properties of our alphas in Section 3 based on data for individual alpha Sharpe ratio, turnover and cents-per-share, and also on a sample covariance matrix. The average holding period approximately ranges from 0.6 to 6.4 days. The average (median) pairwise correlation of these alphas is low, (14.3%). The returns are strongly correlated with the volatility , and as in [Kakushadze and Tulchinsky, 2015] we find an empirical scaling 我们在第 3 节讨论了我们的阿尔法的经验特性,基于个别阿尔法夏普比率、换手率和每股分币的数据,以及样本协方差矩阵。平均持有期大约在 0.6 到 6.4 天之间。这些阿尔法的平均(中位数)成对相关性较低, (14.3%)。收益 与波动性 高度相关,正如[Kakushadze 和 Tulchinsky, 2015]中所发现的,我们发现了经验缩放。
with for our 101 alphas. Furthermore, we find that the returns have no significant dependence on the turnover . This is a direct confirmation of an earlier result by [Kakushadze and Tulchinsky, 2015], which is based on a more indirect empirical analysis. 与我们的 101 个阿尔法 。此外,我们发现收益与换手率 没有显著依赖关系。这直接确认了[Kakushadze 和 Tulchinsky, 2015]的早期结果,该结果基于更间接的实证分析。
We further find empirically that the turnover per se has poor explanatory power for alpha correlations. This is not to say that the turnover does not add value in, e.g., modeling the covariance matrix via a factor model. A more precise statement is that pair-wise correlations of the alphas label the alphas, ) are not highly correlated with the product , where , and is an a priori arbitrary normalization constant. 我们进一步实证发现,成交量本身对阿尔法相关性解释能力较差。这并不是说成交量在例如通过因子模型建模协方差矩阵时没有增加价值。更准确的说法是,阿尔法的成对相关性标记的阿尔法与产品的相关性不高,其中是一个先验的任意归一化常数。
We briefly conclude in Section 4. Appendix A contains our formulaic alphas with definitions of the functions, operators and input data used therein. Appendix B contains some legalese. 我们在第 4 节中简要总结。附录 A 包含我们的公式化阿尔法及其所用函数、运算符和输入数据的定义。附录 B 包含一些法律术语。
2. Formulaic Alphas 2. 公式化阿尔法
In this section we describe some general features of our 101 formulaic alphas. The alphas are proprietary to WorldQuant LLC and are used here with its express permission. We provide as many details as we possibly can within the constraints imposed by the proprietary nature of the alphas. The formulaic expressions - that are also computer code - are given in Appendix A. 在本节中,我们描述了我们 101 个公式化阿尔法的一些一般特征。这些阿尔法是 WorldQuant LLC 的专有产品,并在此获得其明确许可。我们在专有性质的限制下尽可能提供尽可能多的细节。公式化表达式——也是计算机代码——在附录 A 中给出。
Very coarsely, one can think of alpha signals as based on mean-reversion or momentum. A mean-reversion alpha has a sign opposite to the return on which it is based. E.g., a simple mean-reversion alpha is given by 非常粗略地说,可以将阿尔法信号视为基于均值回归或动量。 均值回归阿尔法的符号与其所基于的收益相反。例如,一个简单的均值回归阿尔法可以表示为
-ln(today's open / yesterday's close) -ln(今天的开盘价 / 昨天的收盘价)
Here yesterday's close is adjusted for any splits and dividends if the ex-date is today. The idea (or hope) here is that the stock will mean-revert and give back part of the gains (if today's open is higher than yesterday's close) or recoup part of the losses (if today's open is lower than yesterday's close). This is a so-called "delay-0" alpha. Generally, "delay-0" means that the time of some data (e.g., a price) used in the alpha coincides with the time during which the alpha is intended to be traded. E.g., the alpha (2) would ideally be traded at or, more realistically, as close as possible to today's open. More broadly, this can be some other time, e.g., the close. 这里昨天的收盘价已根据任何拆分和分红进行调整,如果除息日是今天。这里的想法(或希望)是股票会回归均值,并回吐部分收益(如果今天的开盘价高于昨天的收盘价)或收回部分损失(如果今天的开盘价低于昨天的收盘价)。这被称为“延迟-0”阿尔法。一般来说,“延迟-0”意味着在阿尔法中使用的某些数据(例如,价格)的时间与阿尔法计划交易的时间相吻合。例如,阿尔法(2)理想情况下应在今天的开盘价时交易,或者更现实地,尽可能接近今天的开盘价。更广泛地说,这可以是其他时间,例如收盘。
A simple example of a momentum alpha is given by 动量阿尔法的一个简单例子是由
ln(yesterday's close / yesterday's open) ln(昨日收盘 / 昨日开盘)
Here it makes no difference if the prices are adjusted or not. The idea (or hope) here is that if the stock ran up (slid down) yesterday, the trend will continue today and the gains (losses) will be further increased. This is a so-called "delay-1" alpha if the intent is to trade it today (e.g., starting at the open). Generally, "delay- 1 " means that the alpha is traded on the day subsequent to the date of the most recent data used in computing it. A "delay-d" alpha is defined similarly, with counting the number of days by which the data used is out-of-sample. 在这里,价格是否调整并没有区别。这里的想法(或希望)是,如果股票昨天上涨(下跌),那么今天的趋势将会继续,收益(损失)将进一步增加。如果今天打算交易,这就是所谓的“延迟-1”阿尔法(例如,从开盘开始)。 通常,“延迟-1”意味着阿尔法在用于计算的最新数据日期之后的当天进行交易。“延迟-d”阿尔法的定义类似, 计算所用数据超出样本的天数。
In complex alphas elements of mean-reversion and momentum can be mixed, making them less distinct in this regard. However, one can think of smaller building blocks of such alphas as being based on mean-reversion or momentum. For instance, Alpha#101 in Appendix A is a delay-1 momentum alpha: if the stock runs up intraday (i.e., close open and high low), the next day one takes a long position in the stock. On the other hand, Alpha#42 in Appendix A essentially is a delay-0 mean-reversion alpha: rank(vwap - close) is lower if a stock runs up in the second half of the day (close > vwap) as opposed to sliding down (close < vwap). The denominator weights down richer stocks. The "contrarian" position is taken close to the close. 在复杂的阿尔法中,均值回归和动量的元素可以混合,使它们在这方面不那么明显。然而,可以将这些阿尔法的小构建块视为基于均值回归或动量。例如,附录 A 中的 Alpha#101 是一个延迟 1 的动量阿尔法:如果股票在盘中上涨(即,收盘 开盘和最高 最低),那么第二天就会在该股票上建立多头头寸。另一方面,附录 A 中的 Alpha#42 本质上是一个延迟 0 的均值回归阿尔法:如果股票在一天的后半段上涨(收盘 > VWAP) ,则 rank(vwap - close)较低,而不是下滑(收盘 < VWAP)。分母对更富有的股票进行加权。反向头寸在接近收盘时建立。
3. Data and Empirical Properties of Alphas 3. 阿尔法的数据和实证特性
In this section we describe empirical properties of our formulaic alphas based on data proprietary to WorldQuant LLC, which is used here with its express permission. We provide as many details as possible within the constraints of the proprietary nature of this dataset. 在本节中,我们描述了基于 WorldQuant LLC 专有数据的公式化阿尔法的经验特性,该数据在此处使用时已获得其明确许可。我们在专有数据集的限制下提供尽可能多的细节。
For our alphas we take the annualized daily Sharpe ratio , daily turnover , and cents-pershare . Let us label our alphas by the index , where is the number of alphas. For each alpha, and are defined via 对于我们的阿尔法,我们取年化日夏普比率 ,日换手率 ,以及每股分币 。我们用指数 来标记我们的阿尔法,其中 是阿尔法的数量。对于每个阿尔法, 和 通过以下方式定义:
Here: is the average daily P&L (in dollars); is the daily portfolio volatility; is the average daily shares traded (buys plus sells) by the -th alpha; is the average daily dollar volume traded; and is the total dollar investment in said alpha (the actual long plus short positions, without leverage). More precisely, the principal of is constant; however, fluctuates due to the daily P&L. So, both and are adjusted accordingly (such that is constant) in Equation (4). The period of time over which this data is collected is Jan 4, 2010-Dec 31, 2013. For the same period we also take the sample covariance matrix of the realized daily returns for our alphas. The number of observations in the time series is 1,006 , and is nonsingular. From we read off the daily return volatility and the correlation matrix (where ). Note that , and the average daily return is given by . 这里: 是平均每日盈亏(以美元计); 是每日投资组合波动率; 是第 个阿尔法的平均每日交易股数(买入加卖出); 是平均每日交易美元量; 是该阿尔法的总美元投资(实际的多头加空头头寸,不含杠杆)。更准确地说, 的本金是恒定的;然而, 由于每日盈亏而波动。因此, 和 会相应调整(使得 恒定)在方程(4)中。收集这些数据的时间段是 2010 年 1 月 4 日至 2013 年 12 月 31 日。在同一时期,我们还取样本协方差矩阵 ,用于我们阿尔法的实际每日收益。时间序列中的观察次数为 1,006,且 是非奇异的。从 我们读取每日收益波动率 和相关矩阵 (其中 )。注意 ,平均每日收益为 ,由 给出。
Table 1 and Figure 1 summarize the data for the annualized Sharpe ratio , daily turnover, , average holding period , cents-per-share , daily return volatility , annualized average daily return , and pair-wise correlations with . 表 1 和图 1 总结了年化夏普比率 、日交易量 、平均持有期 、每股分币 、日回报波动率 、年化平均日回报 以及与 的 成对相关性 。
3.1. Return v. Volatility & Turnover 3.1. 收益与波动性及换手率
We run two cross-sectional regressions, both with the intercept, of over i) as the sole explanatory variable, and ii) over and . The results are summarized in Tables 2 and 3. Consistently with [Kakushadze and Tulchinsky, 2015], we have no statistically 我们进行两次横截面回归,均包含截距,分别以 作为唯一解释变量的 和以 和 为解释变量。结果总结在表 2 和表 3 中。与 [Kakushadze 和 Tulchinsky, 2015] 一致,我们没有统计学上显著的结果。
\footnotetext{ Here the average is over the time series of the realized daily returns. 这里的平均值是基于实际每日收益的时间序列。
significant dependence on the turnover here, while the average daily return is strongly correlated with the daily return volatility and we have the scaling property (1) with . 在这里对营业额 有显著依赖,而平均每日收益 与每日收益波动 高度相关,并且我们具有缩放性质(1)与 。
3.2. Does Turnover Explain Correlations? 3.2. 营业额是否解释相关性?
If we draw a parallel between alphas and stocks, then alpha turnover is analogous to stock liquidity, which is typically measured via an average daily dollar volume (ADDV). Log of ADDV is routinely used as a style risk factor in multifactor risk models for approximating stock portfolio covariance matrix structure, whose chief goal is to model the off-diagonal elements of the covariance matrix, that is, the pair-wise correlation structure. Following this analogy, we can ask if the turnover - or more precisely its log - has explanatory power for modeling alpha correlations. It is evident that using the turnover directly (as opposed to its log) would get us nowhere due to the highly skewed (roughly log-normal) turnover distribution (see Figure 1). 如果我们将阿尔法与股票进行类比,那么阿尔法周转率类似于股票流动性,通常通过平均每日美元交易量(ADDV)来衡量。 ADDV 的对数常被用作多因子风险模型中的风格风险因子 ,用于近似股票投资组合协方差矩阵结构,其主要目标是建模协方差矩阵的非对角元素,即成对相关结构。 根据这个类比,我们可以问周转率——更准确地说是其对数——是否对建模阿尔法相关性具有解释力。 显然,直接使用周转率(与其对数相对)将毫无意义,因为周转率分布高度偏斜(大致呈对数正态分布)(见图 1)。
To answer this question, recall that in a factor model the covariance matrix is modeled via 为了回答这个问题,请回忆一下在因子模型中,协方差矩阵是通过建模的
Here: is the specific risk; is an factor loadings matrix corresponding to risk factors; and is a factor covariance matrix. In our case, we are interested in modeling the correlation matrix and ascertaining whether the turnover has explanatory power for pairwise correlations. Whether the volatility and turnover are correlated is a separate issue. 这里: 是特定风险; 是与 风险因素对应的 因子负荷矩阵; 是因子协方差矩阵。在我们的案例中,我们关注建模相关性矩阵 并确定换手率是否对成对相关性具有解释力。波动率和换手率是否相关是一个单独的问题。
So, our approach is to take one of the columns of the factor loadings matrix as . More precisely, a priori there is no reason why we should pick as opposed to , where , and is some normalization factor. To deal with this, let us normalize such that has zero cross-sectional mean, and let be the unit -vector (the intercept). Then we can construct three symmetric tensor combinations , and . Let us now define a composite index , which takes values, i.e., we pull the off-diagonal lower-triangular elements of a general 因此,我们的方法是将因子负载矩阵的其中一列设为 。更准确地说,事先没有理由选择 而不是 ,其中 , 是某个归一化因子。为了解决这个问题,让我们对 进行归一化,使得 的横截面均值为零,并让 成为单位 -向量(截距)。然后我们可以构造三个对称张量组合 和 。现在让我们定义一个复合指数 ,它取 值,即我们提取一般情况下的下三角元素。
symmetric matrix into a vector . This way we can construct four -vectors and . Now we can run a linear regression of over and . Note that is simply the intercept (the unit -vector), so this is a regression of over and with the intercept. The results are summarized in Table 4. It is evident that the linear and bilinear (in ) variables and have poor explanatory power for pair-wise correlations , while (the intercept) simply models the average correlation Mean . Recall that by construction and are orthogonal to , and these three explanatory variables are independent of each other. 对称矩阵 转换为向量 。通过这种方式,我们可以构建四个 -向量 和 。现在我们可以对 进行线性回归,使用 和 。请注意, 只是截距(单位 -向量),因此这是对 关于 和 的回归,带有截距。结果总结在表 4 中。显然,线性和双线性(在 中)变量 和 对成对相关性 的解释能力较差,而 (截距)仅仅建模平均相关性均值 。请记住,通过构造 和 与 正交,这三个解释变量彼此独立。
Let us emphasize that our conclusion does not necessarily mean the turnover adds no value in the factor model context, it only means that the turnover per se does not appear to help in modeling pair-wise alpha correlations. The above analysis does not address whether the turnover adds explanatory value to modeling variances, e.g., the specific risk. Thus, a linear regression of over (with the intercept) shows nonzero correlation between these variables (see Table 5), albeit not very strong. To see if the turnover adds value via, e.g., the specific risk requires using certain proprietary methods outside of the scope of this paper. 我们强调,我们的结论并不一定意味着在因子模型的背景下,换手率没有增加价值,它仅仅意味着换手率本身似乎并未帮助建模成对的阿尔法相关性。上述分析并未解决换手率是否为建模方差(例如,特定风险)增加了解释价值的问题。 因此,对