Zura Kakushadze 祖拉·卡库沙泽 Quantigic Solutions LLC, High Ridge Road, #135, Stamford, CT 06905 Quantigic Solutions LLC, 高岭路, #135, 斯坦福, CT 06905 Free University of Tbilisi, Business School & School of Physics 第比利斯自由大学,商学院与物理学院240, David Agmashenebeli Alley, Tbilisi, 0159, Georgia 乔治亚州,第比利斯,阿格马申贝利大街 240 号,邮政编码 0159
December 9, 2015 2015 年 12 月 9 日
"There are two kinds of people in this world: 这个世界上有两种人:
Those seeking happiness, and bullfighters." 那些追求幸福的人和斗牛士。
(Zura Kakushadze, ca. early '90s) (祖拉·卡库沙泽,大约 90 年代初)
Abstract 摘要
We present explicit formulas - that are also computer code - for 101 real-life quantitative trading alphas. Their average holding period approximately ranges 0.6-6.4 days. The average pair-wise correlation of these alphas is low, . The returns are strongly correlated with volatility, but have no significant dependence on turnover, directly confirming an earlier result based on a more indirect empirical analysis. We further find empirically that turnover has poor explanatory power for alpha correlations. 我们提供了 101 个现实生活中量化交易阿尔法的明确公式——这些公式也是计算机代码。它们的平均持有期大约在 0.6-6.4 天之间。这些阿尔法的平均成对相关性较低, 。收益与波动性高度相关,但与换手率没有显著依赖,直接证实了基于更间接实证分析的早期结果。我们进一步实证发现,换手率对阿尔法相关性的解释能力较差。
1. Introduction 1. 引言
There are two complementary - and in some sense even competing - trends in modern quantitative trading. On the one hand, more and more market participants (e.g., quantitative traders, inter alia) employ sophisticated quantitative techniques to mine alphas. This results in ever fainter and more ephemeral alphas. On the other hand, technological advances allow to essentially automate (much of) the alpha harvesting process. This yields an ever increasing number of alphas, whose count can be in hundreds of thousands and even millions, and with the exponentially increasing progress in this field will likely be in billions before we know it... 在现代量化交易中,有两种互补的——在某种意义上甚至是竞争的——趋势。一方面,越来越多的市场参与者(例如,量化交易者等)采用复杂的量化技术来挖掘阿尔法。 这导致阿尔法变得越来越微弱和短暂。另一方面,技术进步使得基本上可以自动化(大部分)阿尔法收割过程。这产生了越来越多的阿尔法,其数量可以达到数十万甚至数百万,随着这一领域的指数级进展,可能在不久的将来就会达到数十亿……
This proliferation of alphas - albeit mostly faint and ephemeral - allows combining them in a sophisticated fashion to arrive at a unified "mega-alpha". It is then this "mega-alpha" that is actually traded - as opposed to trading individual alphas - with a bonus of automatic internal crossing of trades (and thereby crucial-for-profitability savings on trading costs, etc.), alpha portfolio diversification (which hedges against any subset of alphas going bust in any given time period), and so on. One of the challenges in combining alphas is the usual "too many variables, too few observations" dilemma. Thus, the alpha sample covariance matrix is badly singular. 这种阿尔法的激增——尽管大多数是微弱和短暂的——使得可以以复杂的方式将它们结合起来,从而得出一个统一的“超级阿尔法”。实际上交易的是这个“超级阿尔法”——而不是单独交易各个阿尔法——并且有自动内部交叉交易的额外好处(从而在交易成本等方面节省了对盈利至关重要的费用)、阿尔法投资组合的多样化(对任何特定时间段内任何子集阿尔法破产的对冲)等等。结合阿尔法的挑战之一是通常的“变量过多,观察值过少”的困境。因此,阿尔法样本协方差矩阵严重奇异。
Also, naturally, quantitative trading is a secretive field and data and other information from practitioners is not readily available. This inadvertently creates an enigma around modern quant trading. E.g., with such a large number of alphas, are they not highly correlated with each other? What do these alphas look like? Are they mostly based on price and volume data, mean-reversion, momentum, etc.? How do alpha returns depend on volatility, turnover, etc.? 此外,自然地,量化交易是一个保密的领域,来自从业者的数据和其他信息并不容易获得。这无意中为现代量化交易创造了一个谜团。例如,考虑到如此大量的阿尔法,它们之间是否高度相关?这些阿尔法是什么样的?它们主要是基于价格和成交量数据、均值回归、动量等吗?阿尔法收益如何依赖于波动性、换手率等?
In a previous paper [Kakushadze and Tulchinsky, 2015] took a step in demystifying the realm of modern quantitative trading by studying some empirical properties of 4,000 real-life alphas. In this paper we take another step and present explicit formulas - that are also computer code - for 101 real-life quant trading alphas. Our formulaic alphas - albeit most are not necessarily all that "simple" - serve a purpose of giving the reader a glimpse into what some of the simpler real-life alphas look like. It also enables the reader to replicate and test these alphas on historical data and do new research and other empirical analyses. Hopefully, it further inspires (young) researchers to come up with new ideas and create their own alphas. 在之前的论文中[Kakushadze 和 Tulchinsky, 2015]通过研究 4000 个真实的阿尔法,迈出了揭开现代量化交易领域神秘面纱的一步。在本文中,我们又迈出了一步,提出了 101 个真实量化交易阿尔法的显式公式——这些公式也是计算机代码。我们的公式化阿尔法——尽管大多数并不一定“简单”——旨在让读者一窥一些较简单的真实阿尔法的样子。 这也使读者能够在历史数据上复制和测试这些阿尔法,并进行新的研究和其他实证分析。希望这能进一步激励(年轻)研究人员提出新想法并创造自己的阿尔法。
We discuss some general features of our formulaic alphas in Section 2. These alphas are mostly "price-volume" (daily close-to-close returns, open, close, high, low, volume and vwap) based, albeit "fundamental" input is used in some of the alphas, including one alpha utilizing market cap, and a number of alphas employing some kind of a binary industry classification such as GICS, BICS, NAICS, SIC, etc., which are used to industry-neutralize various quantities. 我们在第二节讨论了我们公式化阿尔法的一些一般特征。这些阿尔法主要基于“价格-成交量”(每日收盘到收盘的回报、开盘、收盘、最高、最低、成交量和加权平均价格),尽管在一些阿尔法中使用了“基本面”输入,包括一个利用市值的阿尔法,以及一些采用某种二元行业分类(如 GICS、BICS、NAICS、SIC 等)的阿尔法,这些分类用于行业中性化各种数量。
We discuss empirical properties of our alphas in Section 3 based on data for individual alpha Sharpe ratio, turnover and cents-per-share, and also on a sample covariance matrix. The average holding period approximately ranges from 0.6 to 6.4 days. The average (median) pairwise correlation of these alphas is low, (14.3%). The returns are strongly correlated with the volatility , and as in [Kakushadze and Tulchinsky, 2015] we find an empirical scaling 我们在第 3 节讨论了我们的阿尔法的经验特性,基于个别阿尔法夏普比率、换手率和每股分币的数据,以及样本协方差矩阵。平均持有期大约在 0.6 到 6.4 天之间。这些阿尔法的平均(中位数)成对相关性较低, (14.3%)。收益 与波动性 高度相关,正如[Kakushadze 和 Tulchinsky, 2015]中所发现的,我们发现了经验缩放。
with for our 101 alphas. Furthermore, we find that the returns have no significant dependence on the turnover . This is a direct confirmation of an earlier result by [Kakushadze and Tulchinsky, 2015], which is based on a more indirect empirical analysis. 与我们的 101 个阿尔法 。此外,我们发现收益与换手率 没有显著依赖关系。这直接确认了[Kakushadze 和 Tulchinsky, 2015]的早期结果,该结果基于更间接的实证分析。
We further find empirically that the turnover per se has poor explanatory power for alpha correlations. This is not to say that the turnover does not add value in, e.g., modeling the covariance matrix via a factor model. A more precise statement is that pair-wise correlations of the alphas label the alphas, ) are not highly correlated with the product , where , and is an a priori arbitrary normalization constant. 我们进一步实证发现,成交量本身对阿尔法相关性解释能力较差。这并不是说成交量在例如通过因子模型建模协方差矩阵时没有增加价值。更准确的说法是,阿尔法的成对相关性标记的阿尔法与产品的相关性不高,其中是一个先验的任意归一化常数。
We briefly conclude in Section 4. Appendix A contains our formulaic alphas with definitions of the functions, operators and input data used therein. Appendix B contains some legalese. 我们在第 4 节中简要总结。附录 A 包含我们的公式化阿尔法及其所用函数、运算符和输入数据的定义。附录 B 包含一些法律术语。
2. Formulaic Alphas 2. 公式化阿尔法
In this section we describe some general features of our 101 formulaic alphas. The alphas are proprietary to WorldQuant LLC and are used here with its express permission. We provide as many details as we possibly can within the constraints imposed by the proprietary nature of the alphas. The formulaic expressions - that are also computer code - are given in Appendix A. 在本节中,我们描述了我们 101 个公式化阿尔法的一些一般特征。这些阿尔法是 WorldQuant LLC 的专有产品,并在此获得其明确许可。我们在专有性质的限制下尽可能提供尽可能多的细节。公式化表达式——也是计算机代码——在附录 A 中给出。
Very coarsely, one can think of alpha signals as based on mean-reversion or momentum. A mean-reversion alpha has a sign opposite to the return on which it is based. E.g., a simple mean-reversion alpha is given by 非常粗略地说,可以将阿尔法信号视为基于均值回归或动量。 均值回归阿尔法的符号与其所基于的收益相反。例如,一个简单的均值回归阿尔法可以表示为
-ln(today's open / yesterday's close) -ln(今天的开盘价 / 昨天的收盘价)
Here yesterday's close is adjusted for any splits and dividends if the ex-date is today. The idea (or hope) here is that the stock will mean-revert and give back part of the gains (if today's open is higher than yesterday's close) or recoup part of the losses (if today's open is lower than yesterday's close). This is a so-called "delay-0" alpha. Generally, "delay-0" means that the time of some data (e.g., a price) used in the alpha coincides with the time during which the alpha is intended to be traded. E.g., the alpha (2) would ideally be traded at or, more realistically, as close as possible to today's open. More broadly, this can be some other time, e.g., the close. 这里昨天的收盘价已根据任何拆分和分红进行调整,如果除息日是今天。这里的想法(或希望)是股票会回归均值,并回吐部分收益(如果今天的开盘价高于昨天的收盘价)或收回部分损失(如果今天的开盘价低于昨天的收盘价)。这被称为“延迟-0”阿尔法。一般来说,“延迟-0”意味着在阿尔法中使用的某些数据(例如,价格)的时间与阿尔法计划交易的时间相吻合。例如,阿尔法(2)理想情况下应在今天的开盘价时交易,或者更现实地,尽可能接近今天的开盘价。更广泛地说,这可以是其他时间,例如收盘。
A simple example of a momentum alpha is given by 动量阿尔法的一个简单例子是由
ln(yesterday's close / yesterday's open) ln(昨日收盘 / 昨日开盘)
Here it makes no difference if the prices are adjusted or not. The idea (or hope) here is that if the stock ran up (slid down) yesterday, the trend will continue today and the gains (losses) will be further increased. This is a so-called "delay-1" alpha if the intent is to trade it today (e.g., starting at the open). Generally, "delay- 1 " means that the alpha is traded on the day subsequent to the date of the most recent data used in computing it. A "delay-d" alpha is defined similarly, with counting the number of days by which the data used is out-of-sample. 在这里,价格是否调整并没有区别。这里的想法(或希望)是,如果股票昨天上涨(下跌),那么今天的趋势将会继续,收益(损失)将进一步增加。如果今天打算交易,这就是所谓的“延迟-1”阿尔法(例如,从开盘开始)。 通常,“延迟-1”意味着阿尔法在用于计算的最新数据日期之后的当天进行交易。“延迟-d”阿尔法的定义类似, 计算所用数据超出样本的天数。
In complex alphas elements of mean-reversion and momentum can be mixed, making them less distinct in this regard. However, one can think of smaller building blocks of such alphas as being based on mean-reversion or momentum. For instance, Alpha#101 in Appendix A is a delay-1 momentum alpha: if the stock runs up intraday (i.e., close open and high low), the next day one takes a long position in the stock. On the other hand, Alpha#42 in Appendix A essentially is a delay-0 mean-reversion alpha: rank(vwap - close) is lower if a stock runs up in the second half of the day (close > vwap) as opposed to sliding down (close < vwap). The denominator weights down richer stocks. The "contrarian" position is taken close to the close. 在复杂的阿尔法中,均值回归和动量的元素可以混合,使它们在这方面不那么明显。然而,可以将这些阿尔法的小构建块视为基于均值回归或动量。例如,附录 A 中的 Alpha#101 是一个延迟 1 的动量阿尔法:如果股票在盘中上涨(即,收盘 开盘和最高 最低),那么第二天就会在该股票上建立多头头寸。另一方面,附录 A 中的 Alpha#42 本质上是一个延迟 0 的均值回归阿尔法:如果股票在一天的后半段上涨(收盘 > VWAP) ,则 rank(vwap - close)较低,而不是下滑(收盘 < VWAP)。分母对更富有的股票进行加权。反向头寸在接近收盘时建立。
3. Data and Empirical Properties of Alphas 3. 阿尔法的数据和实证特性
In this section we describe empirical properties of our formulaic alphas based on data proprietary to WorldQuant LLC, which is used here with its express permission. We provide as many details as possible within the constraints of the proprietary nature of this dataset. 在本节中,我们描述了基于 WorldQuant LLC 专有数据的公式化阿尔法的经验特性,该数据在此处使用时已获得其明确许可。我们在专有数据集的限制下提供尽可能多的细节。
For our alphas we take the annualized daily Sharpe ratio , daily turnover , and cents-pershare . Let us label our alphas by the index , where is the number of alphas. For each alpha, and are defined via 对于我们的阿尔法,我们取年化日夏普比率 ,日换手率 ,以及每股分币 。我们用指数 来标记我们的阿尔法,其中 是阿尔法的数量。对于每个阿尔法, 和 通过以下方式定义:
Here: is the average daily P&L (in dollars); is the daily portfolio volatility; is the average daily shares traded (buys plus sells) by the -th alpha; is the average daily dollar volume traded; and is the total dollar investment in said alpha (the actual long plus short positions, without leverage). More precisely, the principal of is constant; however, fluctuates due to the daily P&L. So, both and are adjusted accordingly (such that is constant) in Equation (4). The period of time over which this data is collected is Jan 4, 2010-Dec 31, 2013. For the same period we also take the sample covariance matrix of the realized daily returns for our alphas. The number of observations in the time series is 1,006 , and is nonsingular. From we read off the daily return volatility and the correlation matrix (where ). Note that , and the average daily return is given by . 这里: 是平均每日盈亏(以美元计); 是每日投资组合波动率; 是第 个阿尔法的平均每日交易股数(买入加卖出); 是平均每日交易美元量; 是该阿尔法的总美元投资(实际的多头加空头头寸,不含杠杆)。更准确地说, 的本金是恒定的;然而, 由于每日盈亏而波动。因此, 和 会相应调整(使得 恒定)在方程(4)中。收集这些数据的时间段是 2010 年 1 月 4 日至 2013 年 12 月 31 日。在同一时期,我们还取样本协方差矩阵 ,用于我们阿尔法的实际每日收益。时间序列中的观察次数为 1,006,且 是非奇异的。从 我们读取每日收益波动率 和相关矩阵 (其中 )。注意 ,平均每日收益为 ,由 给出。
Table 1 and Figure 1 summarize the data for the annualized Sharpe ratio , daily turnover, , average holding period , cents-per-share , daily return volatility , annualized average daily return , and pair-wise correlations with . 表 1 和图 1 总结了年化夏普比率 、日交易量 、平均持有期 、每股分币 、日回报波动率 、年化平均日回报 以及与 的 成对相关性 。
3.1. Return v. Volatility & Turnover 3.1. 收益与波动性及换手率
We run two cross-sectional regressions, both with the intercept, of over i) as the sole explanatory variable, and ii) over and . The results are summarized in Tables 2 and 3. Consistently with [Kakushadze and Tulchinsky, 2015], we have no statistically 我们进行两次横截面回归,均包含截距,分别以 作为唯一解释变量的 和以 和 为解释变量。结果总结在表 2 和表 3 中。与 [Kakushadze 和 Tulchinsky, 2015] 一致,我们没有统计学上显著的结果。
\footnotetext{ Here the average is over the time series of the realized daily returns. 这里的平均值是基于实际每日收益的时间序列。
significant dependence on the turnover here, while the average daily return is strongly correlated with the daily return volatility and we have the scaling property (1) with . 在这里对营业额 有显著依赖,而平均每日收益 与每日收益波动 高度相关,并且我们具有缩放性质(1)与 。
3.2. Does Turnover Explain Correlations? 3.2. 营业额是否解释相关性?
If we draw a parallel between alphas and stocks, then alpha turnover is analogous to stock liquidity, which is typically measured via an average daily dollar volume (ADDV). Log of ADDV is routinely used as a style risk factor in multifactor risk models for approximating stock portfolio covariance matrix structure, whose chief goal is to model the off-diagonal elements of the covariance matrix, that is, the pair-wise correlation structure. Following this analogy, we can ask if the turnover - or more precisely its log - has explanatory power for modeling alpha correlations. It is evident that using the turnover directly (as opposed to its log) would get us nowhere due to the highly skewed (roughly log-normal) turnover distribution (see Figure 1). 如果我们将阿尔法与股票进行类比,那么阿尔法周转率类似于股票流动性,通常通过平均每日美元交易量(ADDV)来衡量。 ADDV 的对数常被用作多因子风险模型中的风格风险因子 ,用于近似股票投资组合协方差矩阵结构,其主要目标是建模协方差矩阵的非对角元素,即成对相关结构。 根据这个类比,我们可以问周转率——更准确地说是其对数——是否对建模阿尔法相关性具有解释力。 显然,直接使用周转率(与其对数相对)将毫无意义,因为周转率分布高度偏斜(大致呈对数正态分布)(见图 1)。
To answer this question, recall that in a factor model the covariance matrix is modeled via 为了回答这个问题,请回忆一下在因子模型中,协方差矩阵是通过建模的
Here: is the specific risk; is an factor loadings matrix corresponding to risk factors; and is a factor covariance matrix. In our case, we are interested in modeling the correlation matrix and ascertaining whether the turnover has explanatory power for pairwise correlations. Whether the volatility and turnover are correlated is a separate issue. 这里: 是特定风险; 是与 风险因素对应的 因子负荷矩阵; 是因子协方差矩阵。在我们的案例中,我们关注建模相关性矩阵 并确定换手率是否对成对相关性具有解释力。波动率和换手率是否相关是一个单独的问题。
So, our approach is to take one of the columns of the factor loadings matrix as . More precisely, a priori there is no reason why we should pick as opposed to , where , and is some normalization factor. To deal with this, let us normalize such that has zero cross-sectional mean, and let be the unit -vector (the intercept). Then we can construct three symmetric tensor combinations , and . Let us now define a composite index , which takes values, i.e., we pull the off-diagonal lower-triangular elements of a general 因此,我们的方法是将因子负载矩阵的其中一列设为 。更准确地说,事先没有理由选择 而不是 ,其中 , 是某个归一化因子。为了解决这个问题,让我们对 进行归一化,使得 的横截面均值为零,并让 成为单位 -向量(截距)。然后我们可以构造三个对称张量组合 和 。现在让我们定义一个复合指数 ,它取 值,即我们提取一般情况下的下三角元素。
symmetric matrix into a vector . This way we can construct four -vectors and . Now we can run a linear regression of over and . Note that is simply the intercept (the unit -vector), so this is a regression of over and with the intercept. The results are summarized in Table 4. It is evident that the linear and bilinear (in ) variables and have poor explanatory power for pair-wise correlations , while (the intercept) simply models the average correlation Mean . Recall that by construction and are orthogonal to , and these three explanatory variables are independent of each other. 对称矩阵 转换为向量 。通过这种方式,我们可以构建四个 -向量 和 。现在我们可以对 进行线性回归,使用 和 。请注意, 只是截距(单位 -向量),因此这是对 关于 和 的回归,带有截距。结果总结在表 4 中。显然,线性和双线性(在 中)变量 和 对成对相关性 的解释能力较差,而 (截距)仅仅建模平均相关性均值 。请记住,通过构造 和 与 正交,这三个解释变量彼此独立。
Let us emphasize that our conclusion does not necessarily mean the turnover adds no value in the factor model context, it only means that the turnover per se does not appear to help in modeling pair-wise alpha correlations. The above analysis does not address whether the turnover adds explanatory value to modeling variances, e.g., the specific risk. Thus, a linear regression of over (with the intercept) shows nonzero correlation between these variables (see Table 5), albeit not very strong. To see if the turnover adds value via, e.g., the specific risk requires using certain proprietary methods outside of the scope of this paper. 我们强调,我们的结论并不一定意味着在因子模型的背景下,换手率没有增加价值,它仅仅意味着换手率本身似乎并未帮助建模成对的阿尔法相关性。上述分析并未解决换手率是否为建模方差(例如,特定风险)增加了解释价值的问题。 因此,对 与 (带截距)的线性回归显示这些变量之间存在非零相关性(见表 5),尽管相关性并不强。要查看换手率是否通过特定风险等方式增加价值,需要使用某些专有方法,这超出了本文的范围。
4. Conclusions 4. 结论
We emphasize that the 101 alphas we present here are not "toy" alphas but real-life trading alphas used in production. In fact, 80 of these alphas are in production as of this writing. To our knowledge, this is the first time such a large number of real-life explicit formulaic alphas appear in the literature. This should come as no surprise: naturally, quant trading is highly proprietary and secretive. Our goal here is to provide a glimpse into the complex world of modern and ever-evolving quantitative trading and help demystify it, to any degree possible. 我们强调,这里呈现的 101 个阿尔法并不是“玩具”阿尔法,而是真实的交易阿尔法,已在生产中使用。事实上,截至目前,这 80 个阿尔法已经投入生产。 据我们所知,这是文献中首次出现如此大量的真实明确公式化阿尔法。这并不令人惊讶:量化交易自然是高度专有和保密的。我们在这里的目标是提供一个现代且不断发展的量化交易复杂世界的瞥见,并尽可能帮助揭开其神秘面纱。
Technological advances nowadays allow automation of alpha mining. Quantitative trading alphas are by far the most numerous of available trading signals that can be turned into trading strategies/portfolios. There are myriad permutations of individual stock holdings in a (dollarneutral) portfolio of, e.g., 2,000 most liquid U.S. stocks that can result in a positive return on high- and mid-frequency time horizons. In addition, many of these alphas are ephemeral and their universe is very fluid. It takes quantitatively sophisticated, technologically well-endowed and ever-adapting trading operations to mine hundreds of thousands, millions and even billions of alphas and combine them into a unified "mega-alpha", which is then traded with an added bonus of sizeable savings on execution costs due to automatic internal crossing of trades. 如今的技术进步使得阿尔法挖掘的自动化成为可能。量化交易阿尔法是可转化为交易策略/投资组合的可用交易信号中数量最多的。目前有无数种个股持仓的排列组合,例如,在一个(美元中性)投资组合中,包含 2,000 只流动性最强的美国股票,这些组合可以在高频和中频时间范围内实现正收益。此外,许多阿尔法是短暂的,其范围非常流动。挖掘数十万、数百万甚至数十亿个阿尔法并将其组合成一个统一的“超级阿尔法”需要量化上高度复杂、技术上装备精良且不断适应的交易操作,这样交易时还可以因自动内部交叉交易而节省大量执行成本。
In this spirit, we end this paper with an 1832 poem by a Russian poet Mikhail Lermontov (translation from Russian by Zura Kakushadze, ca. 1993): 在这种精神下,我们以俄罗斯诗人米哈伊尔·莱蒙托夫(Mikhail Lermontov)的一首 1832 年的诗结束本文(由祖拉·卡库沙泽(Zura Kakushadze)翻译,约 1993 年):
The Sail 帆
A lonely sail seeming white, In misty haze mid blue sea, Be foreign gale seeking might? 孤独的帆似乎是白色的,在蓝色海洋中的雾霭中,是否是外来的狂风在寻求力量?
Why home bays did it flee? 为什么家居湾会逃离?
The sail's bending mast is creaking, The wind and waves blast ahead, It isn't happiness it's seeking, Nor is it happiness it's fled! 帆的弯曲桅杆在吱吱作响,风和波浪在前方冲击,它并不是在寻求快乐,也不是在逃避快乐!
Beneath are running ázure streams, Above are shining golden beams, But wishing storms the sail seems, As if in storms is peace it deems. 在下面流淌着碧蓝的溪流, 上面闪耀着金色的光芒, 但愿风暴似乎在帆上, 仿佛在风暴中认为有宁静。
Appendix A: Formulaic Alphas 附录 A:公式化阿尔法
In this appendix, in Subsection A.1, we provide our 101 formulaic alphas. The formulas are also code once the functions and operators are defined. The functions and operators used in the alphas are defined in Subsection A.2. The input data is elaborated upon in Subsection A.3. 在本附录中,在子节 A.1 中,我们提供了我们的 101 个公式化阿尔法。这些公式在函数和运算符定义后也可以视为代码。阿尔法中使用的函数和运算符在子节 A.2 中定义。输入数据在子节 A.3 中进行了详细说明。
A.1. Formulaic Expressions for Alphas A.1. 阿尔法的公式表达式
(Below " {} " stands for a placeholder. All expressions are case insensitive.) 请提供需要翻译的文本。 standard definitions; same for the operators "+", "-", "*", "/", ">", "<", "==", "||", "x ? y : z" 标准定义;对于操作符“+”、“-”、“*”、“/”、“>”、“<”、“==”、“||”、“x ? y : z”相同 cross-sectional rank 横截面排名 value of days ago 天前的价值
correlation time-serial correlation of and for the past days 相关性 和 在过去 天的时间序列相关性
covariance time-serial covariance of and for the past days 协方差 和 在过去 天的时间序列协方差 rescaled such that the default is ) 重新缩放 使得 默认值为 )
delta today's value of minus the value of days ago delta 今天的 减去 天前的值
signedpower 签署权
decay_linear weighted moving average over the past days with linearly decaying weights (rescaled to sum up to 1 ) 衰减线性 加权移动平均过去 天的线性衰减权重 (重新缩放以总和为 1)
indneutralize cross-sectionally neutralized against groups g (subindustries, industries, sectors, etc.), i.e., is cross-sectionally demeaned within each group indneutralize 在各组 g(子行业、行业、部门等)中进行横截面中和,即 在每个组 内进行横截面去均值处理。
ts_{O}(x, d) = operator applied across the time-series for the past d days; non-integer number of days is converted to floor(d) ts_{O}(x, d) = 操作符 应用于过去 d 天的时间序列;非整数天数 被转换为 floor(d)
ts_min time-series over the past d days ts_min 时间序列 在过去的 d 天内
ts_max time-series max over the past d days ts_max 在过去 d 天内的时间序列最大值
ts_argmax which day ts_max occurred on ts_argmax 哪一天发生了 ts_max
ts_argmin which day ts_min occurred on ts_argmin 哪一天发生了 ts_min
ts_rank(x, d) = time-series rank in the past d days ts_rank(x, d) = 在过去 d 天的时间序列排名 ts_min ts_max time-series sum over the past d days 时间序列在过去 d 天的总和
product( time-series product over the past d days 产品( 过去 d 天的时间序列产品) moving time-series standard deviation over the past d days 过去 d 天的 移动时间序列标准差
A.2. Input Data A.2. 输入数据
returns daily close-to-close returns 返回 日收盘到收盘收益率
open, close, high, low, volume = standard definitions for daily price and volume data 开盘、收盘、最高、最低、成交量 = 日价格和成交量数据的标准定义
vwap = daily volume-weighted average price vwap = 日均成交量加权平均价格
cap market cap 市值 市值 average daily dollar volume for the past days 过去 天的平均每日美元交易量为
IndClass = a generic placeholder for a binary industry classification such as GICS, BICS, NAICS, SIC, etc., in indneutralize(x, IndClass.level), where level = sector, industry, subindustry, etc. Multiple IndClass in the same alpha need not correspond to the same industry classification. IndClass = 一个通用的占位符,用于二元行业分类,如 GICS、BICS、NAICS、SIC 等,在 indneutralize(x, IndClass.level) 中,其中 level = 行业、子行业等。同一 alpha 中的多个 IndClass 不必对应于相同的行业分类。
Appendix B: Disclaimer 附录 B:免责声明
Wherever the context so requires, the masculine gender includes the feminine and/or neuter, and the singular form includes the plural and vice versa. The authors of this paper ("Authors") and their affiliates including without limitation Quantigic Solutions LLC ("Authors' Affiliates" or "their Affiliates") make no implied or express warranties or any other representations whatsoever, including without limitation implied warranties of merchantability and fitness for a particular purpose, in connection with or with regard to the content of this paper including without limitation any formulae, code or algorithms contained herein ("Content"). 在任何需要的情况下,阳性别包括阴性和/或中性,单数形式包括复数,反之亦然。本文的作者(“作者”)及其附属机构,包括但不限于 Quantigic Solutions LLC(“作者的附属机构”或“他们的附属机构”),对与本文内容相关的任何内容,包括但不限于其中包含的任何公式、代码或算法(“内容”),不作任何明示或暗示的担保或其他任何陈述,包括但不限于对适销性和特定用途适用性的暗示担保。
The reader may use the Content solely at his/her/its own risk and the reader shall have no claims whatsoever against the Authors or their Affiliates and the Authors and their Affiliates shall have no liability whatsoever to the reader or any third party whatsoever for any loss, expense, opportunity cost, damages or any other adverse effects whatsoever relating to or arising from the use of the Content by the reader including without any limitation whatsoever: any direct, indirect, incidental, special, consequential or any other damages incurred by the reader, however caused and under any theory of liability; any loss of profit (whether incurred directly or indirectly), any loss of goodwill or reputation, any loss of data suffered, cost of procurement of substitute goods or services, or any other tangible or intangible loss; any reliance placed by the reader on the completeness, accuracy or existence of the Content or any other effect of using the Content; and any and all other adversities or negative effects the reader might encounter in using the Content irrespective of whether the Authors or their Affiliates are or should have been aware of such adversities or negative effects. 读者仅可自行承担风险使用内容,读者对作者或其附属机构不应提出任何索赔,作者及其附属机构对读者或任何第三方因读者使用内容而导致的任何损失、费用、机会成本、损害或其他不利影响不承担任何责任,包括但不限于:读者因任何原因和在任何责任理论下遭受的任何直接、间接、附带、特殊、后果性或其他损害;任何利润损失(无论是直接还是间接产生的),任何商誉或声誉损失,任何数据损失,替代商品或服务的采购成本,或任何其他有形或无形的损失;读者对内容的完整性、准确性或存在性所依赖的任何损失或使用内容的其他影响;以及读者在使用内容时可能遇到的任何及所有其他不利或负面影响,无论作者或其附属机构是否意识到或应当意识到这些不利或负面影响。
The formulae and code included in Appendix A hereof are provided herein with the express permission of WorldQuant LLC. WorldQuant LLC retains all rights, title and interest in and to the formulae and code included in Appendix A hereof and any and all copyrights therefor. 附录 A 中包含的公式和代码在此经 WorldQuant LLC 的明确许可提供。WorldQuant LLC 保留附录 A 中包含的公式和代码及其所有版权的所有权、所有权利和利益。
References 参考文献
Avellaneda, M. and Lee, J.H. "Statistical arbitrage in the U.S. equity market." Quantitative Finance 10(7) (2010), pp. 761-782. 阿维拉内达,M. 和 李,J.H. "美国股票市场的统计套利。" 定量金融 10(7) (2010),页 761-782。
Grinold, R.C. and Kahn, R.N. "Active Portfolio Management." New York, NY: McGraw-Hill, 2000. Grinold, R.C. 和 Kahn, R.N. "主动投资组合管理." 纽约, NY: 麦格劳-希尔, 2000.
Jegadeesh, N. and Titman, S. "Returns to buying winners and selling losers: Implications for stock market efficiency." Journal of Finance 48(1) (1993), pp. 65-91. Jegadeesh, N. 和 Titman, S. "购买赢家和出售输家的回报:对股票市场效率的影响。" 《金融学杂志》 48(1) (1993), pp. 65-91.
Kakushadze, Z. "Factor Models for Alpha Streams." The Journal of Investment Strategies 4(1) (2014), pp. 83-109. 卡库沙泽,Z. "阿尔法流的因子模型。" 投资策略杂志 4(1) (2014),第 83-109 页。
Kakushadze, Z. and Tulchinsky, I. "Performance v. Turnover: A Story by 4,000 Alphas." Journal of Investment Strategies (forthcoming). Available online: http://ssrn.com/abstract=2657603 (September 7, 2015). Kakushadze, Z. 和 Tulchinsky, I. "表现与周转:4,000 个阿尔法的故事。" 投资策略期刊(即将出版)。在线可用:http://ssrn.com/abstract=2657603(2015 年 9 月 7 日)。
Pastor, L. and Stambaugh, R.F. "Liquidity Risk and Expected Stock Returns." The Journal of Political Economy 111(3) (2003), pp. 642-685. 帕斯特,L. 和 斯坦博,R.F. "流动性风险与预期股票收益。" 《政治经济学杂志》 111(3) (2003),第 642-685 页。
Tulchinsky, I. et al. "Finding Alphas: A Quantitative Approach to Building Trading Strategies." New York, NY: Wiley, 2015. 图尔钦斯基, I. 等. "寻找阿尔法:构建交易策略的定量方法." 纽约, NY: Wiley, 2015.
Tables 表格
Quantity 数量
Minimum 最小值
1st Quartile 第一四分位数
Median 中位数
Mean 平均
3rd Quartile 第三四分位数
Maximum 最大
1.238
1.929
2.224
2.265
2.498
4.162
0.1571
0.3429
0.4752
0.5456
0.6474
1.604
0.6235
1.545
2.104
2.391
2.916
6.365
0.1324
0.3125
0.3969
0.4814
0.5073
2.031
0.9318
1.194
1.395
1.747
2.019
10.44
3.285
4.4
5.441
6.015
6.296
28.72
-15.09
7.457
14.31
15.86
22.91
87.33
Table 1. Summary (using the R function summary ( ) ) for the annualized Sharpe ratio , daily turnover, , average holding period , cents-per-share , daily return volatility , annualized average daily return , and pair-wise correlations with (see Section 3). The performance figures are exclusive of any trading or transaction costs, price impact, etc. 表 1. 年化夏普比率 、日交易量 、平均持有期 、每股分币 、日回报波动率 、年化平均日回报 以及与 的成对相关性 的总结(使用 R 函数 summary())(见第 3 节)。绩效数据不包括任何交易或交易成本、价格影响等。
Estimate 估计
Standard error 标准误差
t-statistic t-统计量
Overall 整体
Intercept 拦截
-3.509
0.295
-11.88
0.761
0.046
16.65
Mult./Adj. R-squared 多重/调整后的 R 平方
F-statistic F 统计量
277.2
Table 2. Summary (using the R function summary ( ) ) ) for the cross-sectional regression of over with the intercept. See Subsection 3.1 for details. Also see Figure 2. 表 2. 摘要(使用 R 函数 summary ( ))对 与 的横截面回归(包含截距)。详细信息请参见第 3.1 小节。另见图 2。
Estimate 估计
Standard error 标准误差
t-statistic t-统计量
Overall 整体
Intercept 拦截
-3.435
0.324
-10.60
0.775
0.052
14.84
-0.023
0.040
-0.57
Mult./Adj. R-squared 多重/调整后的 R 平方
F-statistic F 统计量
137.8
Table 3. Summary for the cross-sectional regression of over and with the intercept. See Subsection 3.1 for details. 表 3. 带截距的 和 的横截面回归 的总结。有关详细信息,请参见第 3.1 小节。
Estimate 估计
Standard error 标准误差
t-statistic t-统计量
Overall 整体
Intercept 拦截
0.1587
0.0017
95.18
0.0067
0.0023
2.907
0.0474
0.0063
7.537
Mult./Adj. R-squared 多重/调整后的 R 平方
F-statistic F 统计量
32.55
Table 4. Summary for the cross-sectional regression of over and with the intercept. See Subsection 3.2 for details. Also see Figure 3. 表 4. 关于 和 的截面回归 的摘要,包括截距。详细信息请参见第 3.2 小节。另见图 3。
Estimate 估计
Standard error 标准误差
t-statistic t-统计量
Overall 整体
Intercept 拦截
-6.174
0.062
-100.1
0.368
0.068
5.412
Mult./Adj. R-squared 多重/调整后的 R 平方
F-statistic F 统计量
29.29
Table 5. Summary for the cross-sectional regression of over with the intercept. See Subsection 3.2 for details. Also see Figure 4. 表 5. 带截距的 上 的横截面回归摘要。有关详细信息,请参见第 3.2 小节。另请参见图 4。
Figures 数字
Figure 1. Density (using the R function density ()) plots for the annualized Sharpe ratio , daily turnover, , cents-per-share , daily return volatility , annualized average daily return , and pair-wise correlations with (see Table 1 and Section 3). The "extreme" outliers in and are due to the delay-0 alphas (see Section 2). 图 1. 年化夏普比率 、日交易量 、每股分币 、日回报波动率 、年化平均日回报 和成对相关性 的密度(使用 R 函数 density())图,与 (见表 1 和第 3 节)。 和 中的“极端”异常值是由于延迟 0 的阿尔法(见第 2 节)。
Figure 2. Horizontal axis: ; vertical axis: . The dots represent the data points. The straight line plots the linear regression fit . See Table 2. 图 2。横轴: ;纵轴: 。点表示数据点。直线绘制了线性回归拟合 。见表 2。
Figure 3. Horizontal axis: ; vertical axis: . See Table 4 and Subsection 3.2. The numeric coefficients are the regression coefficients in Table 4. 图 3。横轴: ;纵轴: 。见表 4 和 3.2 节。数值系数是表 4 中的回归系数。
Figure 4. Horizontal axis: ; vertical axis: . The dots represent the data points. The straight line plots the linear regression fit . See Table 5 . 图 4。横轴: ;纵轴: 。点表示数据点。直线绘制了线性回归拟合 。见表 5。
Zura Kakushadze, Ph.D., is the President and a Co-Founder of Quantigic Solutions LLC and a Full Professor in the Business School and the School of Physics at Free University of Tbilisi. Email: zura@quantigic.com 祖拉·卡库沙泽,博士,是 Quantigic Solutions LLC 的总裁和联合创始人,同时也是第比利斯自由大学商学院和物理学院的全职教授。电子邮件:zura@quantigic.com DISCLAIMER: This address is used by the corresponding author for no purpose other than to indicate his professional affiliation as is customary in publications. In particular, the contents of this paper are not intended as an investment, legal, tax or any other such advice, and in no way represent views of Quantigic Solutions LLC, the website www.quantigic.com or any of their other affiliates. 免责声明:此地址仅用于通讯作者表明其专业隶属关系,符合出版惯例。特别是,本文内容并不构成投资、法律、税务或任何其他此类建议,且绝不代表 Quantigic Solutions LLC、网站 www.quantigic.com 或其任何其他附属机构的观点。 Paraphrasing Blondie's (Clint Eastwood) one-liners from a great 1966 motion picture The Good, the Bad and the Ugly (directed by Sergio Leone). 改编布朗迪(克林特·伊斯特伍德)在 1966 年经典电影《黄金三镖客》(导演:塞尔吉奥·莱昂内)中的经典台词。
"An alpha is a combination of mathematical expressions, computer source code, and configuration parameters that can be used, in combination with historical data, to make predictions about future movements of various financial instruments." [Tulchinsky et al, 2015] Here "alpha" - following the common trader lingo - generally means any reasonable "expected return" that one may wish to trade on and is not necessarily the same as the "academic" alpha. In practice, often the detailed information about how alphas are constructed may even not be available, e.g., the only data available could be the position data, so "alpha" then is a set of instructions to achieve certain stock (or other instrument) holdings by some times (e.g., a tickers by holdings matrix for each ). “阿尔法是数学表达式、计算机源代码和配置参数的组合,可以与历史数据结合,用于预测各种金融工具的未来走势。” [Tulchinsky et al, 2015] 在这里,“阿尔法” - 根据常见的交易术语 - 通常指任何合理的“预期回报”,交易者可能希望以此进行交易,并不一定与“学术”阿尔法相同。在实践中,关于阿尔法构建的详细信息往往可能不可用,例如,唯一可用的数据可能是持仓数据,因此“阿尔法”则是一组指令,用于在某些时间 实现特定的股票(或其他工具)持有量 的持仓矩阵。 We picked these alphas largely based on simplicity considerations, so they can be presented within the inherent limitations of a paper. There also exist myriad other, "non-formulaic" (coded and too-complex-to-present) alphas. 我们选择这些阿尔法主要基于简单性考虑,因此它们可以在论文的固有限制内呈现。还有许多其他“非公式化”(编码且过于复杂以至于无法呈现)的阿尔法。
More precisely, depending on the alpha and industry classification used, neutralization can be w.r.t. sectors, industries, subindustries, etc. - different classifications use different nomenclature for levels of similar granularity. 更准确地说,根据所使用的阿尔法和行业分类,中性化可以针对部门、行业、子行业等 - 不同的分类使用不同的术语来表示相似粒度的级别。 In [Kakushadze and Tulchinsky, 2015] the alpha return volatility was not directly available and was estimated indirectly based on the Sharpe ratio, cents-per-share and turnover data. Here we use direct realized volatility data. 在[Kakushadze 和 Tulchinsky, 2015]中,alpha 收益波动性并未直接提供,而是基于夏普比率、每股分币和周转率数据间接估算。在这里,我们使用直接实现的波动性数据。 Depending on a construction, a priori the turnover might add value via the specific (idiosyncratic) risk for alphas. 根据构造,事先来看,营业额可能通过特定(特有)风险为阿尔法增值。 Here we use log of the turnover as opposed to the turnover itself as the latter has a skewed, roughly log-normal distribution, while pair-wise correlations take values in (in fact, their distribution is tighter - see below). 在这里,我们使用营业额的对数而不是营业额本身,因为后者具有偏斜的、近似对数正态分布,而成对相关性取值在 (实际上,它们的分布更紧密 - 见下文)。
On longer horizons, for a discussion of mean-reversion (contrarian) and momentum (trend following) strategies, see, e.g., [Avellanida and Lee, 2010] and [Jegadeesh and Titman, 1993], respectively, and references therein. 在更长的时间范围内,关于均值回归(逆向)和动量(趋势跟随)策略的讨论,请参见,例如,[Avellanida 和 Lee, 2010] 和 [Jegadeesh 和 Titman, 1993],以及其中的参考文献。 Four of our 101 alphas in Appendix A, namely, the alphas numbered 42, 48, 53 and 54, are delay-0 alphas. They are assumed to be traded at or as close as possible to the close of the trading day for which they are computed. 在附录 A 中,我们的 101 个阿尔法中,有四个,即编号为 42、48、53 和 54 的阿尔法,是延迟 0 阿尔法。假设它们在计算的交易日收盘时或尽可能接近收盘时进行交易。 On the other hand, if the alpha (3) is executed as close as possible to yesterday's close, then it is delay-0. 另一方面,如果 alpha (3) 尽可能接近昨天的收盘价执行,则它是延迟 0。 Here "vwap", as usual, stands for "volume-weighted average price". 在这里,“vwap”一如既往地代表“成交量加权平均价格”。
Perhaps a more precise analogy would be between the turnover and the ratio of ADDV and market cap; however, this is not going to be critical for our purposes here. 或许一个更精确的类比是营业额与 ADDV 和市值的比率;然而,这对我们在这里的目的并不是至关重要的。 For liquidity as a style risk factor, see, e.g., [Pastor and Stambaugh, 2003] and references therein. 关于流动性作为风格风险因素,参见,例如,[Pastor 和 Stambaugh, 2003] 及其中的参考文献。 See, e.g., [Grinold and Kahn, 2000] and references therein. 参见,例如,[Grinold 和 Kahn, 2000] 及其中的参考文献。 Variances are relatively stable and can be computed based on historical data (sample variances). It is the offdiagonal elements of the sample covariance matrix - to wit, the correlations - that are out-of-sample unstable. 方差相对稳定,可以基于历史数据(样本方差)进行计算。样本协方差矩阵的非对角元素——即相关性——是超出样本不稳定的。 Log of the turnover as a factor for risk models for alpha portfolios was suggested in [Kakushadze, 2014]. 作为阿尔法投资组合风险模型因素的成交量日志在[Kakushadze, 2014]中被建议。
Suppressing alpha weights by the turnover can add value but be highly correlated with volatility suppression. 通过周转抑制阿尔法权重可以增加价值,但与波动性抑制高度相关。 Roughly speaking, when the specific risk is computed via nontrivial (proprietary) methods, the column in the factor loadings matrix corresponding to the turnover is no longer proportional to but is a more complex function of the turnover, the specific risk also depends on the turnover nontrivially and is not quadratic in . 大致来说,当通过非平凡(专有)方法计算特定风险时,因子负载矩阵中与周转率对应的列不再与 成正比,而是周转率的一个更复杂的函数,特定风险也非平凡地依赖于周转率,并且在 中不是二次的。 For proprietary reasons, we are not at liberty to state precisely which ones. 出于专有原因,我们无法准确说明具体是哪些。