Plotting time trends (write pseudo code)
绘制时间趋势(编写伪代码)
10.0p 10.0 页1
You have loaded a dataset of annual fundamental company data into memory. The dataset consists of the following variables (columns): 您已将年度基本面公司数据的数据集加载到内存中。数据集由以下变量(列)组成:
gvkey: firm identifier gvkey:公司标识符
assets_total: total book assets (in nominal terms) assets_total:账面总资产(名义价值)
year: fiscal year year:会计年度
cpi: Consumer Price Index (CPI) for the given year CPI:指定年份的消费者价格指数 (CPI)
cpi2020: CPI for 2020 CPI 2020:2020 年 CPI
Your goal is to plot the time trend in inflation-adjusted firm size over time. Describe (in pseudo-code) the steps you would take to: 您的目标是绘制经通胀调整后的公司规模随时间变化的时间趋势。描述(以伪代码形式)您将采取的步骤:
Process the data to account for inflation. 处理数据以考虑通货膨胀。
Summarize firm size for each year (e.g., by averaging or aggregating values). 汇总每年的公司规模(例如,通过平均或聚合值)。
Generate the desired plot. 生成所需的绘图。
0 words 0 字
Joining datasets I (write pseudo code)
连接数据集 I (编写伪代码)
10.0p 10.0 页2
You are given two datasets: 您将获得两个数据集:
Annual Company Data: Contains company data organized by company identifier (gvkey) and fiscal year (year). 年度公司数据:包含按公司标识符 ( gvkey ) 和会计年度 ( ) 组织的公司数据 year 。
Your goal is to add the cpi column from the second dataset to the first dataset. To do this, you need to align the annual data with the appropriate monthly inflation data. For simplicity, assume you should match each year in the first dataset to the cpi of December for the same year in the second dataset. 您的目标是将第二个数据集中的 cpi 列添加到第一个数据集。为此,您需要将年度数据与适当的月度通货膨胀数据保持一致。为简单起见,假设您应该将第一个数据集 year 中的每个数据集与 cpi 第二个数据集中同一年的 12 月进行匹配。
Describe (in pseudo-code) the steps you would take to: 描述(以伪代码形式)您将采取的步骤:
Align the two datasets. 对齐两个数据集。
Perform the join to retain all rows and columns of the first dataset while adding the cpi column. 执行联接以在添加 cpi 列时保留第一个数据集的所有行和列。
0 words 0 字
Joining datasets II (interpret code)
联接数据集 II(解释代码)
10.0p 10.0 页3
You are given two datasets: 您将获得两个数据集:
Dataset 1 contains two columns: 数据集 1 包含两列:
ead: The earnings announcement date (a date variable, e.g., YYYY-MM). ead :收益公告日期(日期变量,例如 YYYY-MM )。
permno: The stock identifier. permno :股票标识符。 Each ead-permno combination appears only once. 每个 ead-permno 组合只出现一次。
Dataset 2 contains three columns: 数据集 2 包含三列:
permno: The stock identifier. permno :股票标识符。
date: The calendar day (a date variable, e.g.,YYYY-MM). date :日历日(日期变量,例如 YYYY-MM )。
prc: The closing price of the stock on a given day. prc :股票在给定日期的收盘价。 Each permno-date combination appears only once. 每个 permno-date 组合只出现一次。
You are also provided the following code: 此外,还为您提供了以下代码:
use dataset1 使用 Dataset1
generate date = ead - 7 生成日期 = EAD - 7 joinby permno date using dataset2, unmatched(master) joinby 使用 dataset2 的 Permno Date, unmatched(master) tab _merge 选项卡 _merge drop _merge count if prc != . count 如果 prc != . replace date = date-1 替换日期 = date-1 joinby permno date using dataset2, unmatched(master) update joinby 使用 dataset2 的 Permno Date,unmatched(master) 更新 tab _merge 选项卡 _merge drop _merge count if prc!=. count if prc!=。
Tasks: 任务:
In your own words, describe the purpose of this code. What is the goal of these operations? 用你自己的话描述此代码的用途。这些作的目标是什么?
Explain what happens in each step of the code and how it contributes to the overall goal. 解释代码的每个步骤中发生的情况以及它如何有助于实现总体目标。
You are given a panel dataset of monthly stock data with the following variables: 您将获得一个包含以下变量的月度股票数据的面板数据集:
permno: the stock identifier permno:股票标识符
month: the calendar month, e.g., YYYY-MM month:日历月,例如 YYYY-MM
ret: the monthly stock return in decimal format (e.g., 0.05 for 5%) RET:十进制格式的每月股票回报率(例如,5% 为 0.05%)
beme: the book-to-market ratio of the stock measured at the end of the previous month BEME:上个月末衡量的股票账面市值比率
mktcap: the market capitalization of the stock measured at the end of the previous month MKTcap:上个月月底衡量的股票市值
mktrf: the monthly market return minus the monthly risk-free rate (excess market return) MKTRF:每月市场回报减去每月无风险利率(超额市场回报)
You are also provided the following code: 此外,还为您提供了以下代码:
xtile2 beme10 = beme, nquantiles(10) by(month) xtile2 beme10 = beme, nquantiles(10) by(月) bysort month beme10: egen ret_vw = wtmean(ret), weight(mktcap) 按排序月份 BEME10:EGen ret_vw = WTMam(ret)、权重(MKTCAP) duplicates drop month beme10, force 重复 下降 月份 beme10, 强度 forvalues i = 1/10 { for值 i = 1/10 { reg ret_vw mktrf if beme10==`i' reg ret_vw mktrf 如果 beme10=='i' outreg2 using "temp.txt", stats(coef tstat) aster(coef) bdec(3) tdec(2) label append outreg2 using “temp.txt”, stats(coef tstat) aster(coef) bdec(3) tdec(2) label append }
Tasks: 任务:
Purpose of the Code: 守则的目的:
In your own words, describe the purpose of this code. What is the goal of these operations? Why might this analysis be useful in a financial context? 用你自己的话描述此代码的用途。这些作的目标是什么?为什么这种分析在金融环境中可能有用?
Step-by-Step Explanation: 步骤说明:
Explain what happens in each step of the code and how it contributes to the overall goal. Be sure to address: 解释代码的每个步骤中发生的情况以及它如何有助于实现总体目标。请务必解决:
The role of xtile2 and how the dataset is segmented by deciles. 数据集的作用 xtile2 以及如何按十分位数对数据集进行分段。
How the weighted mean is computed and used. 如何计算和使用加权平均值。
The purpose of dropping duplicates. 删除重复项的目的。
The regression analysis for each decile and the generation of output in "temp.txt." 每个十分位数的回归分析和 “temp.txt” 中输出的生成。
You are given a dataset of daily stock returns (named crsp_stocks_2000_2024) with the following variables (columns): 您将获得一个每日股票回报数据集(名为 crsp_stocks_2000_2024 ),其中包含以下变量(列):
permno: The stock identifier. permno:股票标识符。
date: The calendar day (e.g., "2024-01-15"). date:自然日(例如,“2024-01-15”)。
ret: The stock return for the given stock and calendar day. ret:给定股票和日历日的股票回报。
Write the few lines of code to: 将几行代码编写为:
Load the dataset into memory. 将数据集加载到内存中。
Winsorize or truncate (your choice) the variable ret at the 0.5% level symmetrically. 对称地缩窄或截断 (您的选择) 0.5% 级别的变量 ret 。
Plot a histogram of the winsorized returns. 绘制缩孔返回的直方图。
You are given a dataset similar to the one described in the question above: a dataset of daily stock data (named crsp_stocks_2000_2024). Instead of the dataset from the previous question, this dataset does not contain daily returns, but end-of-day prices. That is, it has the following variables (columns):
permno: the stock identifier
date: the calendar day, e.g. (e.g., "2024-01-15")
prc: the end-of-day stock price for the given stock and calendar day
Your goal is to create daily stock returns from prices according to the formula rit=pi,t−1pi,t−1 where:
i is the stock identifier,
t indexes the calendar day,
p is the end-of-day price, and
r is the daily stock return.
Write the few lines of code that achieve this goal and ensure the result includes a new column for returns named ret.
You have loaded a panel dataset of annual accounting data for U.S. corporations. The dataset contains the following variables:
gvkey: The company identifier.
year: The fiscal year (an integer variable, e.g. 2005.
at: The total book asset of the company for that company and fiscal year.
Each gvkey-year combination appears only once.
You are also given the following code that can be run on this dataset: g isMissing = (missing(at) & year == 1993)
bysort gvkey: egen isDropped = total(isMissing) drop if isDropped == 1 drop isDropped
What is the goal of these operations and how does each line contribute the overall goal? Choose the correct answers from the answers provided below. More that one answer can be correct.
check_box_outline_blank
The first line of the code is adding a variable to the dataset that can take the values 0, 1 or NA.
check_box_outline_blank
The overall goal of the code is to remove companies from the dataset that have any missing observations on total book assets.
check_box_outline_blank
The overall goal of the code is to remove all companies from the dataset that have a missing observation on total book assets in 1993.
check_box_outline_blank
The second line of the code is adding a variable to the dataset that takes the value 1 if the company had missing total assets in year 1993, and 0 otherwise.
check_box_outline_blank
The second line of the code is adding a variable to the dataset that takes the value 1 if the company had missing total assets in any fiscal year, and 0 otherwise.
check_box_outline_blank
The first line of the code is adding a variable to the dataset that can take the values 0 or 1.
check_box_outline_blank
The second line of the code is adding a variable to the dataset that contains the number of fiscal years for which the company had missing total assets.
check_box_outline_blank
The first line of the code is adding a numeric variable to the dataset that can take any value in the interval [0,1].
check_box_outline_blank
The overall goal of the code is to remove all observations from the dataset.
Construct cumulative returns (write pseudo code)
10.0p8
You are given a panel dataset with stock returns for many different stocks at monthly frequency. The variables are:
permo: the stock identifier
month: the calendar month, in the format "YYYY-MM"
ret: the stock return of the given stock in the given month
For each stock and each month, your goal is to generate a new variable, cum_ret_5yr, which measures the cumulative five-year forward return. The cumulative five-year forward return is mathematically defined as:
ri,t→t+60=∏s=t+1t+60(1+ri,s)−1,
where
ri,t is the return for stock i in month t,
t+60 represents the month 60 months after t (five years).
Write pseudo-code to calculate this new variable. Your pseudo-code should describe:
Steps to sort and group the data to ensure calculations are done within each stock (permo).
The computation of the five-year forward return for each stock and month.
How to handle edge cases, such as months where a full five-year period is not available in the data.
0 words
Interpret regression results (I)
5.0p9
In an empirical research article, you find the following table:
The regressions are linear panel regressions with fixed effects.
The variable definitions are as follows:
Entrepreneurship dummy: a dummy variable equal to 1 if individual i starts a business in year t+1
Owner: a dummy variable equal to 1 if individual i owns a house in year t
Δp: The house price appreciation in the region where individual i resides, defined as the fractional change in house prices over the past five years:Δp=pt−5pt−1,where pt and pt−5 represent house prices at time t and t−5, respectively.
How should one interpret the coefficient of Owner x Δp, which is reported as 0.014?
check_box_outline_blank
House owners are 1.4 percentage points more likely to start a business than non-owners after house prices have increased by 100% over five years,
check_box
Owners are more inclined to start a business, especially when house prices have increased over the past five years.
check_box_outline_blank
When house prices increase by 100 percent, house ownership increases by 1.4 percent.
check_box
House owners start on average 1.4 businesses
check_box_outline_blank
When house prices increase by 100% over five years, entrepreneurs are 1.4 percentage points less likely to buy a house.
check_box
House owners are 0.014 percentage points more likely to start a business.
check_box_outline_blank
House owners start on average 0.014 businesses per year
check_box_outline_blank
Home owners are more inclined to buy a second house, especially when house prices have increased over the past five years.
check_box_outline_blank
House owners are 1.4 percentage points more likely to start a business than non-owners.
Interpret regression results (II)
10.0p10
You are reading an empirical article about family-owned firms. The article assesses the question of whether the appointment of a family CEO hurts or benefits the performance of such firms compared to the appointment of an external, non-family related CEO. To do that, the paper uses data on CEO transitions in family firms, where the founder-CEO passes the CEO role to either a family member or an external manager. A key regression that you will see in this paper is: yi=a1+Xi′b1+c1famCEOi+ε1i,
where
y is a measure of firm performance
i indexes a firm
famCEO is a dummy variable equal to 1 for a family CEO
Xi is a vector of control variables
You don't have the time to read the paper in detail, so you quickly browse through it and find the following tables:
For these regressions, the researchers use another variable:
Genderofthefirstbornchildismale: a dummy variable equal to 1 if the gender of the first born child of the incumbent CEO is male, and zero otherwise.
What is the overall purpose of these regressions? What is each of these regression tables contributing to the overall purpose of this analysis? To answer theses questions, it is sufficient to focus only on the highlighted regression coefficients (among the many coefficients in these tables).
check_box_outline_blank
In this analysis, an instrumental variables framework is essential, because firm performance is determined by many unobservable factors.
check_box
The Genderoffirstbornchildismale dummy serves as an instrumental variable in this analysis.
check_box_outline_blank
The coefficient in column 1 of the third table shows that, typically, firm performance is worse under family CEOs.
check_box_outline_blank
The overall goal of the paper is to establish the causal effect of family CEOs on firm performance.
check_box
In this analysis, an instrumental variables framework is essential, because transitions from one CEO to the next can be driven by many unobservable factors.
check_box
The overall goal of the paper is to document that firm performance is worse under a family CEO.
check_box_outline_blank
The family CEO dummy serves as an instrumental variable in this analysis.
check_box_outline_blank
The coefficient in column 1 of the third table shows that there is a negative causal effect of family CEOs on firm performance.
check_box_outline_blank
The overall goal of the paper is to establish the causal effect of firm performance on the decision to appoint a family-related or an external CEO.
check_box_outline_blank
When the first born child of a founder CEO is male, it is more likely that there is a transition to a family CEO as opposed to an external CEO.
check_box_outline_blank
The coefficient in column 6 of the third table shows that there is a negative causal effect of family CEOs on firm performance.
check_box_outline_blank
The coefficient in column 6 of the third table shows that there is a negative association between family CEOs and firm performance.
Plotting time trends (write pseudo code)
绘制时间趋势(编写伪代码)
1
You have loaded a dataset of annual fundamental company data into memory. The dataset consists of the following variables (columns): 您已将年度基本面公司数据的数据集加载到内存中。数据集由以下变量(列)组成:
gvkey: firm identifier gvkey:公司标识符
assets_total: total book assets (in nominal terms) assets_total:账面总资产(名义价值)
year: fiscal year year:会计年度
cpi: Consumer Price Index (CPI) for the given year CPI:指定年份的消费者价格指数 (CPI)
cpi2020: CPI for 2020 CPI 2020:2020 年 CPI
Your goal is to plot the time trend in inflation-adjusted firm size over time. Describe (in pseudo-code) the steps you would take to: 您的目标是绘制经通胀调整后的公司规模随时间变化的时间趋势。描述(以伪代码形式)您将采取的步骤:
Process the data to account for inflation. 处理数据以考虑通货膨胀。
Summarize firm size for each year (e.g., by averaging or aggregating values). 汇总每年的公司规模(例如,通过平均或聚合值)。
Generate the desired plot. 生成所需的绘图。
to inflation-adjust total assets to year-2020 dollars, we need to multiply nominal total book assets by the ratio of the year-2020 CPI level and the current-year CPI level, i.e. the ratio cpi2020/cpi. 要将通货膨胀调整后的总资产为 2020 年美元,我们需要将名义总账面资产乘以 2020 年 CPI 水平与本年 CPI 水平的比率,即比率 cpi2020/cpi 。
to do this, I would create a new variable as the product of nominal total assets and the ratio cpi2020/cpi, i.e. according to the formula assets_total_adj=assets_total⋅cpi2020/cpi. 为此,我将创建一个新变量作为名义总资产和比率 cpi2020/cpi 的乘积,即根据公式 assets_total_adj=assets_total⋅cpi2020/cpi 。
in the next step I would generate a new variable that summarizes assets_total_adj by year, e.g. by its mean or median. Call that new variable assets_total_adj_mean. 在下一步中,我将生成一个 assets_total_adj 按年份汇总的新变量,例如按其平均值或中位数。将该新变量 assets_total_adj_mean .
I may have to be mindful about missing values of assets_total_adj when I compute its mean per year. 我可能必须注意每年计算平均值 assets_total_adj 时的缺失值。
I can do this by either retaining the full panel dataset and adding a variable that contains the same number (the mean/median for a given year) in each row of the same year, or I can collapse the dataset (create a new dataset) that contains only one observation per year and only two variables: year and assets_total_adj_mean. 为此,我可以保留完整的面板数据集,并在同一年的每一行中添加一个包含相同数字(给定年份的平均值/中位数)的变量,也可以折叠每年仅包含一个观测值且仅包含两个变量的数据集(创建新数据集): year 和 assets_total_adj_mean 。
finally, I would plot the variable assets_total_adj_mean against year, i.e. assets_total_adj_mean on the y-axis and year on the x-axis. 最后,我将针对 绘制变量 assets_total_adj_mean , year 即 assets_total_adj_mean 在 y 轴和 year x 轴上。
If I use the full panel dataset, I should make sure to plot one observation per year only. If I have already collapsed the dataset to one observation per year, this is already ensured. 如果我使用全面板数据集,我应该确保每年只绘制一个观测值。如果我已经将数据集折叠为每年一次观测,则已经确保了这一点。