2 Managing Financial Data

In this chapter we will learn how to download/import financial time series data from various sources. Most importantly we will use the quantmod library through the package tidyquant to download financial data from a variety of sources. We will also mention other possibilities as well as the fact that one can always import and ‘.xlsx’ (Excel) file.

2.1 Getting Data

2.1.1 quantmod and the tidyquant package

The tidyquant package comes with a variety of readily compiled data sets and allows access to several different data sources. We start this chapter by loading the tidyverse and the tidyquant package. Basically tidyquant provides four different types of commands:

  1. Get a Stock Index using tq_index() or data form a Stock Exchange using tq_exchange()
  2. Get Quantitative Data using tq_get()
  3. Transmute data (change in place) using tq_transmute() and Mutate data (add new variable) by using tq_mutate(). These functions provide functionality that integrates xts, zoo, quantmod, TTR and the PerformanceAnalytics packages.
  4. Do some performance analysis using tq_performance() and aggregate portfolios using tq_portfolio().

We start by loading the relevant packages and then employ tq_exchange_options() to give us a list of all stock exchanges available through the package.

library(tidyverse)
library(tidyquant)
tq_exchange_options()
#> [1] "AMEX"   "NASDAQ" "NYSE"

In a second step we download data from all three exchanges and have a brief look. We start with data from AMEX7, followed by NASDAQ8 and the most famous NYSE9

amex <- tq_exchange("AMEX")
nasdaq <- tq_exchange("NASDAQ")
nyse <- tq_exchange("NYSE")
# amex
amex
# nasdaq
nasdaq
# nyse
nyse
#> # A tibble: 274 x 7
#>   symbol company        last.sale.price market.cap country   ipo.year industry  
#>   <chr>  <chr>                    <dbl>      <dbl> <chr>        <int> <chr>     
#> 1 AAMC   Altisource As~          25.7     52827918 United S~       NA "Investme~
#> 2 AAU    Almaden Miner~           0.398   54682731 Canada        2015 ""        
#> # ... with 272 more rows
#> # A tibble: 4,503 x 7
#>   symbol company       last.sale.price market.cap country  ipo.year industry    
#>   <chr>  <chr>                   <dbl>      <dbl> <chr>       <int> <chr>       
#> 1 AACG   ATA Creativi~            2.68   84959184 China          NA Service to ~
#> 2 AADI   Aadi Bioscie~           29.4   611603870 United ~       NA Biotechnolo~
#> # ... with 4,501 more rows
#> # A tibble: 3,135 x 7
#>   symbol company        last.sale.price  market.cap country  ipo.year industry  
#>   <chr>  <chr>                    <dbl>       <dbl> <chr>       <int> <chr>     
#> 1 A      Agilent Techn~           174.  52655580872 "United~     1999 Electrica~
#> 2 AA     Alcoa Corpora~            48.8  9124674397 ""           2016 Metal Fab~
#> # ... with 3,133 more rows

In a next step, we show a list of all indices that are available through the package using tq_index_options():

tq_index_options() # find all indices available
#> [1] "DOW"       "DOWGLOBAL" "SP400"     "SP500"     "SP600"

We download the holdings for all five indices10. To not having to download all individual index holdings, we make use of map() from the purrr package contained in the tidyverse. Thereby we apply the function tq_index() to each vector element and save the output in a list. In a second step, we name the different list elements using names() so that collapsing the list using bind_rows() and specifying the name of the .id variable as ‘IndexID’ gets us the name of each index in an extra .id column. From each of the five indices (DOW11, DOWGLOBAL12, SP40013, SP50014 and SP60015) we show the first two lines using group_by() and slice(). For all indices we find the following variables: Stock symbol, company name, identifier and the current weight in the index.16

indexholdings <- c("DOW","DOWGLOBAL","SP400","SP500","SP600") %>% map(tq_index)
names(indexholdings) <- c("DOW","DOWGLOBAL","SP400","SP500","SP600")
indexholdings %>% bind_rows(.id = "IndexID") %>% group_by(IndexID) %>%
  slice(1:2)
#> # A tibble: 10 x 9
#> # Groups:   IndexID [5]
#>   IndexID   symbol company       identifier sedol   weight sector    shares_held
#>   <chr>     <chr>  <chr>         <chr>      <chr>    <dbl> <chr>           <dbl>
#> 1 DOW       UNH    UnitedHealth~ 91324P10   29177~ 0.0790  Health C~     5714596
#> 2 DOW       GS     Goldman Sach~ 38141G10   24079~ 0.0752  Financia~     5714596
#> 3 DOWGLOBAL MT-NL  ArcelorMitta~ BYPBS6     BYPBS~ 0.0132  Materials       38443
#> 4 DOWGLOBAL MTN-ZA MTN Group Li~ 656320     65632~ 0.0124  Communic~      142262
#> 5 SP400     CGNX   Cognex Corpo~ 19242210   22082~ 0.00685 Informat~     1551993
#> 6 SP400     MOH    Molina Healt~ 60855R10   22127~ 0.00671 Health C~      513214
#> # ... with 4 more rows, and 1 more variable: local_currency <chr>

Last, we get a list of all valid data sets (websites) that we can get data from by employing the tq_get() command.

tq_get_options() # find all data sources available
#>  [1] "stock.prices"       "stock.prices.japan" "dividends"         
#>  [4] "splits"             "economic.data"      "quandl"            
#>  [7] "quandl.datatable"   "tiingo"             "tiingo.iex"        
#> [10] "tiingo.crypto"      "alphavantager"      "alphavantage"      
#> [13] "rblpapi"

Those are (at the time of writing this book):

  • ‘stock.prices’: Get the open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance.
  • ‘dividends’: Retrieves the dividends for a stock symbol from Yahoo Finance.
  • ‘splits’: Get the split ratio for a stock symbol from Yahoo Finance.
  • ‘stock.prices.japan’: Get the open, high, low, close, volume and adjusted stock prices for a stock symbol from Yahoo Finance Japan.
  • ’economic.data*: Get economic data from FRED.
  • ‘quandl’: Get data sets from Quandl. Requires a valid api-key, see quandl_api_key.
  • ‘quandl.datatable’: Get data tables from Quandl.
  • ‘tiingo’: Get data sets from Tiingo. Requires a valid api-key, see tiingo_api_key.
  • ‘tiingo.iex’: Get data sets from IEX, The Investors Exchange through Tiingo.
  • ‘tiingo.crypto’: Get cryptocurrency prices collected by Tiingo.
  • ‘alphavantager’: Get data sets from Alpha Vantager. Requires a valid api-key, see av_api_key.
  • ‘rblpapi’: Get data sets from Bloomberg. Requires a valid connection to a Bloomberg terminal, see Rblpapi::blpConnect().

The data set we will be using in the following consists of the ten largest stocks within the S&P500 that had an IPO before January 2000. Therefore we need to merge both data sets using inner_join() because we only want to keep symbols from the S&P500 that are also traded on NYSE or NASDAQ:

stocks.selection <- indexholdings %>% bind_rows(.id = "IndexID") %>% 
  filter(IndexID=="SP500") %>% select(-IndexID) %>%
  inner_join(rbind(nyse,nasdaq) %>% 
               select(symbol,last.sale.price,market.cap,ipo.year),by=c("symbol")) %>% # join data sets
  dplyr::filter(ipo.year<2000,!is.na(market.cap)) %>% # filter years with ipo<2000 or ipo=NA
  arrange(desc(weight)) %>% # sort in descending order
  slice(1:10)
stocks.selection

In a next step, we download stock prices from Yahoo Finance by using tq_get(). Data from that source usually comes in the OHLC format (open,high,low,close) with additional information (volume, adjusted). In plain text this means the first (open) and last (close) as well as the highest (high) and lowest (low) traded price of the day. Volume shows the physical number of shares traded of that stock today, whereas adjusted is adjusted for dividends using the methodology from the Center for Research in Security Prices. All variables are adjusted for stock splits. To better understand dividend adjustment, we could check

#> # A tibble: 1 x 3
#> # Groups:   symbol [1]
#>   symbol date       value
#>   <chr>  <date>     <dbl>
#> 1 MSFT   2004-11-15  3.08
#> # A tibble: 3 x 8
#> # Groups:   symbol [1]
#>   symbol date        open  high   low close    volume adjusted
#>   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>    <dbl>
#> 1 MSFT   2004-11-12  30.2  30.2  29.8  30.0 162269000     19.2
#> 2 MSFT   2004-11-15  27.3  27.5  27.2  27.4 104468000     19.5
#> 3 MSFT   2004-11-16  27.3  27.3  27.0  27.1  64522600     19.3

where one clearly sees the paid dividend to reduce the OHLC values, without having an impact on the adjusted price.

So we download data for all ten stocks as well as the S&P500-index for the time period 2000-01-01 to 2020-12-31.

stocks.prices <- stocks.selection$symbol %>% 
    tq_get(get  = "stock.prices",from = "2000-01-01",to = "2020-12-31") %>%
    group_by(symbol)
index.prices <- "^GSPC" %>% 
    tq_get(get  = "stock.prices",from = "2000-01-01",to = "2020-12-31") 
#> # A tibble: 20 x 8
#> # Groups:   symbol [10]
#>   symbol date         open   high    low  close    volume adjusted
#>   <chr>  <date>      <dbl>  <dbl>  <dbl>  <dbl>     <dbl>    <dbl>
#> 1 AAPL   2000-01-03  0.936  1.00   0.908  0.999 535796800    0.858
#> 2 AAPL   2000-01-04  0.967  0.988  0.903  0.915 512377600    0.786
#> 3 ADBE   2000-01-03 16.8   16.9   16.1   16.4     7384400   16.3  
#> 4 ADBE   2000-01-04 15.8   16.5   15.0   15.0     7813200   14.9  
#> 5 AMZN   2000-01-03 81.5   89.6   79.0   89.4    16117600   89.4  
#> 6 AMZN   2000-01-04 85.4   91.5   81.8   81.9    17487400   81.9  
#> # ... with 14 more rows

Dividends and stock splits can also be downloaded:

stocks.dividends <- stocks.selection$symbol %>% 
    tq_get(get  = "dividends",from = "2000-01-01",to = "2017-12-31") %>%
    group_by(symbol)
#> # A tibble: 6 x 3
#> # Groups:   symbol [3]
#>   symbol date         value
#>   <chr>  <date>       <dbl>
#> 1 AAPL   2012-08-09 0.00338
#> 2 AAPL   2012-11-07 0.00338
#> 3 ADBE   2000-03-23 0.00156
#> 4 ADBE   2000-06-30 0.00156
#> 5 CSCO   2011-03-29 0.06   
#> 6 CSCO   2011-07-05 0.06
stocks.splits <- stocks.selection$symbol %>% 
    tq_get(get  = "splits",from = "2000-01-01",to = "2017-12-31") %>%
    group_by(symbol)
#> # A tibble: 6 x 3
#> # Groups:   symbol [3]
#>   symbol date       value
#>   <chr>  <date>     <dbl>
#> 1 AAPL   2000-06-21 0.5  
#> 2 AAPL   2005-02-28 0.5  
#> 3 AAPL   2014-06-09 0.143
#> 4 ADBE   2000-10-25 0.5  
#> 5 ADBE   2005-05-24 0.5  
#> 6 CSCO   2000-03-23 0.5

A variety of (professional) data services are integrated into tidyquant (through quantmod) which I will list in the following subsections:

2.1.2 Quandl

Quandl provides access to many different financial and economic databases. To use it, one should acquire an api key by creating a Quandl account.17 Searches can be done using quandl_search() (I personally would use their homepage to do that). Data can be downloaded as before with tq_get(), be aware that you can download either single time series or entire datatables with the arguments get = "quandl" and get = "quandl.datatable". Note that in the example for ‘Apple’ below, the adjusted close prices are different from the ones of Yahoo. An example for a datatable is Zacks Fundamentals Collection B.

quandl_api_key("enter-your-api-key-here")
quandl_search(query = "Oil", database_code = "NSE", per_page = 3)
quandl.aapl <- c("WIKI/AAPL") %>%
    tq_get(get          = "quandl",
           from         = "2000-01-01",
           to           = "2019-12-31",
           column_index = 11, # numeric column number (e.g. 1)
           collapse     = "daily",  # can be “none”, “daily”, “weekly”, “monthly”, “quarterly”, “annual”
           transform    = "none")    # for summarizing data: “none”, “diff”, “rdiff”, “cumul”, “normalize”
#> # A tibble: 3 x 14
#>   symbol    date        open  high   low close  volume ex.dividend split.ratio
#>   <chr>     <date>     <dbl> <dbl> <dbl> <dbl>   <dbl>       <dbl>       <dbl>
#> 1 WIKI/AAPL 2000-01-03  105.  112.  102.  112. 4783900           0           1
#> 2 WIKI/AAPL 2000-01-04  108.  111.  101.  102. 4574800           0           1
#> 3 WIKI/AAPL 2000-01-05  104.  111.  103   104  6949300           0           1
#> # ... with 5 more variables: adj.open <dbl>, adj.high <dbl>, adj.low <dbl>,
#> #   adj.close <dbl>, adj.volume <dbl>

2.1.3 Alpha Vantage

Alpha Vantage provides access to a real-time and historical financial data. Here we also need to get and set an api key (for free). Then we can download either time series data specifying av_fun="TIME_SERIES_DAILY_ADJUSTED" or intraday data (av_fun="TIME_SERIES_INTRADAY" and interval="5min") via tq_get().

av_api_key("enter-your-api-key-here")
alpha.aapl <- c("AAPL") %>%
    tq_get(get          = "alphavantager",
           av_fun="TIME_SERIES_DAILY_ADJUSTED") # for daily data
alpha.aapl.id <- c("AAPL") %>%
    tq_get(get          = "alphavantager",
           av_fun="TIME_SERIES_INTRADAY",  # for intraday data
           interval="5min") # 5 minute intervals
#> # A tibble: 5 x 10
#>   symbol timestamp   open  high   low close adjusted_close    volume
#>   <chr>  <date>     <dbl> <dbl> <dbl> <dbl>          <dbl>     <dbl>
#> 1 AAPL   2021-05-07  131.  131.  129.  130.           130.  78973273
#> 2 AAPL   2021-05-10  129.  130.  127.  127.           127.  88071229
#> 3 AAPL   2021-05-11  124.  126.  123.  126.           126. 126142826
#> 4 AAPL   2021-05-12  123.  125.  122.  123.           123. 112172282
#> 5 AAPL   2021-05-13  125.  126.  124.  125.           125. 105861339
#> # ... with 2 more variables: dividend_amount <dbl>, split_coefficient <dbl>
alpha.aapl.id <- c("AAPL") %>%
    tq_get(get          = "alphavantager",
           av_fun="TIME_SERIES_INTRADAY",  # for intraday data
           interval="5min") # 5 minute intervals
#> # A tibble: 5 x 7
#>   symbol timestamp            open  high   low close  volume
#>   <chr>  <dttm>              <dbl> <dbl> <dbl> <dbl>   <dbl>
#> 1 AAPL   2021-09-28 11:45:00  143.  143.  143.  143.  863237
#> 2 AAPL   2021-09-28 11:50:00  143.  143.  143.  143. 1028675
#> 3 AAPL   2021-09-28 11:55:00  143.  143.  142.  143. 1216676
#> 4 AAPL   2021-09-28 12:00:00  143.  143.  143.  143.  794039
#> 5 AAPL   2021-09-28 12:05:00  143.  143.  142.  142. 1299687

2.1.4 FRED (Economic Data)

A large quantity of economic data can be extracted from the Federal Reserve Economic Data (FRED) database. Below we download the 1M- and 3M- risk-free-rate for the US. Note that these are annualized rates!

ir <- tq_get(c("TB1YR","TB3MS"), get = "economic.data") %>%
  group_by(symbol)
#> # A tibble: 6 x 3
#> # Groups:   symbol [2]
#>   symbol date       price
#>   <chr>  <date>     <dbl>
#> 1 TB1YR  2020-09-01  0.13
#> 2 TB1YR  2020-08-01  0.13
#> 3 TB1YR  2020-07-01  0.15
#> 4 TB3MS  2020-09-01  0.11
#> 5 TB3MS  2020-08-01  0.1 
#> 6 TB3MS  2020-07-01  0.13

2.1.5 Bloomberg and Datastream

Bloomberg is officially integrated into the tidyquant-package, but one needs to have Bloomberg running on the terminal in use. Refinitiv’s Datastream is not integrated but has a nice R-Interface in the package DatastreamDSWS2R. However, you need to have the Refinitiv Datastream Web Services API licensed! In this case the package allows for convenient retrieval of data. If this is not the case, then you have to manually retrieve your data using the Excel-Client, save it as “.xlsx”-file, and import using readxl::read_xlsx() from the readxl-package.

2.1.6 Fama-French Data (Kenneth French’s Data Library)

To download Fama-French data in batch there is a package FFdownload that I maintain and that now can be installed via install.packages("FFdownload") from CRAN or install its development version via remotes::install_github("sstoeckl/ffdownload") from my github page. Currently you can either download all data or skip the (large) daily files using the command exclude_daily=TRUE or specify an inputlist of file names that the package will try to match to the data sets of the website. For a detailed reference on how to use the package please see this article on my website. The result is a list of time series in xts format having the following structure: ‘FFdata$dataset$timeframe$filecontent’.

The data sets (factors and factor portfolios) provided on Kenneth French’s data library all relate to research in asset pricing conducted by Kenneth French and Nobel laureate Eugene Fama. The most famous of their papers specify the famous three-factor model containing the market factor as well as a size and value factor (Fama and French 1992) and the five-factor model that adds profitability and investment (Fama and French 2014). One additional factor that has gained fame and can also be downloaded on their website is the momentum factor by Carhart (1997).

library(FFdownload)
library(timetk)
FFdownload(output_file = "data/FFdata_20201007.RData",         # output file for the final data set
           inputlist = c("F-F_Research_Data_Factors","F-F_Research_Data_Factors_daily_CSV","F-F_Momentum_Factor","F-F_Momentum_Factor_daily_CSV"),
           exclude_daily = TRUE) # exclude daily data
load(file = "data/FFdata_20201007.RData")
factors <- FFData$`x_F-F_Research_Data_Factors`$monthly$Temp2 %>% 
          tk_tbl(rename_index="date") %>% # make tibble
          left_join(FFData$`x_F-F_Momentum_Factor`$monthly$Temp2 %>% 
          tk_tbl(rename_index="date"), by ="date") %>%
          mutate(date=as.Date(date, frac=1)) %>% # make proper month-end date format
          gather(key=FFvar,value = price,-date) # gather into tidy format
factors %>% group_by(FFvar) %>% slice(1,n())
#> # A tibble: 10 x 3
#> # Groups:   FFvar [5]
#>   date       FFvar  price
#>   <date>     <chr>  <dbl>
#> 1 2000-01-31 HML    -1.89
#> 2 2020-12-31 HML    -1.43
#> 3 2000-01-31 Mkt.RF -4.74
#> 4 2020-12-31 Mkt.RF  4.63
#> 5 2000-01-31 Mom     1.92
#> 6 2020-12-31 Mom    -2.26
#> # ... with 4 more rows

2.1.7 Cryptocurrency Data

Cryptocurrency data could be downloaded from Coinmarketcap.com using my crypto2-package that can be installed via install.packages("crypto2") from CRAN or install its development version via remotes::install_github("sstoeckl/crypto2") from my github page. Available functions are

Be aware, that crypto currencies (CCs) are very volatile, risky and not controlled by any authority. A large number of coins has been de-listed from Coinmarketcap.com, which often led to a total loss for all investors in that crypto currency.

We start by downloading a list of all active crypto currencies.

library(crypto2)
# List all active coins
coins <- crypto_list(only_active=TRUE)
#> # A tibble: 6,881 x 8
#>      id name      symbol slug       rank is_active first_historica~ last_historical~
#>   <int> <chr>     <chr>  <chr>     <int>     <int> <date>           <date>          
#> 1     1 Bitcoin   BTC    bitcoin       1         1 2013-04-28       2021-09-29      
#> 2     2 Litecoin  LTC    litecoin     16         1 2013-04-28       2021-09-29      
#> 3     3 Namecoin  NMC    namecoin    716         1 2013-04-28       2021-09-29      
#> 4     4 Terracoin TRC    terracoin  2062         1 2013-04-28       2021-09-29      
#> 5     5 Peercoin  PPC    peercoin    762         1 2013-04-28       2021-09-29      
#> # ... with 6,876 more rows

From those crypto currencies, we select the 10 longest living coins and download additional information:

# retrieve information for all (the first 3) of those coins
coin_info <- crypto_info(coins,limit=3)
#> > Scraping crypto info
#> > Processing historical crypto data
#> # A tibble: 3 x 19
#>      id name     symbol category description     slug   logo    subreddit notice
#>   <int> <chr>    <chr>  <chr>    <chr>           <chr>  <chr>   <chr>     <chr> 
#> 1     1 Bitcoin  BTC    coin     "## **What Is ~ bitco~ https:~ bitcoin   ""    
#> 2     2 Litecoin LTC    coin     "## What Is Li~ litec~ https:~ litecoin  ""    
#> 3     3 Namecoin NMC    coin     "Namecoin (NMC~ namec~ https:~ namecoin  ""    
#> # ... with 10 more variables: date_added <chr>, twitter_username <chr>,
#> #   is_hidden <int>, date_launched <lgl>,
#> #   self_reported_circulating_supply <lgl>, self_reported_tags <lgl>,
#> #   status <dttm>, tags <list>, urls <list>, platform <lgl>

[TBD]

2.2 Manipulate Data

A variety of transformations can be applied to (financial) time series data. We present some examples while merging together our stock file with the index, the risk-free rate from FRED and the Fama-French-Factors.

Doing data transformations in tidy data sets is either called via transmute() (change variable/data set, only return calculated column) or mutate() (add transformed variable). In the tidyquant-package these functions are called tq_transmute() and tq_mutate(), because they simultaneously allow changes of periodicity (daily to monthly) and therefore the returned data set can have less rows than before. The core of these functions is the provision of a mutate_fun that can come from the the xts/zoo, quantmod (Quantitative Financial Modelling & Trading Framework for R) and TTR (Technical Trading Rules) packages.

In a first step we show how to change the periodicity of the data (where we keep the adjusted closing price and the volume information). For this we make use of the mutate_fun __to_period()__ where we specify the period as (assuming daily data) ‘weeks,’ ‘months,’ ‘quarters’ or ‘years.’ The option index_at allows us to select a final index and the corresponding format, namely ‘yearmon,’ ‘yearqtr’ (both formats only make sense with monthly/quarterly aggregation), ‘firstof,’ ‘lastof’ (specifying the first/last date of the period), startof or endof (detailing the first last date in the data given the respective period). select chooses which input columns to use/keep, while name overrides column-names. For certain aggregation times there exist ready-made commands to.weekly(), to.monthly(), to.quarterly() and to.yearly(). In the following we aggregate to weekly and monthly adjusted prices. Note, that the to-period() functions always select the last observation per period.

stocks.prices %>% 
  tq_transmute(select = c(adjusted,volume), mutate_fun = to.weekly, 
               indexAt = "lastof") %>% slice(1,2) %>% head(4)
#> # A tibble: 4 x 4
#> # Groups:   symbol [2]
#>   symbol date       adjusted    volume
#>   <chr>  <date>        <dbl>     <dbl>
#> 1 AAPL   2000-01-07    0.763 460734400
#> 2 AAPL   2000-01-14    0.770 390376000
#> 3 ADBE   2000-01-07   16.1     8253200
#> 4 ADBE   2000-01-14   16.5     5136800
stocks.prices %>% 
  tq_transmute(select = c(adjusted,volume), mutate_fun = to.monthly, 
               indexAt = "yearmon") %>% slice(1,2) %>% head(4)
#> # A tibble: 4 x 4
#> # Groups:   symbol [2]
#>   symbol date      adjusted    volume
#>   <chr>  <yearmon>    <dbl>     <dbl>
#> 1 AAPL   Jan 2000     0.795 701680000
#> 2 AAPL   Feb 2000     0.879 368961600
#> 3 ADBE   Jan 2000    13.7    16459200
#> 4 ADBE   Feb 2000    25.3     6718800

From this we can either calculate returns per period manually or use the function periodReturn() which allows to specify a period that takes care of the last step while simultaneously calculating the desired return. There are two types of returns available using type, the ‘arithmetic’ return and the ‘log’ return. Given \(P_t\) to be the price of an asset at time \(t\), we let \(p_t=ln(P_t)\) be its log-price.18

  • Then the arithmetic or discrete return is calculated as: \(R_t=\frac{P_{t} - P_{t-1}}{P_{t-1}}=\frac{P_{t}}{P_{t-1}}-1\), while
  • the log or continuously compounded return is: \(p_t - p_{t-1}=\left(\frac{P_t}{P_{t-1}}\right)\).

Both have their advantages, as the arithmetic return allows for easier cross-sectional aggregation (i.e. the formation of portfolios as weighted sum of returns) while the log-return allows for easier time series aggregation (i.e. the log-return in a month is just the sum of the daily log-returns). In the following we calculate monthly returns19 for the ten stocks and the index. We then merge the adjusted price and return information for each stock, while simultaneously merging the return of the S&P500 index as well as the 4 Fama-French-Carhart-Factors (Mkt.RF, SMB, HML, Mom) plus the risk-free interest rate (RF).

stocks.returns <- stocks.prices %>%
   tq_transmute(select = adjusted, mutate_fun = to.monthly, indexAt = "yearmon") %>%
  left_join(stocks.prices %>% 
              tq_transmute(select = adjusted, mutate_fun = periodReturn, period="monthly",
                           type="arithmetic", indexAt = "yearmon") %>%
              dplyr::rename(return = monthly.returns),by = c("symbol","date")) %>%
  left_join(index.prices %>% 
              tq_transmute(select = adjusted, mutate_fun = periodReturn, period="monthly", 
                           type = "arithmetic", indexAt = "yearmon") %>% 
              dplyr::rename(sp500 = monthly.returns),by = c("date")) %>%
  left_join(factors %>% mutate(price = price/100) %>% mutate(date=as.yearmon(date)) %>%
              pivot_wider(names_from = FFvar, values_from = price),by="date")
stocks.returns %>% ungroup() %>% slice(1:2,(n()-1):n())       # show first and last entry
#> # A tibble: 4 x 10
#>   symbol date      adjusted  return   sp500  Mkt.RF    SMB     HML     RF     Mom
#>   <chr>  <yearmon>    <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
#> 1 AAPL   Jan 2000     0.795 -0.0731 -0.0418 -0.0474 0.0579 -0.0189 0.0041  0.0192
#> 2 AAPL   Feb 2000     0.879  0.105  -0.0201  0.0245 0.215  -0.0981 0.0043  0.182 
#> 3 UPS    Nov 2020   168.     0.0955  0.108   0.125  0.0565  0.0213 0.0001 -0.124 
#> 4 UPS    Dez 2020   164.    -0.0283  0.0305  0.0463 0.0482 -0.0143 0.0001 -0.0226

2.3 The data we use in this book

2.3.1 Stock data sets

long/short data? features? Crypto? For applications!

References

Carhart, Mark M. 1997. “On Persistence in Mutual Fund Performance.” The Journal of Finance 52 (1): 57–82. https://doi.org/10.2307/2329556.
Fama, Eugene F., and Kenneth R. French. 1992. “The Cross-Section of Expected Stock Returns.” The Journal of Finance 47 (2): 427–65. https://doi.org/10.1111/j.1540-6261.1992.tb04398.x.
———. 2014. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics. https://doi.org/10.1016/j.jfineco.2014.10.010.