# 3 Exploring Data

In this chapter we show how to explore and analyze data using the dataset created in Chapter 2:

discuss frequency and properties of the data (log-returns vs discrete returns)

library(tidyverse)
library(tidyquant)

We load the data and check on it:

#> # A tibble: 2 x 11
#>   symbol company    identifier sedol weight sector    shares_held local_currency
#>   <chr>  <chr>      <chr>      <chr>  <dbl> <chr>           <dbl> <chr>
#> 1 AAPL   Apple Inc. 03783310   2046~ 0.0616 Informat~   168974140 USD
#> 2 MSFT   Microsoft~ 59491810   2588~ 0.0607 Informat~    81130190 USD
#> # ... with 3 more variables: last.sale.price <dbl>, market.cap <dbl>,
#> #   ipo.year <int>

## 3.1 Plotting and Charting Data

In this chapter we show how to create various graphs of financial time series and their properties, which should help us to get a better understanding of their properties, before we go on to calculate and test their statistics.

### 3.1.1 Time-series plots

We can now directly plot the return series using a bar-chart. In the following, I

stocks.returns.monthly %>% select(symbol, date, return) %>%
ggplot(aes(x=date,y=return,col=symbol)) + geom_bar(stat = "identity") + facet_wrap(~symbol) Often we want to relate the performance of different investments in a graphical manner. This can be done by assuming an investment of one dollar at a particular point in time and plotting the resulting cumulated timeseries. We aggregate arithmetic returns as $$R_{1:t}=\prod_{s=0}^{t}(1+R_{s})$$. The resulting series is depicted below with the y-scale being log-transformed due to the extraordinary perfromance of sum of the companies in relation to others.

stocks.returns.monthly %>% select(symbol, date, return) %>% mutate(wealth=cumprod(1+return)) %>%
ggplot(aes(x=date,y=wealth,col=symbol)) + geom_line() + scale_y_log10() Another series that is rather important when talking about performance is drawdown which describes the decline from a historical peak in a cumulated return series. Unfortunately the function to calculate Drawdowns() per se is only available for xts-input, therefore we stick to only plotting

stocks.returns.monthly %>% select(symbol, date, return) %>% mutate(dd = Drawdowns(return %>% timetk::tx_xts())) A similar chart can be produced by means of the PerformanceAnalytics package:

stocks.returns.monthly %>% select(symbol, date, return) %>% mutate(wealth=cumprod(1+return)) %>%
ggplot(aes(x=date,y=wealth,col=symbol)) + geom_line() + scale_y_log10() ### 3.1.4 Quantile Plots

Putting it all together:

pm <- GGally::ggpairs(iris)
#> Registered S3 method overwritten by 'GGally':
#>   method from
#>   +.gg   ggplot2
if(output %in% c("latex","docx")){
pm
} else if(output == "html"){
plotly::ggplotly(pm)
} else(print("No format defined for this output filetype"))
#> stat_bin() using bins = 30. Pick better value with binwidth.
#> stat_bin() using bins = 30. Pick better value with binwidth.
#> Warning: Can only have one: highlight
#> stat_bin() using bins = 30. Pick better value with binwidth.
#> Warning: Can only have one: highlight
#> stat_bin() using bins = 30. Pick better value with binwidth.
#> Warning: Can only have one: highlight

## 3.2 Analyzing Data

### 3.2.1 Calculating Statistics and testing and factor exposure

simple and by using performanceanalytics through tidyquant

summary statistics, sample mean and covariance estimation higher moments, tests for (multivariate) normality, quantiles and other risk measures per asset/time period, auto-correlation and predictability?

factor analysis, betas, alphas

necessary ?

### 3.2.6 Exposure to Factors

The stocks in our example all have a certain exposure to risk factors (e.g. the Fama-French-factors we have added to our dataset). Let us specify these exposures by regression each stocks return on the factors Mkt.RF, SMB and HML using the methods from section 1.2.3

stocks.factor_exposure <- stocks.returns.monthly %>%
nest_by() %>%
mutate(model = list(bind_rows(coef(lm(return ~ Mkt.RF + SMB + HML, data = data))))) %>%
unnest(model) %>%
select(-data,-(Intercept))