yfR
facilitates importing stock prices from Yahoo
finance, organizing the data in the tidy
format and
speeding up the process using a cache system and parallel computing.
yfR
is the second and backwards-incompatible version of BatchGetSymbols,
released in 2016 (see vignette yfR
and BatchGetSymbols for details).
In a nutshell, Yahoo Finance (YF) provides a vast repository of stock price data around the globe. It covers a significant number of markets and assets, being used extensively in academic research and teaching. In order to import the financial data from YF, all you need is a ticker (id of a stock, e.g. “GM” for General Motors) and a time period – first and last date.
The main function of the package, yfR::yf_get
, returns a
dataframe with the financial data. All price data is measured at the
unit of the financial exchange. For example, price data for GM
(NASDAQ/US) is measured in dollars, while price data for PETR3.SA
(B3/BR) is measured in Reais (Brazilian currency).
The returned data contains the following columns:
ticker: The requested tickers (ids of stocks);
ref_date: The reference day (this can also be year/month/week when using argument freq_data);
price_open: The opening price of the day/period;
price_high: The highest price of the day/period;
price_close: The close/last price of the day/period;
volume: The financial volume of the day/period, in the unit of the exchange;
price_adjusted: The stock price adjusted for corporate events such as splits, dividends and others – this is usually what you want/need for studying stocks as it represents the real financial performance of stockholders;
ret_adjusted_prices: The arithmetic or log return (see input type_return) for the adjusted stock prices;
ret_adjusted_prices: The arithmetic or log return (see input type_return) for the closing stock prices;
cumret_adjusted_prices: The accumulated arithmetic/log return for the period (starts at 100%).
The easiest way to find the tickers of a company stock is to search for it in Yahoo Finance’s website. At the top page you’ll find a search bar:
A company can have many different stocks traded at different markets (see picture above). As the example shows, Petrobras is traded at NYQ (New York Exchange), SAO (Sao Paulo/Brazil - B3 exchange) and BUE (Buenos Aires/Argentina Exchange), all with different symbols (tickers). For market indices, a list of tickers is available here.
yfR
Fetches daily/weekly/monthly/annual stock prices/returns from yahoo finance and outputs a dataframe (tibble) in the long format (stacked data);
A new feature called collections facilitates
download of multiple tickers from a particular market/index. You can,
for example, download data for all stocks in the SP500 index with a
simple call to yf_collection_get("SP500")
;
A session-persistent smart cache system is available by default. This means that the data is saved locally and only missing portions are downloaded, if needed.
All dates are compared to a benchmark ticker such as SP500 and, whenever an individual asset does not have a sufficient number of dates, the software drops it from the output. This means you can choose to ignore tickers with a high proportion of missing dates.
A customized function called yf_convert_to_wide()
can transform the long dataframe into a wide format (tickers as
columns), much used in portfolio optimization. The output is a list
where each element is a different target variable (prices, returns,
volumes).
Parallel computing with package furrr
is available,
speeding up the data importation process.
# CRAN (not yet available)
#install.packages('yfR')
# Github (dev version)
devtools::install_github('ropensci/yfR')
# ropensci
install.packages("yfR", repos = "https://ropensci.r-universe.dev")
library(yfR)
# set options for algorithm
<- 'FB'
my_ticker <- Sys.Date() - 30
first_date <- Sys.Date()
last_date
# fetch data
<- yf_get(tickers = my_ticker,
df_yf first_date = first_date,
last_date = last_date)
#>
#> ── Running yfR for 1 stocks | 2022-05-28 --> 2022-06-27 (30 days) ──
#>
#> ℹ Downloading data for benchmark ticker ^GSPC
#> ℹ (1/1) Fetching data for FB
#> ! - not cached
#> ✔ - cache saved successfully
#> ✔ - got 18 valid rows (2022-05-31 --> 2022-06-24)
#> ✔ - got 100% of valid prices -- Feliz que nem lambari de sanga!
#> ℹ Binding price data
#>
#> ── Diagnostics ─────────────────────────────────────────────────────────────────
#> ✔ Returned dataframe with 18 rows -- Looking good!
#> ℹ Using 5.4 kB at /tmp/Rtmpoa1adQ/yf_cache for 1 cache files
#> ℹ Out of 1 requested tickers, you got 1 (100%)
# output is a tibble with data
head(df_yf)
#> # A tibble: 6 × 11
#> ticker ref_date price_open price_high price_low price_close volume
#> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FB 2022-05-31 195. 198. 192. 194. 26131103
#> 2 FB 2022-06-01 197. 200. 185 189. 36623495
#> 3 FB 2022-06-02 188. 201. 188. 199. 31951582
#> 4 FB 2022-06-03 196. 197. 190. 191. 19464993
#> 5 FB 2022-06-06 194. 197. 188. 194. 30574242
#> 6 FB 2022-06-07 192. 197. 191. 196. 18828687
#> # … with 4 more variables: price_adjusted <dbl>, ret_adjusted_prices <dbl>,
#> # ret_closing_prices <dbl>, cumret_adjusted_prices <dbl>
Package yfR
is based on quantmod (@joshuaulrich) and uses one of its
functions (quantmod::getSymbols
) for fetching raw data from
Yahoo Finance. As with any API, there is significant work in maintaining
the code. Joshua was always fast and openminded in implemented required
changes, and I’m very grateful for it.