Most at-home traders have the urge to create a trading strategy as fast as possible. Yet, many of them backtest their own strategies without ever having heard of feature engineering or mastering fundamental finance concepts and Python skills. Only a few seem to understand the ins and outs of financial analysis using Python, real-time data aggregation, and strategy origination.
While the intertwined worlds of finance and Python are undeniably fascinating, knowing where to start can be overwhelming and challenging.
If you feel lost in this jungle of financial concepts and incomplete Python scripts, FinancePy could be your new go-to resource. This article will guide you step by step - and provide the full Python script - to retrieve Bitcoin OHLCV data from Kraken's API for the last 720 days, compute key financial metrics such as the log return or rolling averages, and finally help you visualize the results.
This article serves as a primer and aims to equip anyone with the foundational skills to create their own simple strategy in the future.
See this article as your gateway to “systematic” finance and discover how you can leverage the power of Python to create your own trading strategy from the ground up. More advanced articles will follow.
Although we’ll use real market data, please keep in mind that this article is a tutorial, not financial advice, and aims at enhancing our finance knowledge and Python skills. Note that this article focuses on Bitcoin as our chosen asset but that the following script can be applied to other assets of your choice.
Part 1: Fetching 720 Days of Historical Data from Kraken
We first import our required libraries and then initialize our public API. Here we are specifically fetching data from Kraken but note that the “ccxt” Python library can be used to fetch data from other exchanges like Binance. Be aware that symbol and timeframe formats may vary between exchanges, therefore you might need to check “ccxt” documentation before fetching new data from other exchanges.
We then define the asset's symbol and the timeframe. We are looking for Bitcoin “BTC” traded in USD on Kraken “BTC/USD” and daily data. At the time of writing this article, we fetched the data on October 1, 2023. Please be aware that when fetching data at a later date results will change.
`fetch_ohlcv` retrieves the OHLCV (Open, High, Low, Close, Volume) data for the symbol and timeframe specified: OHLCV data is fundamental for conducting various types of technical analysis or creating a strategy.
Here, our DataFrame `data` has 720 entries. 1 entry represents 1 day. It’s important to determine the number of days of historical data available for technical indicators and backtesting purposes and to look at the data type of each of our variables.
import ccxt
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Initializing the connection with Kraken
kraken = ccxt.kraken()
# We determine the symbol and desired timeframe
symbol = 'BTC/USD'
timeframe = '1d'
# We extract OHLCV data through `fetch_ohlcv()`
ohlcv = kraken.fetch_ohlcv(symbol, timeframe)
# We create the DataFrame `data` from our `ohlcv` array
data = pd.DataFrame(ohlcv, columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume'])
data['timestamp'] = pd.to_datetime(data['timestamp'], unit = 'ms')
data.set_index('timestamp', inplace = True)
data
# Let's get more information from our data
data.info()
data.describe()
Part 2: Calculating Advanced Financial Metrics and Technical Indicators
Now that we have our Bitcoin `data` ready, we get to the interesting part, calculating key financial metrics:
Logarithmic Returns (log return)
Rolling Volatility
50-day and 200-day Rolling Averages
Sharpe Ratio
Understanding Our Variables
1. Logarithmic Returns: Unlike raw price differences, logarithmic returns give a consistent metric to compare returns over time. You calculate it by taking the natural log of the ratio of consecutive prices. Log returns are important because they account for the compounding effect. They also give a more accurate representation of returns over time. You can see how we calculate the log returns in the code below.
Cool characteristics about log returns!
Compounding: They can be added together over time, which is not the case for simple returns.
Statistical Properties: They are approximately normally distributed for many assets.
Stability: They are more stable and less skewed than simple returns.
Note that for small return values the difference between log returns and simple returns is minimal... But as returns become larger, the difference is more pronounced.
2. Rolling Volatility: The rolling volatility measures the degree of variation of a trading price series over time. Here, our 21-day rolling volatility is the standard deviation of the log returns over the past 21 days. We get a moving snapshot of how volatile the Bitcoin prices have been in the past two years and can assess its potential market risks.
3. 50-day and 200-day Rolling Averages: These metrics smooth out our price data to reveal underlying trends. They're crucial in trend-following strategies, helping us identify potential buy or sell opportunities based on crossovers or relative positions of the Bitcoin price to these averages.
4. Sharpe Ratio: This is a measure of risk-adjusted return. It describes how much excess return you can expect to receive for the extra volatility of holding a riskier asset, like Bitcoin in this case. A higher Sharpe ratio indicates a better risk-adjusted performance. The formula involves subtracting the risk-free rate from the asset's return and then dividing by its volatility. This gives us a sense of the 'price' of the risk taken to achieve a certain level of return.
In this tutorial we arbitrarily set the risk-free rate to 1% but just know for now that the choice of the risk-free rate depends on the analysis. Often, in U.S. markets, the three-month U.S. Treasury bill is set as the risk-free rate of return, given the very low risk of default by the American government. We will cover this topic in a next article.
Focus: As of June 30, 2023, the S&P 500 Portfolio Sharpe ratio is 0.88. (source Investopedia).
# Computing log returns
data['log_return'] = np.log(data['close'] / data['close'].shift(1))
# Computing rolling volatility
data['rolling_volatility'] = data['log_return'].rolling(window=21).std()
# Computing 50-day and 200-day moving averages
data['50_MA'] = data['close'].rolling(window=50).mean()
data['200_MA'] = data['close'].rolling(window=200).mean()
# Computing sharp ratio
risk_free_rate = 0.01
annualized_sharpe_ratio = (data['log_return'].mean() - risk_free_rate) / data['log_return'].std() * np.sqrt(252) # or 365 days in the case of assets traded all year
Rolling Averages Focus
Let's take a moment to talk about rolling averages, also known as moving averages. They are a type of data smoothing technique. By averaging out data over specific intervals, rolling averages help in revealing underlying trends and patterns. These are extremely useful to get a first understanding of our data.
The 'window size' like 50 days or 200 days in our case, determines the number of days to average and also defines the lag from the data. A 50-day rolling average will use the previous 50 days' data points to compute an average for a specific day, thus giving a smoother line that can potentially signal long-term trends. A 200-day rolling average is considered even more long-term.
In financial analysis, these rolling averages have multiple purposes:
Trend Identification: A rising moving average typically suggests an uptrend, while a falling moving average indicates a downtrend.
Support and Resistance Levels: In chart analysis, moving averages can act as support or resistance lines.
Signal Generation: When a short-term moving average (like the 50-day MA) crosses above a long-term moving average (like the 200-day MA), it's considered a bullish signal, and vice versa.
If you wanted to create a trading strategy, or just follow some key metrics for your technical analysis, the rolling averages can be a good starting point.
Understanding NaNs in our Data
In our DataFrame 'data', we can observe some entries with NaN (Not a Number). These are a result of certain calculations; the rolling metrics.
This concept is straightforward to understand. When computing a 50-day rolling average, the first 49 entries won't have a value because they don't have 50 preceding days to average from. Therefore, these entries are populated with NaN. This is a common occurrence in time series analysis.
Part 3: Visualizing our data and metrics
We plot BTC closing prices as well as the moving averages for the period. It can also be interesting to visualize the average price and volatility of the asset. We can represent them by dotted red lines.
# Background grid
plt.style.use('seaborn-whitegrid')
# Plotting BTC closing prices (last 720 days) and average with the 50 and 200 MAs
average_price = data['close'].mean()
plt.figure(figsize = (14, 7))
data[['close', '50_MA', '200_MA']].plot(ax = plt.gca(), title = "BTC/USD Price Trends & Moving Averages")
plt.axhline(average_price, color = 'red', linestyle = '--', label = f"BTC Average Price: ${average_price:.2f}")
plt.ylabel("Price (USD)")
plt.legend()
plt.show()
# Visualizing Rolling Volatility with average volatility
average_volatility = data['rolling_volatility'].mean()
plt.figure(figsize = (14, 7))
data['rolling_volatility'].plot(title = "Bitcoin's 21-Day Rolling Volatility")
plt.axhline(average_volatility, color = 'red', linestyle = '--', label = f"Average Volatility: {average_volatility:.6f}")
plt.ylabel("Volatility")
plt.legend()
plt.show()
Conclusion
These building blocks, though “systematic”, are just the tip of the iceberg. You now know how to fetch live data using the ccxt library, calculate key trading metrics, and visualize them.
With this foundational knowledge, you are now better equipped to tackle more advanced topics. The real challenge, however, lies in integrating financial concepts with python programming to uncover profitable trading strategies. It takes time, practice and extensive research, but it's a fascinating and rewarding journey.
In future articles, we will explore backtesting, the discovery and tweaking of trading features, selecting a trading universe, and much more.
Thanks for reading and stay tuned for more articles in this Toolkit series!
Follow FinancePy on Medium
Ressources:
Eng Guan. (2023, Mar 3). Magic of Log Returns: Concept – Part 1. Updated Mar 10. Retrieved on September 30, 2023, from allquant.
Hayes, A. (2023, May 21). What Is the Risk-Free Rate of Return, and Does It Really Exist? Reviewed by Cierra Murry. Fact checked by Pete Rathburn. Investopedia. Consulted on September 29, 2023. Retrieved from investopedia.
Fernando, J. (2023, May 11). Sharpe Ratio: Definition, Formula, and Examples. Reviewed by Margaret James. Fact checked by Katrina MuniChiello. Investopedia. Consulted on September 30, 2023. Retrieved from investopedia.