Python for Finance: Portfolio Optimization and the value of Diversifying.

In this article I will show you how to create and efficient portfolio and the importance of choosing uncorrelated assets

13 min readApr 26, 2021

The following topics will be addressed in this post:

How to get the stocks data from Yahoo Finance directly with Python.
Plotting the price and returns of the stocks.
How to get the logarithmic returns from the prices.
Calculate the annual expected return with CAPM and pyportfolioopt library.
Compute the annual Covariance Matrix, the annual Variance of the portfolio, and the annual Volatility and the annual expected return of the portfolio.
Optimizing the weights of the portfolio for Sharpe Ratio and Minimal Variance with pyportfolioopt library.
How to allow short selling in the optimal solution.
How to get integer optimal quantities of stocks to have in the portfolio.
Conclusions and the value of choosing uncorrelated assets.

But before we start with Python a little bit of the theory behind must be addressed. The first thing a person should have clear when investing is the level of risk they are willing to take, that’s called the risk and return trade off. The risk is a personal choice that each investor must take, that's why I will show you how to optimize your portfolio for minimum volatility and also for Sharpe Ratio. But first, let’s talk a little bit about the Efficient Frontier.

What is the Efficient Frontier?

According to Markowitz Portfolio Theory the Efficient Frontier are all the portfolio sets that given a risk level maximizes the expected return of the investor.

Image by author: Efficient Frontier of the Portfolio created below.

As you can see in the picture above, the portfolios that are in the red line are in what we call the Efficient Frontier. Any portfolio that is below the red line for the same amount of risk, can improve the expected return with a different combination of the portfolio weights.

What is the volatility?

The volatility measures the dispersion of the returns from the mean. Mathematically the dispersion accounts for the standard deviation of the returns of the assets. One of the assumptions we need to make when we work with the returns is that they can be modelled as a normal distribution, where the risk or volatility is the standard deviation of the data.

What is the Sharpe Ratio?

The Sharpe Ratio indicate us the risk of an investment measured by it’s risk. The ratio is the average excess of return above the risk free rate adjusted by the volatility of the asset. The greater the Sharpe Ratio the better. As you can see in the image of the efficient frontier, the blue dot represents the maximum Sharpe Ratio that you can achieve with all the possible Portfolios.

Image by author: Mathematical Formulation of the Sharpe Ratio

What is a good Sharpe Ratio?

A Sharpe Ratio above 1.0 is considered an acceptable Sharpe Ratio.
A Sharpe Ratio above 2.0 is considered a good Sharpe Ratio
And finally, a Sharpe Ratio above 3.0 is considered an excellent Sharpe Ratio.

Let’s go to Python

First thing first, we need to import the libraries that we will use.

import pandas as pd
import numpy as np
from datetime import datetime
import matplotlib.pyplot as plt
import statistics
import seaborn as sns
from pandas_datareader import data
from pulp import *

You may need to install PuLp and pandas-datereader library; and to do that you can run the following commands:

pip install pulp
pip install pandas-datareader

Now we can start working. First, we need to choose the assets we want to work with:

For this case, I will choose the biggest US tech companies in market capitalization. Those companies are: Apple (AAPL), Microsoft (MSFT), Amazon (AMZN), Alphabet (GOOGL), Facebook (FB), Tesla (TSLA) and Visa (V).
These companies are listed in the NASDAQ (^IXIC): the second largest index in the US. The NASDAQ generally is the place where the tech companies are listed. That’s why we are going to use this index as benchmark or market portfolio.

Obtaining the Data

To obtain the data, we set up a list with the names of the tickers of the stocks we are going to use. In the case of Apple, for example, it appears listed on the NASDAQ as AAPL. Also, the ticker name of our benchmark, the NASDAQ, is ^IXIC.

assets = ["AAPL", "MSFT","AMZN","GOOGL","FB","TSLA","V","^IXIC]

Now that we have the list of the assets, we must define the date from which we are going to obtain the data. For this example, we are going to obtain the data from 5 years ago until now.

initial_date = "2016-04-20"
today = datetime.today().strftime('%Y-%m-%d')

With this information we are ready to obtain the data from Yahoo Finance, but first we must create and empty dataset where the data is going to be stored.

df_prices = pd.DataFrame()

Then we use the following Python function that will automatically give us the “Adj Close” column of the Yahoo Finance data from the date we chose.

# We give the function the empty dataframe, the name of the assets, the initial date and the end date:def datosYahoo(dataframe,asset_list,start,finish):
    for i in asset_list:
        dataframe[i] = data.DataReader(i,data_source='yahoo',start= start , end=finish)["Adj Close"]
    return dataframe
df = datosYahoo(df_prices,assets,initial_date,today)
df

Now we are ready to plot the price of the stocks during time.

Plotting the Stocks Prices

plt.figure(figsize=(12.2,4.5)) 
for i in df.columns.values:
    plt.plot( df[i],  label=i)
plt.title('Price of the Stocks')
plt.xlabel('Date',fontsize=18)
plt.ylabel('Price in USD',fontsize=18)
plt.legend(df.columns.values, loc='upper left')
plt.show()

Image by author: Price of the stock in time.

It’s pretty clear in the plot the impact of the pandemic, it can be seen in the gray line that represents the NASDAQ index at the beginning of 2020.

The plot above has one problem, it doesn’t show the trend of the stocks prices in time. So we are going to plot the return of the stocks in time. We do this by dividing each row by the initial price.

Image by author: Return of the stock in time.

In the above picture we can clearly see how much these stocks have returned in time. It’s impressive to see that the price of Tesla has almost multiplied by 15 since the initial date. Wish I had invested in Tesla back in 2017. Congratulations to the ones who did.

Logarithmic Returns

Even though the Python library pyportfolioopt allows to calculate the expected return only with the prices of the stocks. It’s a good exercise to calculate the return by ourselves.

The Logarithmic Returns are defined by the following equation:

Image by author: Annual Logarithmic Return

This can be done simply in python with two lines of code:

df = np.log(df).diff()
df = df.dropna()
df

We eliminate the missing values because due to the logarithmic transformation we lose the first data. The resulting dataframe is the following:

Table by author: Resulting Dataframe with the logarithmic returns.

Are in fact the returns normally distributed?

plt.figure(figsize=(12.2,4.5)) 
for i in df.columns.values:
    plt.hist( df[i],  label=i, bins = 200)
plt.title('Returns Histogram')
plt.xlabel('Fecha',fontsize=18)
plt.ylabel('Precio en USD',fontsize=18)
plt.legend(df.columns.values)
plt.show()

Image by author: Histogram of the Returns

Effectively the normality of the returns holds.

CAPM: Capital Asset Pricing Model.

One of the requirements to solve the optimization problem that suppose to find the minimum variance portfolio is to have the expected return. An easy way to do this is to use the CAPM model.

Model Formulation

The model tell us that the expected return of the asset i (E(Ri)) is given by the sum of the risk free rate (Rf) plus the beta of the asset multiplied by the market expected return (E(Rm)) subtracted by the risk free rate.

So where do I get the Risk Free Rate?

The real risk free rate can be calculated as the subtraction between the Yield to Maturity of a Treasury Bond and the inflation.

But of all bonds with different durations which one should I choose?

You should choose a bond that has the same duration as your investment.
In this case, as we are trying to calculate the expected return of our stocks one year from now, we are going to use a 1 year T-bill.
For this example the annual rate of this bond is 0.07% (04/20/2021).

Expected Return of the Market Portfolio

In this case, one option is to assume the historic annual return of the Benchmark that you will use. The classic Benchmark is the S&P 500, but as in this case we are working with tech companies, we will use the NASDAQ.

Back to Python

Now we have everything to use the Python (pyportfolioopt) library that automatically computes the expected return using CAPM.

Screenshot of https://pyportfolioopt.readthedocs.io/en/latest/: Inputs of the function

You may need to install the library which is achieved by running the following command:

pip install PyPortfolioOpt

As the function ask us the market prices as inputs separated from the returns of our assets we separate them:

df_assets =  df.loc[:, df.columns != '^IXIC']
df_assets

Now the same with the Benchmark:

df_benchmark1 =  df.loc[:, df.columns == '^IXIC']

Computing the Expected Return

As we are working with the returns we must set returns_data = True.

retornos1 = expected_returns.capm_return(df_assets, market_prices = df_benchmark1, returns_data= True, risk_free_rate=0.07/100, frequency=252)
retornos1

From which we obtain:

Image by author: Expected return using CAPM.

Annual Covariance Matrix

The Covariance Matrix, is a square matrix that measures the variation between two variables, in this case; returns. The covariance matrix can be easily calculated in Python:

df_cov = df_activos.cov()*252

We multiply the returns by 252, because during a year there are 252 business days.

The resulting dataframe is the following:

Table by author: Resulting Covariance Matrix

Variance of the Portfolio

First, we need to create an arbitrary portfolio:

Let’s give all the assets the same weights.
To calculate the weights easily and automatically independent of how many assets there are, I created the following Python function:

# The function give us the equal weights for all assets:def pesosPortafolio(dataframe):
    array = []
    for i in dataframe.columns:
        array.append(1/len(dataframe.columns))
    arrayFinal = np.array(array)
    return arrayFinal# Pesos are the resulting weightspesos = pesosPortafolio(df_activos)
pesos

As a result, we obtain the following weights vector:

array([0.14285714, 0.14285714, 0.14285714, 0.14285714, 0.14285714,
       0.14285714, 0.14285714])

Now we can calculate the variance of the portfolio:

Image by author: Variance of the portfolio

Where, w corresponds to the weights of the portfolio and summation symbol corresponds to the covariance matrix.

#Portfolio Variance:varianza_portafolio = pesos.T @ df_cov @pesos
"The variance of the portfolio is:" + " " + str(round(varianza_portafolio*100,1))+"%"#Result:'The variance of the portfolio is: 5.5%'

Then, the volatility is calculated as the square root of the variance. That can be calculated using the numpy sqrt function as follows:

# Portfolio Volatility
volatilidad_portafolio = np.sqrt(varianza_portafolio)
"The volatility of the portfolio is:" + " " + str(round(volatilidad_portafolio*100,1))+"%"#Result:
'The volatility of the portfolio is: 23.5%'

Expected Return of the Portfolio

# Expected return of the portfolio
retorno_portafolio = np.sum(pesos*retornos1)
'The expected annual return of the portfolio is:' + ' ' + str(round(retorno_portafolio*100,3)) + '%'# Result: 
'The expected annual return of the portfolio is: 18.723%'

Let’s see if we can find a portfolio that with the same amount or less risk has a better expected return.

Optimization for Minimum Variance: with Short Selling

Before we start optimizing let’s take a look to what is Short Selling with this great explanatory video from the Wall Street Survivor.

To sum up:

Short selling is borrowing a quantity of stocks, let’s say 10 stocks at 10 dollars a share, from a broker and selling them immediately.
Then when the prices go down, yo rebuy the stock to repay your broker. Imagine that the stock went down to 5 dollars a share.
Now, to repay your broker, you only need to spend 50 dollars to rebuy the 10 stocks you owe. So now you have a 50 dollars profit.

Optimization

The problem we are facing is the following:

Then using the pyportfolioopt library we optimize the portfolio for minimum volatility allowing short selling:

That is obtained by indicating the function that it must minimize volatility (ef.min_volatility)
To allow short selling (weight_bounds = (-1,1)). This means that short selling is the equivalent to have negative weights in your optimal solution.

ef = EfficientFrontier(retornos1, df_cov, weight_bounds=(-1,1))
weights = ef.min_volatility() 
cleaned_weights = ef.clean_weights() 
print(cleaned_weights) 
ef.portfolio_performance(verbose=True)

As a result we obtain the following optimal weights:

OrderedDict([('AAPL', 0.09437), ('MSFT', -0.04576), ('AMZN', 0.25826), ('GOOGL', 0.22786), ('FB', 0.01725), ('TSLA', -0.02033), ('V', 0.46836)])
Expected annual return: 19.8%
Annual volatility: 23.4%
Sharpe Ratio: 0.76

In the optimal result we must do short selling with stocks of Tesla and Microsoft.
The expected annual return is 19.8%.
The annual volatility is 23.4%.
And last but not least, the Sharpe Ratio is 0.76.

If you want to optimize for minimum variance but you don’t want to do short selling, you only need to set weight_bounds = (0,1). Let’s see an example but in this case let’s optimize it for Sharpe Ratio.

Optimization for Sharpe Ratio: without Short Selling

For maximizing the Sharpe Ratio we must change the command min.volatility for ef.max_sharpe().

Then in Python:

ef = EfficientFrontier(retornos1, df_cov,weight_bounds=(0,1))
weights = ef.max_sharpe() 
cleaned_weights = ef.clean_weights() 
print(cleaned_weights) 
ef.portfolio_performance(verbose=True)

And then we obtain the following results:

OrderedDict([('AAPL', 0.22438), ('MSFT', 0.30354), ('AMZN', 0.06793), ('GOOGL', 0.12507), ('FB', 0.07497), ('TSLA', 0.04913), ('V', 0.15499)])
Expected annual return: 22.0%
Annual volatility: 24.8%
Sharpe Ratio: 0.81

In the optimal solution the expected return is 22.0%, not much better than the minimum variance problem.
The annual volatility is 24.8%.
And the Sharpe Ratio is 0.81.

Integer Programming

As you can only buy integer quantities of stocks you need to find the discrete optimal allocation to invest in each stock. That is solve in python with the following code:

from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices
latest_prices = get_latest_prices(df_activos)
pesos = cleaned_weights 
da = DiscreteAllocation(pesos, latest_prices, total_portfolio_value=10.000)
allocation, leftover = da.lp_portfolio()
print("Quantities of Stock To buy:", allocation)
print("Money leftover: ${:.2f}".format(leftover))

Obtaining the following result:

Discrete allocation: {'AAPL': 636, 'MSFT': 342, 'AMZN': 85, 'TSLA': 15, 'V': 81}
Funds remaining: $1.96

This are the optimal quantities to invest in the portfolio containing these stocks.

The value of Diversification

One of the possible reasons of why the portfolio has a bad Sharpe Ratio and also is unable to drastically reduce the variance is because the assets are extremely correlated between each other.

Let’s see the correlation matrix of the assets:

correlation_mat = df.corr()
plt.figure(figsize=(12.2,4.5))
sns.heatmap(correlation_mat, annot = True)
plt.title('Matriz de Correlación')
plt.xlabel('Activos',fontsize=18)
plt.ylabel('Activos',fontsize=18)
plt.show()

Image by author: Correlation Matrix of the Assets.

With the exception of Tesla, that also happens to have one of the greatest volatilities among these assets, all of the remaining stocks are extremely correlated (closer to 1 it means that the assets are very correlated) with each other.

Having a portfolio of these stocks is not very helpful to diversify the risk. Diversification, commonly described as not having all the eggs in one basket, doesn’t mean to have own the entire S&P 500, it means to invest smartly in the least correlated assets you can find. These, will drastically reduce the risk of the portfolio without hurting the expected return.

Let’s see one last example:

What if we add Bitcoin to our portfolio. Cryptocurrency is one of the most unstable assets you can buy, it has historic annual volatility of more or less a 100%.

Let’s have a look to the correlation matrix first:

It’s clear now that Bitcoin is uncorrelated with every other asset in the portfolio.

So I calculated the expected returns of each assets using the exponentially-weighted mean of (daily) historical returns because bitcoin is uncorrelated with the market so it almost has a Beta equal to cero. So using CAPM would not be optimal in this case.

But the volatilities are the same as the last portfolio so we are going to compare how does our new portfolio behave in comparison with the previous one in terms of volatility considering we have an incredibly unstable asset.

We optimize the portfolio as before not allowing short selling. Obtaining the following:

OrderedDict([('BTC-USD', 0.07027), ('AAPL', 0.07762), ('MSFT', 0.0), ('AMZN', 0.20695), ('GOOGL', 0.17435), ('FB', 0.02476), ('TSLA', 0.0), ('V', 0.44605)])
Expected annual return: 40.7%
Annual volatility: 21.6%
Sharpe Ratio: 1.79

The expected return almost double from last portfolio but we know it’s not comparable because we use different methods to calculate it.
But, the volatility is almost the same as the previous portfolio and we optimal solution tell us to buy a 7% of Bitcoin, an asset that by itself has a volatility of almost 100% a year.
The Sharpe Ratio also almost double to 1.79.

Conclusion

Diversifying risk means much more than having a lot of assets, it means having uncorrelated assets so when one goes down, the other assets in your portfolio don’t go down too.

Choosing your assets correctly can give you tremendous dividends, not only with a higher expected return, but also reducing the risk or at least the same risk level but with a higher expected return.

You can put your eggs in different baskets but if you have to many eggs they can brake too, so it’s better to have a small quantity but very good quality eggs in your basket to reduce the risk of them braking.