# Python for Finance: Portfolio Optimization and the value of Diversifying.

## In this article I will show you how to create and efficient portfolio and the importance of choosing uncorrelated assets

The following topics will be addressed in this post:

- How to get the stocks data from
**Yahoo Finance**directly with Python. **Plotting the price**and**returns**of the stocks.- How to get th
**e logarithmic returns**from the prices. - Calculate the
**annual expected return**with**CAPM**and pyportfolioopt library. - Compute the annual
**Covariance Matrix**, the annual**Variance**of the portfolio, and the annual**Volatility**and the annual**expected return of the portfolio.** **Optimizing the weights**of the portfolio for**Sharpe Ratio**and**Minimal Variance**with pyportfolioopt library.- How to allow
**short selling**in the optimal solution. - How to get
**integer optimal quantities**of stocks to have in the portfolio. - Conclusions and the value of choosing
**uncorrelated assets**.

But before we start with Python a little bit of the theory behind must be addressed. The first thing a person should have clear when investing is the **level of risk they are willing to take,** that’s called the **risk and return trade off**. The risk is a personal choice that each investor must take, that's why I will show you how to optimize your portfolio for **minimum volatility **and also for** ****Sharpe Ratio**. But first, let’s talk a little bit about the **Efficient Frontier.**

# What is the Efficient Frontier?

According to **Markowitz Portfolio Theory** the **Efficient Frontier** are all the portfolio sets that **given a risk level maximizes the expected return** of the investor.

As you can see in the picture above, the portfolios that are in the red line are in what we call the Efficient Frontier. Any portfolio that is **below the red line** for the s**ame amount of risk**, can improve the **expected return** with a different combination of the portfolio weights.

**What is the volatility?**

The volatility measures the **dispersion **of the returns from the mean. Mathematically the dispersion accounts for the **standard deviation** of the returns of the assets. One of the assumptions we need to make when we work with the returns is that they can be **modelled as a ****normal distribution**, where the risk or volatility is the **standard deviation of the data**.

**What is the Sharpe Ratio?**

The **Sharpe Ratio** indicate us the risk of an investment measured by it’s risk. The ratio is the average excess of return above the risk free rate adjusted by the volatility of the asset. The **greater the Sharpe Ratio **the better**. **As you can see in the image of the efficient frontier, the blue dot represents the maximum Sharpe Ratio that you can achieve with all the possible Portfolios.

**What is a good Sharpe Ratio?**

- A Sharpe Ratio
**above 1.0**is considered an**acceptable**Sharpe Ratio. - A Sharpe Ratio
**above 2.0**is considered a**good**Sharpe Ratio - And finally, a Sharpe Ratio
**above 3.0**is considered an**excellent**Sharpe Ratio.

**Let’s go to Python**

First thing first, we need to import the libraries that we will use.

`import pandas as pd`

import numpy as np

from datetime import datetime

import matplotlib.pyplot as plt

import statistics

import seaborn as sns

from pandas_datareader import data

from pulp import *

You may need to **install PuLp**** **and** ****pandas-datereader** library; and to do that you can run the following commands:

`pip install pulp`

pip install pandas-datareader

Now we can start working. First, we need to choose the assets we want to work with:

- For this case, I will choose the
**biggest US tech companies in market capitalization**. Those companies are: Apple (AAPL), Microsoft (MSFT), Amazon (AMZN), Alphabet (GOOGL), Facebook (FB), Tesla (TSLA) and Visa (V). - These companies are listed in the
**NASDAQ (^IXIC)****:**the second largest index in the US. The NASDAQ generally is the place where**the tech companies are listed**. That’s why we are going to use this index as**benchmark or market portfolio**.

## Obtaining the Data

To obtain the data, we set up a list with the names of the **tickers**** **of the stocks we are going to use. In the case of Apple, for example, it appears listed on the NASDAQ as **AAPL.** Also, the ticker name of our benchmark, **the NASDAQ**, is ^IXIC.

`assets = ["AAPL", "MSFT","AMZN","GOOGL","FB","TSLA","V","^IXIC]`

Now that we have the list of the assets, we must define the date from which we are going to obtain the data. For this example, we are going to obtain the data **from 5 years ago until now**.

`initial_date = "2016-04-20"`

today = datetime.today().strftime('%Y-%m-%d')

With this information we are ready to** obtain the data from Yahoo Finance,** but first we must create and empty dataset where the data is going to be stored.

`df_prices = pd.DataFrame()`

Then we use the following Python function that will automatically give us the **“Adj Close”**** **column of the Yahoo Finance data from the date we chose**.**

# We give the function the empty dataframe, the name of the assets, the initial date and the end date:def datosYahoo(dataframe,asset_list,start,finish):

for i in asset_list:

dataframe[i] = data.DataReader(i,data_source='yahoo',start= start , end=finish)["Adj Close"]

return dataframe

df = datosYahoo(df_prices,assets,initial_date,today)

df

Now we are ready to plot the price of the stocks during time.

# Plotting the Stocks Prices

`plt.figure(figsize=(12.2,4.5)) `

for i in df.columns.values:

plt.plot( df[i], label=i)

plt.title('Price of the Stocks')

plt.xlabel('Date',fontsize=18)

plt.ylabel('Price in USD',fontsize=18)

plt.legend(df.columns.values, loc='upper left')

plt.show()

It’s pretty clear in the plot the impact of the pandemic, it can be seen in the gray line that represents the NASDAQ index at the beginning of 2020.

The plot above has one problem, it doesn’t show the **trend of the stocks prices** in time. So we are going to plot the return of the stocks in time. We do this by dividing each row by the initial price.

In the above picture we can clearly see how much these stocks have **returned in time**. It’s impressive to see that the price of Tesla has almost **multiplied by 15 since the initial date**. Wish I had invested in Tesla back in 2017. Congratulations to the ones who did.

# Logarithmic Returns

Even though the Python library pyportfolioopt allows to calculate the expected return only with the prices of the stocks. It’s a good exercise to calculate the return by ourselves.

The Logarithmic Returns are defined by the **following equation**:

This can be done simply in python with two lines of code:

`df = np.log(df).diff()`

df = df.dropna()

df

We **eliminate the missing values** because due to the logarithmic transformation we lose the first data. The resulting dataframe is the following:

## Are in fact the returns normally distributed?

`plt.figure(figsize=(12.2,4.5)) `

for i in df.columns.values:

plt.hist( df[i], label=i, bins = 200)

plt.title('Returns Histogram')

plt.xlabel('Fecha',fontsize=18)

plt.ylabel('Precio en USD',fontsize=18)

plt.legend(df.columns.values)

plt.show()

Effectively the **normality of the returns** holds.

# CAPM: Capital Asset Pricing Model.

One of the requirements to solve the optimization problem that suppose to find the minimum variance portfolio is to have the **expected return**. An easy way to do this is to use the **CAPM model****.**

## Model Formulation

The model tell us that the **expected return of the asset i **

**(E(R**is given by the sum of the

*i*))**risk free rate**

**(Rf)**plus the

**beta**of the asset multiplied by the

**market expected return**

**(E(R**subtracted by the risk free rate.

*m*))## So where do I get the Risk Free Rate?

The **real risk free rate** can be calculated as the **subtraction **between the **Yield to Maturity**** **of a** ****Treasury Bond**** **and the** ****inflation**.

**But of all bonds with different durations which one should I choose?**

- You should choose a
**bond that has the same duration as your investment**. - In this case, as we are trying to calculate the expected return of our stocks one year from now, we are going to use a
**1 year T-bill**. - For this example the annual rate of this bond is 0.07% (04/20/2021).

## Expected Return of the Market Portfolio

In this case, one option is to assume the **historic annual return of the Benchmark **that you will use. The **classic Benchmark is the S&P 500**, but as in this case we are working with tech companies, we will use the **NASDAQ**.

# Back to Python

Now we have everything to use the Python (pyportfolioopt) library that **automatically computes the expected return using CAPM**.

You may need to **install the library** which is achieved by running the following command:

`pip install PyPortfolioOpt`

As the function ask us the **market prices **as inputs** separated **from the returns of our assets we separate them:

`df_assets = df.loc[:, df.columns != '^IXIC']`

df_assets

Now the same with the Benchmark:

`df_benchmark1 = df.loc[:, df.columns == '^IXIC']`

## Computing the Expected Return

As we are working with the returns **we must **set** returns_data = True.**

`retornos1 = expected_returns.capm_return(df_assets, market_prices = df_benchmark1, returns_data= True, risk_free_rate=0.07/100, frequency=252)`

retornos1

From which we obtain:

**Annual Covariance Matrix**

The** ****Covariance Matrix**, is a square matrix that measures the variation between two variables, in this case; returns. The covariance matrix can be easily calculated in Python:

`df_cov = df_activos.cov()*252`

We multiply the returns by 252, because during a year there are **252 business days.**

The resulting dataframe is the following:

# Variance of the Portfolio

First, we need to create an arbitrary portfolio:

- Let’s give all the
**assets the same weights.** - To calculate the weights easily and automatically independent of how many assets there are, I created the following Python function:

# The function give us the equal weights for all assets:def pesosPortafolio(dataframe):

array = []

for i in dataframe.columns:

array.append(1/len(dataframe.columns))

arrayFinal = np.array(array)

return arrayFinal# Pesos are the resulting weightspesos = pesosPortafolio(df_activos)

pesos

As a result, we obtain the following **weights vector**:

`array([0.14285714, 0.14285714, 0.14285714, 0.14285714, 0.14285714,`

0.14285714, 0.14285714])

Now we can calculate the **variance of the portfolio**:

Where, *w *corresponds to the** weights of the portfolio **and** **summation symbol corresponds to the covariance matrix.

#Portfolio Variance:varianza_portafolio = pesos.T @ df_cov @pesos

"The variance of the portfolio is:" + " " + str(round(varianza_portafolio*100,1))+"%"#Result:'The variance of the portfolio is: 5.5%'

Then, the **volatility** is calculated as the** square root of the variance. **That can be calculated using the numpy sqrt function as follows:

# Portfolio Volatility

volatilidad_portafolio = np.sqrt(varianza_portafolio)

"The volatility of the portfolio is:" + " " + str(round(volatilidad_portafolio*100,1))+"%"#Result:

'The volatility of the portfolio is: 23.5%'

## Expected Return of the Portfolio

# Expected return of the portfolio

retorno_portafolio = np.sum(pesos*retornos1)

'The expected annual return of the portfolio is:' + ' ' + str(round(retorno_portafolio*100,3)) + '%'# Result:

'The expected annual return of the portfolio is: 18.723%'

Let’s see if we can find a portfolio that with the same amount or less risk has a better expected return.

# Optimization for Minimum Variance: with Short Selling

Before we start optimizing let’s take a look to what is **Short Selling **with this great explanatory video from the **Wall Street Survivor****.**

To sum up:

**Short selling**is borrowing a quantity of stocks, let’s say 10 stocks at 10 dollars a share, from a broker and selling them immediately.- Then when the
**prices go down**, yo rebuy the stock to repay your broker. Imagine that the stock went down to 5 dollars a share. - Now, to repay your broker, you only need to
**spend 50 dollars to rebuy**the 10 stocks you owe. So now you have a 50 dollars profit.

## Optimization

The problem we are facing is the following:

Then using the pyportfolioopt library we **optimize the portfolio for minimum volatility** allowing short selling:

- That is obtained by indicating the function that it must minimize volatility (
**ef.min_volatility)** - To allow short selling
**(weight_bounds = (-1,1)).**This means that short selling is the equivalent to have**negative weights**in your optimal solution.

`ef = EfficientFrontier(retornos1, df_cov, weight_bounds=(-1,1))`

weights = ef.min_volatility()

cleaned_weights = ef.clean_weights()

print(cleaned_weights)

ef.portfolio_performance(verbose=True)

As a result we obtain the following **optimal weights**:

`OrderedDict([('AAPL', 0.09437), ('MSFT', -0.04576), ('AMZN', 0.25826), ('GOOGL', 0.22786), ('FB', 0.01725), ('TSLA', -0.02033), ('V', 0.46836)])`

Expected annual return: 19.8%

Annual volatility: 23.4%

Sharpe Ratio: 0.76

- In the optimal result we must do
**short selling**with stocks of**Tesla and Microsoft**. - The
**expected annual return**is 19.8%. - The
**annual volatility**is 23.4%. - And last but not least, the
**Sharpe Ratio**is 0.76.

If you want to optimize for minimum variance but you don’t want to do short selling, you only need to set **weight_bounds = (0,1). **Let’s see an example but in this case let’s optimize it for Sharpe Ratio.

# Optimization for Sharpe Ratio: without Short Selling

For **maximizing the Sharpe Ratio** we must change the command min.volatility for **ef.max_sharpe().**

Then in Python:

`ef = EfficientFrontier(retornos1, df_cov,weight_bounds=(0,1))`

weights = ef.max_sharpe()

cleaned_weights = ef.clean_weights()

print(cleaned_weights)

ef.portfolio_performance(verbose=True)

And then we obtain the following results:

`OrderedDict([('AAPL', 0.22438), ('MSFT', 0.30354), ('AMZN', 0.06793), ('GOOGL', 0.12507), ('FB', 0.07497), ('TSLA', 0.04913), ('V', 0.15499)])`

Expected annual return: 22.0%

Annual volatility: 24.8%

Sharpe Ratio: 0.81

- In the
**optimal solution**the**expected return is 22.0%**, not much better than the minimum variance problem. - The
**annual volatility**is 24.8%. - And the
**Sharpe Ratio**is 0.81.

# Integer Programming

As you can only buy **integer quantities of stocks** you need to find the discrete optimal allocation to invest in each stock. That is solve in python with the following code:

`from pypfopt.discrete_allocation import DiscreteAllocation, get_latest_prices`

latest_prices = get_latest_prices(df_activos)

pesos = cleaned_weights

da = DiscreteAllocation(pesos, latest_prices, total_portfolio_value=10.000)

allocation, leftover = da.lp_portfolio()

print("Quantities of Stock To buy:", allocation)

print("Money leftover: ${:.2f}".format(leftover))

Obtaining the following result:

`Discrete allocation: {'AAPL': 636, 'MSFT': 342, 'AMZN': 85, 'TSLA': 15, 'V': 81}`

Funds remaining: $1.96

This are the** optimal quantities** to invest in the portfolio containing these stocks.

# The value of Diversification

One of the possible reasons of why the portfolio has a** bad Sharpe Ratio** and also is unable to **drastically reduce the variance** is because the assets are **extremely correlated **between each other.

Let’s see the **correlation matrix** of the assets:

`correlation_mat = df.corr()`

plt.figure(figsize=(12.2,4.5))

sns.heatmap(correlation_mat, annot = True)

plt.title('Matriz de Correlación')

plt.xlabel('Activos',fontsize=18)

plt.ylabel('Activos',fontsize=18)

plt.show()

With the **exception of Tesla**, that also happens to have one of the greatest volatilities among these assets, all of the **remaining stocks are extremely correlated **(closer to 1 it means that the assets are very correlated) with each other.

Having a portfolio of these stocks is not very helpful to diversify the risk. Diversification, **commonly described as not having all the eggs in one basket**, doesn’t mean to have own the entire S&P 500, it means to invest smartly in the **least correlated assets you can find**. These, will drastically **reduce the risk of the portfolio** without hurting the expected return.

## Let’s see one last example:

What if we add Bitcoin to our portfolio. **Cryptocurrency** is one of the most **unstable** assets you can buy, it has historic annual volatility of more or less a 100%.

Let’s have a look to the correlation matrix first:

It’s clear now that **Bitcoin is uncorrelated** with every other asset in the portfolio.

So I calculated the expected returns of each assets using the **exponentially-weighted mean of (daily) historical returns**** **because bitcoin is uncorrelated with the market so it almost has a **Beta equal to cero**. So using CAPM would not be optimal in this case.

But the **volatilities are the same as the last portfolio** so we are going to compare how does our new portfolio behave in comparison with the previous one in terms of volatility considering we have an incredibly unstable asset.

We optimize the portfolio as before **not allowing short selling**. Obtaining the following:

`OrderedDict([('BTC-USD', 0.07027), ('AAPL', 0.07762), ('MSFT', 0.0), ('AMZN', 0.20695), ('GOOGL', 0.17435), ('FB', 0.02476), ('TSLA', 0.0), ('V', 0.44605)])`

Expected annual return: 40.7%

Annual volatility: 21.6%

Sharpe Ratio: 1.79

- The
**expected return**almost double from last portfolio but we know it’s not comparable because we use**different methods to calculate it**. - But, the volatility is
**almost the same as the previous portfolio**and we optimal solution tell us to buy a 7% of Bitcoin, an asset that by itself has a volatility of almost 100% a year. - The
**Sharpe Ratio**also almost double to 1.79.

# Conclusion

Diversifying risk means **much more than having a lot of assets,** it means having uncorrelated assets so when one goes down, the other assets in your portfolio don’t go down too.

**Choosing your assets correctly** can give you** tremendous dividends,** not only with a **higher expected return**, but also reducing the risk or at least the same risk level but with a higher expected return.

You can put your eggs in different baskets but if you have to **many eggs they can brake too**, so it’s better to have a small quantity but very good quality eggs in your basket to reduce the risk of them braking.