# How we optimized finding similar assets for tax-loss harvesting

**With this post, we’re starting a new section – FinTech Product Story. Here technology leaders describe their product path from day one to success. They share the challenges they encountered and what they did to overcome them, share technology hacks and solutions. **

**Originally published at the AgentRisk’s blog.**

Automated tax-loss harvesting is one of the core features of both of our **AgentRisk Wealth** and **AgentRisk Lite** products for individual investors, and of our **AgentRisk Overlay** product for financial advisors. Finding similar assets to replace an asset in tax-loss harvesting is the most computationally intensive part of the process. In this post, we share how we optimized this computation, as well as how this can be further improved.

# What is tax-loss harvesting and how does it work?

Tax-loss harvesting is the practice of selling an asset that is at a loss and realizing this loss to help minimize your tax bill. It goes without saying that **nothing we cover in this post should be viewed as tax advice** and you should definitely talk to your tax accountant before applying any of this information to your portfolio.

Selling an asset at a loss is half the battle in tax-loss harvesting. The second half is finding a similar-enough asset to replace the asset you sold. If you don’t do this, your portfolio allocation is skewed. You also can’t just sell and buy the same asset back-to-back — that would be cheating, wouldn’t it? In fact, the tax code categorizes this as a “wash sale”. This is why you need to find a similar enough asset and use the proceeds from selling the original asset to buy it.

# Tax-loss harvesting in practice

This may sound a bit confusing, so let’s give a simple example.

Let’s assume that sometime in January 2020 you bought 100 shares of SPY (a very popular ETF that tracks the S&P 500) at $330 per share. This asset now is at $280 per share, meaning that you have $5000 total loss. If you want to realize this loss and collect the tax credit, you sell all 100 shares of SPY, with total proceeds of $28,000.

Now, you’ll want to replace SPY with a “similar enough” asset so that your portfolio stays practically the same. We’ll describe the details on how to do this below, but for now, let’s assume that VOO (another S&P 500 ETF) is the asset we’ll replace SPY with. At a price of $265 per share, our proceeds will buy 105 shares of VOO. Easy-peasy.

# How to find similar assets

At a high level, when looking for an asset to replace SPY with, we want to find an asset whose performance closely tracks that of SPY. The algorithm for this can be summarized in the following python-like pseudocode:

`similar_assets = [] # calculate daily returns for SPY daily_ret_spy = get_daily_returns("SPY") for asset of asset_universe: # calculate daily returns for asset daily_ret_asset = get_daily_returns(asset) # calculate correlation coefficient of the two correl = calc_correlation(daily_ret_spy, daily_ret_asset) similar_assets.push({'asset': asset, 'correl': correl}) # return results in descending order of correlation return similar_assets.sort_by('correl')`

Some initial observations and explanations on the pseudocode above:

- The calculation is linear in the size of asset_universe. This potentially includes all symbols from NYSE, NASDAQ, and ASE, as well as all mutual funds, which in total is in excess of 32K symbols.
- Each iteration is independent, which means this loop has high potential for speedup when run in parallel.
- Each iteration requires us to calculate the daily returns for each symbol we consider. This calculation is based on historical price data for that symbol.
- The calculation of the correlation coefficient is the heavy hitter of the loop and dependent on the number of items of the daily returns arrays.