Implied Volatility Skew Strategy Mid-Week Update Using Python (2/03/2016)

To see this weekend's prediction click here.

Overall this strategy has been impressive in its trial run over the last 4.5 weeks. I figured, given the volatility and uncertainty in the broad markets this week I'd like to see a mid-week update of the strategy using Python and the BarChart OnDemand API. To see my original article on the basics of using the BarChart OnDemand API click here.

First I import the basic modules needed to execute the script:

from copy import copy
import pandas as pd
import pandas_datareader.data as web
from pandas.tseries.offsets import *

From there I define a couple convenience functions. The first is a one off function for querying the BarChart API for singular symbol names. The other is a bulk function to aggregate the portfolio symbol price data into a HDF5 format for easy querying later on.  Remember the api key is your api key.

# ================================================================== #
def _get_barChart_px(sym):
    start = '20160201'
    freq = 'minutes' 
    api_url = construct_barChart_url(sym, start, freq, api_key=apikey)  
    csvfile = pd.read_csv(api_url, parse_dates=['timestamp'])
    csvfile.set_index('timestamp', inplace=True)
    csvfile.index = csvfile.index.tz_localize('utc').tz_convert('US/Eastern')
    return csvfile
# ================================================================== #
def construct_barChart_url(sym, start_date, freq, api_key=apikey):
    '''Function to construct barchart api url'''    
    url = 'http://marketdata.websol.barchart.com/getHistory.csv?' +\
            'key={}&symbol={}&type={}&startDate={}'.format(api_key, sym, freq, start_date)
    return url
   
# ================================================================== #
def get_minute_data(syms):
    '''Function to retrieve minute data for multiple stocks'''
    print('Running Get Minute Data')
    # This is the required format for datetimes to access the API
    # You could make a function to translate datetime to this format
    start = '20160201'
    #end = d
    freq = 'minutes'    
    symbol_count = len(syms)
    N = copy(symbol_count)
    try:
        for i, sym in enumerate(syms, start=1):
            api_url = construct_barChart_url(sym, start, freq, api_key=apikey)
            try:
                csvfile = pd.read_csv(api_url, parse_dates=['timestamp'])
                csvfile.set_index('timestamp', inplace=True)
                # convert timestamps to EST
                csvfile.index = csvfile.index.tz_localize('utc').tz_convert('US/Eastern')
                symbol_store.put('{}'.format(sym), csvfile, format='table')
            except:
                continue
            N -= 1
            pct_total_left = (N/symbol_count)
            print('{}..[done] | {} of {} symbols collected | percent remaining: {:>.2%}'.format(sym, i, symbol_count, pct_total_left)) 
    except Exception as e: 
        print(e)
    finally:
        pass

Next I run the function for aggregating the portfolio stocks prices. 

longs = ['VO', 'GDX', 'XHB', 'XLB', 'HACK', 'XLY', 'XLP', 'XLU']
shorts = ['ACWI', 'VWO', 'IYJ', 'VB', 'VPU', 'ECH', 'VGK', 'IWB']

today = pd.datetime.today().date()
symbol_store = pd.HDFStore(price_path + 'Implied_Volatility_ETF_Tracking_{}.h5'.format(today.strftime('%m-%d-%y')))

symbols = longs+shorts
get_minute_data(symbols)
symbol_store.close()

After aggregating the data I perform some simple clean up operations along with calculation of the mid-prices for each minute of available data. 

'''grab data from our previously created hdf5 file using keys'''
data = {}
loc = price_path + 'Implied_Volatility_ETF_Tracking_{}.h5'.format(today.strftime('%m-%d-%y'))
with pd.HDFStore(loc, 'r') as DATA:
    for key in DATA.keys():
        data[key] = DATA[key]
dat = pd.Panel.from_dict(data) # convert Python Dict to Pandas Panel        

'''construct minute mid-prices'''
mids = pd.DataFrame()
for symbol in dat:
    mids[symbol] = (dat[symbol]['open'] + dat[symbol]['close']) / 2

'''Remove unnecessary forward slash from default HDF5 key label'''
cols = []
for sym in mids.columns:
    symz = sym.replace('/','')
    cols.append(symz)

mids.columns = cols
mids.head()

Because each ETF did not record a trade for every minute I perform a forward fill of the previous price before calculating the log returns. Then I calculate the cumulative sum of the the returns for both the long and short legs of the portfolio for comparison. 

mids = mids.ffill()
lrets = np.log(mids / mids.shift(1))

crets = lrets.cumsum()
last_cret = crets.ix[-1]

long_rets = pd.Series()
for long in longs:
    long_rets.loc[long] = last_cret.loc[long]
    
short_rets = pd.Series()
for short in shorts:
    short_rets.loc[short] = last_cret.loc[short]

Finally, we arrive at the moment of truth. How has the strategy performed for the first 3 trading sessions of the week?

net_gain = long_rets.mean() - short_rets.mean()
print('long positions mean return: {:.3%}\nshort positions mean return: {:.3%}\ngross strategy pnl: {:.3%}'.format(long_rets.mean(), short_rets.mean(), net_gain))

Not bad at all! How does that compare to the US Major Market Averages? I perform almost the same process for the SPY, QQQ, and DIA ETF's except I use the singular BarChart API function defined above. I also calculate the mid-price for each minute of data. 

'''get price data from BarChart'''
spy = _get_barChart_px('SPY')
qqq = _get_barChart_px('QQQ')
dia = _get_barChart_px('DIA')

'''calculate the midprice'''
spy_mid = (spy['open'] + spy['close']) / 2
qqq_mid = (qqq['open'] + qqq['close']) / 2
dia_mid = (dia['open'] + dia['close']) / 2

'''calculate the returns'''
spyr = np.log(spy_mid / spy_mid.shift(1))
qqqr = np.log(qqq_mid / qqq_mid.shift(1))
diar = np.log(dia_mid / dia_mid.shift(1))

'''calculate the cumulative sum of the returns'''
spyc = spyr.cumsum().ix[-1]
diac = diar.cumsum().ix[-1]
qqqc = qqqr.cumsum().ix[-1]

print('spy returns: {:.2%}\ndia returns: {:.2%}\nqqq returns: {:.2%}\nstrategy gross return: {:.2%}'.format(spyc, diac, qqqc, net_gain))

Wow! Not even close.