Get Free Financial Data w/ Python (State street ETF Holdings - SPY)

One issue I frequently encounter during my research is the need to compare an individual stock, or collection of stocks vs its ETF benchmark. To do this I need accurate ETF holdings data. 

Generally this information is located on the ETF provider's website. However,  this information is often inconvenient to access. Most websites including the ETF provider will do something like the following, where they only show the top 10 holdings, when what we really need is accessible only by clicking the highlighted download link.

SPY ETF Holdings Page

This isn't a major issue until you need to access multiple ETF holdings pages. State Street Global Advisors is the ETF provider and this is the structure they use most frequently, therefore I figured it would be a major time saver to write a script to automate this important yet redundant task. 

This code requires the following third-party modules to execute: 

  • Selenium
  • Google Chromedriver (allows Python to open Chrome browser)

Before we get to the code, you must have Chromedriver downloaded and unzipped. Make sure to grab the filepath as we will need it. 


# ----- import modules -----
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import os
import time
from pprint import pprint as pp

Next you will need to grab the correct xpath's from the webpages of interest. I use xpath in this situation because the python script was able to find the correct clickable links every time without issue. 


"""
When you first open the State Street Website you will need to navigate to the 'Holdings' tab and then to the .xls
"""

# ----- webpage xpath -----
holdings_xpath = r"//*[@id='tabs']/a[3]"
xls_xpath = r"//*[@id='FUND_TOP_HOLDINGS']/a"

Next you need to construct a reusable generalized url string which can be used for any of the State Street ETF's. In this example we will be using SPY only.  Additionally I recommend creating a generalized filepath string for the actual downloaded file. This is so we can confirm that the download has completed correctly before exiting the browser in a later step. 


# ----- generalized URL string -----
symbol = 'SPY'
url = r"https://www.spdrs.com/product/fund.seam?ticker={}".format(symbol)
'''
The default naming convention for the holdings file is 'holdings-spy.xls' where the ETF label is lowercase
'''
file_string = my_etf_data_dir + 'holdings-{}.xls'.format(symbol.lower())

Now it's time to setup our chromedriver preferences via the 'ChromeOptions' method. You must define a default directory for this to work properly. During this step I also define my chromedriver filepath for convenience. 


# ----- Chromedriver options/preferences -----
chromeOptions = webdriver.ChromeOptions()
prefs = {'download.default_directory':insert_my_default_dir}
chromeOptions.add_experimental_option('prefs', prefs)
chromedriver_path = insert_my_chromedriver_filepath

Now for the 'money' code. In this step we will instantiate the webdriver (fancy word for automated browser), tell it to navigate to our previously defined URL, tell it to wait until the 'Holdings' tab is visible, click the tab link, then wait again until the 'Download All Holdings .xls' link is visible, click it, confirm the file has downloaded and finally exit the browser. 


"""
I often use prettyprint functions to tell me what's happening with the code, feel free to delete them if you like they are not required.
"""
pp('{} running holdings download..[start]'.format(symbol)) 
driver = webdriver.Chrome(executable_path=chromedriver_path, chrome_options=chromeOptions)
driver.set_page_load_timeout(90) # avoid hanging browser
try:
	driver.get(url)
    holdings_element = WebDriverWait(driver, 30) \
		.until(EC.presence_of_element_located((By.XPATH, holdings_xpath)))
    holdings_element.click()
    csv_element = WebDriverWait(driver, 30) \
		.until(EC.presence_of_element_located((By.XPATH, csv_xpath)))
    csv_element.click() # start download
    # the code below checks the file exists before exiting the browser
    for i in range(1,10,2):
    	time.sleep(i/20)
        if os.path.isfile(file_string)==True:
        	break
except Exception as e:
	print(e)
finally:
	driver.quit()
    pp('{} running holdings download..[complete]'.format(symbol))            

That's it. Now you should have the SPY holdings .xls file on your local hard drive.  If you want to get fancy you can throw this code into a function or class structure like I have. This allows you to run the code in a loop if, for example, you have 10 different State Street ETFs whose holdings data you need.