Day Trade SPY options Using XGBoost Predictive Model and Python
Use the first 30 minutes of the trading day (9:30 to 10:00) and use XGBoost to determine whether to buy CALL or PUT contract based on prediction at 10:30.
Photo by Vladislav Babienko on Unsplash
First 30 minutes of the trading day
The first 30 minutes of the trading day is often volatile due to a combination of factors:
- Overnight news affects investor sentiment.
- Market orders at the open create a supply-demand imbalance.
- Lower liquidity can exaggerate price movements.
- Emotional trading from retail investors can cause erratic behavior.
- Institutional strategies may exploit or contribute to the volatility.
- Unfilled orders from the previous session add to the imbalance.
- The opening sets the tone for the rest of the day.
- Reduced information at the start can make the market more unpredictable.
These factors collectively result in heightened price swings and trading volumes during the market’s initial 30 minutes.
Which can be a trading opportunity.
The Approach
The premise is by analyzing the first 30 minutes of the trading day using 30 days historical data, one can glean a pattern of the symbol’s behavior for the rest of the day. I implemented XGBoost predictive model to determine the pattern.
features = 1 minute interval from 9:30 to 10:00 of Open, High,Low,Close, Volume
target = price at 10:30
The features and target are arbitrary. You can adjust the times as you fit.
Import the necessary packages and download
Used yfinance to download. Notice yfinance will only allow 7 days for 1minute interval so a loop is added
import warnings
warnings.filterwarnings("ignore")import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import xgboost as xgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error# Initialize empty DataFrame to hold the data
all_data = pd.DataFrame()
def download_histdata(start_date, end_date):
all_data = pd.DataFrame() # Loop through to get 7-day chunks of data
current_end_date = end_date
while current_end_date > start_date:
current_start_date = current_end_date - timedelta(days=7) print(f"Fetching data from {current_start_date} to {current_end_date}") # Download data for the 7-day period
data = yf.download('SPY', start=current_start_date, end=current_end_date, interval='1m', progress=False) print(f"Fetched {data.shape[0]} data points.") # Concatenate the data to the main DataFrame
all_data = pd.concat([data, all_data], axis=0) # Decrement the end date for the next iteration
current_end_date = current_start_date - timedelta(minutes=1) # One minute before the next start date
# Drop duplicates, if any
all_data = all_data.loc[~all_data.index.duplicated(keep='first')] # Drop 'Adj Close' column
if 'Adj Close' in all_data.columns:
all_data.drop('Adj Close', axis=1, inplace=True) # Round to 2 decimal places
all_data = all_data.round(2) # Remove timezone information
all_data.index = all_data.index.tz_localize(None) # Filter data to keep only 3:00pm to 4:00pm and 9:30am to 10:30am
filtered_data_morning = all_data.between_time('09:30', '10:30')
#filtered_data_afternoon = all_data.between_time('15:00', '16:00')
filtered_data = pd.concat([filtered_data_morning]) print(filtered_data.tail(120))
# Save to CSV
if not filtered_data.empty:
filtered_data.to_csv('SPY_30d_1m.csv')
print("CSV file generated: SPY_30d_1m.csv")
else:
print("DataFrame is empty. No CSV file generated.") return filtered_data
end_date = datetime.now()
start_date = end_date - timedelta(days=24)
all_data = download_histdata(start_date, end_date)Implement XGBoost
Logic is added to determine whether to trade PUT or CALL by comparing last_close_price (10:30) and last_predicted_price:
def implement_xgboost(data):
# Filter the DataFrame to include only the relevant time slots
morning_data = data.between_time('09:30', '10:00')
cut_off_data = data.between_time('10:00', '10:00')
target_data = data.between_time('10:30', '10:30')
# Merge the morning data with the target data at 10:30 based on the date
morning_data['Date'] = morning_data.index.date
target_data['Date'] = target_data.index.date
merged_data = pd.merge(morning_data, target_data[['Date', 'Close']], on='Date', how='inner', suffixes=('_morning', '_target')) # Prepare features and target variable
X = merged_data.drop(columns=['Date', 'Close_target'])
y = merged_data['Close_target']
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize and train the model
model = xgb.XGBRegressor(objective="reg:squarederror", n_estimators=100)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
print(f'cut_off at 10:00 {cut_off_data['Close'].tail(1)}')
print(f'prediction at 10:30 {y_pred[-1]}')
# Calculate RMSE
rmse = np.sqrt(mean_squared_error(y_test, y_pred))
print(f"RMSE: {rmse}") # Determine the option type
last_close_price = cut_off_data['Close'].tail(1).iloc[0]
last_predicted_price = y_pred[-1]
if last_close_price < last_predicted_price:
option_type = 'call'
else:
option_type = 'put'
return option_type if not all_data.empty:
print("Implementing XGBoost...")
option_type = implement_xgboost(all_data.copy()) itm_call, itm_put = get_option_chain('SPY')
if option_type == 'call':
print(f'In the money CALL {itm_call}')
else:
print(f'In the money PUT {itm_put}')Determine the smallest In-the-Money Option
To automate the option selection I have added code to choose the option contract (PUT or CALL) as determined by XGBoost. I also calculated the number of shares based on $10000 capital
def calculate_contracts(capital, option_price):
# Calculate the number of contracts that can be purchased with the given capital
num_contracts = capital // option_price
# Calculate the number of shares that can be controlled with these contracts
num_shares = num_contracts * 100
return num_contracts, num_sharescapital = 1000 # $1000 capital
if option_type == 'call':
option_price = itm_call['lastPrice'] # Price of one option contract
else:
option_price = itm_put['lastPrice'] # Price of one option contract
num_contracts, num_shares = calculate_contracts(capital, option_price)
print(f"Number of contracts that can be purchased: {num_contracts}")There it is. I would love to hear your comments and make sure to follow me. Cheers!!
3





