About
Lab 1 : Python Libraries for time series
Importing the Libraries
necessary libraries - numpy,pandas,matplotlib,statsmodels,seaborn
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot,autocorrelation_plot
import seaborn as sns
from statsmodels.tsa.stattools import adfuller,kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
from statsmodels.tsa.arima.model import ARIMA
Reading the dataset
df = pd.read_csv('Microsoft_Stock.csv')
df
df = pd.read_csv('Microsoft_stock.csv', parse_dates=['Date'], index_col='Date') # sets date as the index in date time format by parsing the date column
df = df.sort_index()
df
Lab 2 : Feature Engineering Time Series Data
print(df.info())
print(df.describe())
print(df.corr())
7 day moving average
df['7_day_MA'] = df['Close'].rolling(window=7).mean()
df
4 day moving average
df['4_day_MA'] = df['Close'].rolling(window=4).mean()
df
5 day Centered moving average
df['5_day_CMA'] = df['Close'].rolling(window = 5,center=True).mean()
df
df['Year'] = df.index.year
df['Month'] = df.index.month
df['Day'] = df.index.day
df
df['fluctuation'] = df['High'] - df['Low']
df
df['change'] = df['Close'].pct_change()
df
Lab 3 : Visualizing the time series
plt.figure(figsize=(10,8))
sns.heatmap(df.corr(),annot = True,cmap = 'coolwarm')
plt.title('Correlation heatmap')
plt.plot(df['Close'])
plt.title('Plot for the closing stock values')
plt.show()
Pandas has a built-in function for exactly this called the lag plot. It plots the observation at time t on the x-axis and the observation at the next time step (t+1) on the y-axis. -> If the points cluster along a diagonal line from the bottom-left to the top-right of the plot, it suggests a positive correlation relationship. -> If the points cluster along a diagonal line from the top-left to the bottom-right, it suggests a negative correlation relationship. -> Either relationship is good as they can be modeled. More points tighter in to the diagonal line suggests a stronger relationship and more spread from the line suggests a weaker relationship. A ball in the middle or a spread across the plot suggests a weak or no relationship.
lag_plot(df['Close'])
plt.show()
lag_plot(df['Volume'])
autocorrelation_plot(df['Close'])
plt.show()
Lab 4 : Resampling and Interpolation
Resampling changes the frequency of data, such as from daily to monthly, while interpolation fills missing data points.
upsampled_df = df.resample('D').mean()
upsampled_df
upsampled_df = upsampled_df.interpolate(method='linear')
upsampled_df
upsampled_df = df.resample('D').mean()
upsampled_df
downsampled_df = df.resample('M').mean()
downsampled_df
downsampled_df = df.resample('M').mean()
# Resample to monthly frequency, taking the average for each month
monthly_df = df['Close'].resample('M').mean()
# Plot the monthly data
plt.figure(figsize=(12, 6))
plt.plot(monthly_df, label='Monthly Resampled Close Price')
plt.title('Monthly Resampled Microsoft Stock Price')
plt.xlabel('Date')
plt.ylabel('Average Monthly Close Price')
plt.legend()
plt.show()
Lab 5 : Explore different power-based transforms for time series forecasting
df['Log_Close'] = np.log(df['Close'])
plt.plot(df['Close'])
plt.show()
plt.plot(df['Log_Close'])
plt.show()
# Plot original and log-transformed data
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Original Close')
plt.plot(df['Log_Close'], label='Log Transformed Close')
plt.title('Original and Log-Transformed Close Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()
Lab 6 : Moving average smoothing for time series forecasting
df
df['30_day_MA'] = df['Close'].rolling(window = 30).mean()
df
plt.plot(df['30_day_MA'])
plt.show()
plt.figure(figsize=(12,6))
plt.plot(df['Close'],label = 'Close')
plt.plot(df['30_day_MA'],label = '30 day moving average')
plt.legend()
plt.show()
Lab 7 : Identification of White noise in time series
White noise has constant mean, variance, and no autocorrelation. You can visually check or use the autocorrelation function (ACF) plot.
len(df)
1511
white_noise = np.random.normal(size = len(df))
plt.figure(figsize=(12,6))
plt.plot(white_noise,label = 'White Noise')
plt.title('White Noise time series')
plt.show()
plot_acf(white_noise)
plt.show()
Lab 8 : Identification of white random walk time series
A random walk is a type of time series where each value is the sum of the previous value and a random step. It is a non-stationary process, meaning its statistical properties like mean and variance change over time.
random_walk = np.cumsum(np.random.normal(size = len(df)))
plt.figure(figsize=(12, 6))
plt.plot(random_walk, label='Random Walk')
plt.title('Random Walk Time Series')
plt.legend()
plt.show()
plot_acf(random_walk,lags=30)
plt.show()
Lab 9 : Decompose Time Series Data
decomposition = seasonal_decompose(df['Close'],model = 'additive',period = 30)
decomposition.plot()
Lab 10 : Use and Remove Trends
decomposition.trend
df['Detrended_Close'] = df['Close'].diff()
plt.figure(figsize=(12, 6))
plt.plot(df['Detrended_Close'], label='Detrended Close')
plt.title('Detrended Time Series')
plt.xlabel('Date')
plt.ylabel('Detrended Close')
plt.legend()
plt.show()
Lab 11 : Use and Remove Seasonality
plt.plot(df['Close'])
# Perform seasonal differencing
df['Seasonally_Differenced_Close'] = df['Close'].diff(periods=30)
# Plot the seasonally differenced time series
plt.figure(figsize=(12, 6))
plt.plot(df['Seasonally_Differenced_Close'], label='Seasonally Differenced Close')
plt.title('Seasonally Differenced Time Series')
plt.xlabel('Date')
plt.ylabel('Seasonally Differenced Close')
plt.legend()
plt.show()
Lab 12 : Stationarity in Time Series Data
result = adfuller(df['Close'],autolag='AIC')
print('ADF stat : ',result[0])
print('p - val : ',result[1])
ADF stat : 1.7371362899271037
p - val : 0.9982158366942122
if result[1] < 0.05:
print("Stationary")
else:
print("Non - stationary")
res = adfuller(df['Detrended_Close'].dropna(),autolag='AIC')
print(res[0]," ",res[1])
res_kpss = kpss(df['Close'],regression='c')
print(res_kpss[0]," ",res_kpss[1])
res_kpss_detrended = kpss(df['Detrended_Close'].dropna(),regression='c')
print(res_kpss[0]," ",res_kpss[1])
if res_kpss[1] < 0.05:
print("The series is non-stationary")
else:
print("The series is stationary")
Lab 13,14,15 : Moving Average,ARIMA Models for Forecasting with residual plot
p: The order of the Auto-Regressive (AR) part. q: The order of the Moving Average (MA) part. d: The differencing order to make the series stationary (determined separately).
ts = df['Close']
plt.plot(ts)
adfuller(ts,autolag='AIC')[1]
ts = ts.diff().dropna()
plt.plot(ts)
adfuller(ts,autolag='AIC')
this proves that we need order of differencing as 1 d = 1
# Moving average model for Forecasting
ma_model = ARIMA(ts, order=(0, 0, 3))
ma_result = ma_model.fit()
print(ma_result.summary())
plot_acf(ts)
plt.show()
here in the autocorrelation plot since the lag drops to near 0 in lag 3 q = 3
plot_pacf(ts)
plt.show()
sice the partial auto-correlation plot goes down to near 0 in lag 2 p = 2
model = ARIMA(ts,order = (2,0,3))
model_ARIMA = model.fit()
print(model_ARIMA.summary())
Residual plots
residuals_arima = model_ARIMA.resid
residuals_arima
plt.plot(residuals_arima)
plt.show()
residuals_ma = ma_result.resid
residuals_ma
plt.plot(residuals_ma)
plt.show()
# Plot residuals
plt.figure(figsize=(12, 6))
plt.plot(residuals_arima, label='Residuals')
plt.axhline(0, color='red', linestyle='--')
plt.title('ARIMA Model Residuals')
plt.legend()
plt.show()
# Plot ACF of residuals
plot_acf(residuals_arima, lags=20)
plt.title('ACF of Residuals')
plt.show()
If the residuals are randomly distributed and the ACF shows no significant correlation, the model is likely a good fit.
print(residuals_arima.describe())
If the mean of residuals is close to zero and the residuals appear normally distributed, the model likely captures the underlying pattern in the data well.
ma_result.mse
5.119954877729448
model_ARIMA.mse
5.065261829993087