User CB_EN_U4CSE21429

About

Lab 1 : Python Libraries for time series

Importing the Libraries

necessary libraries - numpy,pandas,matplotlib,statsmodels,seaborn

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pandas.plotting import lag_plot,autocorrelation_plot
import seaborn as sns
from statsmodels.tsa.stattools import adfuller,kpss
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf,plot_pacf
from statsmodels.tsa.arima.model import ARIMA

Reading the dataset

df = pd.read_csv('Microsoft_Stock.csv')
df


df = pd.read_csv('Microsoft_stock.csv', parse_dates=['Date'], index_col='Date') # sets date as the index in date time format by parsing the date column
df = df.sort_index()
df

Lab 2 : Feature Engineering Time Series Data

print(df.info())



print(df.describe())



print(df.corr())

7 day moving average

df['7_day_MA'] = df['Close'].rolling(window=7).mean()
df

4 day moving average

df['4_day_MA'] = df['Close'].rolling(window=4).mean()

df

5 day Centered moving average

df['5_day_CMA'] = df['Close'].rolling(window = 5,center=True).mean()
df


df['Year'] = df.index.year
df['Month'] = df.index.month
df['Day'] = df.index.day

df

df['fluctuation'] = df['High'] - df['Low']
df

df['change'] = df['Close'].pct_change()
df

Lab 3 : Visualizing the time series

plt.figure(figsize=(10,8))
sns.heatmap(df.corr(),annot = True,cmap = 'coolwarm')
plt.title('Correlation heatmap')



plt.plot(df['Close'])
plt.title('Plot for the closing stock values')
plt.show()

Pandas has a built-in function for exactly this called the lag plot. It plots the observation at time t on the x-axis and the observation at the next time step (t+1) on the y-axis. -> If the points cluster along a diagonal line from the bottom-left to the top-right of the plot, it suggests a positive correlation relationship. -> If the points cluster along a diagonal line from the top-left to the bottom-right, it suggests a negative correlation relationship. -> Either relationship is good as they can be modeled. More points tighter in to the diagonal line suggests a stronger relationship and more spread from the line suggests a weaker relationship. A ball in the middle or a spread across the plot suggests a weak or no relationship.

lag_plot(df['Close'])
plt.show()


lag_plot(df['Volume'])


autocorrelation_plot(df['Close'])
plt.show()

Lab 4 : Resampling and Interpolation

Resampling changes the frequency of data, such as from daily to monthly, while interpolation fills missing data points.

upsampled_df = df.resample('D').mean()
upsampled_df

upsampled_df = upsampled_df.interpolate(method='linear')
upsampled_df


upsampled_df = df.resample('D').mean()
upsampled_df


downsampled_df = df.resample('M').mean()
downsampled_df

downsampled_df = df.resample('M').mean()



# Resample to monthly frequency, taking the average for each month
monthly_df = df['Close'].resample('M').mean()

# Plot the monthly data
plt.figure(figsize=(12, 6))
plt.plot(monthly_df, label='Monthly Resampled Close Price')
plt.title('Monthly Resampled Microsoft Stock Price')
plt.xlabel('Date')
plt.ylabel('Average Monthly Close Price')
plt.legend()
plt.show()

Lab 5 : Explore different power-based transforms for time series forecasting

df['Log_Close'] = np.log(df['Close'])

plt.plot(df['Close'])
plt.show()



plt.plot(df['Log_Close'])
plt.show()



# Plot original and log-transformed data
plt.figure(figsize=(12, 6))
plt.plot(df['Close'], label='Original Close')
plt.plot(df['Log_Close'], label='Log Transformed Close')
plt.title('Original and Log-Transformed Close Price')
plt.xlabel('Date')
plt.ylabel('Price')
plt.legend()
plt.show()

Lab 6 : Moving average smoothing for time series forecasting

df

df['30_day_MA'] = df['Close'].rolling(window = 30).mean()
df


plt.plot(df['30_day_MA'])
plt.show()



plt.figure(figsize=(12,6))
plt.plot(df['Close'],label = 'Close')
plt.plot(df['30_day_MA'],label = '30 day moving average')
plt.legend()
plt.show()

Lab 7 : Identification of White noise in time series

White noise has constant mean, variance, and no autocorrelation. You can visually check or use the autocorrelation function (ACF) plot.

len(df)

1511

white_noise = np.random.normal(size = len(df))
plt.figure(figsize=(12,6))
plt.plot(white_noise,label = 'White Noise')
plt.title('White Noise time series')
plt.show()



plot_acf(white_noise)
plt.show()

Lab 8 : Identification of white random walk time series

A random walk is a type of time series where each value is the sum of the previous value and a random step. It is a non-stationary process, meaning its statistical properties like mean and variance change over time.

random_walk = np.cumsum(np.random.normal(size = len(df)))

plt.figure(figsize=(12, 6))
plt.plot(random_walk, label='Random Walk')
plt.title('Random Walk Time Series')
plt.legend()
plt.show()


plot_acf(random_walk,lags=30)
plt.show()

Lab 9 : Decompose Time Series Data

decomposition = seasonal_decompose(df['Close'],model = 'additive',period = 30)
decomposition.plot()

Lab 10 : Use and Remove Trends

decomposition.trend


df['Detrended_Close'] = df['Close'].diff()

plt.figure(figsize=(12, 6))
plt.plot(df['Detrended_Close'], label='Detrended Close')
plt.title('Detrended Time Series')
plt.xlabel('Date')
plt.ylabel('Detrended Close')
plt.legend()
plt.show()

Lab 11 : Use and Remove Seasonality

plt.plot(df['Close'])


# Perform seasonal differencing
df['Seasonally_Differenced_Close'] = df['Close'].diff(periods=30)

# Plot the seasonally differenced time series
plt.figure(figsize=(12, 6))
plt.plot(df['Seasonally_Differenced_Close'], label='Seasonally Differenced Close')
plt.title('Seasonally Differenced Time Series')
plt.xlabel('Date')
plt.ylabel('Seasonally Differenced Close')
plt.legend()
plt.show()

Lab 12 : Stationarity in Time Series Data

result = adfuller(df['Close'],autolag='AIC')
print('ADF stat : ',result[0])
print('p - val : ',result[1])

ADF stat :  1.7371362899271037
p - val :  0.9982158366942122

if result[1] < 0.05:
    print("Stationary")
else:
    print("Non - stationary")



res = adfuller(df['Detrended_Close'].dropna(),autolag='AIC')
print(res[0]," ",res[1])


res_kpss = kpss(df['Close'],regression='c')
print(res_kpss[0]," ",res_kpss[1])


res_kpss_detrended = kpss(df['Detrended_Close'].dropna(),regression='c')
print(res_kpss[0]," ",res_kpss[1])



if res_kpss[1] < 0.05:
    print("The series is non-stationary")
else:
    print("The series is stationary")

Lab 13,14,15 : Moving Average,ARIMA Models for Forecasting with residual plot

p: The order of the Auto-Regressive (AR) part. q: The order of the Moving Average (MA) part. d: The differencing order to make the series stationary (determined separately).

ts = df['Close']

plt.plot(ts)


adfuller(ts,autolag='AIC')[1]


ts = ts.diff().dropna()
plt.plot(ts)


adfuller(ts,autolag='AIC')

this proves that we need order of differencing as 1 d = 1

# Moving average model for Forecasting
ma_model = ARIMA(ts, order=(0, 0, 3))
ma_result = ma_model.fit()
print(ma_result.summary())


plot_acf(ts)
plt.show()

here in the autocorrelation plot since the lag drops to near 0 in lag 3 q = 3

plot_pacf(ts)
plt.show()

sice the partial auto-correlation plot goes down to near 0 in lag 2 p = 2

model = ARIMA(ts,order = (2,0,3))
model_ARIMA = model.fit()


print(model_ARIMA.summary())

Residual plots

residuals_arima = model_ARIMA.resid
residuals_arima

plt.plot(residuals_arima)
plt.show()


residuals_ma = ma_result.resid
residuals_ma


plt.plot(residuals_ma)
plt.show()


# Plot residuals
plt.figure(figsize=(12, 6))
plt.plot(residuals_arima, label='Residuals')
plt.axhline(0, color='red', linestyle='--')
plt.title('ARIMA Model Residuals')
plt.legend()
plt.show()

# Plot ACF of residuals
plot_acf(residuals_arima, lags=20)
plt.title('ACF of Residuals')
plt.show()

If the residuals are randomly distributed and the ACF shows no significant correlation, the model is likely a good fit.

print(residuals_arima.describe())

If the mean of residuals is close to zero and the residuals appear normally distributed, the model likely captures the underlying pattern in the data well.

ma_result.mse

5.119954877729448

model_ARIMA.mse

5.065261829993087