PSTAT 174 Class Project: Time Series Analysis of Housing Starts

The decline of the United States housing market in 2007 had a pervasive negative effect on the United States economy. Housing prices peaked in 2006 and started declining in 2007. The cause of the housing market is believed to have been caused by several factors including relaxed standards for mortgages rates and deregulation. Increased foreclosure rates resulted in a credit crisis which is believed to be the primary cause pf the 2007-2009 recession. During the recession, housing demand fell and construction for new housing declined rapidly.

This is why I decided to do a time series analysis of United States housing starts. Because housing starts tend to be a good economic indicator of how well the economy is doing and how well it going to do. If the housing market is doing well then it results in economic growth. If there is sustained decline in the housing market than the economic growth will tend to decrease as well. By building a time series model, we can forecast whether the economic conditions are going to improve to decline.

For this project I used data released by the Federal Reserve Bank of St. Louis to do my time series analysis. The first thing I did was I used the Quandl package to import the data into R and then plotted the time series.

library("astsa", lib.loc="~/R/win-library/3.1")
library("Defaults", lib.loc="~/R/win-library/3.1")
library("forecast", lib.loc="~/R/win-library/3.1")
library("fracdiff", lib.loc="~/R/win-library/3.1")
library("plotrix", lib.loc="~/R/win-library/3.1")
library("tseries", lib.loc="~/R/win-library/3.1")
library("Quandl", lib.loc="~/R/win-library/3.1")

housing=Quandl("FRED/HOUST", trim_start="1979-12-31", trim_end="2014-04-30", collapse="monthly", type="ts")
housingts=ts(housing, frequency = 12, start = c(1979,12))



From the graph, we can see housing starts declining around 2007 because of the subprime mortgage crisis.

acf2(ts(housingts, frequency=1))

> adf.test(housingts)

Augmented Dickey-Fuller Test

data: housingts
Dickey-Fuller = -2.1506, Lag order = 7, p-value = 0.5138
alternative hypothesis: stationary


From the time series graph, we can conclude that the series can conclude that the series is not stationary. There is a slow decay in the acf which indicates that there is strong correlation in the data and that the data is not stationary. The large p-value from the adf test future supports our hypothesis that the data is not stationary.

The next step is to make the data stationary.

In order to make the data stationary, we can take a lagged difference of the data and then look at the acf again.


> acf2(housingdiff)


The acf plot of the differenced data no longer has a slow decay and the Dickey-Fuller test also supports the hypothesis that the data is stationary because of the small p-value. There are a few significant lag and that is due to the seasonal nature of the data.


> adf.test(housingdiff)

Augmented Dickey-Fuller Test

data: housingdiff
Dickey-Fuller = -6.6809, Lag order = 7, p-value = 0.01
alternative hypothesis: stationary

Warning message:
In adf.test(housingdiff) : p-value smaller than printed p-value

Now that the data set is stationary we can now start working with it by looking at the ACF and PACF lags to get the values. We want to be conservative with our estimates so we begin by trying ARIMA(2,1,3)(2,1,3) model.

> fit213213<-arima(housingts, order = c(2, 1, 3),seasonal = list(order = c(2, 1, 3), period = 12)) 
> tsdisplay(residuals(fit213213))
|> fit213213$aic
[1] 4851.315


We can also use the auto.arima() function to see which values R gets.



> fit.start=auto.arima(housingts)
> fit.start

Series: housingts

ar1 ar2 ma1 ma2 ma3 sar1 sar2
1.2874 -0.3256 -1.6559 0.8715 -0.1725 -0.1352 -0.2074
s.e. 0.5709 0.5348 0.5665 0.7266 0.2052 0.0520 0.0517

sigma^2 estimated as 9231: log likelihood=-2466.11
AIC=4948.21 AICc=4948.57 BIC=4980.38

We can try other models to see if we can do better.
> fit112200&amp;amp;amp;amp;amp;lt;-arima(housingts, order = c(1, 1, 2),seasonal = list(order = c(2, 0, 0), period = 12)) 
> fit112200$aic
[1] 4945.036


Since this has a lower AIC value, then it tells us that this model is the better model to use.





Looking at the plot, there seems to be significant lag at 13 and 26. Considering that we have 36 observations and there is 0.05 percent chance for the observation to outside the confidence interval we can expect 36*0.05 = 1.8 or 2 lags to be outside the confidence interval. Since most of the values fall within the confidence interval then the values in the graph are sufficiently close to being white noise.

We can also look at the tsdiag.

> tsdiag(fit112200)

> summary(fit112200)
Series: housingts

ar1 ma1 ma2 sar1 sar2
0.9308 -1.2837 0.362 -0.1427 -0.2054
s.e. 0.0601 0.0720 0.045 0.0517 0.0517

sigma^2 estimated as 9250: log likelihood=-2466.52
AIC=4945.04 AICc=4945.24 BIC=4969.16


From the summary, we can see that the coefficient are significant because 0 or 1 do not fall in their interval.

Now that we have a seasonal arima model can now use it for forecasting.

> plot(forecast(fit112200, h=10))
(forecast(fit112200, h=4))
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
May 2014 1034.663 911.4092 1157.916 846.1627 1223.163
Jun 2014 1043.814 897.0021 1190.625 819.2848 1268.342
Jul 2014 1044.469 875.3739 1213.563 785.8605 1303.077
Aug 2014 1049.710 859.2006 1240.218 758.3513 1341.068


The housing data tells us that we can expect the new housing starts to increase and then decrease and increase. Since the units are in thousands, the small changes the observations makes a big difference. Because of the significant lags in the acf and pacf plot then our model will only work for future observations that are not too far into the future.

These observations makes sense because the housing market recently has had a slow upward trend. From this, we can expect the economy to slowly recover over the coming month and for the financial conditions to improve. Since if the economy is improving then consumers have more money and are creating demand for housing. In order to meet the demand, new housing is built. This assumption is backed by the following graph which shows that during a recession the new housing declines and when economic conditions improve housing starts start increasing.


plot(forecast(fit112200, h=4))
rect(xleft=2006, xright=2009, ybottom=0, ytop=7000, col="#123456A0")
rect(xleft=1990, xright=1991, ybottom=0, ytop=7000, col="#123456A0")
rect(xleft=1981, xright=1982, ybottom=0, ytop=7000, col="#123456A0")

Since the end of the 2006-2009 recession, there has been a steady increase in new housing. And our model supports that steady growth.