Hawai'i Median Home Price Forecast with prophet

In this project I'll attempt to forecast Hawai'i Median Home Prices with the prophet library, and explore some intermediate features while doing so. I'll take a look at seasonality, changepoints, growth modes, anomaly omission, and prior scales in order to find a plausibly accurate forecast for Home Price. And while this typically would be fairly straight forward, we'll see that the pandemic has given us some volatility that needs to be accounted for in order to find a nice fitting model. 

You can find this project in my GitHub repo: https://github.com/sam-tritto/hawaii-house-price-forecast

Rainbow on the other side of the Ala Wai canal in Waikiki, HI on the island of O'ahu

 Data

 The data of Median Home Sale Price of a Single Family Residence was downloaded from one of Zillow's housing data sets which you can find here: https://www.zillow.com/research/data/

By inspecting the data, we can see that data is at the city, state level for each row, which also contain the weekly timeseries of as columns. The data goes back to 2008, which will be sufficient to forecast with. I'll subset this to the Honolulu, HI metro area, however there is also Hilo, HI if you'd prefer to see a Big Island trend. 

 Prophet expects data to have the independent time variable as "ds", and the dependent variable of Median Home Sale Price as "y".

Model 1 - Out of the Box Linear Growth

 Before I start tuning the model, I'll first try to see how the data looks and fits on an out of the box model. Since the data is weekly, I'll need to adjust the periods parameter, so 52*5 will mean we have a 5 year forecast. Also the freq parameter should be set to "W", again for weekly. 

 Plotting the forecast we can see that... wait... is the y-axis in Millions? We've got some saving to do. We can also see here that while the housing price was growing fairly steadily something happened around 2020 that caused the Median Home Sale Price to skyrocket. Oh, right. The pandemic. Which caused an influx of remote workers and also retirees moving to the islands to live their best lives.  This increasing trend continued until around 2023, and then began to dip down again. This will be a tricky problem for prophet to figure out as there isn't much volatility in the data for it to learn from prior to this event. There also isn't any correcting trends to be found since the initial up and down (it hasn't gone back up again). So it looks like prophet out of the box will assume that the recent downward trend was an exception to the rule, you can see here the forecast goes way up into the sky. And even if we are somehow able to get prophet to understand that there was a recent dip down, it might assume the downward trend will continue forever, which is obviously not the case. 


 Model 2 - Seasonality, Changepoints, and Holidays

The first thing I'll do is add in the pandemic as a custom holiday. This will help the model understand that there is something special happening.   I'll create a small data frame of informaton that prophet expects for it's holidays. Aside from that, I'll need to know the start dates and stop dates and then add them as a list into the changepoints variable. The lower window is how many days to consider before the start date and the upper window is how many dates to consider after the start date. There are 122 days between the start and stop dates, so I'l use that as my upper window. It might also be worth mentioning that I approximated these dates. 

 Next, I'll change the seasonality parameters. I'll include yearly and weekly seasonality since the data is weekly and set their prior scale to 0.5 which will allow the trend to fit with reasonable flexability. The prior scale parameters are effectively a regularization paramter for the model and the term "prior" refers to them being Bayesian prior distributions. I've also added in a custom yearly seasonality, to be able to control the fourier order and period. And then another custom monthly seasonality. At the end of the day, with this data, the seasonality parameters won't do too much - seasonality isn't the issue , it's the pandemic.

Next, I pass in the changepoints and holidays, giving the holiday a little more flexability in it's prior scale. 

And finally, I'll try to account for the fact that I really don't anticipate the Median Home Price to go to 0 or to 1 Billion for that matter, by setting the growth parameter to logistic which then requires me to set a floor and cap value for the response.

 Similarly, I'll need to set a floor and cap for the future predictions as well. I've chosen values that resemble the min and max of what's in the historical data. 

 You can see the changepoints in the resulting plot as the dashed red vertical lines and the floor and cap as the black dashed horizontal lines.  You can also see the curvature of the logistic growth, which fits more true to the actual growth we've been seeing historically. While I was able to capture the downward trend, it's also not terribly realistic that this will continue to go back down to the floor of 550,000. It will most likely continue to increase as it's been doing for 100 years. So this might be an OK solution for a shorter forecast, here I've used 2 years.

 Model 3 - Omitting Anomalies

If we want to continue to see a longer forecast, but don't want the effects of the anomalous pandemic, then one strategy would be to simply omit that data from our model. We can do this easily with boolean indexing. Ive chosen to leave only the most recent data point to capture the lowest point of the new downward trend. 

 The model is similar to the second one above, but without the pandemic holiday or changepoints.

 We're left with a reasonably accurate forecast that completely ignores the steep increase seen during the pandemic (aside from the most current value), but also captures the less recent logistic growth see pre-pandemic. We know the home prices would continue to grow upward, however by ignoring the pandemic we can more accurately model that growth well out into the future. For comparison the first out of the box model was about 500,000 more than this model 5 years out, which is obviously a lot more savings. 

Get in touch at:       mr.sam.tritto@gmail.com