Scale up your time series forecasting with Prophet
One of the more common data science tasks utilized across many organizations is time series forecasting. Forecasting provides businesses with insights into the future which enables effective business planning, short-term and long-term goal setting, and an ability to anticipate changes in the market, to name a few. Forecasting can help businesses gain a competitive edge by giving them the best chance of making the right decisions.
Despite the many benefits, forecasting can be time-consuming and challenging due to the temporal nature of the observations. There are a multitude of models that can be used, from classical forecasting methods such as Autoregressive integrated moving averages (ARIMA), to more advanced neural networks such as Neural NETwork AutoRegression (NNETAR). Also, each of these methods uses different hyperparameters that are necessary to tune the model and improve the accuracy of the forecasts. Finally, the data requirements for these methods often involve intensive cleaning, scaling, or transformation.
The combination of these things often makes it hard to generate accurate and stable forecasting models.
What is Prophet?
Prophet is a Facebook open source library executed in R and Python for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality and holiday effects (Cran).
Prophet simplifies the forecasting process by doing a lot of the work for you while generating reasonable results right out-of-the-box. However, it does offer experienced forecasters they ability to tune the method as well. Prophet decomposes time series into three main components: trend, seasonality, and holidays:
g(t) is the trend function that models non-periodic changes in the value of the time series
s(t) is periodic changes such as daily, weekly, or yearly seasonality
h(t) is the holiday effect which can be occur on a specific day or range of days
et is the model error, assumed to be normally distributed
Prophet allows you to configure the trend using parameters, add custom seasonality's, and add custom holidays or special events.
Let’s walk through a simple example using the python Prophet library. The data set used in this example is from Kaggle, https://www.kaggle.com/podsyp/time-series-starter-dataset.
Input the data
Prophet’s only data requirements are that the input data set must be a Pandas dataframe with two fields: ds for the date or datetime value, and y for the measure we want to forecast.
The data set provides monthly sales revenue data from January 2015 to April 2020. First, we input the data, rename the appropriate columns, and then split it into a training and test data set.
A quick plot of the raw data is shown in Figure 1. This clearly shows a positive trend and some of the seasonal patterns in the Sales revenue data. Also, no missing data or outliers are present in the data.
If you have missing data, set their values to ‘NA’ and Prophet will generate predictions for them.
If you encounter outliers, it is recommended to remove them because they can negatively impact the uncertainty in the future forecasts.
Figure 1: Plot of raw Sales Revenue data
Fit the model
Now, we fit the model on the training data using some of the seasonality options. Because the data provided is monthly data, we know will not have daily or weekly seasonality components. Thus, those options are set to ‘False’.
To predict the Sales revenue using the model, we use the number of periods in the test data (fc_period) and specify the monthly frequency in make_future_dataframe(). This also includes the predictions for the training data.
Running plot_components() on the predictions displays the components of the model as shown in Figure 2. The trend plot confirms the positive trend we saw in the raw data (Figure 1) and the yearly plot shows strong yearly trends in Sales revenue peaking just before Spring, the start of Summer, and at the start of Winter.
Figure 2: Component Plots
Now to create Figure 3, which will display the plot of predications
Notice we also added the add_changepoints_to_plot() option to the graph. Changepoints are where the time series has an abrupt change in their trajectory. Prophet automatically detects these changes and allows the trend to adapt appropriately, but you can also use them to tune the model if the trend looks like it’s overfitting or underfitting.
The graph displays the following:
· Black dots – Actual values
· Blue solid line – Predictions
· Light blue band – Confidence interval band
· Dotted red line – Changepoint
· Red line – Trend lines before/after change points
Visual review of the test predictions looks good, but additional performance measures help to determine the fit of the model.
Figure 3: Sales Revenue Forecasts
Assess Model Performance
There are several statistics you can use to determine model fit. The two shown here seem to be the most common:
MAE – Mean Absolute Error which is one the same scale as the data and is robust to outliers
MAPE – Mean Absolute Percentage Error which is the absolute error normalized over the actual value, computed for every data point and then averaged.
For this forecast, the value of MAE implies that, on average, the forecast's distance from the true value is $948.05. The value of MAPE implies that, on average, the forecast's distance from the true value is 5.9% of the true value.
What about the Holidays Component?
Our example above didn’t mention the 3rd main component of the Prophet model because they were not included. However, to add them is trivial:
1. Generate a dataframe of the dates of the holidays or special events you think have an impact on your forecasts. In our example above, we noted that Figure 2 showed potential seasonal impact so we created a seasons dataframe - March 1 for beginning of Spring, June 21 for the beginning of Summer, and December 21 for the holiday season.
2. Apply the holiday component to your model using the holiday=<dataframe> option.
Ready to forecast using your time series data?
This simple example shows how Prophet can help you quickly start using forecasting to help your organization make better decisions and plan for the future. Do you want to learn more how to apply these techniques and more with your data? Contact us now at Scalesology. Our team of data scientists are ready to work in collaboration with you to empower your business with insights of how to efficiently run and scale your business.