When working with time-series data it can be important to apply filtering to remove noise. This story shows how to implement a low-pass filter in SQL / BigQuery that can come in handy when improving ML features.
Filtering of time-series data is one of the most useful preprocessing tools in Data Science. In reality, data is almost always a combination of signal and noise where the noise is not only defined by the lack of periodicity but also by not representing the information of interest. For example, imagine daily visitation to a retail store. If you are interested in how seasonal changes impact visitation, you might not be interested in short-term patterns due to weekday changes (there might be an overall higher visitation on Saturdays compared to Mondays, but that is not what you are interested in).
time-series filtering is a cleaning tool for your data
Even though this might look like a small issue in the data, noise or irrelevant information (like the short-term visitation pattern) certainly increases your feature complexity and, thus, impacts your model. If not removing that noise, your model complexity and volume of training data should be adjusted accordingly to avoid overfitting.
This is where filtering comes to the rescue. Similar to how one would filter outliers from a training set or less important metrics from a feature set, time-series filtering removes noise from a time-series feature. To put it short: time-series filtering is a cleaning tool for your data. Applying time-series filtering will restrict your data to reflect only the frequencies (or timely patterns) you are interested in and, thus, results in a cleaner signal that will enhance your subsequent statistical or machine-learning model (see Figure 1 for a synthetic example).
A detailed walkthrough of what a filter is and how it works is beyond the scope of this story (and a very complex topic in general). However, on a high level, filtering can be seen as a modification of an input signal by applying another signal (also called kernel or filter…