In this article, I will talk about one of the most popular data pipeline design patterns — event streaming. Among other benefits, it enables lightning-fast data analytics and we can create reporting dashboards that update results in real-time. I will demonstrate how it can be achieved by building a streaming data pipeline with AWS Kinesis and Redshift which can be deployed with just a few clicks using infrastructure as code. We will use AWS CloudFormation to describe our data platform architecture and simplify deployment.
Imagine that as a data engineer, you are tasked to create a data pipeline that connects server event streams with a data warehouse solution (Redshift) to transform the data and create an analytics dashboard.
What is a data pipeline?
It is a sequence of data processing steps. Due to logical data flow connections between these stages, each stage generates an output that serves as an input for the following stage.
I previously wrote about it in this article:
For example, event data can be created by a source at the back end, an event stream built with Kinesis Firehose or Kafka stream. It can then feed a number of various consumers or destinations.
Streaming is a “must-have” solution for enterprise data due to its streaming data processing capabilities. It enables real-time data analytics.
In our use-case scenario we can set up an ELT streaming data pipeline to AWS Redshift. AWS Firehose stream can offer this type of seamless integration when streaming data will be uploaded directly into the data warehouse table. Then data can be transformed to create reports with AWS Quicksight as a BI tool for example.