Building a Streaming Data Pipeline with Redshift Serverless and Kinesis | by ????Mike Shakhomirov

An End-To-End Tutorial for Beginners

Photo by Sebastian Pandelache on Unsplash

In this article, I will talk about one of the most popular data pipeline design patterns — event streaming. Among other benefits, it enables lightning-fast data analytics and we can create reporting dashboards that update results in real-time. I will demonstrate how it can be achieved by building a streaming data pipeline with AWS Kinesis and Redshift which can be deployed with just a few clicks using infrastructure as code. We will use AWS CloudFormation to describe our data platform architecture and simplify deployment.

Imagine that as a data engineer, you are tasked to create a data pipeline that connects server event streams with a data warehouse solution (Redshift) to transform the data and create an analytics dashboard.

Pipeline Infrastructure. Image by author.

What is a data pipeline?

It is a sequence of data processing steps. Due to logical data flow connections between these stages, each stage generates an output that serves as an input for the following stage.

I previously wrote about it in this article:

For example, event data can be created by a source at the back end, an event stream built with Kinesis Firehose or Kafka stream. It can then feed a number of various consumers or destinations.
Streaming is a “must-have” solution for enterprise data due to its streaming data processing capabilities. It enables real-time data analytics.

In our use-case scenario we can set up an ELT streaming data pipeline to AWS Redshift. AWS Firehose stream can offer this type of seamless integration when streaming data will be uploaded directly into the data warehouse table. Then data can be transformed to create reports with AWS Quicksight as a BI tool for example.

Source link

What's Hot

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

This Machine Learning Paper Transforms Embodied AI Efficiency: New Scaling Laws for Optimizing Model and Dataset Proportions in Behavior Cloning and World Modeling Tasks

Gradient Boosting | Towards Data Science

Building a Streaming Data Pipeline with Redshift Serverless and Kinesis | by ????Mike Shakhomirov | Oct, 2023

Gradient Boosting | Towards Data Science

A Practical Framework for Data Analysis: 6 Essential Principles | by Pararawendy Indarjo | Nov, 2024

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

This Machine Learning Paper Transforms Embodied AI Efficiency: New Scaling Laws for Optimizing Model and Dataset Proportions in Behavior Cloning and World Modeling Tasks

Gradient Boosting | Towards Data Science

The Complete Guide to NetSuite Saved Searches

Our Picks

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

This Machine Learning Paper Transforms Embodied AI Efficiency: New Scaling Laws for Optimizing Model and Dataset Proportions in Behavior Cloning and World Modeling Tasks

Gradient Boosting | Towards Data Science

What's Hot

Building a Streaming Data Pipeline with Redshift Serverless and Kinesis | by ????Mike Shakhomirov | Oct, 2023

An End-To-End Tutorial for Beginners

Related Posts

Leave A Reply Cancel Reply