I needed a data warehouse tool for my new data project recently. This story is about how I built it from scratch and organized everything in it. Designing a data platform is not a trivial task and often modern data warehouse solutions are at the center of its architecture. It provides robust data governance features, simplified data querying using ANSI SQL and enhanced data modelling capabilities. Organising everything inside, i.e. data environments, tests, naming conventions, databases, schemas and tables might be a challenging task due to the high number of data sources and complexity of required transformations. This story might be useful for beginner and intermediate-level users who would like to learn advanced data warehousing techniques. With seasoned data practitioners, I would like to discuss what they think about data warehouse design and how they would typically organize everything inside.
Designing a data platform
As a data engineer, I design data pipelines every day. This is what modern data platform consists of and it must be cost-effective, scalable and easy to maintain in the long run. Designing pipelines for data-intensive applications is always challenging and a modern data warehouse (DWH) aims to simplify and enhance this process providing easy access to data, better data governance capabilities and easy-to-maintain data transformations required for analytics and business intelligence.
It always makes sense to use a DWH in our data platform when users would like to access and explore data themselves and there is a business requirement for reporting. Modern data warehouses simplify data access and data governance and I believe this is an integral part of any modern data platform. I previously raised this discussion here [1]:
I chose to use the data lake as a permanent landing area and to stage data before I actually load it into the data warehouse. Cloud service providers offer cloud storage…