Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman

Clustering: A simple way to group similar rows and prevent unnecessary data processing

In my previous article, I explained how to optimise SQL queries using partitioning:

Now, I’m writing the sequel! (Dad joke, anyone?)

This article will look at clustering: another powerful optimisation technique you can use in BigQuery. Like partitioning, clustering can help you write more performant queries that are quicker and cheaper to run. If you want to develop your SQL toolkit and build those higher-level Data Science skills, this is a great place to start.

In BigQuery, a clustered table is a table that keeps similar rows grouped together in physical “blocks”.

For example, picture a table called user_signups that keeps track of all the people registering an account on a fictitious website. It’s got four columns:

registration_date: the date on which the user created an account
country: the country where the user is based
tier: the user’s plan (“Free” or “Paid”)
username: the user’s username

If we wanted, we could cluster the table by country so that users from the same country are stored nearby each other in the table:

Source link

What's Hot

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Researchers from Snowflake and CMU Introduce SuffixDecoding: A Novel Model-Free Approach to Accelerating Large Language Model (LLM) Inference through Speculative Decoding

Top Hyperscience Alternatives: Ratings, Reviews & Pricing

Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman | Dec, 2023

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Increase Trust in Your Regression Model The Easy Way | by Jonte Dancker | Nov, 2024

Reporting in Excel Could Be Costing Your Business More Than You Think — Here’s How to Fix It… | by Hattie Biddlecombe | Nov, 2024

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Researchers from Snowflake and CMU Introduce SuffixDecoding: A Novel Model-Free Approach to Accelerating Large Language Model (LLM) Inference through Speculative Decoding

Top Hyperscience Alternatives: Ratings, Reviews & Pricing

Nous Research Introduces Two New Projects: The Forge Reasoning API Beta and Nous Chat

Our Picks

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Researchers from Snowflake and CMU Introduce SuffixDecoding: A Novel Model-Free Approach to Accelerating Large Language Model (LLM) Inference through Speculative Decoding

Top Hyperscience Alternatives: Ratings, Reviews & Pricing

What's Hot

Add One Line of SQL to Optimise Your BigQuery Tables | by Matt Chapman | Dec, 2023

Clustering: A simple way to group similar rows and prevent unnecessary data processing

Related Posts

Leave A Reply Cancel Reply