Super powerful functions in BigQuery — MAX_BY and MIN_BY

Jash Bhatt
5 min readOct 24, 2023

Introduction

BigQuery is a powerful and scalable data warehouse that lets you run fast and complex queries on massive amounts of data. But sometimes, you may need to perform some simple operations that are not so easy to express in SQL. For example, how do you get the value from one column based on the maximum or minimum value of another column?

You may have encountered this problem before and tried to solve it using subqueries, joins, or window functions. But these solutions can be cumbersome, inefficient, or hard to read. Wouldn’t it be nice if there was a simpler and more elegant way to do this?

Well, guess what? There is! BigQuery recently introduced two new functions that can help you with this task: max_by and min_by. These functions are essentially shortcuts for ANY_VALUE(x HAVING MAX y) and ANY_VALUE(x HAVING MIN y), respectively. And if you have no idea what ANY_VALUE or HAVING are, don’t worry, I’ll explain them later.

What are max_by and min_by functions?

The max_by and min_by functions are aggregate functions that return the value from one column based on the maximum or minimum value of another column within a group. The syntax for these functions is:

max_by(x, y) [OVER over_clause]
min_by(x, y) [OVER over_clause]

where x and y are expressions that can be any data type, and over_clause is an optional clause that specifies a window frame for the aggregation. If you omit the over_clause, the functions will operate on the entire input table or subquery.

Let’s see some examples of how these functions work.

Examples of max_by and min_by functions

Suppose you have a table called fruits that contains information about the sales of different fruits in different stores. The table looks like this:

store fruit   sold 
A apple 20
A banana 15
A pear 25
B apple 30
B banana 10
B pear 20

Now, let’s say you want to find out which fruit sold the most in each store. You could use a subquery with a window function to rank the fruits by sales within each store, and then filter out the ones with the highest rank. But that would be too much work, right? Instead, you can use the max_by function to get the same result with a single line of code:

SELECT 
store,
max_by(fruit, sold) AS best_seller
FROM fruits GROUP BY store;

This query will return:

store best_seller
A pear
B apple

As you can see, the max_by function returns the fruit that has the maximum value of sold for each store. It’s like saying: “Give me the fruit that has the highest sales in each store”.

Similarly, if you want to find out which fruit sold the least in each store, you can use the min_by function:

SELECT store, min_by(fruit, sold) AS worst_seller FROM fruits GROUP BY store;

This query will return:

store worst_seller
A banana
B banana

The min_by function returns the fruit that has the minimum value of sold for each store. It’s like saying: “Give me the fruit that has the lowest sales in each store”.

How do max_by and min_by functions work under the hood?

If you are curious about how these functions work under the hood, here is a brief explanation. The max_by and min_by functions are actually synonyms for ANY_VALUE(x HAVING MAX y) and ANY_VALUE(x HAVING MIN y), respectively. The ANY_VALUE function is another aggregate function that returns any value from a group of values. The HAVING clause is a filter that applies to the group of values after they are aggregated.

So, when you write max_by(x, y), you are actually saying: “Give me any value of x from the group of values that have the maximum value of y”.

And when you write min_by(x, y), you are saying: “Give me any value of x from the group of values that have the minimum value of y”.

But wait, what if there are multiple values of x that have the same maximum or minimum value of y? Which one will be returned by the max_by or min_by function? Well, the answer is: it depends. The functions are nondeterministic, which means that they can return any value from the group that satisfies the condition. There is no guarantee that they will return the same value every time you run the query. This is because BigQuery does not store or process the data in a fixed order, and it may use different algorithms or optimizations to execute the query. So, if you need a consistent and predictable result, you should not rely on these functions alone. You should either use a more specific expression to break ties, or use another function like ARRAY_AGG to get all the possible values.

Why should you use max_by and min_by functions?

Now that you know what these functions are and how they work, you may wonder why you should use them in your queries. Here are some reasons why these functions can make your life easier:

  • They are simple and concise. You can write a single line of code instead of a complex subquery or join.
  • They are expressive and intuitive. You can clearly state what you want to get from your data without worrying about how to get it.
  • They are flexible and powerful. You can use them with any data type and any window frame.
  • They are fun and funny. You can use them to create some interesting and humorous queries, such as finding out which celebrity has the most followers on Twitter, or which movie has the highest rating on IMDb.

Conclusion

In this story, we learned about the max_by and min_by functions in BigQuery, which are new aggregate functions that return the value from one column based on the maximum or minimum value of another column within a group. We saw some examples of how to use these functions with different data types and window frames, and how they work under the hood. We also discussed some reasons why these functions can make your life easier when writing SQL queries.

Big thanks for diving into the world of newest BigQuery functions with me! I hope you found this story both informative and entertaining. To keep the data-driven party going, do clap and follow me here on Medium and on LinkedIn. I’m serving up more BigQuery insights, tips, and even a dash of SQL humor.🚀📊

And if you have any questions or feedback, feel free to leave a comment below.

Happy querying! 😊

--

--