Queries with HAVING

The SELECT statement has two different sections which are used for conditions: WHERE and HAVING. The WHERE section is used more often and contains conditions which will be applied to each row. If a query contains aggregate functions, such as MAX or SUM, the SELECT statement may also contain GROUP BY and HAVING sections. The conditions of the HAVING section are applied after grouping the SQL query results. Typically the conditions of the HAVING section always contain aggregate functions, while conditions in the WHERE section may only contain aggregate functions inside a subquery.

When you write a query which contains aggregate functions, Pony needs to determine if the resulting SQL will contain the GROUP BY and HAVING sections and where it should put each condition from the Python query. If a condition contains an aggregate function, Pony places the condition into the HAVING section. Otherwise it places the condition into the WHERE section.

Consider the following query, which returns the tuples (Group, count_of_students):

  1. select((s.group, count(s)) for s in Student
  2. if s.group.dept.number == 44 and avg(s.gpa) > 4)

In this query we have two conditions. The first condition is s.group.dept.number == 44. Since it doesn`t include an aggregate function, Pony will place this condition into the WHERE section. The second condition avg(s.gpa) > 4 contains the aggregate function avg and will be placed into the HAVING section.

Another question is what columns Pony should add to the GROUP BY section. According to the SQL standard, any non-aggregated column which placed into the SELECT statement should be added to the GROUP BY section too. Let’s consider the following query:

  1. SELECT A, B, C, SUM(D), MAX(E), COUNT(F)
  2. FROM T1
  3. WHERE ...
  4. GROUP BY ...
  5. HAVING ...

According to the SQL standard, we need to include the columns A, B and C into the GROUP BY section, because these columns appear in the SELECT list and don’t wrapped with any aggregate function. Pony does exactly this. If your aggregated Pony query returns a tuple with several expressions, any non-aggregated expression will be placed into the GROUP BY section. Let’s consider the same Pony query again:

  1. select((s.group, count(s)) for s in Student
  2. if s.group.dept.number == 44 and avg(s.gpa) > 4)

This query returns the tuples (Group, count_of_students). The first element of the tuple, the Group instance, is not aggregated, so it will be placed into the GROUP BY section:

  1. SELECT "s"."group", COUNT(DISTINCT "s"."id")
  2. FROM "Student" "s", "Group" "group-1"
  3. WHERE "group-1"."dept" = 44
  4. AND "s"."group" = "group-1"."number"
  5. GROUP BY "s"."group"
  6. HAVING AVG("s"."gpa") > 4

The s.group expression was placed into the GROUP BY section, and the condition avg(s.gpa) > 4 was placed into the HAVING section of the query.

Sometimes the condition which should be placed into the HAVING section contains some non-aggregated columns. Such columns will be added to the GROUP BY section, because according to the SQL standard it is forbidden to use a non-aggregated column inside the HAVING section, if it was not added to the GROUP BY list.

Another example:

  1. select((item.order, item.order.total_price,
  2. sum(item.price * item.quantity))
  3. for item in OrderItem
  4. if item.order.total_price < sum(item.price * item.quantity))

This query has the following condition: item.order.total_price < sum(item.price * item.quantity), which contains an aggregate function and should be added to the HAVING section. But the part item.order.total_price is not aggregated. Hence, it will be added to the GROUP BY section in order to satisfy the SQL requirements.