PROGRAMMING LANGUAGES

What’s faster, COUNT(*) or COUNT(*) with LIMIT in SQL? Let’s check

In a previous blog post, we’ve advertised the use of SQL EXISTS rather than COUNT(*) to check for existence of a value in SQL.

I.e. to check if in the Sakila database, actors called WAHLBERG have played in any films, instead of:

SELECT count(*)
FROM actor a
JOIN film_actor fa USING (actor_id)
WHERE a.last_name = 'WAHLBERG'

Do this:

SELECT EXISTS (
  SELECT 1 FROM actor a
  JOIN film_actor fa USING (actor_id)
  WHERE a.last_name = 'WAHLBERG'
)

(Depending on your dialect you may require a FROM DUAL clause, or a CASE expression if BOOLEAN types aren’t supported).

Check for multiple rows

But what if you want to check if there are at least 2 (or N) rows? In that case, you cannot use EXISTS, but have to revert to using COUNT(*). However, instead of just counting all matches, why not add a LIMIT clause as well? So, if you want to check if actors called WAHLBERG have played in at least 2 films, instead of this:

SELECT (
  SELECT count(*)
  FROM actor a
  JOIN film_actor fa USING (actor_id)
  WHERE a.last_name = 'WAHLBERG'
) >= 2

Write this:

SELECT (
  SELECT count(*)
  FROM (
    SELECT *
    FROM actor a
    JOIN film_actor fa USING (actor_id)
    WHERE a.last_name = 'WAHLBERG'
    LIMIT 2
  ) t
) >= 2

In other words:

  1. Run the join query with a LIMIT 2 in a derived table
  2. Then COUNT(*) the rows (at most 2) from that derived table
  3. Finally, check if the count is high enough

Does it matter?

In principle, the optimiser could have figured this out itself, especially because we used a constant to compare the COUNT(*) value with. But did it really apply the transformation?

Let’s check execution plans and benchmark the query on various RDBMS.

PostgreSQL 15

No LIMIT

Result  (cost=14.70..14.71 rows=1 width=1) (actual time=0.039..0.039 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=14.69..14.70 rows=1 width=8) (actual time=0.037..0.037 rows=1 loops=1)
-> Nested Loop (cost=0.28..14.55 rows=55 width=0) (actual time=0.009..0.032 rows=56 loops=1)
-> Seq Scan on actor a (cost=0.00..4.50 rows=2 width=4) (actual time=0.006..0.018 rows=2 loops=1)
Filter: ((last_name)::text="WAHLBERG"::text)
Rows Removed by Filter: 198
-> Index Only Scan using film_actor_pkey on film_actor fa (cost=0.28..4.75 rows=27 width=4) (actual time=0.003..0.005 rows=28 loops=2)
Index Cond: (actor_id = a.actor_id)
Heap Fetches: 0

With LIMIT

Result  (cost=0.84..0.85 rows=1 width=1) (actual time=0.023..0.024 rows=1 loops=1)
InitPlan 1 (returns $1)
-> Aggregate (cost=0.83..0.84 rows=1 width=8) (actual time=0.021..0.022 rows=1 loops=1)
-> Limit (cost=0.28..0.80 rows=2 width=240) (actual time=0.016..0.018 rows=2 loops=1)
-> Nested Loop (cost=0.28..14.55 rows=55 width=240) (actual time=0.015..0.016 rows=2 loops=1)
-> Seq Scan on actor a (cost=0.00..4.50 rows=2 width=4) (actual time=0.008..0.008 rows=1 loops=1)
Filter: ((last_name)::text="WAHLBERG"::text)
Rows Removed by Filter: 1
-> Index Only Scan using film_actor_pkey on film_actor fa (cost=0.28..4.75 rows=27 width=4) (actual time=0.005..0.005 rows=2 loops=1)
Index Cond: (actor_id = a.actor_id)
Heap Fetches: 0

To understand the difference, focus on these rows:

Before:

Nested Loop  (cost=0.28..14.55 rows=55 width=0) (actual time=0.009..0.032 rows=56 loops=1)

After:

Nested Loop  (cost=0.28..14.55 rows=55 width=240) (actual time=0.015..0.016 rows=2 loops=1)

In both cases, the estimated number of rows produced by the join is 55 (i.e. all WAHLBERGs are expected to have played in a total of 55 films according to statistics).

But int he second execution the actual rows value is much lower, because we only needed 2 rows before we could stop execution of the operation, because of the LIMIT above.

Benchmark results:

Using our recommended SQL benchmarking technique that compares running two queries many times (5 runs x 2000 executions in this case) on the same instance directly from within the RDBMS using procedural languages (to avoid network latency, etc.), we get these results:

RUN 1, Statement 1: 2.61927
RUN 1, Statement 2: 1.01506
RUN 2, Statement 1: 2.47193
RUN 2, Statement 2: 1.00614
RUN 3, Statement 1: 2.63533
RUN 3, Statement 2: 1.14282
RUN 4, Statement 1: 2.55228
RUN 4, Statement 2: 1.00000 -- Fastest run is 1
RUN 5, Statement 1: 2.53801
RUN 5, Statement 2: 1.02363

The fastest run is 1 units of time, slower runs run in multiples of that time. The complete COUNT(*) query is consistently and significantly slower than the LIMIT query.

Both the plans and benchmark results speak for themselves.

Oracle 23c

With Oracle 23c, we can finally use BOOLEAN types and omit DUAL, yay!

No FETCH FIRST:

SQL_ID  40yy0tskvs1zw, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/ ( SELECT count(*)
FROM actor a JOIN film_actor fa USING (actor_id)
WHERE a.last_name="WAHLBERG" ) >= 2

Plan hash value: 2539243977

---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 6 |
| 2 | NESTED LOOPS | | 1 | 55 | 56 |00:00:00.01 | 6 |
| 3 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | 2 | 2 |00:00:00.01 | 2 |
|* 4 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 2 |00:00:00.01 | 1 |
|* 5 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 2 | 27 | 56 |00:00:00.01 | 4 |
| 6 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

4 - access("A"."LAST_NAME"='WAHLBERG')
5 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")

With FETCH FIRST:

SQL_ID  f88t1r0avnr7b, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/( SELECT count(*)
from ( select * FROM actor a JOIN
film_actor fa USING (actor_id) WHERE a.last_name =
'WAHLBERG' FETCH FIRST 2 ROWS ONLY ) t )
>= 2

Plan hash value: 4019277616

------------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 | | | |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 6 | | | |
|* 2 | VIEW | | 1 | 2 | 2 |00:00:00.01 | 6 | | | |
|* 3 | WINDOW BUFFER PUSHED RANK | | 1 | 55 | 2 |00:00:00.01 | 6 | 2048 | 2048 | 2048 (0)|
| 4 | NESTED LOOPS | | 1 | 55 | 56 |00:00:00.01 | 6 | | | |
| 5 | TABLE ACCESS BY INDEX ROWID| ACTOR | 1 | 2 | 2 |00:00:00.01 | 2 | | | |
|* 6 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 2 |00:00:00.01 | 1 | | | |
|* 7 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 2 | 27 | 56 |00:00:00.01 | 4 | | | |
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 | | | |
------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("from$_subquery$_005"."rowlimit_$$_rownumber"<=2)
3 - filter(ROW_NUMBER() OVER ( ORDER BY NULL )<=2)
6 - access("A"."LAST_NAME"='WAHLBERG')
7 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")

Uh oh, this doesn’t look better. The NESTED LOOPS operation doesn’t seem to have gotten the memo from the WINDOW BUFFER PUSHED RANK operation about the query being aborted. The E-Rows (estimated) and A-Rows (actual) values still match, so the JOIN seems to be executed completely.

For good measure, let’s also try:

With ROWNUM:

I had hoped that this undead syntax belongs only to distant memories after Oracle 12c introduced the standard SQL FETCH syntax, but let’s try what happens with this alternative:

SELECT (
  SELECT count(*)
  FROM (
    SELECT *
    FROM actor a
    JOIN film_actor fa USING (actor_id)
    WHERE a.last_name = 'WAHLBERG'
    AND ROWNUM <= 2 -- Yuck, but it works
  ) t
) >= 2

The plan is now:

SQL_ID  6r7w9d0425j6c, child number 0
-------------------------------------
SELECT /*+GATHER_PLAN_STATISTICS*/( SELECT count(*)
from ( select * FROM actor a JOIN
film_actor fa USING (actor_id) WHERE a.last_name =
'WAHLBERG' AND ROWNUM <= 2 ) t ) >= 2

Plan hash value: 1271700124

-----------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 0 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 4 |
| 2 | VIEW | | 1 | 2 | 2 |00:00:00.01 | 4 |
|* 3 | COUNT STOPKEY | | 1 | | 2 |00:00:00.01 | 4 |
| 4 | NESTED LOOPS | | 1 | 55 | 2 |00:00:00.01 | 4 |
| 5 | TABLE ACCESS BY INDEX ROWID BATCHED| ACTOR | 1 | 2 | 1 |00:00:00.01 | 2 |
|* 6 | INDEX RANGE SCAN | IDX_ACTOR_LAST_NAME | 1 | 2 | 1 |00:00:00.01 | 1 |
|* 7 | INDEX RANGE SCAN | IDX_FK_FILM_ACTOR_ACTOR | 1 | 27 | 2 |00:00:00.01 | 2 |
| 8 | FAST DUAL | | 1 | 1 | 1 |00:00:00.01 | 0 |
-----------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - filter(ROWNUM<=2)
6 - access("A"."LAST_NAME"='WAHLBERG')
7 - access("A"."ACTOR_ID"="FA"."ACTOR_ID")

Now, that’s what I’m talking about. The NESTED LOOPS operation has a A-Rows value of 2, as it should have. The COUNT STOPKEY operation knows how to tell its successors to behave.

Benchmark results:

Run 1, Statement 1 : 1.9564
Run 1, Statement 2 : 2.98499
Run 1, Statement 3 : 1.07291
Run 2, Statement 1 : 1.69192
Run 2, Statement 2 : 2.66905
Run 2, Statement 3 : 1.01144
Run 3, Statement 1 : 1.71051
Run 3, Statement 2 : 2.63831
Run 3, Statement 3 : 1 -- Fastest run is 1
Run 4, Statement 1 : 1.61544
Run 4, Statement 2 : 2.67334
Run 4, Statement 3 : 1.00786
Run 5, Statement 1 : 1.72981
Run 5, Statement 2 : 2.77913
Run 5, Statement 3 : 1.02716

Whoopsies. Indeed, it appears that the FETCH FIRST 2 ROWS ONLY clause is bad in this case. It even made performance worse than if we omit it and count the complete result. However, the ROWNUM filter helped greatly, just like before with PostgreSQL’s LIMIT. I would consider this an optimiser bug in Oracle. FETCH FIRST should be an operation that can be pushed down to various other operations

MySQL

No LIMIT:

-> Rows fetched before execution  (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(0) (cost=1.35 rows=1) (actual time=0.479..0.479 rows=1 loops=1)
-> Nested loop inner join (cost=1.15 rows=2) (actual time=0.077..0.110 rows=56 loops=1)
-> Covering index lookup on a using idx_actor_last_name (last_name="WAHLBERG") (cost=0.45 rows=2) (actual time=0.059..0.061 rows=2 loops=1)
-> Covering index lookup on fa using PRIMARY (actor_id=a.actor_id) (cost=0.30 rows=1) (actual time=0.011..0.021 rows=28 loops=2)

With LIMIT:

-> Rows fetched before execution  (cost=0.00..0.00 rows=1) (actual time=0.000..0.000 rows=1 loops=1)
-> Select #2 (subquery in projection; run only once)
-> Aggregate: count(0) (cost=4.08..4.08 rows=1) (actual time=0.399..0.400 rows=1 loops=1)
-> Table scan on t (cost=2.62..3.88 rows=2) (actual time=0.394..0.394 rows=2 loops=1)
-> Materialize (cost=1.35..1.35 rows=2) (actual time=0.033..0.033 rows=2 loops=1)
-> Limit: 2 row(s) (cost=1.15 rows=2) (actual time=0.024..0.025 rows=2 loops=1)
-> Nested loop inner join (cost=1.15 rows=2) (actual time=0.024..0.024 rows=2 loops=1)
-> Covering index lookup on a using idx_actor_last_name (last_name="WAHLBERG") (cost=0.45 rows=2) (actual time=0.014..0.014 rows=1 loops=1)
-> Covering index lookup on fa using PRIMARY (actor_id=a.actor_id) (cost=0.30 rows=1) (actual time=0.008..0.008 rows=2 loops=1)

We again get the Nested loop inner join row with the wanted difference:

Before:

Nested loop inner join  (cost=1.15 rows=2) (actual time=0.077..0.110 rows=56 loops=1)

After:

Nested loop inner join  (cost=1.15 rows=2) (actual time=0.024..0.024 rows=2 loops=1)

Benchmark results:

Again, the LIMIT is helpful, though the difference is less impressive:

0	1	1.2933
0 2 1.0089
1 1 1.2489
1 2 1.0000 -- Fastest run is 1
2 1 1.2444
2 2 1.0933
3 1 1.2133
3 2 1.0178
4 1 1.2267
4 2 1.0178

SQL Server

No LIMIT:

  |--Compute Scalar(DEFINE:([Expr1006]=CASE WHEN [Expr1004]>=(2) THEN (1) ELSE (0) END))
|--Compute Scalar(DEFINE:([Expr1004]=CONVERT_IMPLICIT(int,[Expr1010],0)))
|--Stream Aggregate(DEFINE:([Expr1010]=Count(*)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a]), WHERE:([sakila].[dbo].[actor].[last_name] as [a].[last_name]='WAHLBERG'))
|--Index Seek(OBJECT:([sakila].[dbo].[film_actor].[PK__film_act__086D31FF6BE587FC] AS [fa]), SEEK:([fa].[actor_id]=[sakila].[dbo].[actor].[actor_id] as [a].[actor_id]) ORDERED FORWARD)

With LIMIT:

  |--Compute Scalar(DEFINE:([Expr1007]=CASE WHEN [Expr1005]>=(2) THEN (1) ELSE (0) END))
|--Compute Scalar(DEFINE:([Expr1005]=CONVERT_IMPLICIT(int,[Expr1011],0)))
|--Stream Aggregate(DEFINE:([Expr1011]=Count(*)))
|--Top(TOP EXPRESSION:((2)))
|--Nested Loops(Inner Join, OUTER REFERENCES:([a].[actor_id]))
|--Table Scan(OBJECT:([sakila].[dbo].[actor] AS [a]), WHERE:([sakila].[dbo].[actor].[last_name] as [a].[last_name]='WAHLBERG'))
|--Index Seek(OBJECT:([sakila].[dbo].[film_actor].[PK__film_act__086D31FF6BE587FC] AS [fa]), SEEK:([fa].[actor_id]=[sakila].[dbo].[actor].[actor_id] as [a].[actor_id]) ORDERED FORWARD)

The text version doesn’t indicate actual rows, even with SHOWPLAN_ALL, so let’s just look at what happens in the benchmark:

Benchmark results:

Run 1, Statement 1: 1.92118
Run 1, Statement 2: 1.00000 -- Fastest run is 1
Run 2, Statement 1: 1.95567
Run 2, Statement 2: 1.01724
Run 3, Statement 1: 1.91379
Run 3, Statement 2: 1.01724
Run 4, Statement 1: 1.93842
Run 4, Statement 2: 1.04926
Run 5, Statement 1: 1.95567
Run 5, Statement 2: 1.03448

And again, an impressive 2x improvement for this particular query

Conclusion

Just as with our previous blog post about COUNT(*) vs EXISTS, the seemingly obvious is true again in this case where we want to check if N or more rows exist in a query. If we blindly count all the rows, then we’ve seen much worse performance than if we helped the optimiser with a LIMIT or TOP clause, or ROWNUM in Oracle.

Technically, an optimiser could have detected this optimisation itself, but as our previous article about optimisations that don’t depend on the cost model has shown, optimisers don’t always do everything they can.

Unfortunately, in Oracle’s case, the standard SQL syntax made things slower (in this benchmark). This doesn’t mean it’s generally slower for all cases, but it’s something worth looking out for. There are still cases where ancient ROWNUM clause is better optimised. This is one of those cases.

Whether syntax X is faster than syntax Y can be shown by studying execution plans (not just with estimates, but with actual values), or by running a simple SQL benchmark. As always with benchmarks, be careful when interpreting results, double check, try more alternatives.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button