How to implement FILTER semantics with Oracle JSON aggregate functions

testv September 16, 2024

0 4 2 minutes read

A cool standard SQL:2003 feature is the aggregate FILTER clause, which is supported natively by at least these RDBMS:

ClickHouse
CockroachDB
DuckDB
Firebird
H2
HSQLDB
PostgreSQL
SQLite
Trino
YugabyteDB

The following aggregate function computes the number of rows per group which satifsy the FILTER clause:

SELECT
  COUNT(*) FILTER (WHERE BOOK.TITLE LIKE 'A%'),
  COUNT(*) FILTER (WHERE BOOK.TITLE LIKE 'B%'),
  ...
FROM BOOK

This is useful for pivot style queries, where multiple aggregate values are computed in one go. For most basic types of aggregate function, it can be emulated simply by using CASE expressions, because standard aggregate functions ignore NULL values when aggregating. The following is equivalent to the above, in all RDBMS:

SELECT
  COUNT(CASE WHEN BOOK.TITLE LIKE 'A%' THEN 1 END),
  COUNT(CASE WHEN BOOK.TITLE LIKE 'B%' THEN 1 END),
  ...
FROM BOOK

What if we’re aggregating JSON?

Things are a bit different when aggregating JSON. Look at the following example, where we don’t want to count the books, but list them in a JSON array, or object:

SELECT
  JSON_ARRAYAGG(BOOK.TITLE)
    FILTER (WHERE BOOK.LANGUAGE_ID = 1),
  JSON_OBJECTAGG('id-' || BOOK.ID, BOOK.TITLE)
    FILTER (WHERE BOOK.LANGUAGE_ID = 2),
  ...
FROM BOOK

Things are different with these collection aggregate functions, because NULL values are actually interesting there, so we want to list them in the resulting JSON document. Assuming there are books with a NULL title, we might get:

|JSON_ARRAYAGG                |JSON_OBJECTAGG                      |
|-----------------------------|------------------------------------|
|["1984", "Animal Farm", null]| "id-4" : "Brida", "id-17" : null |

This makes emulating the FILTER clause (e.g. on Oracle) much harder, because we cannot just use ABSENT ON NULL like this:

SELECT
  JSON_ARRAYAGG(
    CASE WHEN T_BOOK.LANGUAGE_ID = 1 THEN T_BOOK.TITLE END 
    ABSENT ON NULL
  ),
  JSON_OBJECTAGG(
    'id-' || T_BOOK.ID, 
    CASE WHEN T_BOOK.LANGUAGE_ID = 2 THEN T_BOOK.TITLE END
    ABSENT ON NULL
  )
FROM T_BOOK;

Because now, the legitimate null titled books are missing and we’re getting this instead:

|JSON_ARRAYAGG         |JSON_OBJECTAGG  |
|----------------------|----------------|
|["1984","Animal Farm"]|"id-4":"Brida"|

We cannot use NULL ON NULL either, because that would just turn the FILTER semantics into a mapping semantics, and produce too many values:

|JSON_ARRAYAGG                        |JSON_OBJECTAGG                                                   |
|-------------------------------------|-----------------------------------------------------------------|
|["1984","Animal Farm",null,null,null]|"id-1":null,"id-4":"Brida","id-3":null,"id-2":null,"id-17":null|

E.g. while id-3 and id-2 values are NULL because the FILTER emulating CASE expression maps them to NULL, the id-17 value really has a NULL title.

Workaround: Wrap data in an array

As a workaround, we can:

Wrap legitimate data into an array
Apply ABSENT ON NULL to remove rows due to the FILTER emulation
Unwrap data again from the array

For the unwrapping, we’re going to be using JSON_TRANSFORM:

SELECT
  JSON_TRANSFORM(
    JSON_ARRAYAGG(
      CASE 
        WHEN T_BOOK.LANGUAGE_ID = 1 

        -- Wrap legitimate data into an array, including nulls
        THEN JSON_ARRAY(T_BOOK.TITLE NULL ON NULL)
      END 

      -- Remove NULLs due to FILTER emulation
      ABSENT ON NULL
    ),

    -- Unwrap data gain from the array
    NESTED PATH '$[*]' (REPLACE '@' = PATH '@[0]')
  ),

  JSON_TRANSFORM(
    JSON_OBJECTAGG(
      'id-' || T_BOOK.ID, 
      CASE 
        WHEN T_BOOK.LANGUAGE_ID = 2 

        -- Wrap legitimate data into an array, including nulls
        THEN JSON_ARRAY(T_BOOK.TITLE NULL ON NULL)
      END

      -- Remove NULLs due to FILTER emulation
      ABSENT ON NULL
    ),

    -- Unwrap data gain from the array
    NESTED PATH '$.*' (REPLACE '@' = PATH '@[0]')
  )
FROM T_BOOK;