JAVA

Using jOOQ’s Implicit Join From Within the JOIN .. ON Clause – Java, SQL and jOOQ.

testv September 25, 2024

0 13 4 minutes read

Starting with jOOQ 3.11, type safe implicit JOIN have been made available, and they’ve been enhanced to be supported also in DML statements in jOOQ 3.17. Today, I’d like to focus on a somewhat weird but really powerful use-case for implicit JOIN, when joining additional tables from within an explicit JOIN‘s ON clause.

The use case

The jOOQ code generator makes heavy use of jOOQ when querying the various dictionary views. In PostgreSQL, most queries go to the SQL standard information_schema, but every now and then, the standard meta data is insufficient, and we also have to query the pg_catalog, which is more complete but also much more technical.

For a lot of information_schema views, there exists an almost equivalent pg_catalog table which contains the same information. For example:

`information_schema`	`pg_catalog`
`schemata`	`pg_namespace`
`tables` or `user_defined_types`	`pg_class`
`columns` or `attributes`	`pg_attribute`

Interestingly, PostgreSQL being an ORDBMS, tables and user defined types are the same thing and often interchangeable in the type system, but that’s a topic for a future blog post.

The point of this blog post is that often, when querying a view like information_schema.attributes, we also have to query pg_catalog.pg_attribute to get additional data. For example, in order to find the declared array dimension of a UDT (User Defined Type) attribute, we have to access pg_catalog.pg_attribute.attndims, as this information is nowhere to be found in the information_schema. See also jOOQ feature request #252, where we’ll add support for H2 / PostgreSQL multi dimensional arrays.

So, we might have a UDT like this:

CREATE TYPE u_multidim_a AS (
  i integer[][],
  n numeric(10, 5)[][][],
  v varchar(10)[][][][]
);

The canonical SQL way to access the pg_attribute table from the attributes view is:

SELECT 
  is_a.udt_schema,
  is_a.udt_name,
  is_a.attribute_name,
  pg_a.attndims 
FROM information_schema.attributes AS is_a
  JOIN pg_attribute AS pg_a 
    ON is_a.attribute_name = pg_a.attname 
  JOIN pg_class AS pg_c
    ON is_a.udt_name = pg_c.relname 
    AND pg_a.attrelid = pg_c.oid
  JOIN pg_namespace AS pg_n
    ON is_a.udt_schema = pg_n.nspname 
    AND pg_c.relnamespace = pg_n.oid
WHERE is_a.data_type = 'ARRAY'    
ORDER BY
  is_a.udt_schema,
  is_a.udt_name,
  is_a.attribute_name,
  is_a.ordinal_position

To visualise:

                +----- udt_schema = nspname ------> pg_namespace
                |                                      ^
                |                                      |
                |                                     oid
                |                                      =
                |                                     relnamespace
                |                                      |
                |                                      v 
                +------- udt_name = relname ------> pg_class
                |                                      ^
                |                                      |
                |                                     oid
                |                                      =
                |                                     attrelid
                |                                      |
                |                                      v 
is.attributes <-+- attribute_name = attname ------> pg_attribute

And now, we can see a few of our integration test user defined types, containing multi dimensional arrays:

|udt_schema|udt_name    |attribute_name|attndims|
|----------|------------|--------------|--------|
|public    |u_multidim_a|i             |2       |
|public    |u_multidim_a|n             |3       |
|public    |u_multidim_a|v             |4       |
|public    |u_multidim_b|a1            |1       |
|public    |u_multidim_b|a2            |2       |
|public    |u_multidim_b|a3            |3       |
|public    |u_multidim_c|b             |2       |

But look at all those JOIN expressions. They’re definitely no fun. We have to spell out the entire path from pg_attribute to pg_namespace, only to make sure we’re not fetching any ambiguously named data from other UDTs or other schemata.

Using implicit joins instead

And that’s where the power of implicit JOIN come in play. What we really want to write in SQL is this:

SELECT 
  is_a.udt_schema,
  is_a.udt_name,
  is_a.attribute_name,
  pg_a.attndims 

-- This table we need
FROM information_schema.attributes AS is_a

-- And also this one
JOIN pg_attribute AS pg_a 
  ON is_a.attribute_name = pg_a.attname 

-- But the path joins from pg_attribute to pg_namespace should 
-- be implicit
  AND pg_a.pg_class.relname = is_a.udt_name
  AND pg_a.pg_class.pg_namespace.nspname = is_a.udt_schema
WHERE is_a.data_type = 'ARRAY'    
ORDER BY
  is_a.udt_schema,
  is_a.udt_name,
  is_a.attribute_name,
  is_a.ordinal_position

It’s not that much shorter, but it’s definitely very convenient to no longer have to think about how to join the different steps. Note that unlike other cases, where we used implicit joins via these paths in SELECT or WHERE, this time we’re using them from within a JOIN .. ON clause! In jOOQ, we can write:

Attributes isA = ATTRIBUTES.as("is_a");
PgAttribute pgA = PgAttribute.as("pg_a");

ctx.select(
       isA.UDT_SCHEMA,
       isA.UDT_NAME,
       isA.ATTRIBUTE_NAME,
       pgA.ATTNDIMS)
   .from(isA)
   .join(pgA)
     .on(isA.ATTRIBUTE_NAME.eq(pgA.ATTNAME))
     .and(isA.UDT_NAME.eq(pgA.pgClass().RELNAME))
     .and(isA.UDT_SCHEMA.eq(pgA.pgClass().pgNamespace().NSPNAME))
   .where(isA.DATA_TYPE.eq("ARRAY"))
   .orderBy(
       isA.UDT_SCHEMA,
       isA.UDT_NAME,
       isA.ATTRIBUTE_NAME,
       isA.ORDINAL_POSITION)
   .fetch();

The generated SQL looks slightly different from the original one, as jOOQ’s implicit JOIN algorithm will never flatten the JOIN tree in order to preserve any potential JOIN operator precedence, which is important in the event of there being LEFT JOIN, FULL JOIN or other operators present. The output looks more like this:

FROM information_schema.attributes AS is_a
  JOIN (
    pg_catalog.pg_attribute AS pg_a
      JOIN (
        pg_catalog.pg_class AS alias_70236485
          JOIN pg_catalog.pg_namespace AS alias_96617829
            ON alias_70236485.relnamespace = alias_96617829.oid
      )
        ON pg_a.attrelid = alias_70236485.oid
    )
    ON (
      is_a.attribute_name = pg_a.attname
      AND is_a.udt_name = alias_70236485.relname
      AND is_a.udt_schema = alias_96617829.nspname
    )

As you can see, the “readable” table aliases (is_a and pg_a) are the user-provided ones, whereas the “unreadable,” system generated ones (alias_70236485 and alias_96617829) are the ones originating from the implicit JOIN. And, again, it’s important that these implicit joins are embedded right where they belong, with the path root pg_a, from which we started the path expressions. That’s the only way we can retain the correct JOIN operator precedence semantics, e.g. if we had used a LEFT JOIN between is_a and pg_a

Future improvements

In the future, there might be even better JOIN paths that allow for connecting such graphs directly, because every time you have to join information_schema.attributes and pg_catalog.pg_attribute, you’ll have to repeat the same equalities on the (udt_schema, udt_name, attribute_name) tuple, and while implicit JOIN have been helpful, it’s easy to see how this can be further improved. The ideal query would be:

SELECT 
  is_a.udt_schema,
  is_a.udt_name,
  is_a.attribute_name,
  pg_a.attndims 
FROM information_schema.attributes AS is_a

-- Magic here
MAGIC JOIN pg_attribute AS pg_a 
  ON jooq_do_your_thing
WHERE is_a.data_type = 'ARRAY'    
ORDER BY
  is_a.udt_schema,
  is_a.udt_name,
  is_a.attribute_name,
  is_a.ordinal_position

But we’re not quite there yet.

Getting access to these join paths

Neither the information_schema views, nor the pg_catalog tables expose any foreign key meta data, which are a prerequisite for implicit join path expressions and other jOOQ code generation features. This isn’t a huge problem as you can specify synthetic foreign keys to the code generator, for precisely this reason. See also our previous blog post about synthetic foreign keys for information schema queries. In this case, all we need is at least this specification:

<configuration>
  <generator>
    <database>
      <syntheticObjects>
        <foreignKeys>
          <foreignKey>
            <tables>pg_attribute</tables>
            <fields><field>attrelid</field></fields>
            <referencedTable>pg_class</referencedTable>
          </foreignKey>
          <foreignKey>
            <tables>pg_class</tables>
            <fields><field>relnamespace</field></fields>
            <referencedTable>pg_namespace</referencedTable>
          </foreignKey>
        </foreignKeys>
      </syntheticObjects>
    </database>
  </generator>
</configuration>

And ta-dah, we have our JOIN paths as seen in the previous examples.