Using jOOQ’s Implicit Join From Within the JOIN .. ON Clause – Java, SQL and jOOQ.
Starting with jOOQ 3.11, type safe implicit JOIN
have been made available, and they’ve been enhanced to be supported also in DML statements in jOOQ 3.17. Today, I’d like to focus on a somewhat weird but really powerful use-case for implicit JOIN
, when joining additional tables from within an explicit JOIN
‘s ON
clause.
The use case
The jOOQ code generator makes heavy use of jOOQ when querying the various dictionary views. In PostgreSQL, most queries go to the SQL standard information_schema
, but every now and then, the standard meta data is insufficient, and we also have to query the pg_catalog
, which is more complete but also much more technical.
For a lot of information_schema
views, there exists an almost equivalent pg_catalog
table which contains the same information. For example:
information_schema |
pg_catalog |
---|---|
schemata |
pg_namespace |
tables or user_defined_types |
pg_class |
columns or attributes |
pg_attribute |
Interestingly, PostgreSQL being an ORDBMS, tables and user defined types are the same thing and often interchangeable in the type system, but that’s a topic for a future blog post.
The point of this blog post is that often, when querying a view like information_schema.attributes
, we also have to query pg_catalog.pg_attribute
to get additional data. For example, in order to find the declared array dimension of a UDT (User Defined Type) attribute, we have to access pg_catalog.pg_attribute.attndims
, as this information is nowhere to be found in the information_schema
. See also jOOQ feature request #252, where we’ll add support for H2 / PostgreSQL multi dimensional arrays.
So, we might have a UDT like this:
CREATE TYPE u_multidim_a AS (
i integer[][],
n numeric(10, 5)[][][],
v varchar(10)[][][][]
);
The canonical SQL way to access the pg_attribute
table from the attributes
view is:
SELECT
is_a.udt_schema,
is_a.udt_name,
is_a.attribute_name,
pg_a.attndims
FROM information_schema.attributes AS is_a
JOIN pg_attribute AS pg_a
ON is_a.attribute_name = pg_a.attname
JOIN pg_class AS pg_c
ON is_a.udt_name = pg_c.relname
AND pg_a.attrelid = pg_c.oid
JOIN pg_namespace AS pg_n
ON is_a.udt_schema = pg_n.nspname
AND pg_c.relnamespace = pg_n.oid
WHERE is_a.data_type = 'ARRAY'
ORDER BY
is_a.udt_schema,
is_a.udt_name,
is_a.attribute_name,
is_a.ordinal_position
To visualise:
+----- udt_schema = nspname ------> pg_namespace | ^ | | | oid | = | relnamespace | | | v +------- udt_name = relname ------> pg_class | ^ | | | oid | = | attrelid | | | v is.attributes <-+- attribute_name = attname ------> pg_attribute
And now, we can see a few of our integration test user defined types, containing multi dimensional arrays:
|udt_schema|udt_name |attribute_name|attndims| |----------|------------|--------------|--------| |public |u_multidim_a|i |2 | |public |u_multidim_a|n |3 | |public |u_multidim_a|v |4 | |public |u_multidim_b|a1 |1 | |public |u_multidim_b|a2 |2 | |public |u_multidim_b|a3 |3 | |public |u_multidim_c|b |2 |
But look at all those JOIN
expressions. They’re definitely no fun. We have to spell out the entire path from pg_attribute
to pg_namespace
, only to make sure we’re not fetching any ambiguously named data from other UDTs or other schemata.
Using implicit joins instead
And that’s where the power of implicit JOIN
come in play. What we really want to write in SQL is this:
SELECT
is_a.udt_schema,
is_a.udt_name,
is_a.attribute_name,
pg_a.attndims
-- This table we need
FROM information_schema.attributes AS is_a
-- And also this one
JOIN pg_attribute AS pg_a
ON is_a.attribute_name = pg_a.attname
-- But the path joins from pg_attribute to pg_namespace should
-- be implicit
AND pg_a.pg_class.relname = is_a.udt_name
AND pg_a.pg_class.pg_namespace.nspname = is_a.udt_schema
WHERE is_a.data_type = 'ARRAY'
ORDER BY
is_a.udt_schema,
is_a.udt_name,
is_a.attribute_name,
is_a.ordinal_position
It’s not that much shorter, but it’s definitely very convenient to no longer have to think about how to join the different steps. Note that unlike other cases, where we used implicit joins via these paths in SELECT
or WHERE
, this time we’re using them from within a JOIN .. ON
clause! In jOOQ, we can write:
Attributes isA = ATTRIBUTES.as("is_a");
PgAttribute pgA = PgAttribute.as("pg_a");
ctx.select(
isA.UDT_SCHEMA,
isA.UDT_NAME,
isA.ATTRIBUTE_NAME,
pgA.ATTNDIMS)
.from(isA)
.join(pgA)
.on(isA.ATTRIBUTE_NAME.eq(pgA.ATTNAME))
.and(isA.UDT_NAME.eq(pgA.pgClass().RELNAME))
.and(isA.UDT_SCHEMA.eq(pgA.pgClass().pgNamespace().NSPNAME))
.where(isA.DATA_TYPE.eq("ARRAY"))
.orderBy(
isA.UDT_SCHEMA,
isA.UDT_NAME,
isA.ATTRIBUTE_NAME,
isA.ORDINAL_POSITION)
.fetch();
The generated SQL looks slightly different from the original one, as jOOQ’s implicit JOIN
algorithm will never flatten the JOIN
tree in order to preserve any potential JOIN
operator precedence, which is important in the event of there being LEFT JOIN
, FULL JOIN
or other operators present. The output looks more like this:
FROM information_schema.attributes AS is_a
JOIN (
pg_catalog.pg_attribute AS pg_a
JOIN (
pg_catalog.pg_class AS alias_70236485
JOIN pg_catalog.pg_namespace AS alias_96617829
ON alias_70236485.relnamespace = alias_96617829.oid
)
ON pg_a.attrelid = alias_70236485.oid
)
ON (
is_a.attribute_name = pg_a.attname
AND is_a.udt_name = alias_70236485.relname
AND is_a.udt_schema = alias_96617829.nspname
)
As you can see, the “readable” table aliases (is_a
and pg_a
) are the user-provided ones, whereas the “unreadable,” system generated ones (alias_70236485
and alias_96617829
) are the ones originating from the implicit JOIN
. And, again, it’s important that these implicit joins are embedded right where they belong, with the path root pg_a
, from which we started the path expressions. That’s the only way we can retain the correct JOIN
operator precedence semantics, e.g. if we had used a LEFT JOIN
between is_a
and pg_a
Future improvements
In the future, there might be even better JOIN
paths that allow for connecting such graphs directly, because every time you have to join information_schema.attributes
and pg_catalog.pg_attribute
, you’ll have to repeat the same equalities on the (udt_schema, udt_name, attribute_name)
tuple, and while implicit JOIN
have been helpful, it’s easy to see how this can be further improved. The ideal query would be:
SELECT
is_a.udt_schema,
is_a.udt_name,
is_a.attribute_name,
pg_a.attndims
FROM information_schema.attributes AS is_a
-- Magic here
MAGIC JOIN pg_attribute AS pg_a
ON jooq_do_your_thing
WHERE is_a.data_type = 'ARRAY'
ORDER BY
is_a.udt_schema,
is_a.udt_name,
is_a.attribute_name,
is_a.ordinal_position
But we’re not quite there yet.
Getting access to these join paths
Neither the information_schema
views, nor the pg_catalog
tables expose any foreign key meta data, which are a prerequisite for implicit join path expressions and other jOOQ code generation features. This isn’t a huge problem as you can specify synthetic foreign keys to the code generator, for precisely this reason. See also our previous blog post about synthetic foreign keys for information schema queries. In this case, all we need is at least this specification:
<configuration>
<generator>
<database>
<syntheticObjects>
<foreignKeys>
<foreignKey>
<tables>pg_attribute</tables>
<fields><field>attrelid</field></fields>
<referencedTable>pg_class</referencedTable>
</foreignKey>
<foreignKey>
<tables>pg_class</tables>
<fields><field>relnamespace</field></fields>
<referencedTable>pg_namespace</referencedTable>
</foreignKey>
</foreignKeys>
</syntheticObjects>
</database>
</generator>
</configuration>
And ta-dah, we have our JOIN
paths as seen in the previous examples.