JAVA

A Quick and Dirty Way to Concatenate Two Vaguely Related Tables in SQL – Java, SQL and jOOQ.

Every now and then I run across a use case for the arcane NATURAL JOIN SQL operator, and I’m even more delighted when I can make that a NATURAL FULL JOIN. A few past blog posts on the subject include:

Recently, I stumbled upon a question on Reddit:

Is there such a thing that can join two completely different tables with no relation but acts like a union?

At first I though of the UNION CORRESPONDING syntax, which doesn’t really exist in most SQL dialects, even if it’s a standard feature. But then, I remembered that this is again a perfect use case for NATURAL FULL JOIN, this time slightly differently from the above example where two tables are compared for contents. This time, we want to make sure the two joined tables never have matching rows, in order to get the UNION like behaviour.

Consider the Sakila database. In that database, we have 3 tables containing people, including:

The tables are defined as follows:

CREATE TABLE actor (
  actor_id integer NOT NULL PRIMARY KEY,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  last_update timestamp NOT NULL
);

CREATE TABLE customer (
  customer_id integer NOT NULL PRIMARY KEY,
  store_id integer NOT NULL,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  email varchar(50),
  address_id integer NOT NULL,
  active boolean NOT NULL,
  create_date date NOT NULL,
  last_update timestamp
);

CREATE TABLE staff (
  staff_id integer NOT NULL,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  address_id integer NOT NULL,
  email varchar(50),
  store_id integer NOT NULL,
  active boolean NOT NULL,
  username varchar(16) NOT NULL,
  password varchar(40),
  last_update timestamp NOT NULL,
  picture bytea
);

As can be seen, the common columns are really only (FIRST_NAME, LAST_NAME, LAST_UPDATE), all the other columns are table-specific. Using the following query, we can concatenate all the data:

SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;

The result looks something like this:

|source  |first_name|last_name|last_update            |actor_id|...|customer_id|...|staff_id|...|
|--------|----------|---------|-----------------------|--------|---|-----------|---|--------|---|
|actor   |PENELOPE  |GUINESS  |2006-02-15 04:34:33.000|1       |...|           |   |        |   |
|actor   |NICK      |WAHLBERG |2006-02-15 04:34:33.000|2       |...|           |   |        |   |
|actor   |ED        |CHASE    |2006-02-15 04:34:33.000|3       |...|           |   |        |   |
|customer|MARY      |SMITH    |2006-02-15 04:57:20.000|        |   |1          |...|        |   |
|customer|PATRICIA  |JOHNSON  |2006-02-15 04:57:20.000|        |   |2          |...|        |   |
|customer|LINDA     |WILLIAMS |2006-02-15 04:57:20.000|        |   |3          |...|        |   |
|staff   |Mike      |Hillyer  |2006-02-15 04:57:16.000|        |   |           |   |1       |...|
|staff   |Jon       |Stephens |2006-02-15 04:57:16.000|        |   |           |   |2       |...|

Some observations:

  • The matched columns (i.e. columns by the same name) of the NATURAL JOIN are at the beginning. They include the synthetic SOURCE column, which is different for each join source, so we never have a match, which was desired. We want UNION semantics (i.e. concatenate the 3 tables), not match them.
  • The columns that are unique to each tables are listed afterwards. They contain data only if they belong to the relevant SOURCE

This technique is obviously not for every day, but it can be occasionally useful. So don’t underestimate the powers of NATURAL FULL JOIN

Caveats

I cheated a little bit. The actual Sakila database schema has a conflict between CUSTOMER and STAFF tables:

CREATE TABLE customer (
  customer_id integer NOT NULL PRIMARY KEY,
  store_id integer NOT NULL,
  first_name varchar(45) NOT NULL,
  last_name varchar(45) NOT NULL,
  email varchar(50),
  address_id integer NOT NULL,
  activebool boolean NOT NULL,
  create_date date NOT NULL,
  last_update timestamp
  active integer NOT NULL, -- This is an integer
);

With this table definition, I got this error for my query:

SQL Error [42804]: ERROR: JOIN/USING types integer and boolean cannot be matched

So, to fix this, I patched the CUSTOMER table definition:

-- Patch
WITH customer AS (
  SELECT 
    customer_id, 
    store_id, 
    first_name, 
    last_name, 
    email, 
    address_id, 
    activebool as active, 
    create_date, 
    last_update 
  FROM customer
)

-- Original query
SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;

Wishing that BigQuery’s useful * REPLACE (...) syntax was more widely available.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button