Changing SELECT .. FROM Into FROM .. SELECT Does Not “Fix” SQL – Java, SQL and jOOQ.
Every now and then, I see folks lament the SQL syntax’s peculiar disconnect between
Most recently here in a Youtube comment reply to a recent jOOQ/kotlin talk. Let’s look at why jOOQ didn’t fall into this trap of trying to “fix” this, and why this is even a trap.
The English Language
SQL has a simple syntax model. All commands start with a verb in imperative form, as we “command” the database to execute a statement. Common commands include:
SELECT
INSERT
UPDATE
DELETE
MERGE
TRUNCATE
CREATE
ALTER
DROP
All of these are verbs in imperative form. Think about adding an exclamation mark everywhere, e.g. INSERT [this record]!
The Order of Operations
We can argue that natural languages are very poor inspiration for computer programming languages, which tend to be more mathematical (some more than others). A lot of criticism about the SQL language is that it does not “compose” (in its native form).
We can argue, that it would be much better for a more composable SQL language to start with FROM
, which is the first operation in SELECT
according to the logical order of operations. E.g.
FROM book
WHERE book.title LIKE 'A%'
SELECT book.id, book.title
Yes, that would be better in the sense that it would be more logical. First, we declare the data source, predicates, etc. and only in the end would we declare the projection. With the Java Stream
API, we would write:
books.stream()
.filter(book -> book.title.startsWith("A"))
.map(book -> new B(book.id, book.title))
The benefits of this would be:
- No disconnect between syntax and logic
- Hence: No confusion around syntax, specifially why you can’t reference
SELECT
aliases inWHERE
, for example. - Better auto-completion (because you don’t write stuff that isn’t declared yet, first)
In a way, this ordering would be consistent with what some RDBMS implemented when RETURNING
data from DML statements, such as:
INSERT INTO book (id, title)
VALUES (3, 'The Book')
RETURNING id, created_at
With DML statements, the command (“imperative”) is still INSERT
, UPDATE
, DELETE
, i.e. a verb that clearly tells the database what to do with the data. The “projection” is more of an afterthought. A utility that is occasionally useful, hence RETURNING
can be placed at the end.
RETURNING
seems like a pragmatic choice of syntax, and isn’t even part of the standard. The standard defines the <data change delta table>
, as implemented by Db2 and H2, whose syntax is:
SELECT id, created_at
FROM FINAL TABLE (
INSERT INTO book (id, title)
VALUES (3, 'The Book')
) AS book
I mean, why not. I don’t have a strong preference for one or the other syntax (jOOQ supports both and emulates them into one another). SQL Server invented a third variant, whose syntax is probably the least intuitive (I always have to look up the exact location of the OUTPUT
clause):
INSERT INTO book (id, title)
OUTPUT id, created_at
VALUES (3, 'The Book')
Cypher query language
Probably worth mentioning here is that there exists a modern query language out there that is sufficiently popular to be considered for such discussions: The Cypher Query Language from neo4j. With a simple “trick”, it both:
- Maintained the language model where a verb in imperative form starts a statement (the verb is
MATCH
, which is similar toFROM
, but it is a verb), so it inherits SQL’s “strength” of being intuitive also for non-programmers. - Reversed the logical order of operations within the reading statements, to be of the form
MATCH .. RETURN
, makingRETURN
the universal form of projecting things for all operations, not justSELECT
. - Reused
MATCH
also for writing operations, includingDELETE
orSET
(which corresponds to SQL’sUPDATE
)
While operating on a different data paradigm (the network model as opposed to the relational model), I’ve always found the Cypher Query Language to be generally superior to SQL in terms of syntax, at least on a high level. If I had to actually “fix” SQL by creating SQL 2.0, I’d take inspiration here.
Fixing this in an API like jOOQ isn’t worth it
As discussed before, SQL has some obvious shortcomings, and there exist better languages like Cypher solving the same kind of problem. But SQL is here, and it’s 50 years old, and it will stay. It won’t be fixed.
That’s something that just has to be accepted:
SQL won’t be fixed
It will be amended. It incorporates new ideas, including:
It always does so in an idiomatic, SQL style way. If you’re reading the SQL standard, or if you’re working with PostgreSQL, which is very close to the standard, you’ll feel that SQL is quite consistent as a language. Or, it is consistently weird, depending on your tastes.
For jOOQ, one of the main success factors has always been to be as close as possible to this vision of what SQL really is in terms of syntax. A lot of folks are very effective writing native SQL. Since Java has text blocks, it has become a lot more bearable to just copy paste a static SQL query from your SQL editor into your Java program, and e.g. execute it with JDBC or with jOOQ’s plain SQL templating API:
for (Record record : ctx.fetch(
"""
SELECT id, title
FROM book
WHERE title LIKE 'A%'
"""
)) {
System.out.println(record);
}
This approach is sufficient for very simple applications out there. If your “application” runs a total of 5 distinct SQL queries, you can do it with JDBC alone (although, once you’ve started to get a hang of jOOQ, you’ll probably use jOOQ even for those applications as well).
But jOOQ really shines when your application has 100s of queries, including many dynamic ones, and your database has 100s of tables, in case of which the type safety and model safety benefits really help. However, it can shine only when your SQL query translates 1:1 to the jOOQ API. Randomly fixing SQL to some extent in this most important statement (SELECT
) won’t do the trick.
Because: Where will you stop fixing SQL? SQL is still weird even if you switch to FROM .. SELECT
. For example, the semantics of GROUP BY
is still weird. Or the relationship between DISTINCT
and ORDER BY
. E.g. this would appear to be much better at first (e.g. to separate SELECT
and DISTINCT
, which shouldn’t be located so closely together):
FROM book
WHERE book.title LIKE 'A%'
SELECT book.title
DISTINCT
ORDER BY book.title
But the weird caveats would still not disappear, namely that you can ORDER BY
expressions that are not listed in SELECT
in the absence of DISTINCT
, but not in the presence of DISTINCT
(see our previous article about that).
Alternative syntaxes in other DSL APIs
So, where does the “fixing” of SQL stop? When will SQL be “fixed?” It will never be fixed, and as such, an API like jOOQ would be much harder to learn that it should be. Some competing APIs follow this model, e.g.
Both of these APIs are based on the idea that SQL needs “fixing,” and that a more “native,” a more “idiomatic” feel of the API would be somewhat better. Some examples:
Slick:
Here’s an example from the getting started guide:
This corresponds to the following SQL:
SELECT max(price)
FROM coffees
It’s arguably a bit more idiomatic. It looks like ordinary Scala collection API usage, removing the SQL feel from the equation. After all, the usual map(x => y)
collection methods really correspond to a SQL SELECT
clause (a “projection”).
Exposed:
Here’s an example from Baeldung:
StarWarsFilms
.slice(StarWarsFilms.sequelId.count(), StarWarsFilms.director)
.selectAll()
.groupBy(StarWarsFilms.director)
The API introduces new terms, e.g.
slice
which means the same thing asmap()
orSELECT
, though foreign to both SQL or kotlin collection APIsselectAll
, which corresponds to the relational algebra term “selection”, corresponding to SQLWHERE
Synthetic convenience syntax instead of “fixing” SQL
jOOQ doesn’t follow down this road and never will. SQL is what it is, and jOOQ won’t be able to “fix” that. The 1:1 mapping between SQL syntax and jOOQ API means that even if you want to use something sophisticated, like:
Even then, jOOQ won’t let you down and will allow you to write exactly what you have in mind in terms of SQL feature. I mean, would it really make sense to support CONNECT BY
in Slick or Exposed? Probably not. They would have to invent their own syntax to give access to SQL recursion. But will it be complete? That’s a problem jOOQ won’t have.
The only reason why some syntax is not available is because it’s not possible yet (and please do send a feature request). The example of FOR XML
is an excellent one. SQL Server invented this FOR
clause, and while it’s convenient for simple cases, it’s not very powerful for complex ones. I much prefer standard SQL/XML and SQL/JSON syntax, (which jOOQ also supports). But while I don’t very much like the syntax, jOOQ won’t judge. What good would a third syntax, entirely invented by jOOQ be for users? As I said before.
When will the “fixing” stop?
It will never stop. The alternatives I’ve mentioned will run into very difficult questions down the line when they start adding more features, if they start adding more features. While it is always easy to implement a simple SELECT .. FROM .. WHERE
query builder, and support that functionality using arbitrary API, claiming SQL has been “fixed,” it is much harder to evolve this API, addressing all sorts of advanced SQL use-cases. Just look at their issue trackers for feature requests like CTEs. The answer is always: “Use native SQL.”
Even “simple” SQL features, such as UNION
become more complex once basic SQL syntax is changed. The semantics is already tricky enough in SQL (and it’s entirely SQL’s fault, sure), but “fixing” these things is never as simple as it may look at first.
Now, there are 2 exceptions to this rule:
Synthetic syntax
One exception is: “Synthetic syntax.” The most powerful synthetic syntax in jOOQ are implicit joins. Implicit joins aren’t “fixing” SQL, they’re “enhancing” SQL with a syntax that SQL itself might have (hopefully will have, eventually). Just like there exist SQL dialects, which “enhance” the SQL standard, e.g.
jOOQ is very conservative about such synthetic syntax. There are a lot of good ideas, but few are forward compatible. Each one of these syntaxes makes other SQL transformation features more complex, and each one has flaws that may not have been addressed yet (e.g. as of jOOQ 3.16, implicit joins are not possible in DML statements such as UPDATE
, DELETE
, even if they make a lot of sense there as well. See issue #7508).
Convenience syntax
Another type of improvement is what I call “convenience syntax.” For example, irrespective of the underlying RDBMS, jOOQ allows you to write:
select(someFunction()); // No FROM clause
selectFrom(someTable); // No explicit SELECT list
In both cases, users can omit clauses that may be mandatory in the underlying SQL dialect, and jOOQ fills the generated SQL with a reasonable default:
- A
FROM DUAL
table declaration, or something similar - A
SELECT *
projection declaration, or something similar
Conclusion
The idea that jOOQ should stick to SQL syntax on a 1:1 basis was a gamble I took 13 years ago, when I made jOOQ. I wanted to design jOOQ in a way that everyone who already knew SQL would have no problems learning jOOQ, because everything is absolutely straightforward. The technique behind this API design is described here.
Others have attempted to “fix” SQL by either making their API very idiomatic considering the target language, or by inventing a new language.
13 years later, I have found that the 1:1 mimicking approach is the only viable one, as I keep discovering new, arcane SQL features:
Creating a language is incredibly difficult (let’s consider an internal DSL API to be a sort of language). It’s almost impossible to design properly, if the goal is to support pretty much any underlying SQL feature, unless, the designer lets go of this dream of “fixing” things, and starts embracing the “dream” of “supporting” things. All the things.
SQL is what it is. And that means, the syntax is SELECT .. FROM
, not FROM .. SELECT
.