How to Plot an ASCII Bar Chart with SQL – Java, SQL and jOOQ.
No need for expensive Tableau subscriptions. Ditch Microsoft Excel. Just use native PostgreSQL to quickly visualise your data!
Here’s an idea I had for a while. As you may know, jOOQ can produce fancy charts from your jOOQ results. But that requires you use jOOQ, and you may not be using jOOQ, because you’re not coding in Java/Kotlin/Scala (otherwise, you’d be using jOOQ). That’s OK. I thought, why not do it with SQL (PostgreSQL, to be specific) directly, then? After all, I’m me:
So, with the following fancy query, which I’ll maintain and develop further on this github repo, you will be able to easily plot just about anything directly from your favourite SQL editor.
Just look at it. Look:
-- The example uses https://www.jooq.org/sakila, but you can just replace
-- the "source" table with anything else
with
-- This part is what you can modify to adapt to your own needs
--------------------------------------------------------------
-- Your data producing query here
source (key, value) as (
select payment_date::date::timestamp, sum(amount)
from payment
where extract(year from payment_date) < 2006
group by payment_date::date::timestamp
order by payment_date::date::timestamp
),
-- Some configuration items:
constants as (
select
-- the height of the y axis
15 as height,
-- the width of the x axis, if normalise_x, otherwise, ignored
25 as width,
-- the bar characters
'##' as characters,
-- the characters between bars
' ' as separator,
-- the padding of the labels on the y axis
10 as label_pad,
-- whether to normalise the data on the x axis by
-- - filling gaps (if int, bigint, numeric, timestamp,
-- timestamptz)
-- - scaling the x axis to "width"
true as normalise_x
),
-- The rest doesn't need to be touched
--------------------------------------
-- Pre-calculated dimensions of the source data
source_dimensions (kmin, kmax, kstep, vmin, vmax) as (
select
min(key), max(key),
(max(key) - min(key)) / max(width),
min(value), max(value)
from source, constants
),
-- Normalised data, which fills the gaps in case the key data
-- type can be generated with generate_series (int, bigint,
-- numeric, timestamp, timestamptz)
source_normalised (key, value) as (
select k, coalesce(sum(source.value), 0)
from source_dimensions
cross join constants
cross join lateral
generate_series(kmin, kmax, kstep) as t (k)
left join source
on source.key >= t.k and source.key < t.k + kstep
group by k
),
-- Replace source_normalised by source if you don't like the
-- normalised version
actual_source (i, key, value) as (
select row_number() over (order by key), key, value
from source_normalised, constants
where normalise_x
union all
select row_number() over (order by key), key, value
from source, constants
where not normalise_x
),
-- Pre-calculated dimensions of the actual data
actual_dimensions (
kmin, kmax, kstep, vmin, vmax, width_or_count
) as (
select
min(key), max(key),
(max(key) - min(key)) / max(width),
min(value), max(value),
case
when every(normalise_x) then least(max(width), count(*)::int)
else count(*)::int
end
from actual_source, constants
),
-- Additional convenience
dims_and_consts as (
with
temp as (
select *,
(length(characters) + length(separator))
* width_or_count as bar_width
from actual_dimensions, constants
)
select *,
(bar_width - length(kmin::text) - length(kmax::text))
as x_label_pad
from temp
),
-- A cartesian product for all (x, y) data points
x (x) as (
select generate_series(1, width_or_count) from dims_and_consts
),
y (y) as (
select generate_series(1, height) from dims_and_consts
),
-- Rendering the ASCII chart
chart (rn, chart) as (
select
y,
lpad(y * (vmax - vmin) / height || '', label_pad)
|| ' | '
|| string_agg(
case
when height * actual_source.value / (vmax - vmin)
>= y then characters
else repeat(' ', length(characters))
end, separator
order by x
)
from
x left join actual_source on actual_source.i = x,
y, dims_and_consts
group by y, vmin, vmax, height, label_pad
union all
select
0,
repeat('-', label_pad)
|| '-+-'
|| repeat('-', bar_width)
from dims_and_consts
union all
select
-1,
repeat(' ', label_pad)
|| ' | '
|| case
when x_label_pad < 1 then ''
else kmin || repeat(' ', x_label_pad) || kmax
end
from dims_and_consts
)
select chart
from chart
order by rn desc
;
Running against the sakila database, you’ll get this fancy chart:
chart | ----------------------------------------------------------------------------------------+ 11251.7400 | ## | 10501.6240 | ## | 9751.50800 | ## | 9001.39200 | ## | 8251.27600 | ## | 7501.16000 | ## ## ## ## | 6751.04400 | ## ## ## ## | 6000.92800 | ## ## ## ## | 5250.81200 | ## ## ## ## ## ## | 4500.69600 | ## ## ## ## ## ## | 3750.58000 | ## ## ## ## ## ## ## ## | 3000.46400 | ## ## ## ## ## ## ## ## | 2250.34800 | ## ## ## ## ## ## ## ## ## ## ## | 1500.23200 | ## ## ## ## ## ## ## ## ## ## ## ## | 750.116000 | ## ## ## ## ## ## ## ## ## ## ## ## | -----------+----------------------------------------------------------------------------| | 2005-05-24 00:00:00 2005-08-23 00:00:00|
Isn’t that something!
How does it work?
The query has 3 parts:
source
: The actual query, producing data. This is what you can substitute and place your own, insteadconstants
: The configuration section, where you can tweak dimensions, bar chart characters, etc.- the rest, which you don’t need to tamper with
The source
is just a query like this:
source (key, value) as (
select payment_date::date::timestamp, sum(amount)
from payment
where extract(year from payment_date) < 2006
group by payment_date::date::timestamp
order by payment_date::date::timestamp
)
It produces all revenue per payment date from the payment table. While payment_date
is a timestamp
, we cast that to date to be able to get daily revenue. But in order to fill the gaps using PostgreSQL’s generate_series
, we have to cast the date value back to timestamp, because surprisingly, there’s no native generate_series(date, date)
function in PostgreSQL.
All you have to do is produce a set of data in a key/value form. You can replace this by anything else, e.g. to get cumulative revenue:
source (key, value) as (
select
payment_date::date::timestamp,
sum(sum(amount)) over (order by payment_date::date::timestamp)
from payment
where extract(year from payment_date) < 2006
group by payment_date::date::timestamp
order by payment_date::date::timestamp
)
… for which you (currently) have to patch the normalisation back to false
(the padding of gaps isn’t correct yet). Also, to save space, I’ve made the bars a bit slimmer:
'#' as characters,
'' as separator,
false as normalise_x
And now you’ll get this nice chart showing the exponential increase of revenue that we wish so dearly to show our managers (it’s not actually exponential, because now, the gaps aren’t respected, but duh, it’s just generated sample data):
chart | -----------------------------------------------------+ 66872.4100 | #| 62414.2493 | ##| 57956.0886 | ####| 53497.9280 | ######| 49039.7673 | #######| 44581.6066 | ##########| 40123.4460 | ###########| 35665.2853 | #############| 31207.1246 | ###############| 26748.9640 | ##################| 22290.8033 | ####################| 17832.6426 | ######################| 13374.4820 | #########################| 8916.32133 | #############################| 4458.16066 | #################################| -----------+-----------------------------------------| | 2005-05-24 00:00:00 2005-08-23 00:00:00|
Awesome, huh! Play around with it here:
Send your pull requests with improvements. Challenges:
- Stacked charts
- Fill gaps also for cumulative data
- Other features?
The sky is the limit.