PROGRAMMING LANGUAGES

How to Plot an ASCII Bar Chart with SQL – Java, SQL and jOOQ.

No need for expensive Tableau subscriptions. Ditch Microsoft Excel. Just use native PostgreSQL to quickly visualise your data!

Here’s an idea I had for a while. As you may know, jOOQ can produce fancy charts from your jOOQ results. But that requires you use jOOQ, and you may not be using jOOQ, because you’re not coding in Java/Kotlin/Scala (otherwise, you’d be using jOOQ). That’s OK. I thought, why not do it with SQL (PostgreSQL, to be specific) directly, then? After all, I’m me:

So, with the following fancy query, which I’ll maintain and develop further on this github repo, you will be able to easily plot just about anything directly from your favourite SQL editor.

Just look at it. Look:

-- The example uses https://www.jooq.org/sakila, but you can just replace
-- the "source" table with anything else
with 

  -- This part is what you can modify to adapt to your own needs
  --------------------------------------------------------------

  -- Your data producing query here 
  source (key, value) as (
    select payment_date::date::timestamp, sum(amount)
    from payment
    where extract(year from payment_date) < 2006
    group by payment_date::date::timestamp
    order by payment_date::date::timestamp
  ),
  
  -- Some configuration items:
  constants as (
    select
    
      -- the height of the y axis
      15 as height, 

      -- the width of the x axis, if normalise_x, otherwise, ignored
      25 as width, 

      -- the bar characters
      '##' as characters,

      -- the characters between bars
      ' ' as separator,
      
      -- the padding of the labels on the y axis
      10 as label_pad, 
      
      -- whether to normalise the data on the x axis by
      -- - filling gaps (if int, bigint, numeric, timestamp, 
      --   timestamptz)
      -- - scaling the x axis to "width"
      true as normalise_x
  ),
  
  -- The rest doesn't need to be touched
  --------------------------------------
  
  -- Pre-calculated dimensions of the source data
  source_dimensions (kmin, kmax, kstep, vmin, vmax) as (
    select 
      min(key), max(key), 
      (max(key) - min(key)) / max(width), 
      min(value), max(value)
    from source, constants
  ),
  
  -- Normalised data, which fills the gaps in case the key data
  -- type can be generated with generate_series (int, bigint, 
  -- numeric, timestamp, timestamptz)
  source_normalised (key, value) as (
    select k, coalesce(sum(source.value), 0)
    from source_dimensions
      cross join constants
      cross join lateral 
        generate_series(kmin, kmax, kstep) as t (k)
      left join source 
        on source.key >= t.k and source.key < t.k + kstep
    group by k
  ),

  -- Replace source_normalised by source if you don't like the 
  -- normalised version
  actual_source (i, key, value) as (
    select row_number() over (order by key), key, value 
    from source_normalised, constants
    where normalise_x
    union all
    select row_number() over (order by key), key, value
    from source, constants
    where not normalise_x
  ),
    
  -- Pre-calculated dimensions of the actual data
  actual_dimensions (
    kmin, kmax, kstep, vmin, vmax, width_or_count
  ) as (
    select 
      min(key), max(key), 
      (max(key) - min(key)) / max(width), 
      min(value), max(value), 
      case
        when every(normalise_x) then least(max(width), count(*)::int) 
        else count(*)::int 
      end
    from actual_source, constants
  ),
  
  -- Additional convenience
  dims_and_consts as (
    with 
      temp as (
        select *, 
        (length(characters) + length(separator)) 
          * width_or_count as bar_width
      from actual_dimensions, constants
    )
    select *,
      (bar_width - length(kmin::text) - length(kmax::text)) 
        as x_label_pad
    from temp
  ),
  
  -- A cartesian product for all (x, y) data points
  x (x) as (
    select generate_series(1, width_or_count) from dims_and_consts
  ),
  y (y) as (
    select generate_series(1, height) from dims_and_consts
  ),

  -- Rendering the ASCII chart
  chart (rn, chart) as (
    select
      y,
      lpad(y * (vmax - vmin) / height || '', label_pad) 
        || ' | ' 
        || string_agg(
             case 
               when height * actual_source.value / (vmax - vmin) 
                 >= y then characters 
               else repeat(' ', length(characters)) 
             end, separator 
             order by x
           )
    from 
      x left join actual_source on actual_source.i = x, 
      y, dims_and_consts
    group by y, vmin, vmax, height, label_pad
    union all
    select 
      0, 
      repeat('-', label_pad) 
        || '-+-' 
        || repeat('-', bar_width)
    from dims_and_consts
    union all
    select 
      -1, 
      repeat(' ', label_pad) 
        || ' | ' 
        || case 
             when x_label_pad < 1 then '' 
             else kmin || repeat(' ', x_label_pad) || kmax 
           end
    from dims_and_consts
  )
select chart
from chart
order by rn desc
;

Running against the sakila database, you’ll get this fancy chart:

chart                                                                                   |
----------------------------------------------------------------------------------------+
11251.7400 |                                                       ##                   |
10501.6240 |                                                       ##                   |
9751.50800 |                                                       ##                   |
9001.39200 |                                                       ##                   |
8251.27600 |                                                       ##                   |
7501.16000 |                                     ##                ##             ## ## |
6751.04400 |                                     ##                ##             ## ## |
6000.92800 |                                     ##                ##             ## ## |
5250.81200 |                   ##                ##             ## ##             ## ## |
4500.69600 |                   ##                ##             ## ##             ## ## |
3750.58000 |                   ## ##             ## ##          ## ##             ## ## |
3000.46400 |                   ## ##             ## ##          ## ##             ## ## |
2250.34800 |    ##             ## ##          ## ## ##          ## ## ##          ## ## |
1500.23200 | ## ##             ## ##          ## ## ##          ## ## ##          ## ## |
750.116000 | ## ##             ## ##          ## ## ##          ## ## ##          ## ## |
-----------+----------------------------------------------------------------------------|
           | 2005-05-24 00:00:00                                     2005-08-23 00:00:00|

Isn’t that something!

How does it work?

The query has 3 parts:

  • source: The actual query, producing data. This is what you can substitute and place your own, instead
  • constants: The configuration section, where you can tweak dimensions, bar chart characters, etc.
  • the rest, which you don’t need to tamper with

The source is just a query like this:

  source (key, value) as (
    select payment_date::date::timestamp, sum(amount)
    from payment
    where extract(year from payment_date) < 2006
    group by payment_date::date::timestamp
    order by payment_date::date::timestamp
  )

It produces all revenue per payment date from the payment table. While payment_date is a timestamp, we cast that to date to be able to get daily revenue. But in order to fill the gaps using PostgreSQL’s generate_series, we have to cast the date value back to timestamp, because surprisingly, there’s no native generate_series(date, date) function in PostgreSQL.

All you have to do is produce a set of data in a key/value form. You can replace this by anything else, e.g. to get cumulative revenue:

  source (key, value) as (
    select 
      payment_date::date::timestamp,
      sum(sum(amount)) over (order by payment_date::date::timestamp)
    from payment
    where extract(year from payment_date) < 2006
    group by payment_date::date::timestamp
    order by payment_date::date::timestamp
  )

… for which you (currently) have to patch the normalisation back to false (the padding of gaps isn’t correct yet). Also, to save space, I’ve made the bars a bit slimmer:

  '#' as characters,
  '' as separator,
  false as normalise_x

And now you’ll get this nice chart showing the exponential increase of revenue that we wish so dearly to show our managers (it’s not actually exponential, because now, the gaps aren’t respected, but duh, it’s just generated sample data):

chart                                                |
-----------------------------------------------------+
66872.4100 |                                        #|
62414.2493 |                                       ##|
57956.0886 |                                     ####|
53497.9280 |                                   ######|
49039.7673 |                                  #######|
44581.6066 |                               ##########|
40123.4460 |                              ###########|
35665.2853 |                            #############|
31207.1246 |                          ###############|
26748.9640 |                       ##################|
22290.8033 |                     ####################|
17832.6426 |                   ######################|
13374.4820 |                #########################|
8916.32133 |            #############################|
4458.16066 |        #################################|
-----------+-----------------------------------------|
           | 2005-05-24 00:00:00  2005-08-23 00:00:00|

Awesome, huh! Play around with it here:

Send your pull requests with improvements. Challenges:

  • Stacked charts
  • Fill gaps also for cumulative data
  • Other features?

The sky is the limit.



Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button