How to calculate percentile (90th) in a table transformer?

Green_ Desmond
Contributor
July 3, 2024

How do I calculate the 90th percentile of a series of numbers in a table, using a table transformer macro?

Let's say I want the 90th percentile of the values in the 'Elapsed Time' column of a table. What's would be the SQL and easiest way to do this, as I can't see any dedicated functions for this?

1 answer

1 accepted

2 votes
Answer accepted
Stiltsoft support
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 4, 2024

Hi @Green_ Desmond ,

Think you need smth like a PECENTILE function in Excel. So you may take the Table Spreadsheet or Spreadsheet from Table (if your source table is a reused one or macro generated) macros and look into the PERCENTILE_INC and PERCENTILE_EXC functions.

Green_ Desmond
Contributor
July 4, 2024

That could work.

I created a Table Spreadsheet and tried to do some logic to filter rows where the value in column F=1, and to then work out the 90th percentile of those qualifying rows. I'll want to do similar where the value of F=2, or 3 or 4. But one thing at a time.

I have that first calculation working in Excel, as:

=PERCENTILE.INC(IF(F:F=1,K:K),0.9)

If I try it in Confluence as:

=PERCENTILE_INC(IF(F:F=1,K:K),0.9)

the I get a #VALUE! error, and also the same when I try and avoid the headers with:

=PERCENTILE_INC(IF(F2:F1000=1,K2:K1000),0.9)

What do you suggest?

Stiltsoft support
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 4, 2024

We can suggest splitting your functions.

Firstly, you may filter out your rows with F=1, F=2 and so on:

=IF(F1=1,,K1)

If F=1, then an empty cell will be set. If F is not 1, then the according value from the K column will be taken. You copy this function across the whole column.

Then for each of these new columns (with F=1, 2, 3 and so on) you may apply the PERCENTILE_INC function.

Green_ Desmond
Contributor
July 4, 2024

I can use a Table Spreadsheet and then a Table Spreadsheet Include to report on a specifc range for the 90th percentile values I'll have worked out, through the use of helper columns etc.

All seems a bit messy and I try to use a table filter or table transformer and the summary data has gridlines and formatting issues.

Is there no way to work out percentiles through some SQL?

Stiltsoft support
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 4, 2024

I don't think that there is a special function in the AlaSQL library on which the Table Transformer macro is based on.

The workaround may be to count percentiles manually: sort your initial table, assign indexes to each row, separate the required number of rows for each case and so on. It seems even more messier and overloaded.

Green_ Desmond
Contributor
July 4, 2024

Yes, I want to try and solve this all within SQL.

The following seems close to working in table transformer, but I get a 'TypeError: Cannot read properties of undefined (reading 'Elapsed Time')' error.

SELECT T1.'Elapsed Time'
FROM (
SELECT T1.'Elapsed Time',
@row_num = @row_num + 1 AS row_num
FROM (SELECT @row_num = 0) AS init,
(SELECT T1.'Elapsed Time'
FROM T1
WHERE T1.'Priority' = 1
ORDER BY T1.'Elapsed Time') AS ordered
) AS ranked
WHERE row_num = (SELECT FLOOR(0.9 * COUNT(*)) + 1
FROM T1
WHERE T1.'Priority' = 1);

ChatGPT gives me various other alternatives and another one which was close to working was using LIMIT, but this function seemed to need a literal rather than a variable.

SELECT @total_count = COUNT(*) FROM T1 WHERE T1.'Priority' = 1; 

SELECT @percentile_rank = FLOOR(0.9 * @total_count) + 1;

SELECT T1.'Elapsed Time' AS '90th Percentile Elapsed Time' FROM T1

WHERE T1.'Priority'= 1 ORDER BY T1.'Elapsed Time'

LIMIT @percentile_rank, 1;

 There are other options with temporary tables, but that couldn't get that to work in Confluence SQL.

All I want is for this SQL to return a plain text value, the 90th percentile of those 'Elapsed Time' values where 'Priority' =1, hence the wish to avoid spreadsheets etc.

Stiltsoft support
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 4, 2024

Don't refer to the columns as 

T1.'Elapsed Time'

in external SELECTs. That's from your error seems to come.

The T1 was your initial table, you can refer to it as T1 only in the first internal SELECT.

In the upper (high-level, external) SELECTS you need to refer to your columns without the T1 prefix: 'Elapsed Time'.

Also please note that you may split your complex SQL query and place several Table Transformers on top of each other. Sometimes it helps if you need to refer to filtered and sorted tables several times.

Green_ Desmond
Contributor
July 4, 2024

At least it is working without error now, but returning no rows.

Still not sure it is right, based on what you last said?

That row_num value is returning 4, so I was hoping it would match on that index.

 

SELECT 'Elapsed Time' FROM ( SELECT 'Elapsed Time', @row_num = @row_num + 1 AS row_num FROM (SELECT @row_num = 0) AS init, (SELECT T1.'Elapsed Time' FROM T1 WHERE T1.'Priority' = 1 ORDER BY T1.'Elapsed Time') AS ordered ) AS ranked WHERE row_num = (SELECT FLOOR(0.9 * COUNT(*)) + 1 FROM T1 WHERE T1.'Priority' = 1);
Stiltsoft support
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 4, 2024

Our developers suggest the following query:

SET @id = (SELECT ROUND(0.9*COUNT(*)) FROM T1 WHERE T1.'F' = 1);
SELECT 'N' FROM (SELECT * FROM T1 ORDER BY T1.'N') WHERE ROWNUM(*) = @id

The 'F' column is the column which you filter (=1 or !=1), the 'N' column is used to count percentile.

Green_ Desmond
Contributor
July 4, 2024

That wasn't quite working.

I think the inner SQL also has to include the WHERE clause.

SET @id = (SELECT ROUND(0.9*COUNT(*)) FROM T1 WHERE T1.'Priority' = 1);

SELECT 'Elapsed Time' FROM (SELECT * FROM T1 WHERE T1.'Priority'=1 ORDER BY T1.'Elapsed Time')
WHERE ROWNUM(*) = @id

 

Like Stiltsoft support likes this
Stiltsoft support
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 5, 2024

Our developers looked into the query once more and provided two options:

  • With ranking and better rounding

SET @id = (SELECT CEIL(0.9*COUNT(*)) FROM T1 WHERE T1.'F' = 1);
SELECT 'N' FROM (SELECT * FROM T1 WHERE T1.'F' = 1 ORDER BY T1.'N') WHERE ROWNUM(*) = @id

  • With interpolation that works similar to Excel

SET @x = (SELECT 0.9*(COUNT(*) - 1) + 1 FROM T1 WHERE T1.'F' = 1);
SET @lower =
(SELECT 'N' FROM (SELECT * FROM T1 WHERE T1.'F' = 1 ORDER BY T1.'N') WHERE ROWNUM(*) = FLOOR(@x));
SET @Higher =
(SELECT 'N' FROM (SELECT * FROM T1 WHERE T1.'F' = 1 ORDER BY T1.'N') WHERE ROWNUM(*) = CEIL(@x));
SELECT @lower + (@higher - @lower) * (@x - FLOOR(@x)) AS 'N'

The feature request to introduce the PERCENTILE function in the Table Transformer has been also created.

Green_ Desmond
Contributor
July 5, 2024

I really like the interpolation, the results I was getting were somehow a little blunt without this. As you say, this is now comparable to the Excel PERCENTILE.INC calculation, albeit we had to replicate the logic!

That's really magic - thank you!

Like # people like this

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events