How to Aggregate Document Data over Rolling 12-Month Periods for U-Chart Analysis in PostgreSQL?

Donie March 3, 2024

Hello Atlassian Community,

I am working on a SQL query in PostgreSQL to evaluate document data across four rolling 12-month periods. The objective is to use the ouput to create a U chart that illustrates the defectiveness of documents based on specific criteria. Each document (identified by contentid) is evaluated for defectiveness within these periods based on three criteria:

  1. Update Status: A document is considered to have passed if it has been updated in the last 12 months.
  2. View Count: A document passes if it has accumulated 70 views or more in the last 12 months.
  3. Document Version: A document passes if it has a version number greater than 3.

A document is deemed defective if it fails to meet one of the above criteria.
A document passes if it meets two or more of the criteria above.  The challenge I’m facing involves aggregating this data correctly over each rolling period and calculating the defectiveness based on the criteria mentioned.

The Specific Challenges Are:

  • Aggregating Over Rolling Periods: I need to evaluate each document’s status over distinct rolling 12-month periods (Q1 2022 to Q4 2022, Q2 2022 to Q1 2023, etc.). I am unsure how to structure the query to dynamically adjust for these periods and correctly aggregate document data.
  • Calculating Defectiveness: For each document in each period, I must calculate whether it meets the criteria listed above. The calculation of whether a document is defective (based on the criteria) should result in a dataset that provides the number of defective documents per period, which will be used to generate a U chart.

The Query I’ve Attempted:

WITH Periods AS (
 SELECT 'Q1 2022 to Q4 2022' AS Period, DATE '2022-01-01' AS StartDate, DATE '2022-12-31' AS EndDate
 UNION ALL
 SELECT 'Q2 2022 to Q1 2023', DATE '2022-04-01', DATE '2023-03-31'
 UNION ALL
 SELECT 'Q3 2022 to Q2 2023', DATE '2022-07-01', DATE '2023-06-30'
 UNION ALL
 SELECT 'Q4 2022 to Q3 2023', DATE '2022-10-01', DATE '2023-09-30'
),
DocumentEvaluation AS (
 SELECT
   p.Period,
   c.contentid,
   -- Determine if the document version is greater than 3
   (MAX(c.version) > 3) AS VersionPass,
   -- Determine if the document has been updated within the last 12 months of the period end date
   BOOL_OR(MAX(c.lastmoddate) > (p.EndDate - INTERVAL '1 year')) AS UpdatePass,
   -- Determine if the document has 70 or more views within the last 12 months of the period end date
   COUNT(CASE WHEN ao."LAST_VIEW_DATE" BETWEEN (p.EndDate - INTERVAL '1 year') AND p.EndDate THEN 1 ELSE 0 END) >= 70 AS ViewPass
 FROM
   public.content c
   LEFT JOIN public."AO_92296B_AORECENTLY_VIEWED" ao ON c.contentid = ao."CONTENT_ID",
   Periods p
 WHERE
   -- Include documents created at any time up to the end date of the period
   c.creationdate <= p.EndDate
 GROUP BY
   p.Period, c.contentid
),
DefectiveDocuments AS (
 SELECT
   Period,
   contentid,
   -- A document is defective if it fails any of the criteria
   (NOT VersionPass OR NOT UpdatePass OR NOT ViewPass) AS IsDefective
 FROM
   DocumentEvaluation
)
SELECT
 Period,
 -- Count the total number of documents evaluated in the period as SampleSize
 COUNT(*) AS SampleSize,
 -- Sum up the documents marked as defective
 SUM(CASE WHEN IsDefective THEN 1 ELSE 0 END) AS Defects
FROM
 DefectiveDocuments
GROUP BY
 Period
ORDER BY
 Period;

My Questions Are:

  • How can I correctly structure the SQL query to evaluate each document against the rolling 12-month periods defined?
  • What is the best approach to calculate whether a document is defective based on the three criteria mentioned, within the context of each rolling period?
  • Is there an efficient way to aggregate this data in PostgreSQL that facilitates easy analysis and visualization in a U chart?

Useful link:  Confluence Data Model

I appreciate any guidance or examples you can provide to help address these challenges. Thank you in advance for your assistance!

1 answer

0 votes
Donie March 4, 2024

This is not an answer just an addition to my original post, any insight to the unique identifier for page versioning in Confluence would also be beneficial.

I’m seeking assistance in understanding how Confluence tracks changes to page titles over time, specifically through its database structure. When a page title is updated—for instance, changing from “This is a page” to “This is the same page with a new title”—Confluence must use a unique identifier to associate both titles with the same page content across its versions.

Could anyone kindly share:

What is the name of this unique identifier?
• In which table within the Confluence database is this identifier located?

Understanding this will help in tracking how page titles evolve over time while maintaining a link to their original content. Thank you in advance for your insights!

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events