List all spaces with indicators in raw SQL

How to list Confluence spaces and extract different information (last contributor and date, number of pages, blogpost, like, comment, total size attachment...) in pure SQL? and all with one quick query.

Ok let's see how to do it step by step.

 

Context

The goal is to build an SQL query to list all the Confluence spaces of an Atlassian server or Atlassian Datacenter instance (*). Here is the list of information and indicators retrieved for each space in this request:

  • ID, name, key ;
  • Space description.
  • Creator and creation date of the space.
  • Most recent date of content creation with the creator.
  • Most recent date of modification of content with the modifier.
  • Number of page.
  • Number of blogpost.
  • Number of custom content.
  • Number of comment.
  • Number of like.
  • Number of attachments file.
  • Total size of all attachments file.

We did not do any specific benchmark to measure query execution time. But as an example we obtain in our test Confluence instance an execution time of 9.8 seconds for 768 spaces.

(*) Tested only on Confluence server 7.13 with PostgreSQL. The query is written for PostgreSQL database without use exotic SQL function, so there shouldn't be a lof of adjustement to do for other DBMS.

 

See final SQL request

https://gist.github.com/v20100v/c6a196ef364276b62429c0af90bad137

 

Building this query step by step

To list all Confluence spaces

SELECT
s.spaceid as space_id,
s.spacename as space_name,
s.spacekey as space_key
FROM spaces s

Let's add the description property of a space

To do this, we need to join two table. The description of a space is in the content table. Then just filter on content type condition (contenttype=SPACEDESCRIPTION) like this :

SELECT
s.spaceid as space_id,
s.spacename as space_name,
s.spacekey as space_key,
sd.body as space_description
FROM spaces s
LEFT JOIN content ON content.spaceid = s.spaceid
LEFT JOIN bodycontent sd ON sd.contentid = content.contentid
WHERE content.contenttype = 'SPACEDESCRIPTION'

Let's add the space's creator with the creation date

Still in the table space, we find the creator and creationdate fields that can be easily joined.

SELECT
s.spaceid as space_id,
s.spacename as space_name,
s.spacekey as space_key,
sd.body as space_description,
s.creationdate as space_creation_date,
up.username as space_creator_username,
u.display_name as space_creator_displayname,
u.email_address as space_creator_email
FROM spaces s
LEFT JOIN content ON content.spaceid = s.spaceid
LEFT JOIN bodycontent sd ON sd.contentid = content.contentid
LEFT JOIN user_mapping up ON up.user_key = s.creator
LEFT JOIN cwd_user u ON u.user_name = up.username
WHERE content.contenttype = 'SPACEDESCRIPTION'

Be carefull to the interpretation of the lastmodifier and lastmoddate field! It is by no means the last contributor in the space who would have modified a page or deposited an attachment. In fact, this is the last user to have modified the description field of the space.

We will see below how to identify the last contributor to a space, and more precisely the last creator of new content, and the last user to have modified content.

Let's add the last creator and the last creation date foreach space

Just use the MAX() aggregate function on the creationdate field of the content table, and group the result by space. In this example, we have excluded the description of a space from the contributions of a space. So if you want to focus only on pages or blogposts, it will suffice to filter on these types of content.

SELECT c2.spaceid, MAX(c2.creationdate) as last_creationdate
FROM content c2
WHERE c2.content_status = 'current'
AND c2.creationdate IS NOT NULL
AND c2.contenttype != 'SPACEDESCRIPTION'
GROUP BY c2.spaceid

Ok it's easy to fetch the last creation date, but to add the associated creator it's a bit more difficult. To do this, we use an SQL subquery like this.

SELECT c.spaceid, up.username, c.creationdate
FROM content c
LEFT JOIN user_mapping up ON c.creator = up.user_key,
(
SELECT c2.spaceid, MAX(c2.creationdate) as last_creationdate
FROM content c2
WHERE c2.content_status = 'current'
AND c2.creationdate IS NOT NULL
GROUP BY c2.spaceid
) last_creation
WHERE c.content_status = 'current'
AND c.spaceid = last_creation.spaceid
AND c.creationdate = last_creation.last_creationdate

Ok nice! We can now add to our starting query with a simple join on the space identifier.

SELECT
s.spaceid as space_id,
s.spacename as space_name,
s.spacekey as space_key,
sd.body as space_description,
s.creationdate as space_creation_date,
up.username as space_creator_username,
u.display_name as space_creator_displayname,
u.email_address as space_creator_email,
last_creation.username as last_creator,
last_creation.creationdate as last_creation_date

FROM spaces s

LEFT JOIN (
SELECT c.spaceid,up.username, c.creationdate
FROM content c
LEFT JOIN user_mapping up ON c.creator = up.user_key,
(
SELECT c2.spaceid, MAX(c2.creationdate) as last_creationdate
FROM content c2
WHERE c2.content_status = 'current'
AND c2.creationdate IS NOT NULL
AND c2.contenttype != 'SPACEDESCRIPTION'
GROUP BY c2.spaceid
) last_creation
WHERE c.content_status = 'current'
AND c.spaceid = last_creation.spaceid
AND c.creationdate = last_creation.last_creationdate
) last_creation ON last_creation.spaceid = s.spaceid

LEFT JOIN content ON content.spaceid = s.spaceid
LEFT JOIN bodycontent sd ON sd.contentid = content.contentid
LEFT JOIN user_mapping up ON up.user_key = s.creator
LEFT JOIN cwd_user u ON u.user_name = up.username
WHERE content.contenttype = 'SPACEDESCRIPTION' 

Let's add the last modifier and the last modification date foreach space

We do the same thing by doing the following left join:

SELECT
(...)
last_modification.username as last_modifier,
last_modification.lastmoddate as last_modification_date

FROM spaces s


LEFT JOIN
(
SELECT c3.spaceid, up2.username, c3.lastmoddate
FROM content c3
LEFT JOIN user_mapping up2 ON c3.lastmodifier = up2.user_key,
(
SELECT c4.spaceid, MAX(c4.lastmoddate) as last_modificationdate
FROM content c4
WHERE c4.content_status = 'current'
AND c4.lastmoddate IS NOT NULL
AND c4.contenttype != 'SPACEDESCRIPTION'
GROUP BY c4.spaceid
) last_modification
WHERE c3.content_status = 'current'
AND c3.spaceid = last_modification.spaceid
AND c3.lastmoddate = last_modification.last_modificationdate
) last_modification ON last_modification.spaceid = s.spaceid

(...)

Let's count the number of page, blogpost, attachment file and custom content foreach space

Rather than separating this calculation into several queries with GROUP BY clause, we group them into a single sub-query for performance reasons. It is much faster to do a matrix transpose than to perform several joins. 

LEFT JOIN(
SELECT
c5.spaceid,
c5.contenttype,
CASE WHEN c5.contenttype = 'PAGE'
THEN COUNT(c5.contenttype) END nb_of_page,
CASE WHEN c5.contenttype = 'BLOGPOST'
THEN COUNT(c5.contenttype) END nb_of_blogpost,
CASE WHEN c5.contenttype = 'ATTACHMENT'
THEN COUNT(c5.contenttype) END nb_of_attachment,
CASE WHEN c5.contenttype = 'CUSTOM'
THEN COUNT(c5.contenttype) END nb_of_custom
FROM content c5
WHERE c5.content_status = 'current'
AND c5.contenttype != 'SPACEDESCRIPTION'
GROUP BY c5.contenttype, c5.spaceid
) t ON t.spaceid = s.spaceid

And finally we add it to our starting query with a simple join on the space identifier, as we saw previously. But this time we add an additional GROUP BY taking the MAX in order to denormalize the result, i.e. do the transpose.

SELECT DISTINCT
s.spaceid as space_id,
s.spacename as space_name,
s.spacekey as space_key,
s.spacetype as space_type,
s.spacestatus as space_status,
sd.body as space_description,
s.creationdate as space_creation_date,
up.username as space_creator_username,
u.display_name as space_creator_displayname,
u.email_address as space_creator_email,
MAX(t.nb_of_page) as nb_of_page,
MAX(t.nb_of_blogpost) as nb_of_blogpost,
MAX(t.nb_of_attachment) as nb_of_attachment,
MAX(t.nb_of_custom) as nb_of_custom

FROM spaces s

(...)

LEFT JOIN(
SELECT
c5.spaceid,
c5.contenttype,
CASE WHEN c5.contenttype = 'PAGE'
THEN COUNT(c5.contenttype) END nb_of_page,
CASE WHEN c5.contenttype = 'BLOGPOST'
THEN COUNT(c5.contenttype) END nb_of_blogpost,
CASE WHEN c5.contenttype = 'ATTACHMENT'
THEN COUNT(c5.contenttype) END nb_of_attachment,
CASE WHEN c5.contenttype = 'CUSTOM'
THEN COUNT(c5.contenttype) END nb_of_custom
FROM content c5
WHERE c5.content_status = 'current'
AND c5.contenttype != 'SPACEDESCRIPTION'
GROUP BY c5.contenttype, c5.spaceid
) t ON t.spaceid = s.spaceid

(...)

WHERE content
.contenttype = 'SPACEDESCRIPTION'
GROUP BY
s.spaceid, s.spacename, s.spacekey, s.spacetype,
s.spacestatus, sd.body, up.username, u.user_name,
u.display_name, u.email_address, last_creator,
last_creation_date, last_modifier, last_modification_date

Let's count the number of comment

A comment is also contain into the content table. But we must do a reflexive join to find the associated space, because a comment does not have an associated spaceid. And so you have to find the page associated with the comment to find the associated space.

SELECT DISTINCT
(...)

MAX(t2.nb_of_comment) as nb_of_comment,

(...)

LEFT JOIN(
SELECT c7.spaceid, COUNT(c6.contenttype) as nb_of_comment
FROM content c6
JOIN content c7 ON c6.pageid = c7.contentid
WHERE c6.prevver IS NULL
AND c6.content_status = 'current'
AND c6.contenttype = 'COMMENT'
GROUP BY c6.contenttype, c7.spaceid
) t2 ON t2.spaceid = s.spaceid

(...)

Let's count the number of likes

We use the likes table.

SELECT DISTINCT
(...)
MAX(t3.nb_of_like) as nb_of_like

(...)

LEFT JOIN
(
SELECT c8.spaceid, COUNT(l.id) as nb_of_like
FROM likes l
JOIN content c8 ON l.contentid = c8.contentid
WHERE c8.prevver IS NULL
AND c8.content_status = 'current'
GROUP BY c8.spaceid
) t3 ON t3.spaceid = s.spaceid

(...)

Let's add the total size of all attachments foreach space

We use the contentproperties table.

SELECT DISTINCT
(...)
MAX(t4.total_attachments_size) as total_attachments_size

(...)

LEFT JOIN
(
SELECT c9.spaceid, SUM(cp.LONGVAL) as total_attachments_size
FROM content c9
JOIN content c10 ON c9.contentid = c10.pageid
JOIN contentproperties cp ON c10.contentid = cp.contentid
WHERE c10.contenttype = 'ATTACHMENT'
GROUP BY c9.spaceid
) t4 ON t4.spaceid = s.spaceid

(...)

Here we go !

 

Bonus (trick)

What if I want to paginate the result?

To implement a server-side paging system (for a Confluence data mining API), the simplest idea is to add the usual statement "LIMIT $limit OFFSET $offset" at the end. But the performance gain is extremely limited with regard to the number of rows in each join to be processed. Indeed, in our test instance, extracting 10 spaces takes almost 9.1 seconds.

The paging gain is minimal. In fact, for the gain to be significant, the dataset of each join should be filtered with the following WHERE condition :

(...)

AND s.spaceid IN (
SELECT spaceid FROM spaces ORDER BY spaceid LIMIT 10 OFFSET 0
)

By adding this WHERE condition on every join, extracting 10 spaces takes almost 518ms in our test instance. Harder, Better, Faster... Stronger !!!

Below is an example of adding this pagination clause in the join that calculates the total size of the attachments. For the implementation with an application server (Java, PHP, nodeJs...), it is enough to replace the numerical values of this example by the variables: limit and offset.

SELECT DISTINCT
(...)

LEFT JOIN
(
SELECT c9.spaceid, SUM(cp.LONGVAL) as total_attachments_size
FROM content c9
JOIN content c10 ON c9.contentid = c10.pageid
JOIN contentproperties cp ON c10.contentid = cp.contentid
WHERE c10.contenttype = 'ATTACHMENT'
AND c10.spaceid IN (SELECT spaceid FROM spaces ORDER BY spaceid LIMIT 10 OFFSET 0)
GROUP BY c9.spaceid
) t4 ON t4.spaceid = s.spaceid

WHERE content.contenttype = 'SPACEDESCRIPTION'
AND s.spaceid IN (SELECT spaceid FROM spaces ORDER BY spaceid LIMIT 10 OFFSET 0)

GROUP BY
s.spaceid, s.spacename, s.spacekey, s.spacetype,
s.spacestatus, sd.body, up.username, u.user_name,
u.display_name, u.email_address, last_creator,
last_creation_date, last_modifier, last_modification_date

 

2 comments

Srinatha T
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 22, 2022

@Vincent Blain . Wow. So many useful SQL's in one page. Very useful article. Thank you for sharing this. 

Like Vincent Blain likes this
Kishan Sharma
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
August 19, 2024

Great info, thank you for sharing @Vincent Blain !!

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events