Find and identify all empty pages in Confluence

As a user of out company wiki based on Confluence I want to find and itentify pages that are empty apart from a title so than I can decide whether to trash them or ping peers so that they can fill them with our knowledge.

4 answers

1 accepted

2 votes
Accepted answer

Here's a macro that will do it. Not the most efficient macro, but it sounds like you just need to run this once to get a list. I can't vouch for it being perfect, but I did a quick test and it seems to work.

## Macro title: Find Empty Pages
## Macro has a body: N
## Body processing: n/a
## Output: HTML
##
## Developed by: Matthew J. Horn
## Date created: 07/31/2013
## @noparams

#set ($pageListArray = [])
#set ($spaceHome = $space.getHomePage())

#macro ( process $rp )
  #set ($pagelist = $rp.getSortedChildren() )  ## returns List<Page>
  #foreach( $child in $pagelist )
    #set($p = $pageListArray.add( $child ) )
    #if( $child.hasChildren() )
      #process ( $child )
    #end
  #end
#end

#process ( $spaceHome )

<table class="confluenceTable">
 <tbody>
 <tr>
  <th class="confluenceTh">Title</th>
  <th class="confluenceTh">Size</th>
 </tr>

 #foreach( $child in $pageListArray)   ## child is of type Page
   <tr>
     <td class="confluenceTd">$child.getTitle()</td>
     <td class="confluenceTd">$child.getBodyAsStringWithoutMarkup().length()</td>
   </tr>
 #end 

</tbody>
</table>

Matthew, this is working well....however how can i see this at complete instance level? I have some 20 odd spaces and want to know how many empty pages i have in the space.

Hi Stefan

did u tried an simple select on the database?

"select contentid from bodycontent where body is NULL" will show you all contentid's which don't have a body

"select * from content where contentid = XYZ" should list you some more information of that page(s).

Sure, you can combine those sql-querys within some joins or subselects, but thats sth i'm not into :-)

Kind regards
André

EDIT:

Hmm Confluence is tricky...
Body-column is CLOB and can't be combined out of the box...
I searched around and made some try+error and found:

SQL: select contentid from bodycontent where to_char(substr(body,0,100)) is NULL;

that should list all pages/contentid's where the first 100 chars are NULL :-)

Hi Andre, thank you for your reply. I forgot to say that I do not have database access at the moment. This could be a solution anyway but I would prefer a solution integrated on the advanced page for a given space for example. This seems not to exists yet?

I have modified the code and this gets me list of all the spaces with empty page names.

## Macro title: Find Empty Pages
## Macro has a body: N
## Body processing: n/a
## Output: HTML
##
## Developed by: Matthew J. Horn
## Date created: 07/31/2013
## @noparams
## Modified by: Pranjal Shukla on 13/1/2016
 
#set ($spaces = $spaceManager.getAllSpaces())
#foreach( $space in $spaces )
#set ($spaceHome = $space.getHomePage())
#set ($pageListArray = [])
 
#macro ( process $rp )
  #set ($pagelist = $rp.getSortedChildren() )  ## returns List<Page>
  #foreach( $child in $pagelist )
    #set($p = $pageListArray.add( $child ) )
    #if( $child.hasChildren() )
      #process ( $child )
    #end
  #end
#end
 
#process ( $spaceHome )
 
<h1>$space.getName()</h1>
<table class="confluenceTable">
 <tbody>
 <tr>
  <th class="confluenceTh">Title</th>
  <th class="confluenceTh">Size</th>
 </tr>
 
 #foreach( $child in $pageListArray)   ## child is of type Page
 #if( $child.getBodyAsStringWithoutMarkup().length()==0 )
   <tr>
     <td class="confluenceTd">$child.getTitle()</td>
     <td class="confluenceTd">$child.getBodyAsStringWithoutMarkup().length()</td>
   </tr>
 #end
 #end
 
</tbody>
</table>
#end

Hello,

maybe a late answer , but we had the same problem / requests from our users.  I will post my solution, maybe somebody can use it.

 

I checked Matthew J. Horn great answer, but it has some problems :

* it lists all the pages in a space, not only the empty ones

* only the page title is used , links to pages would be convenient

 

I did some tests and noted that it is hard to identify empty pages with 100% certainty when using the size of the content as a String. Note I didn't find a better way to check for an empty page, so that idd seems the best tool at our disposal.

So I tested with empty pages which have some layout (like sections) , the usage of macro's (without other text) : with or without ouput, adding very little text , very small images, etc...

I found that if we use a threshold of 10 (length of the String) almost all of the non-empty pages are filtered out , some false positives can remain

 

So starting from Matthew J. Horn solution I made this:

## Macro title: Find Empty Pages
## Macro has a body: N
## Body processing: n/a
## Output: HTML
##
## Original by: Matthew J. Horn : https://community.atlassian.com/t5/Confluence-questions/Find-and-identify-all-empty-pages-in-Confluence/qaq-p/131649
## Updated by: Loïc Dewerchin
## Date created: 07/31/2013
## @noparams

#set ($pageListArray = [])
#set ($spaceHome = $space.getHomePage())

#macro ( process $rp )
  #set ($pagelist = $rp.getSortedChildren() )  ## returns List<Page>
  #foreach( $child in $pagelist )
    #set($p = $pageListArray.add( $child ) )
    #if( $child.hasChildren() )
      #process ( $child )
    #end
  #end
#end

#process ( $spaceHome )

<ac:macro ac:name="note">
<ac:rich-text-body>
    <p>Add a warning about possible false positives</p>
  </ac:rich-text-body>
</ac:macro>

<table class="confluenceTable">
 <tbody>
 <tr>
  <th class="confluenceTh">Page</th>
  <th class="confluenceTh">Author</th>
  <th class="confluenceTh">Creation date</th>
  <th class="confluenceTh">Update date</th>
 </tr>

 #foreach( $child in $pageListArray)   ## child is of type Page

  #if( $child.getBodyAsStringWithoutMarkup().length() <= 10 )
     <tr>
       <td class="confluenceTd"><a href="$child.getUrlPath()">$child.getTitle()</a></td>
       <td class="confluenceTd">$child.getCreatorName()</td>
       <td class="confluenceTd">$child.getCreationDate()</td>
       <td class="confluenceTd">$child.getLastModificationDate()</td>
     </tr>
   #end
 #end

</tbody>
</table>

 

This user macro only shows the empty pages , and provides the link to the page + some additional info.

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Oct 24, 2018 in Confluence

Atlassian Research opportunity with Confluence templates

Do you use templates with Confluence? Take part in a remote 1-hr workshop. You'll receive USD $100 for your time!   We're looking for people to participate in a   remote 1-hr workshop...

1,065 views 16 14
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you