Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Find and identify all empty pages in Confluence

Stefan Glase July 30, 2013

As a user of out company wiki based on Confluence I want to find and itentify pages that are empty apart from a title so than I can decide whether to trash them or ping peers so that they can fill them with our knowledge.

5 answers

1 accepted

2 votes
Answer accepted
Matthew J. Horn
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 30, 2013

Here's a macro that will do it. Not the most efficient macro, but it sounds like you just need to run this once to get a list. I can't vouch for it being perfect, but I did a quick test and it seems to work.

## Macro title: Find Empty Pages
## Macro has a body: N
## Body processing: n/a
## Output: HTML
##
## Developed by: Matthew J. Horn
## Date created: 07/31/2013
## @noparams

#set ($pageListArray = [])
#set ($spaceHome = $space.getHomePage())

#macro ( process $rp )
  #set ($pagelist = $rp.getSortedChildren() )  ## returns List<Page>
  #foreach( $child in $pagelist )
    #set($p = $pageListArray.add( $child ) )
    #if( $child.hasChildren() )
      #process ( $child )
    #end
  #end
#end

#process ( $spaceHome )

<table class="confluenceTable">
 <tbody>
 <tr>
  <th class="confluenceTh">Title</th>
  <th class="confluenceTh">Size</th>
 </tr>

 #foreach( $child in $pageListArray)   ## child is of type Page
   <tr>
     <td class="confluenceTd">$child.getTitle()</td>
     <td class="confluenceTd">$child.getBodyAsStringWithoutMarkup().length()</td>
   </tr>
 #end 

</tbody>
</table>

Pranjal Shukla January 12, 2016

Matthew, this is working well....however how can i see this at complete instance level? I have some 20 odd spaces and want to know how many empty pages i have in the space.

2 votes
Loïc Dewerchin March 20, 2018

Hello,

maybe a late answer , but we had the same problem / requests from our users.  I will post my solution, maybe somebody can use it.

 

I checked Matthew J. Horn great answer, but it has some problems :

* it lists all the pages in a space, not only the empty ones

* only the page title is used , links to pages would be convenient

 

I did some tests and noted that it is hard to identify empty pages with 100% certainty when using the size of the content as a String. Note I didn't find a better way to check for an empty page, so that idd seems the best tool at our disposal.

So I tested with empty pages which have some layout (like sections) , the usage of macro's (without other text) : with or without ouput, adding very little text , very small images, etc...

I found that if we use a threshold of 10 (length of the String) almost all of the non-empty pages are filtered out , some false positives can remain

 

So starting from Matthew J. Horn solution I made this:

## Macro title: Find Empty Pages
## Macro has a body: N
## Body processing: n/a
## Output: HTML
##
## Original by: Matthew J. Horn : https://community.atlassian.com/t5/Confluence-questions/Find-and-identify-all-empty-pages-in-Confluence/qaq-p/131649
## Updated by: Loïc Dewerchin
## Date created: 07/31/2013
## @noparams

#set ($pageListArray = [])
#set ($spaceHome = $space.getHomePage())

#macro ( process $rp )
  #set ($pagelist = $rp.getSortedChildren() )  ## returns List<Page>
  #foreach( $child in $pagelist )
    #set($p = $pageListArray.add( $child ) )
    #if( $child.hasChildren() )
      #process ( $child )
    #end
  #end
#end

#process ( $spaceHome )

<ac:macro ac:name="note">
<ac:rich-text-body>
    <p>Add a warning about possible false positives</p>
  </ac:rich-text-body>
</ac:macro>

<table class="confluenceTable">
 <tbody>
 <tr>
  <th class="confluenceTh">Page</th>
  <th class="confluenceTh">Author</th>
  <th class="confluenceTh">Creation date</th>
  <th class="confluenceTh">Update date</th>
 </tr>

 #foreach( $child in $pageListArray)   ## child is of type Page

  #if( $child.getBodyAsStringWithoutMarkup().length() <= 10 )
     <tr>
       <td class="confluenceTd"><a href="$child.getUrlPath()">$child.getTitle()</a></td>
       <td class="confluenceTd">$child.getCreatorName()</td>
       <td class="confluenceTd">$child.getCreationDate()</td>
       <td class="confluenceTd">$child.getLastModificationDate()</td>
     </tr>
   #end
 #end

</tbody>
</table>

 

This user macro only shows the empty pages , and provides the link to the page + some additional info.

Jochen Berdi June 16, 2021

Great Macro...

Is it possible to limit the search by labels?

1 vote
Andre Lehmann
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 30, 2013

Hi Stefan

did u tried an simple select on the database?

"select contentid from bodycontent where body is NULL" will show you all contentid's which don't have a body

"select * from content where contentid = XYZ" should list you some more information of that page(s).

Sure, you can combine those sql-querys within some joins or subselects, but thats sth i'm not into :-)

Kind regards
André

EDIT:

Hmm Confluence is tricky...
Body-column is CLOB and can't be combined out of the box...
I searched around and made some try+error and found:

SQL: select contentid from bodycontent where to_char(substr(body,0,100)) is NULL;

that should list all pages/contentid's where the first 100 chars are NULL :-)

Stefan Glase July 30, 2013

Hi Andre, thank you for your reply. I forgot to say that I do not have database access at the moment. This could be a solution anyway but I would prefer a solution integrated on the advanced page for a given space for example. This seems not to exists yet?

0 votes
Jonny Carter
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 20, 2020

I'm admittedly rather late to the party, but this would be a good use case for ScriptRunner for Confluence's Search Extractors.

A custom search extractor with the following code:

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
Page page = searchable as Page
if (page.bodyAsStringWithoutMarkup.isEmpty() || page.bodyAsStringWithoutMarkup.isAllWhitespace()) {
document.add(new StringField("empty", "true", Field.Store.YES))
}
}

Will find all pages where the body is either empty or all whitespace. Of course, you can tweak the above script to match your own ideas about what constitutes an "empty" page.

You'll need to rebuild Confluence's indexes afterward, but then a simple Confluence search for empty : true should find any empty pages.

0 votes
Pranjal Shukla January 12, 2016

I have modified the code and this gets me list of all the spaces with empty page names.

## Macro title: Find Empty Pages
## Macro has a body: N
## Body processing: n/a
## Output: HTML
##
## Developed by: Matthew J. Horn
## Date created: 07/31/2013
## @noparams
## Modified by: Pranjal Shukla on 13/1/2016
 
#set ($spaces = $spaceManager.getAllSpaces())
#foreach( $space in $spaces )
#set ($spaceHome = $space.getHomePage())
#set ($pageListArray = [])
 
#macro ( process $rp )
  #set ($pagelist = $rp.getSortedChildren() )  ## returns List&lt;Page&gt;
  #foreach( $child in $pagelist )
    #set($p = $pageListArray.add( $child ) )
    #if( $child.hasChildren() )
      #process ( $child )
    #end
  #end
#end
 
#process ( $spaceHome )
 
&lt;h1&gt;$space.getName()&lt;/h1&gt;
&lt;table class="confluenceTable"&gt;
 &lt;tbody&gt;
 &lt;tr&gt;
  &lt;th class="confluenceTh"&gt;Title&lt;/th&gt;
  &lt;th class="confluenceTh"&gt;Size&lt;/th&gt;
 &lt;/tr&gt;
 
 #foreach( $child in $pageListArray)   ## child is of type Page
 #if( $child.getBodyAsStringWithoutMarkup().length()==0 )
   &lt;tr&gt;
     &lt;td class="confluenceTd"&gt;$child.getTitle()&lt;/td&gt;
     &lt;td class="confluenceTd"&gt;$child.getBodyAsStringWithoutMarkup().length()&lt;/td&gt;
   &lt;/tr&gt;
 #end
 #end
 
&lt;/tbody&gt;
&lt;/table&gt;
#end

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events