Could Scriptrunner be used to identify Confluence pages with broken image links?

Anja Brkljacic
Contributor
September 2, 2020

Hello, 

The company I work for, among with many others, has been affected by this Confluence bug (https://jira.atlassian.com/browse/CONFSERVER-55928) that causes images in Confluence pages to not appear within the page. Atlassian has released a fix in the 7.7.3 version of Confluence that will prevent this from happening again, but won't fix existing pages, or identify pages that are affected. 

Does anyone know if it would be possible to write code using Scriptrunner for Confluence that would identify pages that are affected by this issue, so we can quickly find all the affected pages and correct them?

Thank you,

Anja Brkljacic

3 answers

1 accepted

1 vote
Answer accepted
Aidan Derossett [Adaptavist]
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 3, 2020

Hey there Anja! :D

In short, yes absolutely you can do this! :)

I've reproduced the bug that you linked and it looks like any page that is experiencing the bug will contain some place-holder "unknown-attachment" image in the Storage Format that looks something like this:

<p><br /></p>
<p><ac:image><ri:url ri:value="{hiddenPersonalUrl}/plugins/servlet/confluence/placeholder/unknown-attachment?locale=en_US&amp;version=2" /></ac:image></p>

So, one thing we can do is search all of your pages and their content to see if the page contains this "unknown-attachment" image. You could technically just create a custom script that searches every single page in your instance, but something like that could take a looooooong long time to run...so it's probably not the best approach. Instead, an easier way to do this would be to just create a simple Search Extractor. Doing this will allow you to go through each Space individually (instead of every space all at once) and identify troublesome pages. I tested this locally and used a search extractor with the following code as my Inline script:

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
Page page = searchable as Page
def pageBodyContent = page.bodyContents
def containsUnknown = pageBodyContent.find { it.body.contains("unknown-attachment") }

if (containsUnknown) {
document.add(new StringField("containsUnknown", "true", Field.Store.YES))
} else {
document.add(new StringField("containsUnknown", "false", Field.Store.YES))
}
}

Keep in mind, after creating this extractor you'll need to reindex your instance so that all of your content is appropriately flagged with the "containsUnknown" field. But after indexing, you should be able to run an Advanced Search like the following and specify which space(s) you'd like to search:


Screen Shot 2020-09-03 at 4.41.35 PM.png

Now, disclaimer, I only tested this on a very small group of test pages and in an instance that's basically empty, so your mileage may vary. But I'd give that a shot and see if it returns the problem pages that you're looking for. 

Hope that helps! :D

Best,

Aidan

Anja Brkljacic
Contributor
September 4, 2020

Thanks, Aidan! We will try this out and let you know how it worked/mark it as the answer :)

Anja Brkljacic
Contributor
September 9, 2020

We tried this and it seems to have worked so far :) thanks, again!

2 votes
Aidan Derossett [Adaptavist]
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 3, 2020

Hey there Anja! :D

In short, yes absolutely you can do this! :)

I've reproduced the bug that you linked and it looks like any page that is experiencing the bug will contain some place-holder "unknown-attachment" image in the Storage Format that looks something like this:

<p><br /></p>
<p><ac:image><ri:url ri:value="{hiddenPersonalUrl}/plugins/servlet/confluence/placeholder/unknown-attachment?locale=en_US&amp;version=2" /></ac:image></p>

So, one thing we can do is search all of your pages and their content to see if the page contains this "unknown-attachment" image. You could technically just create a custom script that searches every single page in your instance, but something like that could take a looooooong long time to run...so it's probably not the best approach. Instead, an easier way to do this would be to just create a simple Search Extractor. Doing this will allow you to go through each Space individually (instead of every space all at once) and identify troublesome pages. I tested this locally and used a search extractor with the following code as my Inline script:

import com.atlassian.confluence.pages.Page
import org.apache.lucene.document.Field
import org.apache.lucene.document.StringField

if (searchable instanceof Page) {
Page page = searchable as Page
def pageBodyContent = page.bodyContents
def containsUnknown = pageBodyContent.find { it.body.contains("unknown-attachment") }

if (containsUnknown) {
document.add(new StringField("containsUnknown", "true", Field.Store.YES))
} else {
document.add(new StringField("containsUnknown", "false", Field.Store.YES))
}
}

Keep in mind, after creating this extractor you'll need to reindex your instance so that all of your content is appropriately flagged with the "containsUnknown" field. But after indexing, you should be able to run an Advanced Search like the following and specify which space(s) you'd like to search:


Screen Shot 2020-09-03 at 4.41.35 PM.png

Now, disclaimer, I only tested this on a very small group of test pages and in an instance that's basically empty, so your mileage may vary. But I'd give that a shot and see if it returns the problem pages that you're looking for. 

Hope that helps! :D

Best,

Aidan

0 votes
Jessie Wang_ScriptRunner_The Adaptavist Group
Atlassian Partner
September 28, 2022

Hi all, this script is now available in our script library for ScriptRunner for Confluence Server/DC (tested by our engineers).

Feel free to copy or customise it as you wish https://library.adaptavist.com/entity/identify-pages-with-broken-image-links

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
SERVER
VERSION
7.5.2
TAGS
AUG Leaders

Atlassian Community Events