How to identify which page facing unrecoverable error and blocking Space PDF export

 

As one of best practices for business continuity planning (BCP), your team might want to export a whole space to PDF. However, some of Confluence page possibly contain invalid data and they block PDF export due to unrecoverable exceptions.

I'll introduce some diagnosis approaches to identify which page is causing the export unavailability.

 

Diagnosis Approach #1 

Here's the way to identify the problematic page. Note that this is only applicable for Confluence Server and not for Confluence Cloud.

 

  1. Visit "Logging and Profiling" at http://__YOUR_HOST__/admin/viewlog4j.action
  2. Submit the following setting from "Add New Entry"
    • Class/Package Name: com.atlassian.confluence.extra.flyingpdf
    • New Level: ALL
  3. Change the existing log-level from "Existing Levels" as:
    • Class/Package Name: com.atlassian.confluence.importexport.impl.PdfExporter
    • New Level: ALL
  4. Click "Save" at the bottom of the page
  5. Reproduce the problem (by exporting space in PDF)
  6. Go back to "Logging and Profiling" and set "Log4j Logging" as "Production"

 

pdf 2018-06-07 10.38.06.png

 

This setting makes Confluence to output the helpful logs to identify the problematic page like below:

 

DEBUG [Long running task: PDF Space Export] [extra.flyingpdf.html.RenderedXhtmlBuilder] renderToHtml Rendering to exported XHTML page id=___PAGE_ID__ (__PAGE_TITLE__)

Diagnosis Approach #2

Here's another way to identify the problematic page. It's time-consuming but applicable for both Confluence Server and Confluence Cloud. Basically, you need to export all the pages in the target space one-by-one so that the pages containing invalid data complain error.

 

  1. Make sure to install jq command
  2. Edit the script below to change the variables to your own settings
  3. Grant the execution permission to the script
  4. Run the script and get a list of URLs in stdout
  5. Open the URLs in your browser

 

#!/bin/bash
# Usage:
# chmod u+x ./snippet.sh
# bash ./snippet.sh

# Advisory for Atlassian Cloud
# Despite of using your password, you should generate an API token at https://id.atlassian.com/manage/api-tokens for calling APIs.
C_SCHEME="https"
C_HOST="example.com:8090"
C_USER="admin"
C_PASSWORD=""
TARGET_SPACE="SP"

curl -u ${C_USER}:${C_PASSWORD} -sG "${C_SCHEME}://${C_HOST}/rest/api/content/search" --data-urlencode "cql=(type in (page, blogpost) and space in (${TARGET_SPACE}))" \
| jq .results[]?.id \
| xargs -I{} echo "${C_SCHEME}://${C_HOST}/spaces/flyingpdf/pdfpageexport.action?pageId={}"

 

To open multiple URLs at once, adopt these extensions if needed.

 

If you are system admin of Confluence Server, you also can run the query below to list up page IDs in the target space.

 

select contentid from content c
join spaces s
on s.spacekey in ('SP')
and c.spaceid = s.spaceid
and c.contenttype in ('PAGE', 'BLOGPOST')

 

Related Feature Request

 

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events