Bugs when exporting sites to pdf

April 22, 2022

Hi,

I wanted to report bug but for some reason it gets me here, instead of some jira, so feel free to move it if you want.
The thing is that I was looking for an option to export all confluence pages in my organization to some offline form e.g. pdf. I've found that it's possible to do that from confluence settings ... if you have proper permissions. I didn't have permissions to do that and I'm not a fan of discussing such things with support, so I thought that I will write a script which will just do it.

Our confluence has ~30 spaces with ~12 000 pages in total, so as you may expect during my task I have experienced couple issues which might be interesting for you. However, before I start pointing them out please note that I'm here to discuss the merit and I won't involve in discussions with people who are overly-attached-to-some-forum-rules. If you want to thrown my findings to trash and remove this thread then I don't care, it's your forum and your decision.

Please also note that I don't know what confluence version we are using and I might not have an option to check it. I don't know on what server it runs and I can't check on some other instance, so advanced reproduction and logs gathering is up to Atlassian team. Anyway, here is the list of my findings:

1. Time growths exponentially when querying pages. We have limit 500 per query, so I will use that limit but even if I set limit to 1 the result is the same. Please take a look at those 4 querries:

a) <domain>/rest/api/content?type=page&start=0&limit=500
b) <domain>/rest/api/content?type=page&start=500&limit=500
c) <domain>/rest/api/content?type=page&start=3000&limit=500
d) <domain>/rest/api/content?type=page&start=11000&limit=500

Query a) finishes in 1 second. Query b) finishes in 2-3 seconds. Query c) finishes in 1 minute. Query d) finished in 25 minutes.
At first I thought that maybe server noticed unusual behavior from my site and it's some protection mechanism. However, I've tried multiple times in 2 days and always the bigger start id I was giving the longer it took to return results.

2. Pdf generation stucks forever if there are unaccessible resources. I have noticed that for some pageId's my script hung. It was not able to export page even after leaving it for 2-3 hours. At first I thought that it's some issue with my script, however then I've tried to do that manually from the browser and it also hung forever, so I've checked those pages more carefully. It happened only 5 times per 12 000 pages but it happened always with the same pages. The thing which was common between those 5 pages was that there were some things hidden behind some permissions. I also found a note on one page that in order to see this page fully I would need to be in some group. It doesn't really matter but the real bug is that pdf generation takes forever instead of just simply finishing immediately (as it does for pages to which I didn't have access at all).

3. Huge pdfs generated when photos are involved. Usually generated pdfs had ~40 KB, so I was surprised when I saw that one of the pages had 450 MB. I've visited that page manually and I've noticed that it only contains 12 photos. I thought that maybe someone has uploaded those photos in big resolution, so I've downloaded them manually and it turned out that one photo has 8 MB. It's not small but still pdf should have ~100 MB, not 450 MB, so I think pdf generation could be optimized.

/ M

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Bugs when exporting sites to pdf

4 answers

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events