How to clean programmatically old drafts and purge trash with python for Confluence?

Hello!

In this article, we will learn how to maintenance with python and Confluence REST API and set script into your CI e.g. Atlassian Bamboo. Also, it is very easy to automate it and extend functionality.

Nowadays, so many Confluence instances have enabled function collaborative editing or long time did not maintenance. For example, in our instance, DB backup in text format has been decreased by ~ 16%.

 

Let’s start with trash cleaner functionality.

  1. Algorithm is easy, like get all pages from trash, remove related pages.
  2. And REST API reference located here e.g. https://docs.atlassian.com/ConfluenceServer/rest/6.11.0/
  3. Next thing is easy for implement language, it is python with module requests.

 For more comfortable use raw Rest API I’m using python module with name atlassian-python-api. Hence code is so small and easy.

def clean_pages_from_space(confluence, space_key):
    """
    Remove all pages from trash for related space
    :param confluence:
    :param space_key:
    :return:
    """
    limit = 500
    flag = True
    step = 0
    while flag:
        values = confluence.get_all_pages_from_space_trash(space=space_key, start=0, limit=limit)
        step += 1
        if len(values) == 0:
            flag = False
            print("For space {} trash is empty".format(space_key))
        else:
            for value in values:
                print(value['title'])
                confluence.remove_page_from_trash(value['id'])

Feel free use this full example in script: https://github.com/atlassian-python-api/atlassian-python-api/blob/master/examples/confluence-trash-cleaner.py

 

 

Next step is clean draft pages.

Of course, in this use case we need to have some anchor for determine how old draft we should remove it

Therefore I am using variable

DRAFT_DAYS = 30

def clean_draft_pages_from_space(confluence, space_key, count, date_now):
    """
    Remove draft pages from space using datetime.now
    :param confluence:
    :param space_key:
    :param count:
    :param date_now:
    :return: int counter
    """
    pages = confluence.get_all_draft_pages_from_space(space=space_key, start=0, limit=500)
    for page in pages:
        page_id = page['id']
        draft_page = confluence.get_draft_page_by_id(page_id=page_id)
        last_date_string = draft_page['version']['when']
        last_date = datetime.datetime.strptime(last_date_string.replace(".000", "")[:-6], "%Y-%m-%dT%H:%M:%S")
        if (date_now - last_date) > datetime.timedelta(days=DRAFT_DAYS):
            count += 1
print("Removing page with page id: " + page_id)
confluence.remove_page_as_draft(page_id=page_id) print("Removed page with date {}".format(last_date_string)) return count

https://github.com/atlassian-python-api/atlassian-python-api/blob/master/examples/confluence-draft-page-cleaner.py

 

That’s all. I hope it helps for easy cleanup your Confluence. Next time I will show how to clean page versions, attachement versions. Because of these use case will reduce a lot of disk usage. 

P.S. Let's set into CI system for delegate to other team mates.

image.png

 

Cheers,

Gonchik Tsymzhitov

10 comments

Pavel September 14, 2018

Hi, Gonchik

Purge trash is good.

But cleaning draft pages - method get_all_draft_pages_from_space returns draft pages only for the current user in space.

 

Best regards,

Pavel Dmitriev

Like Gonchik Tsymzhitov likes this
Gonchik Tsymzhitov
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
September 14, 2018

Hi Pavel, 

 

Thanks for feedback.

It sounds reasonable answer. But I have tested on my Confluence instance, where results was the same. 

Anyway, I will push your idea, and compare on other places.

 

Thanks! 

Cheers,

Gonchik Tsymzhitov 

Pavel September 14, 2018

Gonchik, thanks.

And there is no action in block for deleting draft page

if (date_now - last_date) > datetime.timedelta(days=DRAFT_DAYS):
            count += 1
            print("Removed page with date {}".format(last_date_string))

Best regards,

Pavel Dmitriev

Like Gonchik Tsymzhitov likes this
Gonchik Tsymzhitov
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
September 14, 2018

@Pavel  Examples has been adjusted. (FYI: it shows how you can do it based on wrapper)

You can see here:

https://github.com/AstroMatt/atlassian-python-api/pull/74

Also, I have changed a few of logic methods.

Also, you can meet with this error in your scripts  https://confluence.atlassian.com/confkb/removing-orphaned-draft-316113059.html

 

 

 

Cheers,

Gonchik Tsymzhitov

Anderson Hsu June 25, 2020

How can I use it in confluence 4.2.3 version? Thanks a lot.

Regards, Hsu Yao Chang

Gonchik Tsymzhitov
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
July 27, 2020

@Anderson Hsu  Sorry for the delay. I would recommend you to find the xml-rpc call around that. 

Unfortunately, I don't 4.2.3 version to test it. 

What about upgrade your instance ?

sachin gangam August 25, 2020

Thank you for the post @Gonchik Tsymzhitov . This link - https://github.com/atlassian-python-api/atlassian-python-api/blob/master/examples/confluence-draft-page-cleaner.py gives me a 404. Do you have an updated URL ?

Like Gonchik Tsymzhitov likes this
Gonchik Tsymzhitov
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
August 25, 2020
Like sachin gangam likes this
sachin gangam August 27, 2020

hello @Gonchik Tsymzhitov , thank you for providing me the link. I tried few scripts that were provided and they worked great thanks for that. But, I am unable to get the confluence-trash-cleaner.py to run. When I run confluence-trash-cleaner.py -vvv it doesn't give me any output or throw any errors. I have checked my space and I still see the pages in the trash. Also, made sure I have space admin permissions. Can you please help. Below is the code..did I miss something?

Note: I only want to purge the pages from the trash for a single space

#!/usr/bin/python 
# coding=utf-8
from atlassian import Confluence

confluence = Confluence(
url='https://confluence-site-url',
username='sachin',
password='***********')


def clean_pages_from_space(confluence, TEST):
"""
Remove all pages from trash for related space
:param confluence:
:param space_key:
:return:
"""
limit=500
flag = True
step = 0
while flag:
values = confluence.get_all_pages_from_space_trash(space=space_key, start=0, limit=limit)
step += 1
if len(values) == 0:
flag = False
print("For space {} trash is empty".format(space_key))
else:
for value in values:
print(value['title'])
confluence.remove_page_from_trash(value['id'])

I have also tried the one from this link 

Gonchik Tsymzhitov
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
August 28, 2020

@sachin gangam 

Please, add 

import logging

logging.basicConfig(level=logging.DEBUG)

to make sure the errors 

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events