Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,639,450
Community Members
 
Community Events
196
Community Groups

How to programmatically clean versions of page with python for Confluence?

Hi Atlassian community

It is one of the steps  from my routine work for Confluence administration for internal and customer facings installations as well. 

Also, previous articles you can find here 1, 2.  

For continuous cleanup,  I use atlassian-python-api and Atlassian Bamboo. Also, it is very easy to automate it and extend functionality.

My use case was cleaning after a lot of programmatically created and updated pages after long time using. e.g.

Снимок экрана 2018-11-03 в 12.59.57.png

After cleanup, your page load metrics will be better, also, disk usage of Lucene indexes and DB usage.

 

Well, let's start. 

1. Remove page version based on 2 different methods for Cloud and on-Premises releases:

    def remove_content_history(self, page_id, version_number):
        """
        Remove content history. It works as experimental method
        :param page_id:
        :param version_number: version number
        :return:
        """
        url = "rest/experimental/content/{id}/version/{versionNumber}".format(id=page_id, versionNumber=version_number)
        self.delete(url)

    def remove_content_history_in_cloud(self, page_id, version_id):
        """
        Remove content history. It works in CLOUD
        :param page_id:
        :param version_id:
        :return:
        """
        url = "rest/api/content/{id}/version/{versionId}".format(id=page_id, versionId=version_id)
        self.delete(url)

2. After that we need to define the method which work with page_id and also, how many page versions we need remained.

def page_version_remover(server, content_id, remained_page_numbers):
    response = server.get_content_history(content_id)
    if not response.get('latest'):
        return
    latest_version_count = int(response.get('lastUpdated').get('number'))
    if len(response) > 0 and latest_version_count > remained_page_numbers:
        print("Number of {} latest version {}".format(
            confluence.url_joiner(confluence.url, "/pages/viewpage.action?pageId=" + content_id), latest_version_count))
        for version_page_counter in range(1, (latest_version_count - remained_page_numbers + 1), 1):
            server.remove_content_history(content_id, 1)
    else:
        print('Number of page history smaller than remained')

3. Of course, we need to loop it, after fetch all page ids from space:

def get_all_page_ids_from_space(confluence, space_key):
    """
    :param confluence:
    :param space_key:
    :return:
    """
    limit = 500
    flag = True
    step = 0
    page_ids = []

    while flag:
        values = confluence.get_all_pages_from_space(space=space_key, start=limit * step, limit=limit)
        step += 1
        if len(values) == 0:
            flag = False
            print("Did not find any pages, please, check permissions")
        else:
            for value in values:
                print("Retrieve page with title: " + value['title'])
                page_ids.append((value['id']))
    print("Found in space {} pages {}".format(space_key, len(page_ids)))
    return page_ids

def reduce_page_numbers(confluence, page_id, remained_page_history_count): page_version_remover(confluence, page_id, remained_page_history_count) return

page_ids = get_all_page_ids_from_space(confluence, space_key)
for page_id in page_ids:
reduce_page_numbers(confluence, page_id=page_id, remained_page_history_count=remained_count)

That's all and you can put into repo and trigger you plan in the Bamboo. 

Full version you can find here: full example

P.S. The next time I will provide a few advice based on community and on my experience for automatically page generating. I hope you will not meet like these problems on screenshot :)

MicrosoftTeams-image.png

 

Cheers,

Gonchik Tsymzhitov

3 comments

When trying to use 

confluence.remove_content_history_in_cloud(page_id=247712308, version_id=2)

I get a 404 error from the request, and this error in particular is raised
requests.exceptions.HTTPError: null for uri: 

This seems possibly to be the case because different versions in the history seem to be associated with different page ids? E.g when navigating in the history, version 2 has url ending like
/pages/viewpage.action?pageId=247679396&navigatingVersions=true
version 3 this one
/pages/viewpage.action?pageId=247713252&navigatingVersions=true
etc.


Is this a known issue with the API? Or am I doing something wrong?
@Gonchik Tsymzhitov would you please have any insight?

Also the link "Full example" above unfortunately does not exist anymore

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events