Uploading HTML via REST api through python script gives 400 error for some HTML documents

Hi,

I have a python script that uses the confluence REST API to download a page, make changes to the HTML and re-upload the page.

My current script works fine if I replace the html with html that I create (that is valid HTML). The code programmatically updates the version and put's it to the server.

But now I want to edit the page programmatically. I download the html, patch a new row into an HTML table and want to re-upload -- this fails with a 400 error. I also tried just round-tripping -- downloading the page html, uploading it -- this fails as well. Downloading and uploading just a <p>Hello World</p> works great -- it replaces the page content as expected.

I am looking for guidance on how to make this work. I notice a lot of <link> tags (where an @user reference is) - and tried removing those before re-upload but that did not work either.

Best,

Friedrich Brunzema

2 answers

0 vote
Stephen Deutsch Community Champion Jun 09, 2017

Are you working with HTML, or the XHTML Confluence Storage Format? I think that confluence will only accept its own storage format (returned when using ?expand=body.storage).

The hello world example just happens to be both valid HTML and storage format, but it's not always the case. Tables are the same in both, though, so that shouldn't cause an issue necessarily.

Confluence is usually pretty good about returning information about the error, so have you checked the message that you get back with the 400? You could try using the REST API Browser for debugging.

Hi,

I actually found the answer. So hard. Sigh.  So editing a confluence page programmatically involves a couple of steps:

1.  read the data: [url]?expand-body.storage
2.  load text json response to json object (json.loads())
3.  extract html - json_object['body']['storage']['value']
4.  convert the returned page from storage to 'view' using post to /rest/api/contentbody/convert/storage -- returns json html  - use {'representation':'storage', 'value': html} in the data -- sanitized for view
5. Convert to text using display_json['value']. 

You can now mess with the HTML

Next you have to convert it back to storage format using post /rest/api/contentbody/convert/storage, {'representation':'editor'}

One of steps caused a unicode character &Acirc; to sneak in -- which I replaced with nothing.

Then you do the upload, making sure increase the version number of the page.

I will post some code below.

The character appears to show up when converting back to storage format when there's a &nbsp space character, it becomes '&Acirc;&nbsp;' - at least that's what I've found, becuase it doesn't show up on every page.

I also get strange behavior where every time I publish certain pages and fetch it again, there's an additional newline in the HTML in between certain blocks which I have to regex replace with single newlines, otherwise there will be an absurd amount of space between html blocks after a few publishes.

# coding: utf-8
import argparse
import getpass
import datetime
import json
import keyring
import requests
import lxml.html

# -----------------------------------------------------------------------------
# Globals

BASE_URL = "http://your-server/rest/api/content"
VIEW_URL = "http://your-server/pages/viewpage.action?pageId="
USER_AGENT = "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36"


def pprint(data):
    print json.dumps(
        data,
        sort_keys=True,
        indent=4,
        separators=(', ', ' : '))


def get_page_ancestors(auth, pageid):
    # Get basic page information plus the ancestors property

    url = '{base}/{pageid}?expand=ancestors'.format(
        base=BASE_URL,
        pageid=pageid)

    r = requests.get(url, auth=auth, headers={'Content-Type': 'application/json', 'USER-AGENT': USER_AGENT})

    r.raise_for_status()

    return r.json()['ancestors']


def get_page_info(auth, page_id):
    url = '{base}/{page_id}'.format(
        base=BASE_URL,
        page_id=page_id)

    r = requests.get(url, auth=auth, headers={'Content-Type': 'application/json', 'USER-AGENT': USER_AGENT})

    r.raise_for_status()

    return r.json()


def convert_db_to_view(auth2, html):
    url = 'http://your-server/rest/api/contentbody/convert/view'

    data2 = {
        'value': html,
        'representation': 'storage'
    }

    r = requests.post(url,
                      data=json.dumps(data2),
                      auth=auth2,
                      headers={'Content-Type': 'application/json'}
                      )
    r.raise_for_status()
    return r.json()


def convert_view_to_db(auth2, html):
    url = 'http://your-server/rest/api/contentbody/convert/storage'

    data2 = {
        'value': html,
        'representation': 'editor'
    }

    r = requests.post(url,
                      data=json.dumps(data2),
                      auth=auth2,
                      headers={'Content-Type': 'application/json'}
                      )
    r.raise_for_status()
    return r.json()


def write_data(auth, html, page_id):
    info = get_page_info(auth, page_id)

    ver = int(info['version']['number']) + 1

    ancestors = get_page_ancestors(auth, page_id)

    anc = ancestors[-1]
    del anc['_links']
    del anc['_expandable']
    del anc['extensions']

    info['title'] = "Team City Change Log"

    data = {
        'id': str(page_id),
        'type': 'page',
        'title': info['title'],
        'version': {'number': ver},
        'ancestors': [anc],
        'body': {
            'storage':
                {
                    'representation': 'storage',
                    'value': str(html),
                }
        }
    }

    data = json.dumps(data)

    url = '{base}/{page_id}'.format(base=BASE_URL, page_id=page_id)

    our_headers = {'Content-Type': 'application/json', 'USER-AGENT': USER_AGENT}

    r = requests.put(
        url,
        data=data,
        auth=auth,
        headers=our_headers
    )

    r.raise_for_status()

    print "Wrote '%s' version %d" % (info['title'], ver)
    print "URL: %s%d" % (VIEW_URL, page_id)

    return ""


def read_data(auth, page_id):
    url = '{base}/{page_id}?expand=body.storage'.format(base=BASE_URL, page_id=page_id)
    r = requests.get(
        url,
        auth=auth,
        headers={'Content-Type': 'application/json', 'USER-AGENT': USER_AGENT}
    )

    r.raise_for_status()

    return r


def patch_html(auth, options):
    json_text = read_data(auth, options.pageid).text
    json2 = json.loads(json_text)
    html_storage_txt = json2['body']['storage']['value']
    html_display_json = convert_db_to_view(auth, html_storage_txt)
    html_display_txt = html_display_json['value'].encode('utf-8')

    # PATCH 
    # new_view_string = custom patching of HTML here,
    return new_view_string


def get_login(username=None):
    if username is None:
        username = getpass.getuser()

    password = keyring.get_password('confluence_script', username)

    if password is None:
        password = getpass.getpass()
        keyring.set_password('confluence_script', username, password)

    return username, password


def main():
    parser = argparse.ArgumentParser()

    parser.add_argument(
        "-u",
        "--user",
        default=getpass.getuser(),
        help="Specify the username to log into Confluence")

    parser.add_argument(
        "pageid",
        type=int,
        help="Specify the Confluence page id to overwrite")

    options = parser.parse_args()

    auth = get_login(options.user)

    html = patch_html(auth, options)
    html = html.replace('&Acirc;', '')
    write_data(auth, html, options.pageid)
    return

if __name__ == "__main__": main()

Suggest an answer

Log in or Sign up to answer
Atlassian Community Anniversary

Happy Anniversary, Atlassian Community!

This community is celebrating its one-year anniversary and Atlassian co-founder Mike Cannon-Brookes has all the feels.

Read more
Community showcase
Kesha Thillainayagam
Posted Apr 13, 2018 in Confluence

We want to hear how your non-technical teams are using Confluence!

Hi Community! Kesha (kay-sha) from the Confluence marketing team here! Can you share stories with us on how your non-technical (think Marketing, Sales, HR, legal, etc.) teams are using Confluen...

374 views 20 10
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you