How do I script the exporting and importing of spaces?

I need to export over 500 Confluence spaces, and I don't have weeks to do it.  Naïvely, I thought that I could use the remote interfaces to do it, but the REST API doesn't have the function and the XMLRPC version has been turned off.  I am assured that the SOAP version of the same method has somehow survived, but the time required to create a simple SOAP client is more than I can spend, because I don't know how to do it, and my efforts to learn how have not yet borne fruit.

So I'm back to trying to script the UI.  I get some encouraging results using a browser extension that lets me compose the requests and send them, but so far, I haven't managed to get them to work.

I sent a POST request to /spaces/doExportSpace.action with the values sent by the form presented by /spaces/exportspacexml.action, but I get a permission-denied response.

The browser is already logged in, so I'm guessing that there's something I'm failing to copy from the login response to my export request.  So what is that?  I've taken note of a cookie called "seraph.confluence" and the hidden fields in the request form.

I'm getting pretty desperate here; I've taken a week trying to figure out something that should have been an hour's work.  

What can you tell me?  The objective here is to be able to script hundreds of exports; I've tried everything anyone has suggested in hours of searching, but without success.

 

3 answers

1 vote
Stephen Deutsch Community Champion Sep 26, 2017

Hi Brian,

What's your language of choice? Powershell? Python? Javascript?

Seeing your other question, I see two options. You could either do the request via SOAP (which is easier than you would expect), or you could script the downloading with a headless browser like Nightmare.js.

Let me know what direction you're looking to take and I'll see if I can whip something together.

I'll use whatever I can get and use... Here's what I've tried already.  I began by using a Chrome browser extension called Restlet Client, which lets you compose requests and chain them into test sequences.  I sent the series of requests described above (after first having called the /dologin.action).  While it has some ability to take data from previous requests (and lets the browser handle all the cookie exchanges unless you override it), it didn't have the ability to get the contents of the hidden atl_token field supplied by the exportspacexml method so as to use it in the doexportspace call (there were others, but their values didn't change, so I hard-coded them).  

I copied the field by hand with some success, so I used its feature that translates the request into a curl command, which I invoked from a standard Unix shell and scripted the shell to get data from one response to use in the next.  This also worked to an extent, but every time I did the doexportspace request, I still got permission problems.

I also tried the XMLRPC methods, as you've seen.

I have considered scripting the browser itself using Javascript, or even composing a test with Selenium, but haven't decided which way to go yet with that.

If the SOAP request is as easy as you suggest, I'd like some pointers to where I can do it more easily than I've seen; I tried the Apache CXF extension in Eclipse, and the nice, user-friendly project creation tool asked for data I had no idea how to find or compose, and required me to create my own server for testing.  I gave up on that because I ran out of patience with looking up something I didn't know only to find that finding the answer required looking up something else...

I'd like most to know what nuance I'm missing in the sequence that keeps it from thinking it's coming from an authorized user. But as I said at the beginning, I'll try whatever I can get to work, and thanks for the offer. 

Stephen Deutsch Community Champion Sep 26, 2017

If you let me know your operating system it would be helpful.

I imagine it would; sorry for the omission in all that verbosity.

I'm using OS X 10.11.

For the record, I'm glad that I didn't spend any more effort on trying to concoct a SOAP client, because contrary to the assertion above, it can't be done using SOAP, because though the three-parameter doexportspace call remains, the one taking a fourth parameter, which is a Boolean value specifying whether to do a full export, has been removed. 

I am greatly piqued at this, because in all of the documentation and all of my conversations with the support people, I did not discover this until I had actually made a correct call unsuccessfully.  They claim that this is documented, but saying that the interface is deprecated and will be removed in a future release does not inform me that one version of one method has already been removed, let alone forewarn me of which release will be affected before I waste untold hours trying to use it.

1 vote
Stephen Deutsch Community Champion Nov 25, 2017

Hi Brian,

I know it's a really late answer, but I hadn't forgotten, it just took longer than I expected to be able to test properly. I am also including it in case someone else needs to do the same thing.

I wrote a script in Python (version >3.0) that allows one to be able to export all spaces:

import requests
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

user = "admin"
password = "admin"

session = Session()
session.auth = HTTPBasicAuth(user, password)
client = Client('http://localhost:1990/confluence/rpc/soap-axis/confluenceservice-v2?WSDL',
transport=Transport(session=session))

print("Logging in...")
token = client.service.login(user, password)
print("Getting Spaces")
spaces = client.service.getSpaces(token)
numSpaces = len(spaces)
for index, space in enumerate(spaces):
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space.key))
siteExportUrl = client.service.exportSpace(token, space.key, "TYPE_XML")
print(siteExportUrl)
filename = siteExportUrl.split('/')[-1]
print("Saving as: " + filename)
r = session.get(siteExportUrl, stream=True)
with open(filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
sleep(5)

You will need to install zeep before executing, but that should be the only dependency. Just change out the username and password, and change localhost:1990 to the base url of your confluence instance. The script pauses between each space for 5 seconds so as not to overload your server too much.

Thanks; I'll give it a try. 

Able to export spaces. Need to verify if content and comments , etc are part of space or  not

I assume that the following assumptions are true of your comment:

  • In the first sentence (which is incomplete, lacking both subject and verb), you are saying that at present, you are able to export spaces.  Alternatively, that someone can do it now.
  • Second, you want to know whether the "content and comments , etc" are part of the data in the export.  Seems obvious, because otherwise, it wouldn't be clear which items you're talking about, but it isn't literally what you asked.

Those are the most likely probabilities, but the question could admit of other interpretations.

To the first sentence, which by my assumptions is not a question but a statement of fact, I can affirm that it is possible for some person(s) to export the file, though you don't specify that it can be scripted, which is the issue to which you've posted your reply, so that's not absolutely certain.

To the second, I will confirm that all data pertinent to the space and its content -- including, among other things, the content of every page, blog post, comment, attachment, label, and every extant version of the above, along with the user IDs of those who added them and the time they were added, is indeed in the export file.  But this only happens when you export the entire space, using the XML export option, which is mostly explained in the admin interface where the export is requested.

If these assumptions are wrong, please clarify; also, if you want an answer to the question as I have interpreted it, it would be best if you raised another support issue since, not being relevant to this one, it may be overlooked.

FYI, this SOAP client doesn't work, see Brian's answer in the conversation above.

Amazing that there's just no way to do this short of mechanical web clients like Selenium or Mechanize.  Ridiculous.

Indeed; in case I didn't make it clear as I thought I did, it doesn't matter which, or what kind, of SOAP client you use, it won't work because the method no longer exists.

I should add that the delay while I tried to figure out a way to do this caused my client to abandon the project altogether.  I got paid for the work already done, but nothing more.

Quite.  We're realizing that for this reason and many more, perhaps Confluence is not the tool for us.  There have been a lot of reliability issues recently with the cloud-hosted service, which got me trying to create a local mirror for when the cloud-hosted site is offline.  Oh well.

Rather than a local mirror, have you considered just moving to Server?

For the price they want?  Haha, no.

I followed the script given by Stephen Deutsch.  I am able to export all the spaces in Confluence. But i cloud see a diff when compared with manually downloaded space ( space tools --> contents and tools --> Export -->   select XML and full export). in few spaces attachments missed and in few some diff in entities.xml file)

 

Is there any working way to export all spaces without loosing any content?

Any hope here?  I'm running well over my deadline, and even if I do script it, it will probably take days.  I'd be working on it myself, but I've already spent days on that, which is why I'm behind schedule.

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Oct 11, 2018 in Confluence

What are your project planning tips?

Hello Community,  Jessica here from the Confluence product marketing team! Today I wanted to get your takes on project planning –– what works, what doesn’t, how do you know if you’re doing it r...

316 views 1 4
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you