It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

How do I script the exporting and importing of spaces?

I need to export over 500 Confluence spaces, and I don't have weeks to do it.  Naïvely, I thought that I could use the remote interfaces to do it, but the REST API doesn't have the function and the XMLRPC version has been turned off.  I am assured that the SOAP version of the same method has somehow survived, but the time required to create a simple SOAP client is more than I can spend, because I don't know how to do it, and my efforts to learn how have not yet borne fruit.

So I'm back to trying to script the UI.  I get some encouraging results using a browser extension that lets me compose the requests and send them, but so far, I haven't managed to get them to work.

I sent a POST request to /spaces/doExportSpace.action with the values sent by the form presented by /spaces/exportspacexml.action, but I get a permission-denied response.

The browser is already logged in, so I'm guessing that there's something I'm failing to copy from the login response to my export request.  So what is that?  I've taken note of a cookie called "seraph.confluence" and the hidden fields in the request form.

I'm getting pretty desperate here; I've taken a week trying to figure out something that should have been an hour's work.  

What can you tell me?  The objective here is to be able to script hundreds of exports; I've tried everything anyone has suggested in hours of searching, but without success.

 

3 answers

Hi Brian,

I know it's a really late answer, but I hadn't forgotten, it just took longer than I expected to be able to test properly. I am also including it in case someone else needs to do the same thing.

I wrote a script in Python (version >3.0) that allows one to be able to export all spaces:

import requests
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

user = "admin"
password = "admin"

session = Session()
session.auth = HTTPBasicAuth(user, password)
client = Client('http://localhost:1990/confluence/rpc/soap-axis/confluenceservice-v2?WSDL',
transport=Transport(session=session))

print("Logging in...")
token = client.service.login(user, password)
print("Getting Spaces")
spaces = client.service.getSpaces(token)
numSpaces = len(spaces)
for index, space in enumerate(spaces):
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space.key))
siteExportUrl = client.service.exportSpace(token, space.key, "TYPE_XML")
print(siteExportUrl)
filename = siteExportUrl.split('/')[-1]
print("Saving as: " + filename)
r = session.get(siteExportUrl, stream=True)
with open(filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
sleep(5)

You will need to install zeep before executing, but that should be the only dependency. Just change out the username and password, and change localhost:1990 to the base url of your confluence instance. The script pauses between each space for 5 seconds so as not to overload your server too much.

Thanks; I'll give it a try. 

Able to export spaces. Need to verify if content and comments , etc are part of space or  not

I assume that the following assumptions are true of your comment:

  • In the first sentence (which is incomplete, lacking both subject and verb), you are saying that at present, you are able to export spaces.  Alternatively, that someone can do it now.
  • Second, you want to know whether the "content and comments , etc" are part of the data in the export.  Seems obvious, because otherwise, it wouldn't be clear which items you're talking about, but it isn't literally what you asked.

Those are the most likely probabilities, but the question could admit of other interpretations.

To the first sentence, which by my assumptions is not a question but a statement of fact, I can affirm that it is possible for some person(s) to export the file, though you don't specify that it can be scripted, which is the issue to which you've posted your reply, so that's not absolutely certain.

To the second, I will confirm that all data pertinent to the space and its content -- including, among other things, the content of every page, blog post, comment, attachment, label, and every extant version of the above, along with the user IDs of those who added them and the time they were added, is indeed in the export file.  But this only happens when you export the entire space, using the XML export option, which is mostly explained in the admin interface where the export is requested.

If these assumptions are wrong, please clarify; also, if you want an answer to the question as I have interpreted it, it would be best if you raised another support issue since, not being relevant to this one, it may be overlooked.

FYI, this SOAP client doesn't work, see Brian's answer in the conversation above.

Amazing that there's just no way to do this short of mechanical web clients like Selenium or Mechanize.  Ridiculous.

Indeed; in case I didn't make it clear as I thought I did, it doesn't matter which, or what kind, of SOAP client you use, it won't work because the method no longer exists.

I should add that the delay while I tried to figure out a way to do this caused my client to abandon the project altogether.  I got paid for the work already done, but nothing more.

Quite.  We're realizing that for this reason and many more, perhaps Confluence is not the tool for us.  There have been a lot of reliability issues recently with the cloud-hosted service, which got me trying to create a local mirror for when the cloud-hosted site is offline.  Oh well.

Rather than a local mirror, have you considered just moving to Server?

For the price they want?  Haha, no.

I followed the script given by Stephen Deutsch.  I am able to export all the spaces in Confluence. But i cloud see a diff when compared with manually downloaded space ( space tools --> contents and tools --> Export -->   select XML and full export). in few spaces attachments missed and in few some diff in entities.xml file)

 

Is there any working way to export all spaces without loosing any content?

Hello,

Thanks for sharing the script.

I am trying to invoke it, but get errors from Zeep module when invoking exoportSpace() method.

Here's the error:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\zeep\xsd\elements\element.py in validate(self, value, render_path)
246 if not self.is_optional and not self.nillable and value in (None, NotSet):
247 raise exceptions.ValidationError(
--> 248 "Missing element %s" % (self.name), path=render_path)
ValidationError: Missing element in3 (exportSpace.in3)

Slightly modified script I am using:

import getpass
import requests
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

requests.packages.urllib3.disable_warnings()
auth = ('my_account', getpass.getpass())
session = Session()
session.verify = False;
session.auth = auth;


WIKI = 'https://confluence.example.net/instance/rpc/soap-axis/confluenceservice-v2?WSDL'
client = Client(WIKI, transport=Transport(session=session))
print("Logging in...")
token = client.service.login(auth[0], auth[1])

#-Testing exportSpace() on personal space
print("Exporting personal space...")
numSpaces = 1
for index, space in enumerate(spaces):
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, spacekey))
print(spacekey);
siteExportUrl = client.service.exportSpace(token, "~my_account", "TYPE_XML")
print(siteExportUrl)
sleep(5)

Any clues on why this error and how to fix this?

Thank you.

Hi Stan!

I got the same error and I fixed putting one more argument on the exportSpace() method. I found in this documentation that the last argument is boolean exportAll:

String exportSpace(String token, String spaceKey, String exportType, boolean exportAll)

You can try something like this:

siteExportUrl = client.service.exportSpace(token, "~my_account", "TYPE_XML", True)

 Hope this helps.

Like Stan Ry likes this

Hello, guys. Could you please help to modify this script in order to export certain spaces using their keys as list instead of getting all of them from Confluence instance? Thanks.

Like Stan Ry likes this

Hi @Vitalii_Vuilov 

Try this one.

'''
Simplified version of a Stephen Deutsch's script
Uses predefined list of space keys in 'spaces' variable
'''

import requests
import getpass
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

user = "admin"
password = getpass.getpass()

session = Session()
session.auth = HTTPBasicAuth(user, password)
client = Client('http://localhost:1990/confluence/rpc/soap-axis/confluenceservice-v2?WSDL',
transport=Transport(session=session))

print("Logging in...")
token = client.service.login(user, password)
print("Getting Spaces")

#Specify space keys in spaces list of spaces
spaces = list()
spaces = ('myspacekey1', 'myspacekey2')
#spaces = client.service.getSpaces(token)
numSpaces = len(spaces)

for index, space in spaces:
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space))
siteExportUrl = client.service.exportSpace(token, space, "TYPE_XML", True)
print(siteExportUrl)
filename = siteExportUrl.split('/')[-1]
print("Saving as: " + filename)
r = session.get(siteExportUrl, stream=True)
with open(filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
sleep(5)

I've made following miniscule changes to the original script provided by Stephen Deutsch.

1. Added getpass library (import getpass adds the library namespace)

2. Added invocation of getpass() method provided by getpass library so that you don't need to specify the clear text password within the script. You'll be asked to do so at the run time.

password = getpass.getpass()

3. Added a predefined list of space keys:

spaces = ('myspacekey1', 'myspacekey2')

I am not a developer, so I've added

spaces = list()

to initialize an empty list. I don't think it's necessary here...

4. Changed enumeration so that we first scan through the list of spaces in 'spaces' with this line

for index, space in spaces:

5. Changed every use of 'spaces.key' to just 'space' because we don't have a space object anymore, we have 'space' strings taken from the 'spaces' list.

Here

print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space))

and here

siteExportUrl = client.service.exportSpace(token, space, "TYPE_XML", True)

 6. Added the 'True' bit flag as a third parameter of the exportSpace() method's argument.

Hope this helps.

Stan

Like Vitalii_Vuilov likes this

Thanks a lot, Stan! I will try!

After running the script I got this error:

python exportspaces.py
File "exportspaces.py", line 37
shutil.copyfileobj(r.raw, f)
^
IndentationError: expected an indented block

P.S. fixed by shifting that block

Like Stan Ry likes this

"6. Added the 'True' bit flag as a third parameter of the exportSpace() method's argument." I had to remove it otherwise it says too many arguments.

Like Stan Ry likes this

@Vitalii_Vuilov Thanks for edits, might help others. Seems like forum script has altered formatting, and since Python is a positional language. 'for' didn't work out. Fixed in the script.

Like Vitalii_Vuilov likes this

Hi Brian,

What's your language of choice? Powershell? Python? Javascript?

Seeing your other question, I see two options. You could either do the request via SOAP (which is easier than you would expect), or you could script the downloading with a headless browser like Nightmare.js.

Let me know what direction you're looking to take and I'll see if I can whip something together.

I'll use whatever I can get and use... Here's what I've tried already.  I began by using a Chrome browser extension called Restlet Client, which lets you compose requests and chain them into test sequences.  I sent the series of requests described above (after first having called the /dologin.action).  While it has some ability to take data from previous requests (and lets the browser handle all the cookie exchanges unless you override it), it didn't have the ability to get the contents of the hidden atl_token field supplied by the exportspacexml method so as to use it in the doexportspace call (there were others, but their values didn't change, so I hard-coded them).  

I copied the field by hand with some success, so I used its feature that translates the request into a curl command, which I invoked from a standard Unix shell and scripted the shell to get data from one response to use in the next.  This also worked to an extent, but every time I did the doexportspace request, I still got permission problems.

I also tried the XMLRPC methods, as you've seen.

I have considered scripting the browser itself using Javascript, or even composing a test with Selenium, but haven't decided which way to go yet with that.

If the SOAP request is as easy as you suggest, I'd like some pointers to where I can do it more easily than I've seen; I tried the Apache CXF extension in Eclipse, and the nice, user-friendly project creation tool asked for data I had no idea how to find or compose, and required me to create my own server for testing.  I gave up on that because I ran out of patience with looking up something I didn't know only to find that finding the answer required looking up something else...

I'd like most to know what nuance I'm missing in the sequence that keeps it from thinking it's coming from an authorized user. But as I said at the beginning, I'll try whatever I can get to work, and thanks for the offer. 

If you let me know your operating system it would be helpful.

I imagine it would; sorry for the omission in all that verbosity.

I'm using OS X 10.11.

For the record, I'm glad that I didn't spend any more effort on trying to concoct a SOAP client, because contrary to the assertion above, it can't be done using SOAP, because though the three-parameter doexportspace call remains, the one taking a fourth parameter, which is a Boolean value specifying whether to do a full export, has been removed. 

I am greatly piqued at this, because in all of the documentation and all of my conversations with the support people, I did not discover this until I had actually made a correct call unsuccessfully.  They claim that this is documented, but saying that the interface is deprecated and will be removed in a future release does not inform me that one version of one method has already been removed, let alone forewarn me of which release will be affected before I waste untold hours trying to use it.

Any hope here?  I'm running well over my deadline, and even if I do script it, it will probably take days.  I'd be working on it myself, but I've already spent days on that, which is why I'm behind schedule.

Suggest an answer

Log in or Sign up to answer
TAGS
Community showcase
Posted in Confluence

How is your team having fun and bonding, remotely, utilizing Confluence?

Thanks everyone for answering last week’s question. The winner of the random drawing from those who commented is: @LarryBrock I’ll contact you separately with your prize details. This wee...

306 views 9 7
Join discussion

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you