How do I script the exporting and importing of spaces?

Brian M Thomas September 22, 2017

I need to export over 500 Confluence spaces, and I don't have weeks to do it.  Naïvely, I thought that I could use the remote interfaces to do it, but the REST API doesn't have the function and the XMLRPC version has been turned off.  I am assured that the SOAP version of the same method has somehow survived, but the time required to create a simple SOAP client is more than I can spend, because I don't know how to do it, and my efforts to learn how have not yet borne fruit.

So I'm back to trying to script the UI.  I get some encouraging results using a browser extension that lets me compose the requests and send them, but so far, I haven't managed to get them to work.

I sent a POST request to /spaces/doExportSpace.action with the values sent by the form presented by /spaces/exportspacexml.action, but I get a permission-denied response.

The browser is already logged in, so I'm guessing that there's something I'm failing to copy from the login response to my export request.  So what is that?  I've taken note of a cookie called "seraph.confluence" and the hidden fields in the request form.

I'm getting pretty desperate here; I've taken a week trying to figure out something that should have been an hour's work.  

What can you tell me?  The objective here is to be able to script hundreds of exports; I've tried everything anyone has suggested in hours of searching, but without success.

 

5 answers

4 votes
Frank Liang December 13, 2022

Is there a CLOUD Confluence export all spaces script?

2 votes
Stephen Deutsch
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
November 25, 2017

Hi Brian,

I know it's a really late answer, but I hadn't forgotten, it just took longer than I expected to be able to test properly. I am also including it in case someone else needs to do the same thing.

I wrote a script in Python (version >3.0) that allows one to be able to export all spaces:

import requests
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

user = "admin"
password = "admin"

session = Session()
session.auth = HTTPBasicAuth(user, password)
client = Client('http://localhost:1990/confluence/rpc/soap-axis/confluenceservice-v2?WSDL',
transport=Transport(session=session))

print("Logging in...")
token = client.service.login(user, password)
print("Getting Spaces")
spaces = client.service.getSpaces(token)
numSpaces = len(spaces)
for index, space in enumerate(spaces):
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space.key))
siteExportUrl = client.service.exportSpace(token, space.key, "TYPE_XML")
print(siteExportUrl)
filename = siteExportUrl.split('/')[-1]
print("Saving as: " + filename)
r = session.get(siteExportUrl, stream=True)
with open(filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
sleep(5)

You will need to install zeep before executing, but that should be the only dependency. Just change out the username and password, and change localhost:1990 to the base url of your confluence instance. The script pauses between each space for 5 seconds so as not to overload your server too much.

Brian M Thomas November 25, 2017

Thanks; I'll give it a try. 

Krishna Ponnekanti March 15, 2018

Able to export spaces. Need to verify if content and comments , etc are part of space or  not

Like Inigo Kesseler likes this
Brian M Thomas March 15, 2018

I assume that the following assumptions are true of your comment:

  • In the first sentence (which is incomplete, lacking both subject and verb), you are saying that at present, you are able to export spaces.  Alternatively, that someone can do it now.
  • Second, you want to know whether the "content and comments , etc" are part of the data in the export.  Seems obvious, because otherwise, it wouldn't be clear which items you're talking about, but it isn't literally what you asked.

Those are the most likely probabilities, but the question could admit of other interpretations.

To the first sentence, which by my assumptions is not a question but a statement of fact, I can affirm that it is possible for some person(s) to export the file, though you don't specify that it can be scripted, which is the issue to which you've posted your reply, so that's not absolutely certain.

To the second, I will confirm that all data pertinent to the space and its content -- including, among other things, the content of every page, blog post, comment, attachment, label, and every extant version of the above, along with the user IDs of those who added them and the time they were added, is indeed in the export file.  But this only happens when you export the entire space, using the XML export option, which is mostly explained in the admin interface where the export is requested.

If these assumptions are wrong, please clarify; also, if you want an answer to the question as I have interpreted it, it would be best if you raised another support issue since, not being relevant to this one, it may be overlooked.

Like Inigo Kesseler likes this
Michael Stella March 28, 2018

FYI, this SOAP client doesn't work, see Brian's answer in the conversation above.

Amazing that there's just no way to do this short of mechanical web clients like Selenium or Mechanize.  Ridiculous.

Brian M Thomas March 28, 2018

Indeed; in case I didn't make it clear as I thought I did, it doesn't matter which, or what kind, of SOAP client you use, it won't work because the method no longer exists.

I should add that the delay while I tried to figure out a way to do this caused my client to abandon the project altogether.  I got paid for the work already done, but nothing more.

Michael Stella March 28, 2018

Quite.  We're realizing that for this reason and many more, perhaps Confluence is not the tool for us.  There have been a lot of reliability issues recently with the cloud-hosted service, which got me trying to create a local mirror for when the cloud-hosted site is offline.  Oh well.

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
March 28, 2018

Rather than a local mirror, have you considered just moving to Server?

Michael Stella March 28, 2018

For the price they want?  Haha, no.

Krishna Ponnekanti May 6, 2018

I followed the script given by Stephen Deutsch.  I am able to export all the spaces in Confluence. But i cloud see a diff when compared with manually downloaded space ( space tools --> contents and tools --> Export -->   select XML and full export). in few spaces attachments missed and in few some diff in entities.xml file)

 

Is there any working way to export all spaces without loosing any content?

Stan Ry
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 25, 2019

Hello,

Thanks for sharing the script.

I am trying to invoke it, but get errors from Zeep module when invoking exoportSpace() method.

Here's the error:

~\AppData\Local\Continuum\anaconda3\lib\site-packages\zeep\xsd\elements\element.py in validate(self, value, render_path)
246 if not self.is_optional and not self.nillable and value in (None, NotSet):
247 raise exceptions.ValidationError(
--> 248 "Missing element %s" % (self.name), path=render_path)
ValidationError: Missing element in3 (exportSpace.in3)

Slightly modified script I am using:

import getpass
import requests
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

requests.packages.urllib3.disable_warnings()
auth = ('my_account', getpass.getpass())
session = Session()
session.verify = False;
session.auth = auth;


WIKI = 'https://confluence.example.net/instance/rpc/soap-axis/confluenceservice-v2?WSDL'
client = Client(WIKI, transport=Transport(session=session))
print("Logging in...")
token = client.service.login(auth[0], auth[1])

#-Testing exportSpace() on personal space
print("Exporting personal space...")
numSpaces = 1
for index, space in enumerate(spaces):
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, spacekey))
print(spacekey);
siteExportUrl = client.service.exportSpace(token, "~my_account", "TYPE_XML")
print(siteExportUrl)
sleep(5)

Any clues on why this error and how to fix this?

Thank you.

kalvims April 4, 2019

Hi Stan!

I got the same error and I fixed putting one more argument on the exportSpace() method. I found in this documentation that the last argument is boolean exportAll:

String exportSpace(String token, String spaceKey, String exportType, boolean exportAll)

You can try something like this:

siteExportUrl = client.service.exportSpace(token, "~my_account", "TYPE_XML", True)

 Hope this helps.

Like Stan Ry likes this
Vitalii Vuilov October 15, 2019

Hello, guys. Could you please help to modify this script in order to export certain spaces using their keys as list instead of getting all of them from Confluence instance? Thanks.

Like Stan Ry likes this
Stan Ry
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 17, 2019

Hi @Vitalii Vuilov 

Try this one.

'''
Simplified version of a Stephen Deutsch's script
Uses predefined list of space keys in 'spaces' variable
'''

import requests
import getpass
import shutil
from time import sleep
from requests import Session
from requests.auth import HTTPBasicAuth
from zeep import Client
from zeep.transports import Transport

user = "admin"
password = getpass.getpass()

session = Session()
session.auth = HTTPBasicAuth(user, password)
client = Client('http://localhost:1990/confluence/rpc/soap-axis/confluenceservice-v2?WSDL',
transport=Transport(session=session))

print("Logging in...")
token = client.service.login(user, password)
print("Getting Spaces")

#Specify space keys in spaces list of spaces
spaces = list()
spaces = ('myspacekey1', 'myspacekey2')
#spaces = client.service.getSpaces(token)
numSpaces = len(spaces)

for index, space in spaces:
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space))
siteExportUrl = client.service.exportSpace(token, space, "TYPE_XML", True)
print(siteExportUrl)
filename = siteExportUrl.split('/')[-1]
print("Saving as: " + filename)
r = session.get(siteExportUrl, stream=True)
with open(filename, 'wb') as f:
shutil.copyfileobj(r.raw, f)
sleep(5)

I've made following miniscule changes to the original script provided by Stephen Deutsch.

1. Added getpass library (import getpass adds the library namespace)

2. Added invocation of getpass() method provided by getpass library so that you don't need to specify the clear text password within the script. You'll be asked to do so at the run time.

password = getpass.getpass()

3. Added a predefined list of space keys:

spaces = ('myspacekey1', 'myspacekey2')

I am not a developer, so I've added

spaces = list()

to initialize an empty list. I don't think it's necessary here...

4. Changed enumeration so that we first scan through the list of spaces in 'spaces' with this line

for index, space in spaces:

5. Changed every use of 'spaces.key' to just 'space' because we don't have a space object anymore, we have 'space' strings taken from the 'spaces' list.

Here

print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, space))

and here

siteExportUrl = client.service.exportSpace(token, space, "TYPE_XML", True)

 6. Added the 'True' bit flag as a third parameter of the exportSpace() method's argument.

Hope this helps.

Stan

Like Vitalii Vuilov likes this
Vitalii Vuilov October 17, 2019

Thanks a lot, Stan! I will try!

Vitalii Vuilov October 17, 2019

After running the script I got this error:

python exportspaces.py
File "exportspaces.py", line 37
shutil.copyfileobj(r.raw, f)
^
IndentationError: expected an indented block

P.S. fixed by shifting that block

Like Stan Ry likes this
Vitalii Vuilov October 17, 2019

"6. Added the 'True' bit flag as a third parameter of the exportSpace() method's argument." I had to remove it otherwise it says too many arguments.

Like Stan Ry likes this
Stan Ry
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 17, 2019

@Vitalii Vuilov Thanks for edits, might help others. Seems like forum script has altered formatting, and since Python is a positional language. 'for' didn't work out. Fixed in the script.

Like Vitalii Vuilov likes this
Priya Pathak January 10, 2024

I am getting below error could you please help me here 

File "/opt/homebrew/lib/python3.11/site-packages/zeep/client.py", line 76, in __init__
self.wsdl = Document(wsdl, self.transport, settings=self.settings)
File "/opt/homebrew/lib/python3.11/site-packages/zeep/wsdl/wsdl.py", line 92, in __init__
self.load(location)

1 vote
bipin nepal September 26, 2022

Hello, Any updates on this, @Brian M Thomas , were you able to figure out a solution ? 

I  have around 300 spaces to export from one instance (data center  )  and import those spaces to another instance (data center) 

Is there any way to script both the import and export without using plugins ? 

Thinking about that many spaces, its already giving me a minor heart attack 

Bipin Nepal September 26, 2022

Also looping @Stephen Deutsch

Krishna Ponnekanti September 26, 2022

Is user directory same or different ? if you have different user group set and if you have to map users to new user id , it would be problem , otherwise it would be great

Bipin Nepal September 27, 2022

@Krishna Ponnekanti  in both the instances the users are stored in jira internal directory, and in both instances the email and display name is same but the username is different

Krishna Ponnekanti September 27, 2022

how are you going to readjust or remap the old username  to new username  ?  

If you can solve the problem, above script with minor changes will solve problem for exporting all the spaces.  Importing the spaces is any how manual. 

Like Bipin Nepal likes this
Bipin Nepal September 27, 2022

I'm looking for a way to use the deprecated confluence XML-RPC and SOAP APIs.

as rest api do not have the update option :( im thinking of  bulk update such usernames  in the source and then do the export/ import  so that in both the cases username email and display name will be the same,

I haven't found the solution yet on user update :(

 if you can help me on making changes to the above script , it would be so  grateful

Bipin Nepal September 27, 2022

@Krishna Ponnekanti  i have executed the avobe script and it throws

File "src/lxml/etree.pyx", line 1848, in lxml.etree.QName.__init__
File "src/lxml/apihelpers.pxi", line 1754, in lxml.etree._tagValidOrRaise
ValueError: Invalid tag name u'Object[]'

Krishna Ponnekanti September 27, 2022

Please use below script for exporting the spaces.  My python version is 2.7.5, Based on your python version you can adjust syntax.

 

import os
import subprocess
import requests
import json
import time
import datetime
import logging
from time import sleep

## account which is having confluence system admin access
USER = 'username'
## password
PWD = 'password'

######### Bitbucket URL
URL = '<yourbibucketURL>/rpc/json-rpc/confluenceservice-v2?os_authType=basic'
begin_time = datetime.datetime.now()
timestr = time.strftime("%Y%m%d-%H%M%S")

#now we will Create and configure logger
logging.basicConfig(filename="export_"+timestr+".log", format='%(asctime)s %(message)s',filemode='w')

#Let us Create an object
logger=logging.getLogger()

#Now we are going to Set the threshold of logger to DEBUG
logger.setLevel(logging.DEBUG)

################################
#Export Space
################################
def exportSpace(spaceKey):
logger.info('Exporting Sapce')
payload = { 'jsonrpc' : '2.0', 'method' : 'exportSpace','params' : [spaceKey, 'TYPE_XML', 'true'], 'id' : 7 }
headers1 = {'Accept':'application/json', 'Content-type':'application/json' }
resp = requests.post(URL, auth = (USER, PWD), json=payload, headers=headers1)
logger.info(resp.status_code)
logger.info(resp.text)
rj = resp.json()
logger.info(rj["result"])
return rj["result"]

################################
#Download Space
################################
def downloadSpace(downloadurl,spaceKey):
logger.info('Download Sapce')
OUTPUT = spaceKey+'.zip'
nocheck = '--no-check-certificate'
userinput = '--user='+USER
pwdinput = '--password='+PWD
subprocess.call(['wget','--auth-no-challenge', userinput, pwdinput,'-O', OUTPUT, downloadurl, nocheck])


spaces =<get the list of space Keys >

numSpaces = len(spaces)
for index, space in enumerate(spaces):
print( "space.key ---->>>>> " +space)
try:
for index, spacekey in enumerate(spaces):
print("Exporting space {} of {} - {} using URL:".format(index+1, numSpaces, spacekey))
try:
resu = exportSpace(spacekey)
downloadSpace(resu,spacekey)
except Exception as e:
print(e)
sleep(5)
except Exception as ex:
print(ex)

Bipin Nepal October 15, 2022

Thanks, @Krishna Ponnekanti  i was able to export the spaces 

However the problem you mentioned about user mapping is causing trouble , looks like there is no way out of this

1 vote
Stephen Deutsch
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 26, 2017

Hi Brian,

What's your language of choice? Powershell? Python? Javascript?

Seeing your other question, I see two options. You could either do the request via SOAP (which is easier than you would expect), or you could script the downloading with a headless browser like Nightmare.js.

Let me know what direction you're looking to take and I'll see if I can whip something together.

Brian M Thomas September 26, 2017

I'll use whatever I can get and use... Here's what I've tried already.  I began by using a Chrome browser extension called Restlet Client, which lets you compose requests and chain them into test sequences.  I sent the series of requests described above (after first having called the /dologin.action).  While it has some ability to take data from previous requests (and lets the browser handle all the cookie exchanges unless you override it), it didn't have the ability to get the contents of the hidden atl_token field supplied by the exportspacexml method so as to use it in the doexportspace call (there were others, but their values didn't change, so I hard-coded them).  

I copied the field by hand with some success, so I used its feature that translates the request into a curl command, which I invoked from a standard Unix shell and scripted the shell to get data from one response to use in the next.  This also worked to an extent, but every time I did the doexportspace request, I still got permission problems.

I also tried the XMLRPC methods, as you've seen.

I have considered scripting the browser itself using Javascript, or even composing a test with Selenium, but haven't decided which way to go yet with that.

If the SOAP request is as easy as you suggest, I'd like some pointers to where I can do it more easily than I've seen; I tried the Apache CXF extension in Eclipse, and the nice, user-friendly project creation tool asked for data I had no idea how to find or compose, and required me to create my own server for testing.  I gave up on that because I ran out of patience with looking up something I didn't know only to find that finding the answer required looking up something else...

I'd like most to know what nuance I'm missing in the sequence that keeps it from thinking it's coming from an authorized user. But as I said at the beginning, I'll try whatever I can get to work, and thanks for the offer. 

Stephen Deutsch
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 26, 2017

If you let me know your operating system it would be helpful.

Brian M Thomas September 26, 2017

I imagine it would; sorry for the omission in all that verbosity.

I'm using OS X 10.11.

Brian M Thomas November 29, 2017

For the record, I'm glad that I didn't spend any more effort on trying to concoct a SOAP client, because contrary to the assertion above, it can't be done using SOAP, because though the three-parameter doexportspace call remains, the one taking a fourth parameter, which is a Boolean value specifying whether to do a full export, has been removed. 

I am greatly piqued at this, because in all of the documentation and all of my conversations with the support people, I did not discover this until I had actually made a correct call unsuccessfully.  They claim that this is documented, but saying that the interface is deprecated and will be removed in a future release does not inform me that one version of one method has already been removed, let alone forewarn me of which release will be affected before I waste untold hours trying to use it.

0 votes
Brian M Thomas October 10, 2017

Any hope here?  I'm running well over my deadline, and even if I do script it, it will probably take days.  I'd be working on it myself, but I've already spent days on that, which is why I'm behind schedule.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events