Community
Products
Confluence
Questions
List all pages in a space showing titles of page and IDs

List all pages in a space showing titles of page and IDs

Does any please help me to get the list of all pages showing page titles and page IDs within a Confluence space using bash / python script ?

I want to generate a list of all pages showing the page title and page ID.

Thanks in advance

Vikas

2 answers

2 votes

Hi Vikas!

There are a few ways to accomplish this:

REST API

The api will likely be the best way to retrieve data from Confluence in a bash or Confluence script. You can see Confluence REST API Examples for examples of terminal and python commands for using the API.

The following URL will return a JSON list of all pages in the instance (replace <base-URL> with the base URL for your instance):

http://<base-URL>/rest/api/content?type=page&start=0&limit=99999

You can then use python to parse through the JSON to find the ID and title of each page (useful article on JSON parsing with Python: Working with JSON data in Python).

Database

While the REST API would be most convenient to use with a Python/bash script, you can also get all the page titles and ID's from the database with the following query:

SELECT title, contentid
FROM content
WHERE contenttype = 'PAGE'
 AND prevver IS NULL
 AND content_status = 'current';

I hope this helps!
-Zak

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

@Zak Laughton When I use the following, I get only 200 results. Is that set by the Confluence Server admin?

http://<base-URL>/rest/api/content?type=page&start=0&limit=99999

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

It is works to me for this url.

https://<base-URL>/wiki/rest/api/space/{SPACE_KEY}/content?start=0&limit=9999&type=page

But there were still some problems.

1. the result still exist limit. the limit is 1000

2. I add a new param: expand=children.page. the limit param is no effective. (In fact. the limit is return to 200...)

@Vikas Shrivastava @antony terrence

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Deleted user likes this

I had to get the first set of results and do a loop based on the presence of the next link in the response. When I set the limit to 99999, and I get maximum of 500. If we have to perform a simple action of getting all page details, we have to make multiple calls. I am sure there are areas where Atlassian could reduce the number of calls required to be made. This scenario is one of them. The depth parameter does not work.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Deleted user likes this

Yes. Finally, I made multiple calls to get all pages. But I found another problem. There were exist limit in the "children" field when I add the param: expand=children.page.

(The limit is 25). So that I can't generate the tree structure. This is confusing

https://<base-URL>/wiki/rest/api/space/{SPACE_KEY}/content?expand=children.page&type=page&limit=9999

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

I am seeing 404 not found on using above API call.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hi Guys, I used this script for listing all pages from specific space via API:

$url = "https://$($serverUrl)/rest/api/space/$($SpaceKey)/content/page?limit=99999"

$response = Invoke-RestMethod -Method "GET" -Headers $headers -Uri $url -UseBasicParsing

$allSpacePages = $response.results

do {

$url = "https://$($serverUrl)$($response._links.next)"

$response = Invoke-RestMethod -Method "GET" -Headers $headers -Uri $url -UseBasicParsing

$allSpacePages += $response.results

} while($response._links.next -ne $null)

This is really goes thru (i tested via POSTMAN step by step) all "_links.next" until this object is null and returns me about 6500 pages from space, but...
when I listed all pages from space via SQL query:

SELECT * FROM [Cfl-Db].[dbo].[CONTENT]

WHERE CONTENTTYPE = 'PAGE' AND SPACEID = 51118093

ORDER BY TITLE

!!! I got twice more pages about 12 000 !!!

So question is why the api call didn't list all existing pages?

I use Datacenter version 7.18.3

Thank you for your answers :)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Guys my fault :( I realized that DB returns all page types like "drafts, deleted or current" pages.
But anyway from DB I got more pages then from API.

Fixed query:

SELECT * FROM [$db].[dbo].[CONTENT]
WHERE CONTENTTYPE = 'PAGE' AND CONTENT_STATUS = 'current' AND SPACEID = $spaceId
ORDER BY TITLE

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

List all pages in a space showing titles of page and IDs

2 answers

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events