Does any please help me to get the list of all pages showing page titles and page IDs within a Confluence space using bash / python script ?
I want to generate a list of all pages showing the page title and page ID.
Thanks in advance
Vikas
Hi Vikas!
There are a few ways to accomplish this:
REST API
The api will likely be the best way to retrieve data from Confluence in a bash or Confluence script. You can see Confluence REST API Examples for examples of terminal and python commands for using the API.
The following URL will return a JSON list of all pages in the instance (replace <base-URL> with the base URL for your instance):
http://<base-URL>/rest/api/content?type=page&start=0&limit=99999
You can then use python to parse through the JSON to find the ID and title of each page (useful article on JSON parsing with Python: Working with JSON data in Python).
Database
While the REST API would be most convenient to use with a Python/bash script, you can also get all the page titles and ID's from the database with the following query:
SELECT title, contentid
FROM content
WHERE contenttype = 'PAGE'
AND prevver IS NULL
AND content_status = 'current';
I hope this helps!
-Zak
@Zak Laughton When I use the following, I get only 200 results. Is that set by the Confluence Server admin?
http://<base-URL>/rest/api/content?type=page&start=0&limit=99999
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
It is works to me for this url.
https://<base-URL>/wiki/rest/api/space/{SPACE_KEY}/content?start=0&limit=9999&type=page
But there were still some problems.
1. the result still exist limit. the limit is 1000
2. I add a new param: expand=children.page. the limit param is no effective. (In fact. the limit is return to 200...)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I had to get the first set of results and do a loop based on the presence of the next link in the response. When I set the limit to 99999, and I get maximum of 500. If we have to perform a simple action of getting all page details, we have to make multiple calls. I am sure there are areas where Atlassian could reduce the number of calls required to be made. This scenario is one of them. The depth parameter does not work.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes. Finally, I made multiple calls to get all pages. But I found another problem. There were exist limit in the "children" field when I add the param: expand=children.page.
(The limit is 25). So that I can't generate the tree structure. This is confusing
https://<base-URL>/wiki/rest/api/space/{SPACE_KEY}/content?expand=children.page&type=page&limit=9999
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Guys, I used this script for listing all pages from specific space via API:
$url = "https://$($serverUrl)/rest/api/space/$($SpaceKey)/content/page?limit=99999"
$response = Invoke-RestMethod -Method "GET" -Headers $headers -Uri $url -UseBasicParsing
$allSpacePages = $response.results
do {
$url = "https://$($serverUrl)$($response._links.next)"
$response = Invoke-RestMethod -Method "GET" -Headers $headers -Uri $url -UseBasicParsing
$allSpacePages += $response.results
} while($response._links.next -ne $null)
This is really goes thru (i tested via POSTMAN step by step) all "_links.next" until this object is null and returns me about 6500 pages from space, but...
when I listed all pages from space via SQL query:
SELECT * FROM [Cfl-Db].[dbo].[CONTENT]
WHERE CONTENTTYPE = 'PAGE' AND SPACEID = 51118093
ORDER BY TITLE
!!! I got twice more pages about 12 000 !!!
So question is why the api call didn't list all existing pages?
I use Datacenter version 7.18.3
Thank you for your answers :)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Guys my fault :( I realized that DB returns all page types like "drafts, deleted or current" pages.
But anyway from DB I got more pages then from API.
Fixed query:
SELECT * FROM [$db].[dbo].[CONTENT]
WHERE CONTENTTYPE = 'PAGE' AND CONTENT_STATUS = 'current' AND SPACEID = $spaceId
ORDER BY TITLE
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.