Delay is response for Confluence Cloud

Pranab Bhatta December 25, 2018

Hi All,

While working on Confluence Cloud API (Search content by CQL), there is a property start (The starting index of the returned content), where we can give the integer value. We are trying to fetch all the pages using the "Search content by CQL". As the we keep on increasing the start value, i.e. greater or equals to 10000, the response time increases to around 30 seconds per request. Otherwise the response time is less than 5 seconds. This is hampering the the overall performance.

 

Not sure if this is a bug. Could you please guide us?

 

Thanks,

Pranab

2 answers

0 votes
Lawrence Chou
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
July 4, 2023

We are also trying to crawl pages with search API like Pranab does. We didn't use other RESTful endpoints for crawling due their respective bugs and limitation. 

One workaround we are currently trying is preventing big "page" with sliding window, like:

 

1. CQL="type=blogpost and created > T1 and create <= T2", page=1

2. CQL="type=blogpost and created > T1 and create <= T2", page=2

3. CQL="type=blogpost and created > T1 and create <= T2", page=3

4. CQL="type=blogpost and created > T2 and create <= T3", page=1

... 

10000. CQL="type=blogpost and created > T1001 and create <= T1002", page 3 <---- small "page"

 

instead of:

 

1. CQL="type=blogpost", page=1

2. CQL="type=blogpost", page=2

...

10000. CQL="type=blogpost", page=10000  <---- big "page"

 

P.s. I can imagine Nic's point of "hold and work through a huge list" may be the cause of such limitation, but I still think it's a bug due to underlying implementation. Above is just a possible workaround to this bug.

0 votes
Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 26, 2018

Returning 10,000 pages is a lot of work, it's going to take time.

I'd suggest not doing that, and using the system as it is intended to be used instead (i.e. why are you trying to get so many pages?  That volume is useless to an end user, so you must be trying to do something other than provide pages to users which is either pointless, or for some reason where other reporting will be a far better method)

Pranab Bhatta December 26, 2018

Hi Nic,

 

Thanks for the reply. I am not trying get 10,000 pages at a single request. If you check the API, there is a start parameter (The starting index of the returned content). We are getting the pages in batches. Initially start value is set to 0, then I will increase the start by 50 with each batches. When the start index reaches the 10,000, then I notice the considerable delay.  

Here is the documentation link https://developer.atlassian.com/cloud/confluence/rest/#api-content-search-get

 

Thanks,

Pranab

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 26, 2018

You're still asking it to work with a large number of results and it needs to build, hold and work through a huge list.

I'd suggest you look at simplifying what you're doing to a sane range of pages.

Pranab Bhatta December 26, 2018

Hi Nic,

Thanks for the reply.

Sorry for asking the question again. Let me explain again. Consider you have over 1,00,000 pages in Confluence and you would like to retrieve all pages. 

1) You are making 1st API call, something like this /wiki/rest/api/content/search?cql=type=page&start=0&limit=20]

2) Confluence send responses something like below with 20 records. This is not a large number of results set. 

"_links": {
"base": "https://ibmappcon.atlassian.net/wiki",
"context": "/wiki",
"next": "/rest/api/content/search?limit=20&start=20&cql=type=page",
"self": "https://ibmappcon.atlassian.net/wiki/rest/api/content/search?cql=type=page"
}

3) Again you will make an API call by extracting the value from the next property above in the response. You will perform this call till all the 1,00,000 pages are retrieved. 

4) So when you reach at 10000 index. Your API call with the next value will be like this [/rest/api/content/search?limit=20&start=10000&cql=type=page]. You will see considerable delay in the response.

So my question, in the above scenarios though my per page size is 20 records, there is delay in the response. 

1) Is there a better way of fetching the all the pages with pagination?

2) Will it improve the performance, i.e. response time?

 

Thanks,

Pranab

Pranab Bhatta December 26, 2018

Hi Nic,

 

Thanks for the reply. Sorry for asking the question again.

Let me explain again. Suppose you have over 1,00,000 pages in Confluence and you want to fetch all the pages. So as per the API documentation you can fetch pages using API [/wiki/rest/api/content/search]. Below is what I am trying to do.

1) Make a 1st API call, so my request will be [/wiki/rest/api/content/search?cql=type=page&limit=20&start=0].

2) The endpoint send response back with 20 records. This is not a large number of results. Below is the sample response.

"_links": {
"base": "https://ibmappcon.atlassian.net/wiki",
"context": "/wiki",
"next": "/rest/api/content/search?limit=20&start=20&cql=type=page",
"self": "https://ibmappcon.atlassian.net/wiki/rest/api/content/search?cql=type=page"
}

3) Again you will make an API call by extracting the value in the next property as mentioned above. You will keep on performing the API call till all 1,00,000 pages is retrieved.

4) As you making the API call, you will reach at a point where your start index will reach 10,000. So the API would look like this [/rest/api/content/search?limit=20&start=10000&cql=type=page]. Here you will then notice a considerable delay in the response and later at some point of time you will HTTP 504.

So my question is:

1) Is there other better way to fetch all the pages?

2) Will it improve the performance, i.e. improved response time?

3) Will it resolve the HTTP 504 Gateway timeout issue?

 

Thanks,

Pranab

Pranab Bhatta December 26, 2018

1 more point I would like to add is we are trying to perform crawl and fetch all the pages with metadata.

 

Thanks,

Pranab

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 28, 2018

Again, you're still asking it to work with a large number of results and it needs to build, hold and work through a huge list.

And, again, I'd suggest you look at simplifying what you're doing to a sane range of pages.

So, instead of trying to overload your system, I'd take a look at why you think you want to.  What are you trying to achieve?  What does the end user get out of this?

Pranab Bhatta January 8, 2019

Hi Nic,

Thanks for the reply.

Just wanted to confirm, is there any limit set to the no. of records. for example, Page, Blogpost, Comment, Attachment, etc., these object can hold. But did not find any documentation where we can refer.

 

Thanks,

Pranab

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events