Community
Products
Atlassian Automation
Questions
Can't get list of Page IDs for all links on a page

Can't get list of Page IDs for all links on a page

tl;dr - With Automation + Confluence API I can get all the outgoing links on a given page. But I can't get the Page IDs for the links that are pages on the same Confluence site.

Sorry, the context for this is a bit long. Please bear with me.

I have a Confluence page that's a Directory of Post-Mortems. It contains parent pages under which various teams create their Post-Mortems/Root Cause Analyses.

I'm working an automation that every month, will search for all the descendants created in the last 60 days, and add an "engineering-rca" label to them and then send a report of the pages via email.

If I lookup the Page IDs of all of the parent pages, I'm able to make a search API call to do a CQL search like so:

{{baseurl}}/rest/api/search?cql=ancestor%20in%20({{ancestors.urlEncode}})%20and%20created%20%3E%20now(%22-60d%22)%20and%20title%20!~%20%22draft%22

This is the CQL:

ancestor in ( 144615652 , 272599812 , 107385481 , 106134594 , 176561924 , 174031023 , 109644138 , 161088427 , 81396653 , 599735904 , 177937889 , 149088663 ) and created > now ( "-60d" )

I can then access the list of descendants in {{webResponse.body.results}}.

I can add labels using the pageIDs are in {{webResponse.body.results.content.id}} in conjunction with the labels API.

And I can send an email of links thusly:

<ul>

{{#webResponse.body.results}}<li><a href="{{baseurl}}{{content._links.webui}}">{{title}}</a>

{{/}}

</ul>

So yeah, all that's cool.

But it's possible that users will add additional parent pages to the Directory. While I am watching the page, I thought it would be useful if I didn't have to monitor, lookup, and manually add page IDs to the CQL query.

Using the body-format=view option for the pages endpoint (to get full HTML of the page) along with the match function (to search for all of the a href tags) I can get a list of links, with this regex:

{{webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}

(Adapted from @Andreas Eckhardt's answer to his question on Outgoing links via Python API)

So yes, if you are looking for a way to extract all links on a page, that's the solution.

Unfortunately not all those links contain Page IDs. On my Directory page the links are a mixture of:

full URLs that have the PageID embedded, (https://YOURSITE/wiki/spaces/<SPACEKEY>/<PAGEID>/<TITLE>)
pre-Cloud-style URLs (https://YOURSITE/wiki/display/<SPACEKEY>/<TITLE>)
Tiny Links (https://YOURSITE/wiki/x/<TINYURL>)

It's easy enough to use match to parse out PAGEID from the first example, and for the second example, there's the v1 content endpoint that lets you search by SPACEKEY and TITLE:

https://YOURSITE/wiki/rest/api/content?title=<TITLE>&spaceKey=<SPACEKEY>

Tiny Links are trickier, I don't know if I can reverse-engineer it, especially in Automation, which doesn't have any Base64 functions that I'm aware of haha.

But the PROBLEM IS: I'm trying to do this in a Branch that goes through all the links I've found. And that's a problem:

Screenshot 2024-12-22 at 12.18.32 AM.png

So I can get Page IDs for all of the pre-Cloud URLs by doing a lookup. Or I can get all of the full urls that contain a Page Id. But because processing for each {{url}} stops after the first IF fails, I can't get both. I need an ELSE.

Oh and ANOTHER PROBLEM, which @Bill Sheboy knows all too well:

I can't "store" these Page IDs anywhere in Automation where I can access them later. I was thinking about using page properties to store an "outgoing-pageIDs" that I could GET and then PUT to update with each new Page Id I find. Ugh.

Anyways, that's where I'm currently stuck.

1 answer

0 votes

Hi @Darryl Lee

Yikes! That is an interesting set of problems to solve the scenario. Here are some ideas, in no particular order (or guess of effectiveness ;^)

Is there any way to divide-and-conquer this so the check / label update is made when the page is updated by people, and then the only work for the 60-day schedule is reporting?
Have you considered building a service to do this outside the rule (using a language more helpful to resolve the automation limitations), and then call that service with Send Web Request?
Or...delegation: each branched-to item calls an Incoming Webhook Trigger rule and that is where the if / else structure happens. I do not know if this will bypass the 10-loop service limit (as it is not really triggering other rules; it is calling them with Send Web Request).
Or...if those conditions in the branch are mutually exclusive (rather than sequentially additive), you could use three branches: one to process each case.
Or...(and this one is a reach) using ideas from my dynamic list searching technique article, separate the URLs from the three cases into variables for later parsing.

Good luck, and I hope you have a great week!

Kind regards,
Bill

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

AH, I actually think this is possible with 4 and 5.

4 feels like a little inelegant, because I'd be running the same branch 3 times and ignoring the failures.

But with 5... I can filter it...

So I created create this variable, urlList:

{{#webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}url=={{.}}{{^last}}~~{{/}}{{/}}

And then for each branch I used:

and

{{urlList.split("~~").match("url==(https:\/\/roku.atlassian.net\/wiki\/display\/.*)")}}

(TinyURL reverse engineering TBD)

OK ha, but now I still have a problem of how/where to combine the page IDs from these separate branches. Page Properties...?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Bill Sheboy likes this

Optimized by removing "url==", because I'm not searching within a particular field:

So then: urlList:

{{#webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}{{.}}{{^last}}~~{{/}}{{/}}

Branch to find Page IDs for "relative" links:

{{urlList.split("~~").match("(\/wiki\/spaces\/.*)")}}

Branch to do Page ID lookups for pre-cloud links:

{{urlList.split("~~").match("(https:\/\/roku.atlassian.net\/wiki\/display\/.*)")}}

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

My suggestion for #5 was in order to parse the lists into IDs for each case, followed by merging them for use in a single branch / reporting need.

NB: I did indicate it was a reach to achieve that ;^)

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hey @Bill Sheboy hopefully we can take a break for the holidays soon, although kind of crazily I think about this stuff "for fun".

So... yes, I should be able to parse out Page Ids from full URLs just using the match operator on URLs like this:

full URLs that have the PageID embedded, (https://YOURSITE/wiki/spaces/<SPACEKEY>/<PAGEID>/<TITLE>)

The problem is that to get the Page Ids when I only have a Space Key and Page title, like below, I'm having to do another web request, which I think has to go through a branch, and I can't create a list while I'm in a branch. Hence, I think... properties.

pre-Cloud-style URLs (https://YOURSITE/wiki/display/<SPACEKEY>/<TITLE>)

And ha, I still haven't spent much time thinking about how reverse-engineer a base64 bytestring using the limited maths functions that Automation provides. Hum, looks like yeah, your idea of making an external call is really the only route, in which case, I really ought to create webservice that can accept an array of URLs and handles all of the possible cases and returns an array of Page IDs.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Bill Sheboy likes this

Gah, coal for asynchronous/parallel executions!

I was unable to use Page Properties, because Automation keeps trying to update the property at the same time for every PageID it finds, and so I get version collisions.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Indeed! I am concerned and curious to learn what they are actually implementing for the in-progress suggestion in this area, or if it just a delaying tactic (pun intended ;^)

https://jira.atlassian.com/browse/AUTO-32

If it is a premium / enterprise-only feature to toggle branch processing type, that is going to cause quite the customer response!

Too bad there is no public REST API to read / write Confluence Databases to store the info in scenarios like yours.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Happy new year @Bill Sheboy!

So a big surprise to me was that Automation for Confluence and Jira recently added a Delay action! This isn't in Confluence documentation yet, and was pointed out to me by a Support engineer:

Screenshot 2025-01-14 at 11.46.01 AM.png

With the following help text:

Delay

What does this component do?

When this action is added to a rule (before a condition or another action), it adds a time-based delay in between two components. The rule gets delayed by X amount of time before executing the next component in the rule.

Use smart values here: No

Things to note

A delay component can’t have more than 15 minutes or 900 seconds of delay. And the total amount of delay in a rule can’t be more than 60 minutes. This action is available only for Premium and Enterprise plans at the moment.

Adding the delay component to a branch only delays the execution of the branch’s components and does not affect the rest of the rule.

This action is available only for Premium and Enterprise plans at the moment.

But here's the bad news:

It can't take smart values
It doesn't help our case, because inside of an Advance Branch, all executions run in parallel, so even if I add a delay of 10 seconds, the delay will be the same for ALL executions
And yes, I totally wanted to use the RANDOM math function to try to make the delays different for each execution, although then they would not be in sequence, which may or may not be a problem

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Bill Sheboy likes this

Happy New Year, @Darryl Lee !!

Yes, and...regarding that feature: other than the problem the Delay() function is only available for Premium and Enterprise license levels, I see it only valuable for a narrow set of usages:

to delay when expecting some external processing of known duration to conclude
to try to workaround racetrack conditions in some rule actions (e.g., issue updates, linking, if / else blocks, etc. not actually finishing when control returns to the next rule step)

Still no help for your original scenario, I think.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Can't get list of Page IDs for all links on a page

1 answer

Delay

What does this component do?

Things to note

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events