Can't get list of Page IDs for all links on a page

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 22, 2024

tl;dr - With Automation + Confluence API I can get all the outgoing links on a given page. But I can't get the Page IDs for the links that are pages on the same Confluence site.

Sorry, the context for this is a bit long. Please bear with me.

I have a Confluence page that's a Directory of Post-Mortems. It contains parent pages under which various teams create their Post-Mortems/Root Cause Analyses. 

I'm working an automation that every month, will search for all the descendants created in the last 60 days, and add an "engineering-rca" label to them and then send a report of the pages via email.

If I lookup the Page IDs of all of the parent pages, I'm able to make a search API call to do a CQL search like so:

{{baseurl}}/rest/api/search?cql=ancestor%20in%20({{ancestors.urlEncode}})%20and%20created%20%3E%20now(%22-60d%22)%20and%20title%20!~%20%22draft%22

This is the CQL:

ancestor in ( 144615652 , 272599812 , 107385481 , 106134594 , 176561924 , 174031023 , 109644138 , 161088427 , 81396653 , 599735904 , 177937889 , 149088663 ) and created > now ( "-60d" )

I can then access the list of descendants in {{webResponse.body.results}}.

I can add labels using the pageIDs are in {{webResponse.body.results.content.id}} in conjunction with the labels API

And I can send an email of links thusly:

<ul>

{{#webResponse.body.results}}<li><a href="{{baseurl}}{{content._links.webui}}">{{title}}</a>

{{/}}

</ul>

So yeah, all that's cool.

But it's possible that users will add additional parent pages to the Directory. While I am watching the page, I thought it would be useful if I didn't have to monitor, lookup, and manually add page IDs to the CQL query.

Using the body-format=view option for the pages endpoint (to get full HTML of the page) along with the match function (to search for all of the a href tags) I can get a list of links, with this regex:

{{webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}

(Adapted from @Andreas Eckhardt's answer to his question on Outgoing links via Python API)

So yes, if you are looking for a way to extract all links on a page, that's the solution.

Unfortunately not all those links contain Page IDs. On my Directory page the links are a mixture of:

It's easy enough to use match to parse out PAGEID from the first example, and for the second example, there's the v1 content endpoint that lets you search by SPACEKEY and TITLE:

https://YOURSITE/wiki/rest/api/content?title=<TITLE>&spaceKey=<SPACEKEY>

Tiny Links are trickier, I don't know if I can reverse-engineer it, especially in Automation, which doesn't have any Base64 functions that I'm aware of haha.

But the PROBLEM IS: I'm trying to do this in a Branch that goes through all the links I've found. And that's a problem:

Screenshot 2024-12-22 at 12.18.32 AM.png

So I can get Page IDs for all of the pre-Cloud URLs by doing a lookup. Or I can get all of the full urls that contain a Page Id. But because processing for each {{url}} stops after the first IF fails, I can't get both. I need an ELSE.

Oh and ANOTHER PROBLEM, which @Bill Sheboy knows all too well: 

I can't "store" these Page IDs anywhere in Automation where I can access them later. I was thinking about using page properties to store an "outgoing-pageIDs" that I could GET and then PUT to update with each new Page Id I find. Ugh.

Anyways, that's where I'm currently stuck. 

1 answer

0 votes
Bill Sheboy
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
December 22, 2024

Hi @Darryl Lee 

Yikes!  That is an interesting set of problems to solve the scenario.  Here are some ideas, in no particular order (or guess of effectiveness ;^)

  1. Is there any way to divide-and-conquer this so the check / label update is made when the page is updated by people, and then the only work for the 60-day schedule is reporting?
  2. Have you considered building a service to do this outside the rule (using a language more helpful to resolve the automation limitations), and then call that service with Send Web Request?
  3. Or...delegation: each branched-to item calls an Incoming Webhook Trigger rule and that is where the if / else structure happens.  I do not know if this will bypass the 10-loop service limit (as it is not really triggering other rules; it is calling them with Send Web Request).
  4. Or...if those conditions in the branch are mutually exclusive (rather than sequentially additive), you could use three branches: one to process each case.
  5. Or...(and this one is a reach) using ideas from my dynamic list searching technique article, separate the URLs from the three cases into variables for later parsing.

Good luck, and I hope you have a great week!

Kind regards,
Bill

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 23, 2024

AH, I actually think this is possible with 4 and 5.

4 feels like a little inelegant, because I'd be running the same branch 3 times and ignoring the failures.

But with 5... I can filter it...

So I created create this variable, urlList:

{{#webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}url=={{.}}{{^last}}~~{{/}}{{/}}

And then for each branch I used:

{{urlList.split("~~").match("url==(\/wiki\/spaces\/.*)")}}

and

{{urlList.split("~~").match("url==(https:\/\/roku.atlassian.net\/wiki\/display\/.*)")}}

(TinyURL reverse engineering TBD)

OK ha, but now I still have a problem of how/where to combine the page IDs from these separate branches. Page Properties...?

Like Bill Sheboy likes this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 23, 2024

Optimized by removing "url==", because I'm not searching within a particular field:

So then: urlList:

{{#webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}{{.}}{{^last}}~~{{/}}{{/}}

Branch to find Page IDs for "relative" links:

{{urlList.split("~~").match("(\/wiki\/spaces\/.*)")}}

Branch to do Page ID lookups for pre-cloud links:

{{urlList.split("~~").match("(https:\/\/roku.atlassian.net\/wiki\/display\/.*)")}}

Bill Sheboy
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
December 23, 2024

My suggestion for #5 was in order to parse the lists into IDs for each case, followed by merging them for use in a single branch / reporting need. 

NB: I did indicate it was a reach to achieve that ;^)

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 23, 2024

Hey @Bill Sheboy hopefully we can take a break for the holidays soon, although kind of crazily I think about this stuff "for fun".

So... yes, I should be able to parse out Page Ids from full URLs just using the match operator on URLs like this:

The problem is that to get the Page Ids when I only have a Space Key and Page title, like below, I'm having to do another web request, which I think has to go through a branch, and I can't create a list while I'm in a branch. Hence, I think... properties.

And ha, I still haven't spent much time thinking about how reverse-engineer a base64 bytestring using the limited maths functions that Automation provides. Hum, looks like yeah, your idea of making an external call is really the only route, in which case, I really ought to create webservice that can accept an array of URLs and handles all of the possible cases and returns an array of Page IDs.

Like Bill Sheboy likes this

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events