tl;dr - With Automation + Confluence API I can get all the outgoing links on a given page. But I can't get the Page IDs for the links that are pages on the same Confluence site.
Sorry, the context for this is a bit long. Please bear with me.
I have a Confluence page that's a Directory of Post-Mortems. It contains parent pages under which various teams create their Post-Mortems/Root Cause Analyses.
I'm working an automation that every month, will search for all the descendants created in the last 60 days, and add an "engineering-rca" label to them and then send a report of the pages via email.
If I lookup the Page IDs of all of the parent pages, I'm able to make a search API call to do a CQL search like so:
{{baseurl}}/rest/api/search?cql=ancestor%20in%20({{ancestors.urlEncode}})%20and%20created%20%3E%20now(%22-60d%22)%20and%20title%20!~%20%22draft%22
This is the CQL:
ancestor in ( 144615652 , 272599812 , 107385481 , 106134594 , 176561924 , 174031023 , 109644138 , 161088427 , 81396653 , 599735904 , 177937889 , 149088663 ) and created > now ( "-60d" )
I can then access the list of descendants in {{webResponse.body.results}}.
I can add labels using the pageIDs are in {{webResponse.body.results.content.id}} in conjunction with the labels API.
And I can send an email of links thusly:
<ul>
{{#webResponse.body.results}}<li><a href="{{baseurl}}{{content._links.webui}}">{{title}}</a>
{{/}}
</ul>
So yeah, all that's cool.
But it's possible that users will add additional parent pages to the Directory. While I am watching the page, I thought it would be useful if I didn't have to monitor, lookup, and manually add page IDs to the CQL query.
Using the body-format=view option for the pages endpoint (to get full HTML of the page) along with the match function (to search for all of the a href tags) I can get a list of links, with this regex:
{{webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}
(Adapted from @Andreas Eckhardt's answer to his question on Outgoing links via Python API)
So yes, if you are looking for a way to extract all links on a page, that's the solution.
Unfortunately not all those links contain Page IDs. On my Directory page the links are a mixture of:
full URLs that have the PageID embedded, (https://YOURSITE/wiki/spaces/<SPACEKEY>/<PAGEID>/<TITLE>)
pre-Cloud-style URLs (https://YOURSITE/wiki/display/<SPACEKEY>/<TITLE>)
Tiny Links (https://YOURSITE/wiki/x/<TINYURL>)
It's easy enough to use match to parse out PAGEID from the first example, and for the second example, there's the v1 content endpoint that lets you search by SPACEKEY and TITLE:
https://YOURSITE/wiki/rest/api/content?title=<TITLE>&spaceKey=<SPACEKEY>
Tiny Links are trickier, I don't know if I can reverse-engineer it, especially in Automation, which doesn't have any Base64 functions that I'm aware of haha.
But the PROBLEM IS: I'm trying to do this in a Branch that goes through all the links I've found. And that's a problem:
So I can get Page IDs for all of the pre-Cloud URLs by doing a lookup. Or I can get all of the full urls that contain a Page Id. But because processing for each {{url}} stops after the first IF fails, I can't get both. I need an ELSE.
Oh and ANOTHER PROBLEM, which @Bill Sheboy knows all too well:
I can't "store" these Page IDs anywhere in Automation where I can access them later. I was thinking about using page properties to store an "outgoing-pageIDs" that I could GET and then PUT to update with each new Page Id I find. Ugh.
Anyways, that's where I'm currently stuck.
Hi @Darryl Lee
Yikes! That is an interesting set of problems to solve the scenario. Here are some ideas, in no particular order (or guess of effectiveness ;^)
Good luck, and I hope you have a great week!
Kind regards,
Bill
AH, I actually think this is possible with 4 and 5.
4 feels like a little inelegant, because I'd be running the same branch 3 times and ignoring the failures.
But with 5... I can filter it...
So I created create this variable, urlList:
{{#webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}url=={{.}}{{^last}}~~{{/}}{{/}}
And then for each branch I used:
{{urlList.split("~~").match("url==(\/wiki\/spaces\/.*)")}}
and
{{urlList.split("~~").match("url==(https:\/\/roku.atlassian.net\/wiki\/display\/.*)")}}
(TinyURL reverse engineering TBD)
OK ha, but now I still have a problem of how/where to combine the page IDs from these separate branches. Page Properties...?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Optimized by removing "url==", because I'm not searching within a particular field:
So then: urlList:
{{#webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}{{.}}{{^last}}~~{{/}}{{/}}
Branch to find Page IDs for "relative" links:
{{urlList.split("~~").match("(\/wiki\/spaces\/.*)")}}
Branch to do Page ID lookups for pre-cloud links:
{{urlList.split("~~").match("(https:\/\/roku.atlassian.net\/wiki\/display\/.*)")}}
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
My suggestion for #5 was in order to parse the lists into IDs for each case, followed by merging them for use in a single branch / reporting need.
NB: I did indicate it was a reach to achieve that ;^)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey @Bill Sheboy hopefully we can take a break for the holidays soon, although kind of crazily I think about this stuff "for fun".
So... yes, I should be able to parse out Page Ids from full URLs just using the match operator on URLs like this:
full URLs that have the PageID embedded, (https://YOURSITE/wiki/spaces/<SPACEKEY>/<PAGEID>/<TITLE>)
The problem is that to get the Page Ids when I only have a Space Key and Page title, like below, I'm having to do another web request, which I think has to go through a branch, and I can't create a list while I'm in a branch. Hence, I think... properties.
pre-Cloud-style URLs (https://YOURSITE/wiki/display/<SPACEKEY>/<TITLE>)
And ha, I still haven't spent much time thinking about how reverse-engineer a base64 bytestring using the limited maths functions that Automation provides. Hum, looks like yeah, your idea of making an external call is really the only route, in which case, I really ought to create webservice that can accept an array of URLs and handles all of the possible cases and returns an array of Page IDs.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.