Outgoing links via Python API

Andreas Eckhardt July 5, 2024

Hi,
behind the three dots and Page Information in Confluence GUI I can the a list of outgoing links. This I'd like to retrieve via script.
To be able to check for "forbidden" links (e.g. links to removed server, not allowed internet pages,...) I'd like to retrieve for all pages of a space all outgoing links via a python (best atlassian python api).
Is there something more efficient, than html parsing these Page Information page?

1 answer

1 accepted

0 votes
Answer accepted
Andreas Eckhardt July 10, 2024

After a while of searching I found the answer by updating the atlassian python api, which now has a function to use regular expressions:

regex = r'<a\s+href=[\'"]([^\'"]+)[\'"]|<ri:page([^\/]+)\/>' # Regex pattern for link elements
output = self.cmiatlapicaller.scrap_regex_from_page(pageid, regex)



Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 21, 2024

Hey thanks for your regex, @Andreas Eckhardt.

I was trying to find links on a page using Automation, not Python, so the syntax is slightly different, but the concepts are the same.

One thing I discovered that might be helpful to you or others is that I did not want to have to construct links out of the <ri:page ... > elements (which include page space and title), so I used body-format=view for the pages endpoint.

So then I make a Web Request in Automation to the API thusly:

https://YOURSITE.atlassian.net/wiki/api/v2/pages/{{page.id}}?body-format=view

This gives me the full HTML output for a page, where I was then able to use the match operator to split out the list of links, where I use a bit of your regex:

{{webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}

 

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PRODUCT PLAN
STANDARD
TAGS
AUG Leaders

Atlassian Community Events