Hi,
behind the three dots and Page Information in Confluence GUI I can the a list of outgoing links. This I'd like to retrieve via script.
To be able to check for "forbidden" links (e.g. links to removed server, not allowed internet pages,...) I'd like to retrieve for all pages of a space all outgoing links via a python (best atlassian python api).
Is there something more efficient, than html parsing these Page Information page?
After a while of searching I found the answer by updating the atlassian python api, which now has a function to use regular expressions:
Hey thanks for your regex, @Andreas Eckhardt.
I was trying to find links on a page using Automation, not Python, so the syntax is slightly different, but the concepts are the same.
One thing I discovered that might be helpful to you or others is that I did not want to have to construct links out of the <ri:page ... > elements (which include page space and title), so I used body-format=view for the pages endpoint.
So then I make a Web Request in Automation to the API thusly:
https://YOURSITE.atlassian.net/wiki/api/v2/pages/{{page.id}}?body-format=view
This gives me the full HTML output for a page, where I was then able to use the match operator to split out the list of links, where I use a bit of your regex:
{{webResponse.body.body.view.value.match("<a\\shref=\\\"([^\\\"]+)\\\"")}}
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.