Extracting content from Confluence Page

Patrick December 3, 2023

Hi,

I am looking to extract information from a specific section within a Confluence page. I am having trouble with which Smart Values to utilize.

For example, I have an automation that triggers, and when it does, a Slack message is sent with the page title using `{{page.title}}`. I have another section/heading within the page called Minutes and was wondering if there's a way to extract the information.

I created an excerpt and added the content in there, but I'm not sure how to query that information or if there's another way to do so.

Any assistance would be greatly appreciated!

1 answer

1 accepted

0 votes
Answer accepted
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 3, 2023

Hi @Patrick - Unfortunately Atlassian has not seen fit to give us access to the content of Confluence pages in Automation.

You could use Automation's Send web request action to access the REST API for Confluence which would enable you to get the content of the page returned as a smart value.

You could then use the various smart values text functions to search for your section and grab the content therein.

The specific endpoint you'd want to hit is something like:

https://your-domain.atlassian.net/wiki/rest/api/content/3965072?expand=body.storage

But yeah, it's a wee bit of work. There's a good tutorial on this here:

It says it's for Jira, but it should work for Confluence as well.

The page data ought to be accessible via this smart value:

{{webResponse.body.storage.value}}

So you could find stuff by looking for the excerpt like here:

<ac:structured-macro ac:name=\"excerpt\" 

Or looking for text between Headings: 

<ac:rich-text-body><h1>Minutes</h1><p>Here are the minutes</p></ac:rich-text-body></ac:structured-macro><h1>Not Minutes</h1>

But yeah, it's a little tricky. If you're up for the challenge, give it a shot, and let us know if you run into any problems. :-}

Patrick December 4, 2023

ANOTHER UPDATE:

I feel like i'm in my software engineering days where I've looked at JSON file for so long. It turns out that it's nested in another body. So, I finally got my value but had to put {{webResponse.body.body.storage.value}}.

Now the only thing I need to do is format it because the message is being sent as HTML text.

 

UPDATE:

So it looks like i'm now able to get what I need but only using {{webResponse.body}}. If I go any deeper, {{webResponse.body.storage.value}} for example, the audit log just shows an empty log.

What I'm really hoping for is to get that value and use the Send Slack Message trigger. I feel like I'm close!

 

Hi @Darryl Lee

Thank you so much for this! While following your instructions in addition to the other link you provided, I think I'm on the right track. Within {{body.storage.value}} I see the excerpt values which is what I need. I'm a little lost in what I would provide for the Custom Data portion of the automation rule.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 4, 2023

Hey yeah, you don't need Custom Data unless you are making a web request to update something with the REST API. So you can leave that empty.

I was able to parse this on a page:

Minutes

These are the minutes of the meeting

Not Minutes

This is the next section, which is not minutes.

Using this code:

{{webResponse.body.body.storage.value.match(".*<h1>Minutes</h1>(.+?)<h1>.*")}}

What this means is:

"Search for any character including a non-character before <h1>Minutes</h1>, and then capture any text you see up until you find another <h1>"

So, assuming the minutes of your meeting fall between a Heading 1 of "Minutes" and some other Heading 1 text, that should capture it.

BTW, unless you're using it for something else, I would skip using an Excerpt, because then you get rid of all of the "<ac:structured-macro..." stuff too.

ANYWAYS, assuming the Slack action can accept HTML, you could put the code above into the Message section of the "Send Slack message" action.

Oh dang, Slack probably can't deal with HTML, so you'll end up with paragraph tags (<p>), and possibly other formatting you don't want.

Ugh, stripping HTML can be a pain, but yeah, you could tack on some replaceAll commands, like:

{{webResponse.body.body.storage.value.match(".*<h1>Minutes</h1>(.+?)<h1>.*").replaceAll("</p>","\n").replaceAll("</*.+?>","")}}

That replaces the closing paragraph tag with a newline, and then strips out every other tag. It looks like Slack supports mrkdwn syntax, so if you wanted to get fancy, you could maybe add add these to support some formatting:

.replaceAll("</*strong>","*")

.replaceAll("</*em>","_")

.replaceAll("</*del>","~")

.replaceAll("<li>","- ")

Yeah, that seems to have worked:

{{webResponse.body.body.storage.value.match(".*<h1>Minutes</h1>(.+?)<h1>.*").replaceAll("</p>","\n").replaceAll("</*strong>","*").replaceAll("</*em>","_").replaceAll("</*del>","~").replaceAll("<li>","- ").replaceAll("</*.+?>","")}}

It's gnarly though, and parsing HTML with Regex is dangerous territory. :-}

Patrick December 5, 2023

Thank you so much for this @Darryl Lee ! 

Like Darryl Lee likes this
Daniel Blomqvist
Contributor
December 23, 2024

Hi @Patrick & @Darryl Lee 

I’ve followed this thread and managed to get the HTML in the audit log using the log action and this smart value logic. Using my specific headline.

{{webResponse.body.body.storage.value.match(".*<h1>Minutes</h1>(.+?)<h1>.*")}}

I just want to use the content to populate a text field and I was trying to do it in the Description field and as a comment but it doesn’t work.

When I add the smart value in the description it just says that the issue was successfully edited but nothing is added.

When I add the smart value in the comment the log says that the comment can’t be added empty.

I also tried adding this smart value to a variable and then use the variable smart value but with the same result.

Can you clarify how you get the HTML content populated in a field?

And maybe share an example of you automations.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 23, 2024

Hi @Daniel Blomqvist -

You mention a Description field. So are you trying to capture text from a given Confluence page and include it in the description of a Jira ticket?

Interesting!

First off, in your Web Request to the Confluence API, you should definitely be checking this box:

[x] Delay execution of subsequent rule actions until we've received a response for this web request

Because you definitely don't want to try an use a value from the response before it has been fetched.

To see if you are in fact getting the data you expect, I would recommend using the audit logs to debug:

After your web request, I would try adding audit logs like:

Headline: {{webResponse.body.body.storage.value.match(".*<h1>Minutes</h1>(.+?)<h1>.*")}}

And if that comes up empty, you should examine the full dump of the HTML to see if there's any extra/different characters besides the <h1>Minutes... part that would cause the match to fail. You could log that thusly:

Full HTML: {{webResponse.body.body.storage.value}}

Good luck - let us know how it goes!

Daniel Blomqvist
Contributor
December 23, 2024

Hi @Darryl Lee and thank you for the quick reply.

  1. Yes I have "Delay execution..." ticket
  2. Using the smart value {{webResponse.body.body.storage.value.match(".*<h1>NA/ANZ EXPERIENCE</h1>(.+?)<h1>.*")}} I did manage to get the content in the audit log and added to the issue, but only when it's a short text/table.web request 1.pngweb request 2.png
    1. But when I put the content I need on the Confluence page (a table with 18 rows, action items, links, info boxes (callouts)), it returns empty.web request 4.png
    2. Is there a limit on how much the web response can "GET"?
  3. And when it gets the small test content and adds it to the issue, but it's added as HTML, which doesn't render what the content should look like.web request 3.png
    1. Is there a solution to make it render the same way as it does on the Confluence page?

I'm grateful for your help.
Merry Christmas and happy holidays!

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
December 23, 2024

Ah, I feel like this should be its own question @Daniel Blomqvist !

The problem you're running into is that Automation doesn't allow HTML in the Edit Description field - it's expecting Markdown.

If you really wanted to update the Description with rich content from Confluence (like tables), MAYBE you could use Advanced Edit with JSON to send the content as ADF (Atlassian Doc Format). There's a great example of this here:

"But wait, my content is in HTML," you might be saying!

Yes, but ... if you use the new v2 API's get page by ID endpoint, you can give it the body-format=atlas_doc_format parameter. Now this is a lot more complex than HTML, but in theory you should be able to extract just the table you need, and then use Advanced Edit with JSON to update Description.

But uh, yeah, it's a bit more work, and you'll probably be messing with match quite a bit, so I highly recommend testing with regex101.com, using their Java 8 flavor.

Daniel Blomqvist
Contributor
December 24, 2024

Hi @Darryl Lee 

Thanks again for the answer.

Do you want me to open a new question? (because to me it fits perfectly to the question title "Extracting content from Confluence Page")

With the new v2 API's get page by ID endpoint, and the body-format=atlas_doc_format parameter I do get the entire content that I need but I can't manage to get the match function to return anything at all.

I've been trying for half the day and I tried:

  • Almost a hundred different expressions
  • Using regex101.com 
  • Using ChatGPT to create a regular expression for me
  • Moving the confluence content to an excerpt to get it on a separate page so that I don't need to be that specific in my match and could extract the entire content but also that didn't work because the 

I think maybe this is above my level of expertise and I can't put to much more time on trying to get this to work.

Too bad that Jira doesn't have an automation action "get excerpt from confluence" or similar.

Merry Christmas and Happy Holidays again.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events