Community
Products
Confluence
Questions
Confluence CLI - find pages which contain specific regex

Confluence CLI - find pages which contain specific regex

Hello,

in our space there are some special characters which are not depicted properly because we uploaded lots of HTML files via the REST-API and had some issues with encoding.

Looking at the HTML-Markup, I could see that Confluence saves these characters like that:

&#XXX;

XXX can be any numbers - like 133 or 128. In the editor the characters look like this: …

Now I want to find all pages containing these special characters to fix them manually. I tried to do it with the Confluence CLI as described here. But it doens't work as expected. I guess a reason could be the special characters I use in the regex (.*&#.*;.*):

--action getPageList --space SPACE --regex2 ".*&#.*;.*"

Do you have any idea why it isn't working or any other suggestions to solve my problem?

Thank you in advance and best regards,

Nils

2 answers

1 accepted

1 vote

Answer accepted

Hi @Nils Leger ,

To find the pages with the having the content "&#" and using the regex2 action. You need to know the storage format value of &# and then you need to use that value in the action.

Please see the below action for reference when the content is having &#

--action getPageList --space SU --regex2 ".*&amp;#.*" --debug

Please go through the How to Get Confluence Storage Format page and see the below screenshot of the storage format of a page.

We have opened a support request in our support portal https://bobswift.atlassian.net/servicedesk/customer/portal/1/SUPPORT-3008 and we have made you as a reporter. Please let us know if you have any questions.

Regards,
Kishore Kumar Gangavath.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi @Nils Leger - This regex may work better to find the pages with unwanted html entities: "&#\d{3};" (without the quotes).
That will find them all. However, you may only want to find certain html entities. In that case you may want to try sets of entities.
For example: To filter for only  or &#22 or &#31, you would use "&#[1|22|31];" (again without the quotes).

To avoid manually fixing the pages you could use the storePage action on the list of pages to replace those entities with a space like so:

--action runFromPageList --space ASPACEKEY --regex2 "&#[1|22|31];" --common "-a storePage --id @pageId@ --findReplaceRegex \"&#[1|22|31];: \" "

Be sure to test carefully! Maybe make a test space with copies of some of the problem pages to test on before you do this for real.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Confluence CLI - find pages which contain specific regex

2 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events