It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

Confluence CLI - find pages which contain specific regex

Hello,

in our space there are some special characters which are not depicted properly because we uploaded lots of HTML files via the REST-API and had some issues with encoding. 

Looking at the HTML-Markup, I could see that Confluence saves these characters like that:

&#XXX;

 XXX can be any numbers - like 133 or 128. In the editor the characters look like this: …

 

Now I want to find all pages containing these special characters to fix them manually. I tried to do it with the Confluence CLI as described here. But it doens't work as expected. I guess a reason could be the special characters I use in the regex (.*&#.*;.*):

--action getPageList --space SPACE --regex2 ".*&#.*;.*"

Do you have any idea why it isn't working or any other suggestions to solve my problem?

 

Thank you in advance and best regards,

Nils

2 answers

1 accepted

Hi @Nils_Leger ,

To find the pages with the having the content "&#" and using the regex2 action. You need to know the storage format value of &# and then you need to use that value in the action.

Please see the below action for reference when the content is having &#

--action getPageList --space SU --regex2 ".*&#.*" --debug

Please go through the How to Get Confluence Storage Format page and see the below screenshot of the storage format of a page.
Snag_c50e82a.png

We have opened a support request in our support portal https://bobswift.atlassian.net/servicedesk/customer/portal/1/SUPPORT-3008 and we have made you as a reporter. Please let us know if you have any questions.

Regards,
Kishore Kumar Gangavath.

Hi @Nils_Leger - This regex may work better to find the pages with unwanted html entities: "&#\d{3};" (without the quotes). 
That will find them all.   However, you may only want to find certain html entities.  In that case you may want to try sets of entities. 
For example: To filter for only  or &#22 or &#31, you would use "&#[1|22|31];" (again without the quotes).

To avoid manually fixing the pages you could use the storePage action on the list of pages to replace those entities with a space like so:

--action runFromPageList --space ASPACEKEY --regex2 "&#[1|22|31];" --common "-a storePage --id @pageId@ --findReplaceRegex  \"&#[1|22|31];: \" "

Be sure to test carefully!  Maybe make a test space with copies of some of the problem pages to test on before you do this for real.

Suggest an answer

Log in or Sign up to answer
TAGS
Community showcase
Posted in Confluence

How is your team having fun and bonding, remotely, utilizing Confluence?

Thanks everyone for answering last week’s question. The winner of the random drawing from those who commented is: @LarryBrock I’ll contact you separately with your prize details. This wee...

306 views 9 7
Join discussion

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you