Hello,
in our space there are some special characters which are not depicted properly because we uploaded lots of HTML files via the REST-API and had some issues with encoding.
Looking at the HTML-Markup, I could see that Confluence saves these characters like that:
&#XXX;
XXX can be any numbers - like 133 or 128. In the editor the characters look like this: …
Now I want to find all pages containing these special characters to fix them manually. I tried to do it with the Confluence CLI as described here. But it doens't work as expected. I guess a reason could be the special characters I use in the regex (.*&#.*;.*):
--action getPageList --space SPACE --regex2
".*&#.*;.*"
Do you have any idea why it isn't working or any other suggestions to solve my problem?
Thank you in advance and best regards,
Nils
Hi @Nils Leger ,
To find the pages with the having the content "&#" and using the regex2 action. You need to know the storage format value of &# and then you need to use that value in the action.
Please see the below action for reference when the content is having &#
--action getPageList --space SU --regex2 ".*&#.*" --debug
Please go through the How to Get Confluence Storage Format page and see the below screenshot of the storage format of a page.
We have opened a support request in our support portal https://bobswift.atlassian.net/servicedesk/customer/portal/1/SUPPORT-3008 and we have made you as a reporter. Please let us know if you have any questions.
Regards,
Kishore Kumar Gangavath.
Hi @Nils Leger - This regex may work better to find the pages with unwanted html entities: "&#\d{3};" (without the quotes).
That will find them all. However, you may only want to find certain html entities. In that case you may want to try sets of entities.
For example: To filter for only  or  or , you would use "&#[1|22|31];" (again without the quotes).
To avoid manually fixing the pages you could use the storePage action on the list of pages to replace those entities with a space like so:
--action runFromPageList --space ASPACEKEY --regex2 "&#[1|22|31];" --common "-a storePage --id @pageId@ --findReplaceRegex \"&#[1|22|31];: \" "
Be sure to test carefully! Maybe make a test space with copies of some of the problem pages to test on before you do this for real.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.