Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Next challenges

Recent achievements

  • Global
  • Personal

Recognition

  • Give kudos
  • Received
  • Given

Leaderboard

  • Global

Trophy case

Kudos (beta program)

Kudos logo

You've been invited into the Kudos (beta program) private group. Chat with others in the program, or give feedback to Atlassian.

View group

It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

Confluence CLI - find pages which contain specific regex

Hello,

in our space there are some special characters which are not depicted properly because we uploaded lots of HTML files via the REST-API and had some issues with encoding. 

Looking at the HTML-Markup, I could see that Confluence saves these characters like that:

&#XXX;

 XXX can be any numbers - like 133 or 128. In the editor the characters look like this: …

 

Now I want to find all pages containing these special characters to fix them manually. I tried to do it with the Confluence CLI as described here. But it doens't work as expected. I guess a reason could be the special characters I use in the regex (.*&#.*;.*):

--action getPageList --space SPACE --regex2 ".*&#.*;.*"

Do you have any idea why it isn't working or any other suggestions to solve my problem?

 

Thank you in advance and best regards,

Nils

2 answers

1 accepted

1 vote
Answer accepted
Deleted user Sep 20, 2019

Hi @Nils Leger ,

To find the pages with the having the content "&#" and using the regex2 action. You need to know the storage format value of &# and then you need to use that value in the action.

Please see the below action for reference when the content is having &#

--action getPageList --space SU --regex2 ".*&#.*" --debug

Please go through the How to Get Confluence Storage Format page and see the below screenshot of the storage format of a page.
Snag_c50e82a.png

We have opened a support request in our support portal https://bobswift.atlassian.net/servicedesk/customer/portal/1/SUPPORT-3008 and we have made you as a reporter. Please let us know if you have any questions.

Regards,
Kishore Kumar Gangavath.

Hi @Nils Leger - This regex may work better to find the pages with unwanted html entities: "&#\d{3};" (without the quotes). 
That will find them all.   However, you may only want to find certain html entities.  In that case you may want to try sets of entities. 
For example: To filter for only  or &#22 or &#31, you would use "&#[1|22|31];" (again without the quotes).

To avoid manually fixing the pages you could use the storePage action on the list of pages to replace those entities with a space like so:

--action runFromPageList --space ASPACEKEY --regex2 "&#[1|22|31];" --common "-a storePage --id @pageId@ --findReplaceRegex  \"&#[1|22|31];: \" "

Be sure to test carefully!  Maybe make a test space with copies of some of the problem pages to test on before you do this for real.

Suggest an answer

Log in or Sign up to answer
TAGS
Community showcase
Posted in Confluence

What do you think is the most *delightful* Confluence feature? Comment for a prize!

- Create your own custom emoji 🔥 - "Shake for Feedback" on mobile 📱 - An endless supply of GIFs via GIPHY 🤩 Is there anything quite as nice as a pleasant surprise? Comment below with what...

402 views 23 8
Join discussion

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you