Hello!
I am (still) migrating an old wiki to a Confluence system. At first it was just copying and pasting text but not much later i found myself using the CLI tool to change lots of articles at once, beats adding the same things to 800 seperate articles right?
Anyway, i am heavily using the findReplace and findReplaceRegex and it went all well up until a few hours ago when i ran into a little problem which has left me a little confused. Actually, i'm stumped. And out of ideas.
Here's whats up:
I'm trying to generate links using modifyPage and findReplaceRegex. I need to generate links for about 500 attribute descriptions. The strings i need to match, which are the same as the articles i want them to link to, all have the same formatting: starting with a @, all capitals and possibly containing numbers and underscores. For instance, i have an attribute description article called @ASKORDER wich has, in the content, @LOCAL and @INLEVEL as plain text. I need to change these 2 strings into links linking to the articles @LOCAL and @INLEVEL respectively.
I started doing something which is very much alike to a question i posed here, finding strings and wrapping them in <span> tags. I started by creating this regular expression:
(?!@title@)(((@)+([A-Z0-9]*[_]*)+([\s+]?[\n]?)+))
This matches all strings starting with one or more '@', have all capitals and possibly containing numbers, underscores and possible have a space or newline, or none of those 2. The entire thing is wrapped in a if conditional which checks if the matched string is not the title of the article, i don't want to link a string @ASKORDER in the article @ASKORDER to itself :-)
Don't spend time looking at this regex to find the problem, it matches exactly what i want. Next i replace the matches with the found match with <a> tags wrapped around it:
<a href=/display/QUAESTOR/$0>$0</a>
This is then let loose on an attribute description article to see what happens. After completion it seemed it worked like a charm, the attribute names @LOCAL and @INLEVEL where replaced by working links. Seems i just saved myself a lot of time.
But seems like i sold the bearskin before i shot the bear. When i opened one of the affected attribute desrciption articles (@ASKORDER in this example) in the Confluence editor and saved it, the generated links where changed. Now they don't link to the right articles for @LOCAL and @INLEVEL but to the article containing these 2 strings, @ASKORDER. I didn't touch the links, matter of fact i didn't touch anything, i just opened the article in the editor and saved it again. But somehow the links are messed up because of that.
To recap: i have a regex which matches attribute names and using that i am wrapping the matches in <a> tags. That works like a charm but after editing an affected article the generated links are all pointing to the article itself. The solution could be something simple, but i can't think that simple for now with all these scenario's and regexs swimming around in my head :-)
The script i'm using is this:
a modifyPage --space @space@ --title "@title@" --file temp-page-source.txt --content "" --findReplaceRegex "(?!@title@)(((@)+([A-Z0-9]*[_]*)+([\s+]?[\n]?)+)):<a href=/display/QUAESTOR/$0>$0</a>" --noConvert
Calling it with this:
C:\Users\mvhees1\Desktop\confluence-cli-2.6.0>confluence --action runFromPageList --space "QUAESTOR" --title "@ASKORDER" --file "qlb_addLinksToAttributeNames.txt"
Which seems ok, since the articles are changed. But here it is anyway. This results in a link like this:
<a class="confluence-link" href="/display/QUAESTOR/@LOCAL" rel="nofollow">@LOCAL</a>
But after opening in editor and saving it is changed to this:
<a href="/display/QUAESTOR/@ASKORDER">@LOCAL</a>
It feels as if the link is wrongly intepretated after the article is resaved, a macro not being able to execute or something in that sense. I can't really put a finger on it.
Any idea what may be going wrong here? I sure hope it's not a Confluence bug or an unfixable problem, would mean i have to manually add links to about 500 articles, not my favorite kind of work. Anyway, any advice, feedback or solutions are greatly appreciated!!
Regards,
Maarten van Hees
ATTEMPTED SOLUTIONS:
1. Confluence Storage Format solution suggested by Joseph Clark
I changed the expression to this:
-a modifyPage --space "QUAESTOR" --title "@ASKORDER" --content "" --findReplaceRegex "(?!@title@)(((@)+([A-Z0-9]*[_]*)+([\s+]?[\n]?)+)):<ac:link><ri:page ri:content-title="$0" /><ac:link-body><![CDATA[$0]]></ac:link-body></ac:link>" --noConvert
in order to use the Confluence Storage Format instead of the formatting i used. At first nothing at all seemed to happen, the strings to match didn't change, but when stopped using --noConvert something DID happen, the string to match was changed to this (using "@LOCAL"):
<ac:link><ri:page ri:content-title="@LOCAL" /><ac:link-body><![CDATA@LOCAL]></ac:link-body></ac:link>
Which does look promising indeed. But again, it would seem that the inputted link is not intepreted right, because when i use --noConvert it comes out unchanged. Notice the missing left block bracket at CDATA, is that of importance? It is in the replacement string, but seems to be getting filtered out.
Also something that looks like an anomaly to me, the @LOCAL next to CDATA IS converted into a link. Check this screenshot and notice the underlined @LOCAL which is now a link, linking to the article containing this string (not @LOCAL but @ASKORDER):
It also looks to be an anchor link, as it has a # trailing the link. Not sure if it is of importance, but might as well mention it. By the way, i changed all ':' in the XML formatting to its ASCII character because of the ':' used in the findReplaceRegex syntax.
Thanks so far, i feel like i'm making progress but i'm not quite there yet :) any more idea's ?
Community moderators have prevented the ability to post new answers.
Hello Bob, I have been using getPageSource to check what i have to change else it would be difficult form a regex (can't account for hidden paragraph tags and such if i cant see them) and i have been trying to copy the link markup from that content but not dice... I've also been using the regex testing tool you suggested in another question, used it to test the regex against strings from getPageSource. Thanks for the headsup on the noConvert bug, but i see it has been fixed now? About attaching a before and after code segment, check the third and fourth code segment in my question, it's before editing the page (after using CLI to add the links) and after editing the page. Anyway, i'm first going to try my hand at Josephs markup, it definitely points to something i have missed :)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Bob, i've updated the question with a attempted solutions bit. Maybe the --noConvert bug you described is the culprit here?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I would recommend you install the fix (2.7.0-snapshot) at least to remove doubt.
I would also say, you need to make sure the regex produces exactly the correct storage format from the input storage format using the regex tool before trying it with the command. Then check the to make sure the command does the same thing by again matching the output storage format.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Maarten,
I think your problem is that, when using the Remote API and the CLI, you are not working with HTML. You are working with Confluence Storage Format. Confluence Storage Format is based on XHTML with some custom xml namespaces.
The Confluence editor has a built-in sanitising process that strips away blacklisted markup from Confluence pages whenever a page is saved. This includes stripping out custom HTML that does not match the storage format definition.
When you update the page to create the link, you need to create the link in storage format instead of XML. A link to another page in the same space would look something like this:
<ac:link> <ri:page ri:content-title="@LOCAL" /> <ac:plain-text-link-body> <![CDATA[This is the anchor text]]> </ac:plain-text-link-body> </ax:link>
You can view a more complete description of our storage format here - https://confluence.atlassian.com/display/DOC/Confluence+Storage+Format
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks Joseph, this looks like a possible sollution for me, i'll check it out and let you know what i find.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Joseph, check the attempted solution bit i added to the question above and tell me what you think :)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.