Broken links after XML space import

ColinM
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 7, 2016

 

Hello support,

I have run a successful space import from my PreProd confluence instance to my Production instance.

In general everything is fine, but a few internal links are broken.

  • On the original wiki (exported), PageA is linking to PageB and pageC:
    • Link to pageB is a link with page title
    • Link to pageC is a link with pages/viewpage.action?pageId={id}
  • On the target wiki (imported), PageA is still linking to pageB, but pageC link is now broken.
    • Link to pageB is still using the target page title
    • Link to pageC became a create+edit link with the target title


As storage format, the link from PageA to PageC is a link with page title, not with any hardcoded pageId:

<ac:link><ri:page ri:content-title="Product Discontinuation (ABC)" /><ac:plain-text-link-body><![CDATA[product discontinuation]]></ac:plain-text-link-body></ac:link> 

This storage format is the same in both version of the wiki (exported and imported). The page with that title (PageC) does exist in both wiki too, BUT when reading this page, the URL does not show the page title, but the pageId :

  • Original wiki: pages/viewpage.action?pageId=60065714
  • Imported wiki: pages/viewpage.action?pageId=79495906

It looks like confluence has all information to be able to recover these links (page titles are unique in spaces), how can I recover this case, is there a way to force reindexing or something like this ?

Is it possible to enforce confluence to regenerate page link via title, so that

  • pages/viewpage.action?pageId=79495906

becomes:

  • display/spacekey/Product+Discontinuation+(ABC)

?


I am pretty sure that if I can trigger this kind or "reindexing", the internal links will work back.

UPDATE

In this case, the issue happens with pages that have parenthesis in the name (page title = "Product Discontinuation (ABC)" for instance). I just saw this related KB article:

https://confluence.atlassian.com/confkb/confluence-page-urls-contain-pageid-instead-of-the-page-title-278692715.html


But I don't understand on detail: here '(' and ')' are forbidden characters, but if I update my page title with a random suffix, my link changes from:
/pages/viewpage.action?pageId=79495906
to
/display/spacekey/Product+Discontinuation+%28ABC%29+title

So... it looks like it can be handled in the page URL ??

We are now using Confluence 5.6.3


Thanks and best regards

Colin

 

 

3 answers

1 accepted

0 votes
Answer accepted
ColinM
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 11, 2016

Hello,

I had a deeper look at the DB, and I found the root cause.

All my dead links were having actually more than parenthesis as per my example. They had actually a quote in the original title.

This is the state of the DB in both wiki (exported from, and imported in):

  • exported from: (OK)
  • 60065326 is the page id of "PageA"
  • 60065714 is the page id of "Product Discontinuation (PLM ABC’s)"
  • For 60065326 : LINKS.DESTPAGETITLE = Product Discontinuation (PLM ABC’s)
  • For 60065326 : BODYCONTENT.BODY = <ri:page ri:content-title="Product Discontinuation (PLM ABC&rsquo;s)" />
  • For 60065714 : CONTENT.TITLE = Product Discontinuation (PLM ABC’s)

 

  • imported to: (Fail)
  • 79495974 is the page id of "PageA"
  • 79495906 is the page id of "Product Discontinuation (PLM ABC's)"
  • 79495974 : LINKS.DESTPAGETITLE = Product Discontinuation (PLM ABC’s)
  • 79495974 : BODYCONTENT.BODY = <ri:page ri:content-title="Product Discontinuation (PLM ABC&rsquo;s)" />
  • 79495906 : CONTENT.TITLE = Product Discontinuation (PLM ABC's)

 

In the imported wiki, in DB, the became a ' .

Then all my links based on page title were dead.

 

To fix that, I had to run a SQL query to repair all the page title with ' ; in order to have the back:

UPDATE [db].[CONTENT]
SET TITLE = REPLACE(TITLE, '(PLM ABC''s)', '(PLM ABC’s)')
WHERE SPACEID = {space_id} AND TITLE like '%(PLM ABC''s)%'

 

After that query, all the links were valid.

I think there is an error in Export / Import process somewhere, the special right single quotation mark became a single quote.

 

ColinM
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 11, 2016

The ticket to fix the corresponding issue has been created by Atlassian:
https://jira.atlassian.com/browse/CONF-41354

0 votes
Kay Brown
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 8, 2016

Hi Colin,

You can contact Atlassian Support here:  https://support.atlassian.com/customer/servicedesk-portal

Regards,

Kay

0 votes
ColinM
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 8, 2016

I can see one "painful" solution here:

Write a script (SQL / Java ... ?) that:

  • scan all pages title and replace / remove reserved characters
  • scan all pages content and replace / remove reserved characters inside <ac:link><ri:pageri:content-title attribute value

isn't it a better and faster solution ?

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events