Don't Let Your Users End up in a Dead End!

 

redirect-dead-end-vs-detour-horizontal.png

(Image Credit: https://www.volacci.com/blog/drupal-module-redirect)

[Big thanks to my old colleague @Ed Bukoski from Netflix, who did pioneering work in developing a redirector they used for their successful migration to Confluence Cloud.]

Picture it: you're almost at the finish line. After weeks of planning, testing, and more re-testing, you've figured out how to migrate your on-prem Confluence to the Cloud. All of the pesky apps are set to migrate properly, and you are ready.

But then your VP asks, "Hey Darryl, what's going to happen to old Confluence page links in emails or Word docs? Those are still going to work, right?" 

You break into a cold sweat, thinking about those potentially broken links. They're everywhere: emails, docs, spreadsheets, presentations. And oh, what about integrated tools like Testrail, Salesforce, Zendesk? Internal apps? Oh no….

"Darryl? Hey Darryl, are you ok? You zoned out for a minute."

You reply, "Ah yeah, no, those will keep working. We're going to put in a redirector."

What's a redirector?

A redirector is when you configure your old site (or spin up a much smaller instance) to have something like NGINX or Apache accept requests for the old URL, and then redirect them to the new URL.

When you migrate from on-prem to Confluence Cloud, the URL of your site will change from something like

https://confluence.YOURCOMPANY.com to 

https://YOURCOMPANY.atlassian.net/wiki

UPDATE: Per MIG-507 , you can now get a mapping file directly from Atlassian Support (post-migration) that contains a map of on-prem Confluence page IDs/short URLs -> corresponding URLs in Cloud. So you can skip ahead to "Putting the mapping table into use with Apache RewriteMap"

"Magic Redirector™"

Way back in 2019, @Ramona Scripcaru from Atlassian mentioned an undocumented feature of Confluence Cloud  that I am calling the "Confluence Page Title Magic Redirector™", or "Magic Redirector™" for short.

On Confluence Cloud every page URL includes the page Id, one that does NOT match what you had on-prem:

https://YOURCOMPANY.atlassian.net/wiki/spaces/INFO/pages/8721/Simple+Page+Title

But Ramona revealed that you can actually use a special URL that will get automatically redirected to that URL. Notably, it is a URL that does not contain a page Id:

https://YOURCOMPANY.atlassian.net/wiki/display/SPACEKEY/Page+Title

If you look closely, you'll see the format matches the same pattern as a non-pageId URL on Server/DC:

https://confluence.YOURCOMPANY.com/display/SPACEKEY/Page+Title

So then, it seems like all we need to do is ask NGINX or Apache point at the new host, with an extra /wiki/ added into the URL. Here's what those rules would look like:

NGINX

return 301 $scheme://YOURCOMPANY.atlassian.net/wiki$request_uri;

Apache

RedirectMatch 301 "^(.*)$" "https://YOURCOMPANY.atlassian.net/wiki$1"

But wait, you're not done.

You have to consider Page Ids

On Confluence Server/DC if a page title has certain punctuation marks in it (%&?/\;"§+:), then the page's URL looks like this, with a page Id:

https://confluence.YOURCOMPANY.com/pages/viewpage.action?pageId=123456

Otherwise, it looks like this, with the title right there in the URL:

https://confluence.YOURCOMPANY.com/display/INFO/Simple+Page+Title

This is all explained in Atlassian's documentation:

The upshot is that if you have page titles with those characters, the URL for that page does not include the SPACEKEY and Page Title that the "Magic Redirector™" could use to then send users to the correct page. URLs for these pages just have a pageId.

Lookup page Ids

So what Ed figured out is that you can create a lookup table (using data extracted from SQL) to map a pageId to its SPACEKEY and Page Title, and then "Magic Redirector™" can do its thing.

Now Ed has amazing developers to write their own lookup tool+redirector in Java.

Mere mortals like me have to rely on existing tools, and luckily there is a handy Apache feature called RewriteMap that works perfectly for this.

Making a mapping table

So then, let's get into it. If you want to make a mapping table, you'll need the space keys, page Ids and titles from your Confluence Server/DC instance.

SQL script to get data

The fastest way to get this data is by connecting to the database directly. Here's a MySQL command that lets you get it or a specific space (great for testing):

SELECT CONTENTID, SPACEKEY, TITLE FROM CONTENT JOIN SPACES S on CONTENT.SPACEID = S.SPACEID WHERE CONTENTTYPE = 'PAGE' AND PREVVER IS NULL AND CONTENT_STATUS = 'current' AND S.SPACEKEY='INFO';

If you pipe that to a file, you should get something like this:

CONTENTID       SPACEKEY     TITLE
172376419       INFO         Slusho
176902673       INFO         Do Androids Dream of Electric Sheep?
176903761       INFO         Relocation Policy/Process
176903794       INFO         Story of Your Life
186251233       INFO         "Imagine If" Discovery Sessions
199307491       INFO         Talking Points
199308234       INFO         Eats; Shoots; Leaves
209349090       INFO         This + That

 So you need to turn that into a mapping table that looks like this:

176902673 /wiki/display/INFO/Do%2BAndroids%2BDream%2Bof%2BElectric%2BSheep%3F
199308234 /wiki/display/INFO/Eats%3B%2BShoots%3B%2BLeaves

PageId Mapping table script

I wrote a small Perl script to create the mapping table. To make the table smaller (which should make it faster), I omit pages whose titles do not have any punctuation marks, because their URLs should not require a lookup. (No page Id.) That's why the list gets shorter.

Script: https://github.com/darryllee/confredir/blob/main/pageidmap.pl

"Magic Redirector™" limitations and workaround

Oh, but there's one other hitch. The "Magic Redirector™" still doesn't like certain characters. In testing, I found that Confluence Cloud still does not like the following character in titles: & / + %

So a URL like this will NOT get properly redirected:

https://YOURCOMPANY.atlassian.net/wiki/display/INFO/Page+With+This&That+or+This/That

While one option would be go back to your Server/DC instance and rewrite all those titles to remove or replace the offending characters, I found another workaround - redirecting to a Search page with the title already entered. The URL (with encoding) ends up looking like this:
https://YOURCOMPANY.atlassian.net/wiki/search?text=Page%20With%20This%20%26%20That%20or%20This%2FThat

And so the user ends up on a page like this, which yes, requires a second click, but hopefully lands them on the right page:

Screenshot 2023-10-17 at 4.49.55 PM.png

So my Perl script also looks for & / + % in titles, and if found, redirects users to the Search page for that title. So they look like this:

176903761 /wiki/search?text=Relocation%20Policy%2FProcess
209349090 /wiki/search?text=This%20%2B%20That

Putting the mapping table into use with Apache RewriteMap

Here's the Apache rule that does the page Id to title mapping, using our table:

RewriteMap pageids "txt:/etc/httpd/pageids.txt"
RewriteCond %{REQUEST_URI} "^/pages/viewpage.action$"
RewriteCond %{QUERY_STRING} pageId=(.*)
RewriteRule . ${cloudhost}${pageids:%1|/wiki/spaces/TEST/pages/44073006/Migrated+Confluence+Page+Not+Found}? [L,R=301,NE]

RewriteMap defines your pageids lookup table and its location.

The first RewriteCond makes sure the original URL is actually looking trying to view a page by Id.

The second RewriteCond checks that there is a pageId specified, and stores it

The RewriteRule (which should all be on one line) replaces the original request with what is found when searching the lookup table. I created an alternative page, "Migrated Confluence Page Not Found", to catch any pages that may have been missed.

  • L = Last rule, don't process any other ones after this one.
  • R=301 means redirect this link permanently
  • NE = No Escape - we are already encoding special characters in the lookup table, so we should not double-escape them.

Tiny Links

One last fun bit. Confluence has a thing called Tiny Links. The allow you to have "shorter" URLs that can be shared, that look like this:

https://confluence.YOURCOMPANY.com/x/Y0FGCg

Unfortunately, these links also change when you migrate from Server/DC to Cloud, so if you want old links to automatically redirect… you guessed it, you will need a redirector.

We basically use the same data as above, but we generate a mapping file for every page, because Tiny Links could be used for potentially any page. But how to generate Tiny Links? Well, Atlassian has a script for that: How to programmatically generate the tiny link of a Confluence page 

I adapted their script to read from the MySQL data, and then convert it using Atlassian's algorithm to produce a table like this (accounting for the characters that that "Magic Redirector™" cannot handle):

Y0FGCg /wiki/display/INFO/Slusho
EVKLCg /wiki/display/INFO/Do%2BAndroids%2BDream%2Bof%2BElectric%2BSheep%3F
UVaLCg /wiki/search?text=Relocation%20Policy%2FProcess
claLCg /wiki/display/INFO/Story%2Bof%2BYour%2BLife
4fcZCw /wiki/display/INFO/%22Imagine%2BIf%22%2BDiscovery%2BSessions
4zDhCw /wiki/display/INFO/Talking%2BPoints
yjPhCw /wiki/display/INFO/Eats%3B%2BShoots%3B%2BLeaves
4ml6DA /wiki/search?text=This%20%2B%20That

Script: https://github.com/darryllee/confredir/blob/main/tinylinkmap.pl

The Future

Well, last week, this bug was resolved:

Comment from 02/Oct/2023:
"URL mappings before and after migration to cloud, including pageId changes can now be requested via Atlassian support as part of post migraiton link fixing: Links broken after Server to Cloud or Cloud to Cloud migration | Confluence | Atlassian Documentation"

Well that's pretty cool. So you could write a rewrite rule that maps every old link to the exact new link, based on a mapping table you would construct from the CSV data from Atlassian support. The mappings would just be a lot more accurate and you would not have to worry about the undocumented and tricky nature of the Magic Redirector™.

What about Jira?

On Jira, issue keys stay the same, so the rules would be much simpler (especially since Jira Cloud does not require anything after the hostname like Confluence Cloud's /wiki/)

NGINX

return 301 $scheme://YOURCOMPANY.atlassian.net$request_uri;

Apache

RedirectMatch 301 "^(.*)$" "https://YOURCOMPANY.atlassian.net/$1"

These rules would automatically redirect somebody going to an old Jira Server/DC link:

https://jira.YOURCOMPANY.com/browse/BUG-123

To the shiny new cloud site:

https://YOURCOMPANY.atlassian.net/browse/BUG-123

Nice, right? But wait, will this work for links to filters? What about boards?

Welp, all of the IDs for filters and boards change during migration, so old links like https://jira.YOURCOMPANY.com/secure/RapidBoard.jspa?rapidView=1157 will NOT work when translated to https://YOURCOMPANY.atlassian.net/secure/RapidBoard.jspa?rapidView=1157

Because the board that had an id of 1157 on Server/DC may now have an id of 38 on Cloud. Bummer.

Now it turns out that filter names are unique, so post-migration you could use the API to dump all the filter names and new IDs from the Cloud and use that to construct a mapping file, and maybe you can do this with boards too (although board names are NOT unique).

Reference:

Previous Discussions

Documentation on Confluence URL formats/Tiny Links

 

27 comments

Comment

Log in or Sign up to comment
Dave Liao
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 17, 2023

@Darryl Lee - this is glorious!

p.s. If only we had a way to handle redirects from a Cloud to an on-prem migration? 🫠

p.s.s. I'll say, this article brings back PTSD of a time when we migrated from one software platform to another, many moons ago... but it's a good PTSD?

Like Steffen Opel _Utoolity_ likes this
Robert Wen_Cprime_
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 17, 2023

THis is amazing!

Like Dave Liao likes this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 17, 2023

So I really wanted to do this with NGINX, since years ago I switched to it from Apache for reverse proxying Jira/Confluence.

But when I came to write this I couldn't wrap my head around their map directive. Luckily, Google found this: URL mapping with Nginx.

So I think we need:

map $request_uri $old_id {
  "~^/pages/viewpage.action?pageId=([0-9]+) $1;
}

map $old_id $new_path {
  include /etc/nginx/snippets/pageid.conf;
}

And rewritemap.conf would look very similar to our existing mapping file, with the addition of a trailing semicolon at the end of each line:

176903761 /wiki/search?text=Relocation%20Policy%2FProcess;
209349090 /wiki/search?text=This%20%2B%20That;

 And then finally we have this in the server block:

server {
if ($new_path) {
    return 301 https://YOURCOMPANY.atlassian.net$new_path;
  }
}

The article has some important details about tuning map_hash_max_size and map_hash_bucket_size settings so it runs as optimally as possible.

I would love if there are NGINX experts out there who could weigh-in on this. Or maybe I'll actually try testing it soon. :-}

Like # people like this
Matt Doar
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 27, 2023

That's really useful to have all this documented, thank you!

However I'm not sure that filter names are unique except per user. Should work if you add the userid though.

Like Darryl Lee likes this
Dave Liao
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 27, 2023

@Matt Doar - that's true, filter names aren't unique!

I think Darryl meant filter IDs are unique for each filter, in each environment. That'll allow for a mapping table to be made for any redirector.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
October 27, 2023

Oof, yeah @Matt Doar is 100% right. I must've tested trying to create another filter of MY OWN with the same name. Doh.

So yeah, you'd need to generate a mapping file by ALSO looking at Owners, and OH FUN, GDPR means you'll have to map user-readable usernames to hashed user IDs.

So I found the Cloud Endpoint to get all filters:

https://YOURSITE.atlassian.net/rest/api/3/filter/search?expand=owner

But weirdly (and ANNOYINGLY) there doesn't seem to be an equivalent call for ON-PREM? This seems confirmed by Ian Ragudo back in 2019: https://community.developer.atlassian.com/t/get-rest-api-3-filter-search/29459/4

And ugh, 4 years later, there still does not seem to be a method to search/dump all filters:

https://docs.atlassian.com/software/jira/docs/api/REST/9.11.0/#api/2/filter

SO uh, I guess you could dump them via SQL. Awesome.

SELECT * from searchrequest

(Dump everything? Sure, why not!)

Oof, and uh, to get that usermapping, welp, if you're LUCKY your OLD usernames match say, the first part of email addresses, so darryllee@MYWORK.com -> darryllee, and so you could take your exported users file and do a bit of Excel stripping and make yourself a little lookup table to match on-prem darryllee that to the fun 123456:55fffff50-f179-4f17-f747-ff1484246f4f Cloud User Id.

BUT IF your old usernames don't match email (like if darryllee@MYWORK.com -> dlee) then UGH, I guess you then need to do a cross-check on Email address to map dlee to 123456:55fffff50-f179-4f17-f747-ff1484246f4f.

I leave that as an exercise for the reader. :-}

Like # people like this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
November 3, 2023

@Dave Liao redirecting from Cloud to On-Prem would be amazing, but it would likely have to be a service Atlassian owns/manages since they own the *.atlassian.net.

HOWEVER, with custom domains now a reality, I suppose you could extract all of your Cloud URLs from a backup file, OR hit the API. The tricky part would be figuring out how to map them to On-Prem URLs. I wonder if import logs contain that data.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
November 4, 2023

So on a old, but related post, regarding using mapping data that you can request from Atlassian, @eric_gagnon_banq asks,

what could guide decision on using one or the other approach?

Maybe it's a late solution for those that have completed the migration already? Or the mapping file doesnt provide solution for tinylink?

Someone else is working on the migration, I'll try to see if I can get a sample of the mapping file generated by migration tool and see if I can figure this out.

Based on a comment from @Daniel Serkowski there are two ways to get mapping data. 

1. If you finished the links migration process (post-migration task) recently, our support team can assist you in exporting audit logs in CSV format (as this feature is not yet public). The exported CSV files will contain all changes made from old to new links. Here's a small sample of such a file: 

...

However, please be aware of the following limitations:

  • We can only store this data for 29 days.
  • Only links that have been migrated will be included.

I don't think that data is what we want, because the links migration tool does NOT cover links outside of Jira/Confluence: emails, docs, spreadsheets, presentations, Testrail, Salesforce, Zendesk, Internal apps.

Instead I think you would want to request this from support:

if it does not meet your requirements (for instance, if the link migration process was carried out a while ago), the support team can offer an alternative method of exporting mappings (focused on pageId/tinyurl values). The method is based on database queries, so will require extra consent from you, etc. The format for this is as follows:

 

"server_page_id","cloud_page_id","server_page_type","cloud_page_type","server_page_title","cloud_page_title","server_space_key","cloud_space_key","server_url","cloud_url","tiny_server_url","tiny_cloud_url"

[Example mapping file does include tiny links!]

2. Currently, we don't have a specific workflow for this. However, to expedite the process, you could refer to this ticket (https://jira.atlassian.com/browse/MIG-507) or link my comment there in the description of your support ticket.

SO yeah, I think it might be better to get a full mapping table from Atlassian. I'm actually really excited about this now, because at my day job we FINALLY got approved to go forward with a migration, and I'm going to get to try this for real

In answer to your question, I would use the table from Atlassian, because it will not involve any guessing. My mapping file will now have an exact ID to redirect people to, so the mapping file link to that instead of page title and MOST IMPORTANTLY you will not have to fuss with special characters and relying on the "Magic Redirector™". Instead you would just map old PageID to new PageID (with spacekey, which is also in the file from Atlassian):

176902673 /wiki/spaces/INFO/pages/835485712  
Like eric_gagnon_banq likes this
eric_gagnon_banq
Contributor
November 6, 2023

Thanks for the answer about my question regarding the use of your solution vs what's offered by Atlassian.

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
November 6, 2023

One last clarification @eric_gagnon_banq - Atlassian doesn't provide a "solution", per se.

You're still going to be responsible for setting up the redirection service, using Apache or NGINX, etc.

What Atlassian can now provide is a CSV export of OLD and NEW page IDs and Tiny Links, which should make it possible to create a much more accurate mapping table than what I cobbled together.

But unfortunately it's still a bit of work. 

If Atlassian happens to be listening, what would be AMAZING is if they created an "upgrade" that replaced your on-prem Confluence with a redirector where you just need to upload the CSV you'd get from Support. It would listen the same port Confluence was on (8090) and serve up redirects based on the data in the CSV.

Like # people like this
eric_gagnon_banq
Contributor
November 7, 2023

"You're still going to be responsible for setting up the redirection service, using Apache or NGINX, etc."

Yes sure, I already understood that part.

Adrien RESTAUT December 15, 2023

Hello @Darryl Lee 

So I read your message from the 17th of October related to nginx, and I made it work.
First, let me thank you for creating that page which is full of technical and useful infos.

Second, I'm no nginx expert, I spent several days on making the few lines in nginx.conf work...

Prepare the included pages

  • For pageIds, you need to format the CSV (use Notepad++) as follows:
33839684 688135;
43811752 688141;
  • For tinyURLs, like this:

~HoqcAg F4AK;
~I4qcAg GYAK;
~i4qcAg HYAK;
~noqcAg IYAK;

 --> I had to put ~ at the beginning of every lines because nginx map is not case-sensitive, and you see above that 2 lines are the same but a capital letter...

Use nginx map twice in http block

Mind that our Confluence Server was located in /Confluence/ behind our webserver.

  • For pageIds:
http {
    map_hash_max_size 8192;
    map $request_uri $old_id {
    ~^/Confluence/pages/viewpage.action\?pageId=([0-9]+) $1;
    default $request_uri;
  }
    map $old_id $new_id {
    include /etc/nginx/snippets/rewritepageids.conf;
    }

--> The biggest difficutly I had is that you need to escape the ? above with a "\?".

  • For tinyURLs:
    map $uri $old_tinyid {
   ~^/Confluence/x/(.+) $1;
    default $uri;
  }
    map $old_tinyid $new_tinyid {
    include /etc/nginx/snippets/rewritetinyurls.conf;
    }

Return conditionnally new URLs in location block

location /Confluence/ {
            if ($new_id) {
                return 301 https://customer.atlassian.net/wiki/pages/viewpage.action?pageId=$new_id;
            }
            if ($new_tinyid) {
                return 301 https://customer.atlassian.net/wiki/x/$new_tinyid;
            }
            rewrite ^/Confluence(.+)$ https://customer.atlassian.net/wiki$1 permanent;
}
I hope it works for you.
Like Darryl Lee likes this
Dam
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 4, 2024

really nice one ;) 

Like Adrien RESTAUT likes this
eric_gagnon_banq
Contributor
May 16, 2024

To share. We have completed the migration a while ago. It was not fun but it's done.

For redirection, I think the real pain will be in few years when admin will realise that they are not even the owner of urls anymore (no custom domain) and redirect will be then be impossible.

The move the new editor was (still is) painful (it's not even possible to get what page are not converting automatically using the api when using automatic convert (only a preview)).

We decided to keep our server available for a bit more time in a pseudo read-only mode so that editor can go back and compare with orignal content.

So I currently display a banner on our server instance using the mapping data received from Atlassian (there was just error with blog post that was minor to us, the data was otherwise usable and usefull)

Something like this:

"Wiki xyz a été migré à Confluence infonuagique, mettez à jour vos signets! https://xyz.atlassian.net/wiki/spaces/somespace/pages/3333335

Site archivé et disponible en mode consultation seulement."

From the mapping data excel file and using a small python script I have generated partitionned json files so as not to load all the mapping data at each page load, biggest partionned map is 34 ko). I have two type of maps, one by page id and the other space / page title.

I have embedded a little bit of javascript in the page (wich was possible with Confluence server) to parse the current url, detect the pattern, fetch the mapping json file using the simple partition scheme, get the new target url and display it in the banner.

When we will close the server instance, we will have the option to go back to apache rewrite solution or just move the javascript to a simple standalone page (html file and folder with a few mapping json files) and provide a similar experience or automatically navigate to the new url . (Tinylink are currrently resolved by confluence so I will need to handle that too later)

Like # people like this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 16, 2024

@eric_gagnon_banq Your banner solution sounds great. Would love if you could share your Javascript. 

Oof yeah, we are finally migrating this weekend and I'm definitely worried about conversion issues. It would be _sweet_ if there was some way to embed a link to the "original" from Cloud. (We are also keeping around a read-only version of our DC version for reference.)

Maybe it's time to learn Forge!

Like # people like this
eric_gagnon_banq
Contributor
May 17, 2024

It's somewhat of a big hack and tailored to our context (low traffic / intranet only) but if this can be usefull to someone else here are the main parts and a few notes.

How it looks:

exemple_front_end_redirect.png

A few notes:

- Transition / temporary solution. We will close our Confluence server.

- This is for browser user experience. Bot traffic management is ignored.

- We have about hundred spaces and about 9k pages. Partitioning strategy would probably need to be revised for bigger site.

- I dont remember if I output the duplicated entries of blog but the javascript redirection actually redirect to a search with the title and type blog.

- I have leveraged chatgpt to get this done quickly, some parts are a little bit obtuse.

- There could be bugs but it's not mission critical, only to ease the transition for our users (including me).

- Tiny link are not resolved by this code. They are handled by Confluence server that then redirect to the page id  (on confluence server). The url of that page is then displayed in the banner like the basic behavior. For a a complete redirect without leveraging confluence server itself, there would be other things to do (just like Apache / Nginx solutions).

- So to create to implement the same mecanism without confluence server (or just turned off), there would just be a need to create a basic html page with the javascript (without read-only mode patch) and redirect the traffic to that page (ex. to /wiki), adjust a few things for the message display or automatic redirection (with or without a timer). Tiny link would need be handled in such scenario.

- In javascript: "domain-of-confluence-server/redirection" could have been named more clearly, it's not a redirection endpoint, only base of redirection mapping json files.

Generation of the mapping files:

import json
import csv
import os

def create_directory(path):
"""Creates a directory if it doesn't exist."""
if not os.path.exists(path):
os.makedirs(path)

def simple_hash(key, num_buckets):
"""Simple hash function."""
return sum(ord(char) for char in str(key)) % num_buckets

def write_json_file(data, directory, filename):
"""Writes data to a JSON file in the specified directory."""
with open(os.path.join(directory, filename), 'w', encoding='utf-8') as file:
json.dump(data, file, separators=(',', ':'), ensure_ascii=False)

file_path = 'mapping.csv'
output_directory = 'output_json'
space_partition_subfolder = 'space_partitions'
page_id_partition_subfolder = 'page_id_partitions'
num_partitions = 10

# Create directories
create_directory(output_directory)
create_directory(os.path.join(output_directory, space_partition_subfolder))
create_directory(os.path.join(output_directory, page_id_partition_subfolder))

# Initialize structures
partitioned_maps = [{} for _ in range(num_partitions)]
space_key_title_map = {}

# Read CSV and partition data
with open(file_path, mode='r', encoding='utf-8') as file:
csv_reader = csv.DictReader(file, delimiter=',', quotechar='"')
for row in csv_reader:
page_id = int(row['server_page_id'])
partition_number = simple_hash(page_id, num_partitions)
space_key = row['server_space_key']

partitioned_maps[partition_number][page_id] = [space_key, int(row['cloud_page_id'])]
server_page_title = row['server_page_title']
cloud_page_id = int(row['cloud_page_id'])

if space_key not in space_key_title_map:
space_key_title_map[space_key] = {}
if server_page_title in space_key_title_map[space_key]:
space_key_title_map[space_key][server_page_title].append(cloud_page_id)
print(f"Duplicate found in '{space_key}' for title '{server_page_title}': {space_key_title_map[space_key][server_page_title]}")
else:
space_key_title_map[space_key][server_page_title] = [cloud_page_id]

# Write output files
for i, partition_map in enumerate(partitioned_maps):
filename = f'page_id_map_partition_{i}.json'
write_json_file(partition_map, os.path.join(output_directory, page_id_partition_subfolder), filename)

for space_key, title_map in space_key_title_map.items():
filename = f'{space_key}_bytitle_map.json'
write_json_file(title_map, os.path.join(output_directory, space_partition_subfolder), filename)

Base template of the message injected in html (begin of body):

(I have translated roughly)

<div class="aui-message aui-message-warning closeable">
<p><strong>Wiki have been moved to Confluence Cloud! </strong><span id="dynamicLinkConfluenceCloud"><a href="https://target-cloud-site.atlassian.net/wiki/spaces">https://target-cloud-site.atlassian.net/wiki/spaces</a></span></p>
Archived site available in read only mode only.
</div>

Script injected in html (end of body) for pseudo read-only mode (a few more case handled that what is suggested here: https://confluence.atlassian.com/confkb/how-to-make-confluence-read-only-311920317.html) :

<script type='text/javascript'> AJS.$('#editPageLink').hide()</script>
<script type='text/javascript'> AJS.$('#quick-create-page-button').hide()</script>
<script type='text/javascript'> AJS.$('#create-page-button').hide()</script>
<script type='text/javascript'> AJS.$('#action-copy-page-link').hide()</script>
<script type='text/javascript'> AJS.$('#action-move-page-dialog-link').hide()</script>
<script type='text/javascript'> AJS.$('#action-remove-content-link').hide()</script>
<script type='text/javascript'> AJS.$('#header .aui-header .aui-button.aui-button-primary.aui-style').hide(</script>

<script type='text/javascript'> AJS.$('.quick-comment-body').hide()</script>
<script type='text/javascript'> AJS.$('.restore-historical-version-trigger').hide()</script>
<script type='text/javascript'> AJS.$('.remove-historical-version-trigger').hide()</script>
<script type='text/javascript'> AJS.$('.page-tree-create-child-page-link').hide()</script>

 

Script that is injected in html of Confluence server (end of body):

<script type="text/javascript">

function updateMessage(targetUrl) {

    var linkElement = document.getElementById('dynamicLinkConfluenceCloud');

    if (linkElement) {

            linkElement.innerHTML = ` <a href="${targetUrl}">${targetUrl}</a>`;

    }

}

function constructConfluenceSearchURL(title, baseUrl) {

const encodedTitle = encodeURIComponent('"' + title + '"');

const searchUrl = `${baseUrl}/search?text=${encodedTitle}&type=blogpost`;

return searchUrl;

}

async function redirectConfluenceURL(currentUrl) {

    

    const wikiBaseUrl = 'https://domain-of-confluence-server/confluence';

    const cloudBaseUrl = 'https://target-cloud-site.atlassian.net/wiki';

// Beware of cross-domain, no issue if server from same site as confluence server.

const redirection_map_files_location = 'https://domain-of-confluence-server/redirection';

    let redirect_data = []; // Default to home if no match

    // Function to fetch partition data

    async function fetchPartitionData(url) {

        const response = await fetch(url);

        return response.json();

    }

    // Remove anchor but keep the information

    const anchor = currentUrl.hash;

    currentUrl.hash = '';

    // Detect URL form

    if (currentUrl.href.startsWith(`${wikiBaseUrl}/display/`)) {

        

// Regex pattern to identify blog post URLs

const blogPostRegex = /\/\d{4}\/\d{2}\/\d{2}\/([^\/]+)$/;

// Title format

        const pathParts = currentUrl.pathname.split('/');

        const spaceKey = pathParts[3];

      // Hack. Dirty exit. 

       if  (pathParts.length == 4) {

           return `${cloudBaseUrl}/spaces/${spaceKey}${anchor}`;

       }

        let title = "";

// Check for blog post pattern using regex and adjust title

        const blogPostMatch = currentUrl.pathname.match(blogPostRegex);

        if (blogPostMatch) {

            title = decodeURIComponent(blogPostMatch[1].replace(/\+/g, ' ')); // Use the captured title

console.error("blog:" + title);

        } else {

title = decodeURIComponent(pathParts[4].replace(/\+/g, ' '));

}

const partitionFileUrl = `${redirection_map_files_location}/space_partitions/${spaceKey}_bytitle_map.json`;

        try {

            const data = await fetchPartitionData(partitionFileUrl);

            if (title in data) {

if (data[title].length == 1) {

  let cloudPageId = data[title][0];

  newUrl = `${cloudBaseUrl}/spaces/${spaceKey}/pages/${cloudPageId}${anchor}`;

  redirect_data.push(newUrl);

   }

   else {

redirect_data.push(constructConfluenceSearchURL(title, cloudBaseUrl));

   }

} else {

                newUrl = `${cloudBaseUrl}/spaces/${spaceKey}`;

redirect_data.push(newUrl);

            }

        } catch (error) {

            console.error('Error fetching partition data:', error);

        }

    } else if (currentUrl.href.includes('viewpage.action?pageId=')) {

        // Page ID format

        const queryParams = new URLSearchParams(currentUrl.search);

        const pageId = queryParams.get('pageId');

        const partitionNumber = simpleHash(pageId, 10); // Assuming 10 partitions

        const partitionFileUrl = `${redirection_map_files_location}/page_id_partitions/page_id_map_partition_${partitionNumber}.json`;

        try {

            const data = await fetchPartitionData(partitionFileUrl);

            if (pageId in data) {

                const cloudPageInfo = data[pageId];

                newUrl = `${cloudBaseUrl}/spaces/${cloudPageInfo[0]}/pages/${cloudPageInfo[1]}${anchor}`;

redirect_data.push(newUrl);

            }

        } catch (error) {

            console.error('Error fetchinga partition data:', error);

        }

    }

if (redirect_data.length == 0) {

redirect_data.push(cloudBaseUrl+"/spaces");

}

    return redirect_data[0];

}

// Simple hash function for page ID partitioning

function simpleHash(key, numBuckets) {

    let sum = 0;

    key.toString().split('').forEach(char => {

        sum += char.charCodeAt(0);

    });

    return sum % numBuckets;

}

const currentUrl = new URL(window.location.href);

function redirectionUpdate() {

  try {

  redirectConfluenceURL(currentUrl).then(newUrl => {

updateMessage(newUrl );

   });

 } catch (e) {

  // Ignore.

 }

}

document.addEventListener('DOMContentLoaded', redirectionUpdate);

</script>

 

 

eric_gagnon_banq
Contributor
May 17, 2024

"It would be _sweet_ if there was some way to embed a link to the "original" from Cloud."

I'm sure our users currently working on migrating page to the new editor would have liked such feature.

Would be possible to make a simple chrome plugin to do just that but Confluence cloud page display is so slow now, there are so many changes before the page is really ready, that it's difficult to coordinate a change to the dom.

I have tried forge one night thinking it could help me solve something and then I realised it was one react app in a iframe by instanciation. I have watched a replacement of a plugin that worked very well on confluence server, it's now slow to the point of being unusable. No wonder...

I would need to read some more but I think that in forge it's impossible to modify the dom out of that app container (trying to change dom of the parent of the iframe hosting the "app"), I dont think the sdk offer any way to change things like top of page of Confluence or inject any global behavior like we where able to do with script in Confluence server.

 

Like # people like this
Dave Liao
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 17, 2024

@eric_gagnon_banq - you rock for sharing, and for the detailed replies. Sent you Kudos! Because who doesn't need more virtual karma! 🙌

Simha Gontmaher
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
July 25, 2024

Hi @Darryl Lee

Thank you so much for sharing this!

 

I am trying to implement the same for apache httpd, Atlassian supplied the files so I have the mapping

But, the rewrite rule will lead to pageId=

Looks like it is not picking up the value from the mapping file

I wonder if this is possible to have a mapping file for page IDs and tiny URLs and create a rewrite rule that will redirect each to the right path

 

This is how it looks like now (I placed the setting under VirtualHost)

<VirtualHost *:443>

ServerName ONPREMISESCONFLUENCE.DOMAIN.com

RewriteEngine On

RewriteMap pageMap "txt:/etc/httpd/conf/rewritepageids.txt"

RewriteRule "^(.*)$" https://CLOUDURL /wiki/pages/viewpage.action?pageId=${pageMap:%1} [R,L]

##HERE SHOULD BE ReqriteRule for tiny URL too

 

The mapping file looks like this:

123 456

tinyURLx tinyUrly

 

What do you think? :) 

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
July 25, 2024

Hi @Simha Gontmaher - your rule does not look like mine:

RewriteMap pageids "txt:/etc/httpd/pageids.txt"
RewriteCond %{REQUEST_URI} "^/pages/viewpage.action$"
RewriteCond %{QUERY_STRING} pageId=(.*)
RewriteRule . ${cloudhost}${pageids:%1|/wiki/spaces/TEST/pages/44073006/Migrated+Confluence+Page+Not+Found}? [L,R=301,NE]

Your rule looks like this:

RewriteMap pageMap "txt:/etc/httpd/conf/rewritepageids.txt"

RewriteRule "^(.*)$" https://CLOUDURL /wiki/pages/viewpage.action?pageId=${pageMap:%1} [R,L]

There's a couple of problems there.

Because your rule does not have any RewriteCond (conditions), it runs on EVERY request. Even URLs that do not have a pageID at all.

This rule should only run on requests that include "^/pages/viewpage.action$" AND have a "?pageId=(SOMENUMBER)". That is what you then use to lookup the new pageID in your lookup table.

So... Apache rewrite rules are... tricky things. This is why I tried to document my rule:

The first RewriteCond makes sure the original URL is actually looking trying to view a page by Id.

The second RewriteCond checks that there is a pageId specified, and stores it

The RewriteRule (which should all be on one line) replaces the original request with what is found when searching the lookup table. I created an alternative page, "Migrated Confluence Page Not Found", to catch any pages that may have been missed.

  • L = Last rule, don't process any other ones after this one.
  • R=301 means redirect this link permanently
  • NE = No Escape - we are already encoding special characters in the lookup table, so we should not double-escape them.

I highly recommend finding someone in your company, or a local consultant who has experience with Apache Rewrite Rules if you don't quite understand what I did. 

Troubleshooting rules is often done by looking at log files in /var/log/apache2

Again, if you don't know what those are, you would definitely want to find somebody with at title like "Unix System Administrator" or "BOFH" to see if they can help you. They might have a beard. It's pretty old tech. :-}

Like # people like this
Dr. Peter Heck September 16, 2024

Hi @Darryl Lee ,

we have received the file described here in MIG-507 from Atlassian Support.
It contains the following columns 
title,spacekey,source_pageid,destination_pageid,source_viewpage_url,destination_viewpage_url,source_tiny_url,destination_tiny_url,source_display_link,destination_display_link

The volume is and approx. 50000 lines.

I have the following questions about your solution:
1. which columns are in your file “txt:/etc/httpd/pageids.txt”?
   Only these 2: source_pageid destination_pageid
2. what is in the variable: ${cloudhost}
3. is this performant with 50000 lines?

We have 3 cases to consider:
1. old: https://one.company.com/pages/viewpage.action?pageId=182766657
   new: https://company.atlassian.net/wiki/pages/viewpage.action?pageId=194215951
2. old: https://one.company.com/x/QczkCg
   new: https://company.atlassian.net/wiki/x/D4CTCw2.
3. old: https://one.company.com/display/3JP/30+years+proALPHA
   new: https://company.atlassian.net/wiki/display/3JP/30+years+proALPHA

Solution Case 1:
RewriteMap pageids “txt:/etc/httpd/pageids.txt”
RewriteCond %{REQUEST_URI} “^/pages/viewpage.action$”
RewriteCond %{QUERY_STRING} pageId=(.*)
RewriteRule . ${cloudhost}${pageids:%1|/wiki/spaces/TEST/pages/44073006/Migrated+Confluence+Page+Not+Found}? [L,R=301,NE]

Solution Case 2:
RewriteMap tinyurls “txt:/etc/httpd/tinyurls.txt”
???

Solution Case 3:
???

It would be great if you could help me.
Many thanks for your support :-).

Best regards,
Peter

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
September 16, 2024

Hi @Dr. Peter Heck - 

Welp, I'm long overdue in writing an update to this article now that we have actually MIGRATED to Cloud.

But enh, that sounds like a lot of work. Let's look at your specific questions:

1. which columns are in your file “txt:/etc/httpd/pageids.txt”?
   Only these 2: source_pageid destination_pageid

Correct. I think I used the unix cut command to extract just the first two columns, which on May 20, 2024 were named: server_page_id and cloud_page_id

2. what is in the variable: ${cloudhost}

Oh sorry, I should've specified that should be the URL of your site

        Define cloudhost https://YOURSITE.atlassian.net

3. is this performant with 50000 lines?

My mapping file has over 300000 lines. I was concerned about performance, to be sure, so I took a few precautions:

  1. I apologize for not documenting it earlier, but I used httxt2dbm to convert my mapping files into a DBM Hash File, which "works exactly the same way as the txt map, but is much faster, because a DBM is indexed, whereas a text file is not. This allows more rapid access to the desired key."

    The process is pretty straightforward and described well here: https://httpd.apache.org/docs/current/rewrite/rewritemap.html#dbm

  2. I overallocated my instance. To redirect ~300000 pageIDs/tiny links + ~1600 boards + ~33000 filters, I spun up an m5.2xlarge, which is probably ridiculously too beefy to just run Apache doing rewrites. We can probably switch to a smaller instance to save some $$, as looking at monitoring, it seems like CPU utilization has been below 0.3% for it's lifetime. But yeah, I preferred to err on the side of caution. Network traffic definitely shows the expected spike after we migrated and gradual decline, but I'm still seeing regular requests daily (including a spike of around 2000 every day. Hm... somebody clearly did not change the endpoints in their script.)

We have 3 cases to consider:

Case 1 looks right. I'll go over Cases 2 and 3, which again I apologize for not documenting earlier.

2. old: https://one.YOURCOMPANY.com/x/QczkCg
   new: https://YOURCOMPANY.atlassian.net/wiki/x/D4CTCw2.

Here's what my rule looks like for this:

        # Tiny Links
        RewriteMap tinylinks "dbm=db:/etc/apache2/mappings/tinylinks.db"
        RewriteRule ^/x/(.*)$ ${cloudhost}/wiki/pages/viewpage.action?pageId=${tinylinks:$1|/wiki/spaces/TEST/pages/598016095/Migrated+Confluence+Page+Not+Found} [L,R=301]

My export file included these two columns: tiny_server_url, tiny_cloud_url

I pulled the CSV into Excel so that I could extract JUST the old TinyURL unique string and then I mapped it to the Cloud Page ID. (My thinking is that smaller lookup table means better performance, and I didn't want to map to the Cloud TinyURL because that would mean a second redirect.):

So instead of:

https://OLDSITEURL/x/CKNdKg https://NEWCLOUDSITE/wiki/x/I2bFB
https://OLDSITEURL/x/KQ_WKg https://NEWCLOUDSITE/wiki/x/XGbFB

My lookup table (before conversion to DBM) looked like:

CKNdKg  80045603
KQ_WKg  80045660

3. old: https://one.YOURCOMPANY.com/display/3JP/30+years+proALPHA
   new: https://YOURCOMPANY.atlassian.net/wiki/display/3JP/30+years+proALPHA

Right. So this is the case for "what if the old URL didn't include a Page ID at all?"

I did document this, but I guess not explicitly:

        # Spaces / Full Titles
        RedirectMatch 301 /display/(.*)$ ${cloudhost}/wiki/display/$1

Hope this helps. And yeah, I should probably write an update to this article. Thanks for the reminder.

Like # people like this
Dr. Peter Heck September 16, 2024

Hi @Darryl Lee ,

thank you so much for the detailed information. I will try to make it run.
One more request: Could you please change our real company name to company?
I had changed only the first case and can not do it by my self.
Thanks a lot!
Best regards,
Peter

Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
September 17, 2024

Hi @Dr. Peter Heck -

One more request: Could you please change our real company name to company?
I had changed only the first case and can not do it by my self.

Oh, I figured it out. Fixed.

Like Dr. Peter Heck likes this
Dr. Peter Heck September 17, 2024

Hi @Darryl Lee ,

there is a little mistake in case 2:

RewriteRule ^/x/(.*)$ ${tinylinks:$1|/wiki/spaces/TEST/pages/598016095/Migrated+Confluence+Page+Not+Found} [L,R=301]

should be 

RewriteRule ^/x/(.*)$ $(cloudhost}/wiki/pages/viewpage.action?pageId=${tinylinks:$1|/wiki/spaces/TEST/pages/598016095/Migrated+Confluence+Page+Not+Found} [L,R=301]

I got all 3 cases running...
Best regards,
Peter 

Like # people like this
TAGS
AUG Leaders

Atlassian Community Events