Split instances in more details (Part 3)

This article is another part of a story related to JIRA and Confluence split. If you are interested in how we prepare for that and actually made this migration go to previous parts (Part 1 and Part 2). If you are already done with migration or simply what to know what to do after then continue reading since this is the best place to learn more about it from practice.

Preparing and Doing a migration is one thing but what we should do later? We are in the situation that we have now two separate instances that was a one some time ago. Now it is a question what would Monday morning introduce when everyone would start using a new system..

Having experience with a lot of maintenance work done over the weekend it is not hard to know that real test is coming. Only normal day to day traffic could show us how things are working and how migration went. We did know that we have to pay attention to everything and there would be a lot of support coming. People would have problems with everything.. Some of them would try to continue to use old system, some would notice new problems.. Overall after such a huge change for everyone always mean that we put all hand on deck.

Step 9: Support

Support.. This was something that you cannot prepare.. We started a brand new system with old data and it is now a mystery how it would behave. We had many questions.. How good would be performance, would everyone be able to log in, would everyone see what saw before migration. We actually needed to create a new support project in Jira (since old were left in source Jira), so people had to get used to new rules of creating a tickets.. Overall so many changes..

Business wake up on 7 AM, but most of people start on 9 AM.. On 10 AM we were pretty shocked since it was.. quiet! This mean that either people have problems but they do not know how to report them and accumulation would be ever email or something or actually everything is fine and it was fully transparent to all, goal reached!

We continue to use our monitoring tools (we use Zabbix and Grafana) for that. Noticed that performance is sometimes on the edge but never actually crossed it.. suddenly 1000 users would have to get all pages cached so we should not touch yet anything until it would be really bad. It was a good decision from hour to hour it was reducing so it mean that the heaviest day of the week finally passed!

Of course we handled more tickets than usual mostly related to migration. People (of course as usual) tried to log in to old system.. Some links hard coded into our documentation need to be changed now by owners but those were things that we know would happen. We were happy that nothing really was critical. We continue to monitor for whole week until we notice that people finally fully transitioned and stop asking question about it. This is when we finally could announce that this project was successful!

But it was not the end for us.. Instance is running, people continue to use projects (just on new URL) but we have to do something with data that co-exist now on both instances.

Step 10: Cleanup

Yes, this is where very tired but happy start to think about cleanup. Split was successful, but currently we temporally run two instances with same data. We no longer need projects / spaces that are not used by migrated users and opposite.. Other users do not care about those migrated ones..

Someone could say that this step is not required, but actually that is a very important step. We cannot leave two huge instance. It is required to remove what is not used and optimize resource so that we do not generate additional cost. It was also a matter of data safety. Keeping data which do not belongs to you is a responsibility, so cleaning this ASAP is the last things that we have to do.

From a helicopter view it is easy.. We have a list of projects, simply delete them and both sides and that is all. Well, it depends on the instance.. If splitting small instance nobody probably cares much, but if we take care of bigger one then it is starting to be a challenge.

First we need to determine overall how things behave .. how long data is removed (so that we know how much potential downtime we need), is this affecting performance and what could potentially surprise us. Using "delete" functions is also more risky than "add" or even "edit" since one single incorrect step could cause damage.

Cleanup started by removing everything on a test system. Why? Because then we would have all information about size of the projects/spaces, before and after state and we would be able to validate if anything is missing. We recorded also how much time each object is deleted and based on this we were able to plan downtime.

In our case data removing time was more than 8h so we decided to split it into 2-3 parts and start with less important projects and spaces. Thanks to that we would know how things behave on PRD and verify the time. After removing first group If ended that all where 1.5x slower on PRD than on TST. We automatically adjusted all times and avoided potential problems

Deleting project/space was pretty smooth. We written a simple Python script just for that task. And good that we did that since removing all of those project/spaces manually would take more than we wanted to spent. At the end we also had a report from whole deletion which might be also handy in the future when someone ask us about it - we would know what happened.

Step 11: Research and Cleanup (again)

Deleting projects/spaces is only beginning. It is obvious that now we have to now check what is nor used after that. Many objects where connected to the project/space and now it is matter of identifying them one by one, verify if those are not used and delete. So this phase started again by doing good research so that we do not delete something that is still important.

This step is not easy.. There are no native features that would give us 100% accurate answer what could be deleted and what need to stay. Here we have to get a list of all schemes and objects and simply try to check if there are any "orphans". In case of Jira good thing is to start with schemes that are directly connected with a project. Then removing those would "release" all objects connected to those schemes.. And we can go deeper and deeper in the dependency tree..

For example..

Project A was using ABC Issue Type Scheme that contain Bug, Action and Improvement issue type.. Deleting Project A showed that ABC Issue Type Scheme is no longer used so we deleted it.. After that we were able to check Issue Types and we noticed that Action issue type was only used in Project A.. So at the end we also were able to delete Action..

So we followed this rule (lets call it from Top to Bottom) for other objects as well and after we weeks of work we have this cleaned up. The biggest problem was of course with Custom Fields..

Determine if a custom field is used or not requires more effort since we might think that something is not used now but it was actually used in the past and some values are there in old tickets.. So I recommend leaving Custom Fields to the end and double check everything few times..

Step 12: Final touch

After removing those objects we finally were able to reduce that backup to an optimal level. So far we had to keep two instances with that same huge database and disk space..

Speaking about disk space. Did you know that removing a project does not remove the folder from a disk? Yeah, so additional cleanup was required on the server side also.. We had to identify which projects were deleted and then (sadly) manually remove also attachment directories. Of course there was junk left also (some thumbnails.. Etc.) so removing this helped also optimize disk space usage.

On thing to remember when doing a report and marking something for deletion we have to pay attention the the fact that sometimes you think that folder does not match to any project key, but actually the name of the folder is always connected to the initial key - overtime someone could rename the project but it would not rename a folder on server. So in order to make sure it is good to check in the database old project keys where this information is stored until project is deleted. If you do not do it you might delete an used folder..

On of the steps that were left is to cleanup all plugin data.. Yes, again we had to start with research to determine which group of people used a specific plugin. Sometimes it was obvious since plugin was restricted to a specific group (not open for use for everyone) but for plugins that were adding something to the UI on a global level.. things get more complicated but after painful analysis we made a complete list of plugin that we would disable first and later (if nobody complains) delete.

BTW - Here I have a request to all plugin vendors - always add a feature to restrict ability to use a plugin to a specific group or individuals. Thanks to that it is not only possible to identify who is using it but also we (admins) are able to install something to someone not affecting everyone on the instance. And please add separate plugin audit logs! :)

Step 13: Monitoring and Optimization

Finally, after all of that preparation, migration, and post migration steps we reached to a point that we started to get some breath. Now it was visible how instances (source and target) behave. How much resources we need. We started with one huge instance that used more RAM, CPU.. As you remember we did a complete copy of a system, so now it was a time to check do we still need so many "power".

Good thing was that everything was stable, but overall after few weeks we optimized everything a little bit so that we can save some money on the infrastructure. Of course may things needed also to be reorganized, scripts duplicated and adopted to new environments, new IPs , etc. That was checked one by one and started to work like before.

Few weeks of monitoring both environments everything did the job. We did not get notice more issues, so were able to finally say that whole migration and transition is fully completed!

Summary

Just to quickly summarize I would like to say that this split overall was a success. It shows that everything is possible but of course not everything is simple as we initially think about this. When looking at the documentation it was only few step with not many details.. This series of articles were created to show more details that are also important and specific phases and required to be done. Of course every environment is different. One is small, one huge.. Some are not customized heavily other are integrated from all sides.. So every migration would be different but overall I hope that this information would be helpful to someone. At least to better plan and imagine how much effort might be required. I wish you successful splits / migrations.

Here I would like to thank you for taking time and reading though the series of articles and I am of course if anything is unclear happy to answer any question related to this topic. Please leave a comment and I would definitively reply :)

2 comments

Jimi Wikman
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 1, 2024

Thanks for sharing Mirek, that is interesting reading for sure.

I have never been a fan of migration since I have never seen a setup that was clean, and moving to a new instance by bringing all that mess over is something I always try to avoid.

May I ask how clean your system was before you decided to split them, and what amount of cleaning you had to do in terms of removing old configurations like issue types, statuses, workflows and screens?

As a reference, I am working on cleaning our messy instance where I estimate 3-5 years of cleaning will be needed and if with every split I would probably add another year or more to that estimate. This is because changing people's setup is a time-consuming activity, and just consolidate a dozen or so custom fields used in a few projects can take ages.

I also wonder if you saw any increase in the support needed to handle two instances rather than one and how that changed your strategies within the platforms?

Mirek
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 3, 2024

Hi @Jimi Wikman , 

Thank you for reading and replying. 

I was also not a fan o migrations since I know that a lot of effort is required (especially if you have a big instance with a lot of customization) but sometimes you get to a point that that need to be done. I did couple of "migrations" already (split, merges, platform and application changes.. ) and always cleanup is one of the things that is a very important step. 

By cleanup I mean getting rid of something that is really not needed and focus only on that what is (or might) used. Consolidation (when you for example merge two identical schemes or custom fields) is something a little bit different. It is a part of cleanup but for me it is always the last step when things are stable. So if you are saying that you estimate 3-5 years of cleanup I could imagine about how big instance you are talking about including of course removing and consolidating. 

In case of split we did not do any cleanup before (only after) but we did a lot to analyze how much things belong to one part and another. We agreed that we live now on one single instance that is not cleaned up so we could live a little bit more just on a separate servers. But when we did it then it was a priority to get rid of everything that do not belongs to specific groups/teams. So we pushed hard to eliminate unused objects and do it as soon as possible. At the end it took around 2-3 months (depends of course how many people work on it at the same time.. we divided the work, test things on a test system to see impact and when feeling confident did same steps on PRD.. ) 

I suggest break this huge estimate into small pieces. You always start with removing main objects (project/spaces), then going deeper into the structure. I personally did a matrix of risk and focused first on objects that have low risk and low effort to remove (and revert if needed). Thanks to that you start by making small steps but moving forward. From this matrix it is easily visible that you have to leave Custom Fields to the end :) .. 

You need to generally get to a point that you start to see that things are getting smaller. You might still not know if you can delete something or not so it is better to leave it for a while and observe rather than take a risk of breaking something. So always focus first on those objects that you are 100% sure that could be touched then looks what is left... After one round you would be more aware what you have on the instance and that would be helpful later if you decide to do round 2.. Always have a backup in some form (even a screenshot). 

After split initially we had to do more support on the new (target) instance since people needed to adopt and we had to resolve issue that were not there previously. On the source we did not notice any increase since for then things were transparent or even better since the instance become faster when 1000 users moved somewhere else and we deleted 50% of data.. Of course now we had to do more work on the server side since we had more machines now, we had to do TST/DEV instances for both and do all maintenance activities that so far we did for one single instance (like upgrades) but overall when looking that the support only when I check now it stabilized and it is not a huge increase (however it is hard to say since since migration a lot of new things/processes already where implemented and this would require deeper analysis) 

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events