I'm working in two different migration projects, and in both cases, CCMA is extremely slow and sometimes unresponsive.
Both instances of Confluence are very fast and work properly, the problem seems to be specific with CCMA.
In the first case, the client has a Microsoft server environment (Confluence hosted in MS Server, and database MSSQL). In that case, the bottleneck apparently is the communication with the database server that causes a spike in the server CPU.
Words from support:
we could find on the catalina logs (Tomcat) a lot of stuck threads where the database was not able to handle the SQL queries responses putting them in a queue. That's why the CPU did a spike of 95% because it is receiving many more requests from the CCMA than it can handle.
In the second case, the client has a Linux based server environment hosted in AWS. Our test server is an c5.2xlarge (CPU 8, Memory 16GB) with the database installed in the same server.
In this case, when the problem happens, the server CPU goes to 500% or 800%.
We can see this in the logs
17-Aug-2022 00:25:36.452 WARNING [Catalina-utility-4] org.apache.catalina.valves.StuckThreadDetectionValve.notifyStuckThreadDetected Thread [http-nio-8090-exec-23 url: /rest/migration/latest/stats/usersGroups; user: gmuller] (id=) has been active for [67,496] milliseconds (since [8/17/22 12:24 AM]) to serve the same request for [http://34.211.xx.xxx:8090/rest/migration/latest/stats/usersGroups] and may be stuck (configured threshold for this StuckThreadDetectionValve is  seconds). There is/are  thread(s) in total that are monitored by this Valve and may be stuck.
After the check for errors phase, we proceed to the "Review your migration" phase, and this is where the problem happens. The estimated time keeps spinning forever, if I leave and let CCMA working, it will show a connection error eventually.
The rest of the instance gets unresponsive while CCMA is in that phase. For example, if I open another tab and try to navigate to any Confluence page, it will not work and will forever spin. If I close that CCMA tab, Confluence will get back to life at the same moment.
I did test with a very minimal number of spaces, like 5 spaces only, and it took like 10m to finish the Estimated time. For a customer that has 500, 800 spaces, migrate in very small chunks is not a viable solution.
Any ideas? Is Confluence Cloud Migration Assistant really poorly optimized?
I was able to resolve the issue in both projects by downgrading the CCMA version.
I tested 3.3.1 and 3.2.3, and in both versions, the problem doesn't happen and CCMA works smoothly. Ultimately, I'm using 3.3.1 since it supports attachments only and has a new UI which allows more filters.
PS: I'm also using this dark feature to bypass the outdated CCMA version check: migration-assistant.disable.app-outdated-check
The root cause seems to be the latest version of CCMA, I tried to create a bug report at jira.atlassian.com, but they no longer allow end-users to create tickets there ¯\_(ツ)_/¯ so I don't know if the support engineers will create a bug or not.
This is an example of a REST endpoint that triggers a very log delay.
It seems to cause this query to hang for a long time:
select tasks0_.planId as planid13_18_0_, tasks0_.id as id2_18_0_, tasks0_.taskIndex as taskinde3_0_, tasks0_.id as id2_18_1_, tasks0_.taskIndex as taskinde3_18_1_, tasks0_.planId as planid13_18_1_, tasks0_.endTime as endtime4_18_1_, tasks0_.message as message5_18_1_, tasks0_.completionPercent as completi6_18_1_, tasks0_.doneResult as doneresu7_18_1_, tasks0_.startTime as starttim8_18_1_, tasks0_.executionStatus as executio9_18_1_, tasks0_.weight as weight10_18_1_, tasks0_.spaceKey as spaceke11_18_1_, tasks0_.scoped as scoped12_18_1_, tasks0_.taskType as tasktype1_18_1_ from mig_task tasks0_ where tasks0_.planId='d3f5df59-ab1f-43cf-adbe-bb42759bb06d';
Curiously, this query executes immediately when run from `psql`
This, of course, not on the same system, but on an AWS Amazon Linux instance.
CCMA is 3.3.7.
Query obtained from modifying a query shown from the following:
SELECT * FROM pg_stat_activity where datname = 'confluence7_13_7' order by backend_start;
Interestingly, though, that query referenced "MIG_TASK", not "mig_task" but dunno if that could cause trouble.