When Fisheye is indexing Subversion it appears to do an "svn diff --summarize" and a full "svn diff" for each revision in the repository, even if for revisions which are merely the creation of a branch. This means that a repository with large branches and a modest rate of branch creation take an unreasonable amount of time to index. This behaviour seems completely unnecessary since there is no new source code to index when a branch is taken. It's merely a metadata change, one which is O(1) in Subversion.
For example, if I create a repository with a recent Linux kernel on the trunk and then branch it 100 times then it will take around 10 hours to index with Fisheye 4.2 on a fast machine. Our branches are larger than that so we have found that Fisheye will not manage to index our repository at all since it can't keep up with the our relatively low rate of branch creation, on the order of a few a day.
My question is in two parts:
1) Is there anything we can do to improve indexing performance with the current version of the Fisheye?
2) Are there any plans to fix this serious performance defect?
you're right, it is how FishEye handles SVN repositories. It's because everything in SVN is a convention. FishEye was designed to know about all paths in the repository. Files
/branches/branch1/file.txt are 2 completely different files from SVN point of view. It's a matter of a concept defining that those 2 are connected. And in order to know about all files changed in a revision FishEye uses summarize command.
We have open FR request to improve that handling. You can track it https://jira.atlassian.com/browse/FE-3949.
In general good definition of a SVN symbolic rules helps in a indexing performance, however, it won't prevent FishEye from calling
svn diff --summarize
Thanks for your quick response.
I'm aware that branching and tagging is handled by convention. However, it's a convention which is very commonly followed and it's one that Fisheye already understands and relies on. The SVN symbolic rules are just a means of describing that convention for a repository. I certainly have no expectation that Fisheye will efficiently handle a repository which doesn't respect those conventions or where the symbolic rules are not setup correctly, but I do expect it to efficiently handle repositories where the conventions are respected.
Fundamentally all the scanner needs to do is recognize that a revision which only consists of a copy from a directory identified as trunk, branch or tag by the SVN symbolic rules to another directory similarly identified by the symbolic rules then it is a branch or tag creation and should therefore be treated differently. This information is easily and cheaply obtained from the output of svn log -v - I've done it many times myself in various scripts I use to analyze our repository. There is certainly no need to use svn diff --summarize to get that information.
For these branching revisions I see no reason why Fisheye should need to reindex all the content on the branch since no new content has been created. I do understand that considerable metadata updating may be required but that would hopefully be much cheaper than reindexing the entirety of a branch.
I've added comments to FE260 and FE3949.