Community
Products
Fisheye/Crucible
Questions
Improve FishEye repository scanning performance using Hadoop or something else

Improve FishEye repository scanning performance using Hadoop or something else

We have almost 600 Subversion repositories, some that are 30 GB or more in size. The last time we tried to support FishEye the scanning was taking over a month and we decided it wasn't viable.

I'm wondering if Atlassian has considered supporting the use of Hadoop to improve scanning performance? Revisions seem like a unit of work that could be distributed to various nodes for processing.

Other than Hadoop, does anyone have other suggestions for improving the performance?

7 answers

1 accepted

0 votes

Answer accepted

Atlassian has not responded so presumably it isn't possible.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

2 votes

Bah, accidentally deleted my comment. Be sure that all your repos are structured in the way it likes: https://answers.atlassian.com/questions/19281/how-can-i-reduce-the-size-of-the-fisheye-indexes

I ended up writing something that automatically generates the exclusion rules.

30Gb repos doesn't tell us much - if it's binary files fisheye doesn't care, if it's metadata it does.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

1 vote

Some minor hints (svn)

http over https improved the performance a lot
svn also has a direct file access mechanism, if you can afford to access the svn server's disks directly from the fisheye servers

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

We were using file:// URLs and the repositories were stored on SAN, so that wasn't the issue. Thanks though.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

FWIW - anything other than file:// access is pretty much a non-starter for real life svn repos.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Be sure all your repos are structured in the way that fisheye likes: https://answers.atlassian.com/questions/19281/how-can-i-reduce-the-size-of-the-fisheye-indexes . I ended up writing something to automatically generate the exclusions.

30Gb repos doesn't really tell us anything useful. It could be binary files, in which case it makes no odds to FE, or metadata, in which case it will kill it.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Execellent ! I am truly impressed that there is so much about this subject that has been revealed and you did it so nicely with so considerably class.and visit more http://hadooptraininginhyderabad.co.in/

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

There's feature request for 'scanning agent' -- https://jira.atlassian.com/browse/FE-1988 -- vote for it if you like it.

So far the only 'distributed scanning solution' is to spin up an aux instance and do the initial scanning there.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

Hi Justin,

We also met some problem with Big repositories.

We have over 300 repos, and each one is bigger than 10G.

We are also interesting about your idea of "using Hadoop".

Do you know how to use Haddoop on Fisheye/Crucible?

How Hadoop will improve the scanning performance?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

As far as I know it isn't possible. I was hoping Atlassian would weigh in on the possibility.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

0 votes

I found http://confluence.atlassian.com/display/FISHEYE/Best+Practices+for+FishEye+Configuration for general performance tips. I'm still interested in the idea of using Hadoop though. Atlassian?

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Missed Team ’24? Catch up on announcements here.

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Improve FishEye repository scanning performance using Hadoop or something else

7 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events

Missed Team ’24? Catch up on announcements here.

Ask a question

Start a discussion

Products

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Get product advice from experts

Join a community group

Advance your career with learning paths

Earn badges and rewards

Connect and share ideas at events

Improve FishEye repository scanning performance using Hadoop or something else

7 answers

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events