My question is what is the best practice, or when should I be using the project level re-index as opposed to the full system reindex. Clearly, time is a factor. Does the project level re-index only affect fields etc, that are used or added to a given project?
That is another application as well, initially I thought my users would not have a whole environment denied while one update was made. P.S. Thank you for your work on your Practical Jira Admin book. That has been my handbook, road map, and life saver in travelling through the land of Jira in getting started with this application.
What we've generally found in our performance testing and work on support cases is that Lucene activity normally only becomes a problem when you have especially slow hardware or significantly (say 10-20x) higher write activity than a typical instance would get. Either one is *usually* traceable back to some kind of environmental problem. That said, the JIRA Enterprise team has discussed the possibilities of how we might improve the indexing problem. With the obligatory disclaimer that I'm not promising any of these will ever actually be delivered, much less giving timelines or anything of the sort, some of the things we've looked at include: * Finding ways to avoid the full reindex when we have a more reliable way to identify which issues are affected. If you just modified a field configuration scheme, then only the projects that actually use that scheme need to be reindexed, not all issues in the system. * Making upgrade tasks that "require" a reindex but do not seriously cripple the system in the meantime wait until the administrator says to do that reindex instead of doing so immediately * Sharding the indexes by "active" vs. "archived". This would be a kind of on-server version of archiving with the advantage that it is much simpler to implement, but the disadvantage that all the field configurations, permission schemes, etc. that *really* cause the slowdowns would still be there, so there is far less benefit. * Sharding the indexes by project, which would give the ability to do things like reindex just that one project without issues from other projects being involved in the segment merge activity. Similarly, searches that are limited to a single project would have a much smaller index to work with more naturally this way. * Taking better advantage of Near Real Time search capabilities, which allow searches to see up to date information before it has actually been committed to disk. The advantage is that your writers no longer block the readers. The disadvantage is uncertain but might, for example, be a larger window for index inconsistencies on unclean shutdowns. * Looking at other ways we could do this besides using Lucene, such as indexing directly into the database or into another technology (related or not) like elasticsearch or solr. This has the disadvantage of breaking the API but is almost certainly going to be necessary, regardless. If you (meaning Chris, Matt, or anyone else) have your own ideas to contribute to the list of possible improvements we could make to this picture, then we would definitely love to hear more.
Chris, interesting ideas, thanks for sharing them. There is a test search app written to showcase some newer parts of Lucene at http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html It made me think about having a main Lucene index for non-archived issues and a separate Lucene index for archived issues. Mind you, Lucene could probably be configured differently in JIRA for large indexes and that might be a good start.
@Matt Doar [ServiceRocket]: One of the problems (and the reason why I say we're going to have to break the searching APIs at some point, with 7.0 being the obvious target) is that Lucene 4.0 is a *huge* API break from Lucene 3.x. In particular, IndexSearcher.close() went away, and much of our logic for how to interact with the searcher is based around this method, which no longer exists. Thanks, Lucene! Clearly it was a mistake for us to bleed a third-party API without an abstraction layer and this is a lesson we've learned elsewhere with Quartz and Guava as well, but repairing the damage is hard and a bit slow because the API breaks have to wait for .0 releases. The point is that while Lucene may have awesome new features, we can't take advantage of them until we can can upgrade to a newer version of Lucene. Since that breaks the API, our first priority has to be future-proofing ourselves against this sort of thing by introducing a search API that hides the fact that Lucene is what you're talking to. In the future, it might not be.
As for configuring JIRA differently for large indexes, this is already possible. However, we have not done the necessary experiments to actually give advice on what settings should be changed and in what way. You can find these in jpm.xml (meaning you set them in jira-config.properties) with keys that start with "jira.index.mergepolicy." and these correspond with the settings in http://lucene.apache.org/core/3_2_0/api/all/org/apache/lucene/index/TieredMergePolicy.html
There are times when a reindex is needed due to a very specific configuration change, such as adding a new custom field. When the configuration change only applies to certain projects (say a "Purchase Order Number" for a PURCHASING project), reindexing everything just to get that one field to show up properly in the one project that's using it is wasteful.
JIRA cannot currently tell when the change only matters for a subset of the projects (this is an improvement that the JIRA Enterprise team is looking into), but presumably you can. Re-indexing just the affected projects is a way to get the new field to show up where it is needed with less immediate impact to the system. I believe you still have to eventually do a full (background is okay!) reindex to make the warning message go away, but at least you can get up and running immediately.
Another example would be that if you know that something is wrong with the index (say you had manually taken an index backup and restored it and know that it is out of date, for example), you could selectively reindex your highest priority projects first and leave lower priority projects until later.
We haven't really thought out formal "best practices" around this – it is just another tool for knowledgable administrators to work with.
As to questions about specific fields, etc. -- There is no such thing as partially reindexing an issue's fields at this time. At the code level, the specific affected indexes can be selected (meaning issues, comments, or change history, with worklogs slated to be indexed as well in 6.4). But the indexing technology (Lucene) doesn't understand the concept of modifying just one field of an indexed document; it only knows how delete the previous document and create a complete replacement for it. Although I could see how we might be able to gain quite a bit of speed by creating the new document with just that one field changed, it really is much more complicated than this, because the field's value is generally not the only thing that changes. For example, to distinguish between a field being present-but-empty and it being irrelevant due to field configurations, there is also a field that tracks all of the visible fields, and that value would change, too.
From system-view-project-operations-sections.xml: <web-item key="reindex_project" name="Reindex Project" section="system.view.project.operations" i18n-name-key="webfragments.view.project.operations.item.reindex.project.name" weight="40"> <label key="admin.projects.reindex.project" /> <link linkId="reindex_project">/secure/admin/IndexProject.jspa?pid=$helper.project.id</link> <condition class="com.atlassian.jira.plugin.webfragment.conditions.HasSelectedProjectCondition" /> <condition class="com.atlassian.jira.plugin.webfragment.conditions.UserIsAdminCondition" /> </web-item> From actions.xml: <action name="project.IndexProject" alias="IndexProject" roles-required="admin"> <view name="success">/secure/project/views/projectindex.jsp</view> So it should require normal JIRA admin access, not system admin, meaning that customer admins in Cloud have enough permission to do this, as well.
I'm very interested to know if any improvements over the version active at this time have been made to later versions (we use 7.1.2)? Our re-index time is very lengthy and we cannot lock users out as we're 24/7. That said, we have a large number of customizations to individual projects needed and are investigating using the Project level re-index to help decrease the time needed for full re-index. Has anyone employed this method and would love to hear the results!?
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG
We're bringing product updates and pro tips on teamwork to ten cities around the world.Save your spot