What Jira operations cause synchronous activity in RMISynchronousCacheReplicator?

We suffered a brief network partition in our JIRA Datacenter setup. Among other exceptions, we noticed the usage of RMISynchronousCacheReplicator in some stacktraces in http threads. When do user facing threads cause synchronous activity? Can this be configured to be asynchronous?

Stacktrace from our incident:

java.net.ConnectException: Connection timed out
        at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:619)
        at sun.rmi.transport.tcp.TCPChannel.createConnection(TCPChannel.java:216)
        at sun.rmi.transport.tcp.TCPChannel.newConnection(TCPChannel.java:202)
        at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:130)
        at net.sf.ehcache.distribution.RMICachePeer_Stub.remove(Unknown Source)
        at net.sf.ehcache.distribution.RMISynchronousCacheReplicator.replicateRemovalNotification(RMISynchronousCacheReplicator.java:243)
        at net.sf.ehcache.distribution.RMISynchronousCacheReplicator.notifyElementRemoved(RMISynchronousCacheReplicator.java:229)
        at net.sf.ehcache.event.RegisteredEventListeners.internalNotifyElementRemoved(RegisteredEventListeners.java:144)
        at net.sf.ehcache.event.RegisteredEventListeners.notifyElementRemoved(RegisteredEventListeners.java:124)
        at net.sf.ehcache.Cache.notifyRemoveInternalListeners(Cache.java:2322)
        at net.sf.ehcache.Cache.removeInternal(Cache.java:2305)
        at net.sf.ehcache.Cache.remove(Cache.java:2207)
        at net.sf.ehcache.Cache.remove(Cache.java:2125)
        at net.sf.ehcache.constructs.EhcacheDecoratorAdapter.remove(EhcacheDecoratorAdapter.java:154)
        at com.atlassian.cache.ehcache.LoadingCache.remove(LoadingCache.java:208)
        at com.atlassian.cache.ehcache.DelegatingCache.remove(DelegatingCache.java:135)
        at com.atlassian.jira.avatar.CachingTaggingAvatarStore.getByIdTagged(CachingTaggingAvatarStore.java:104)
        at com.atlassian.jira.avatar.AvatarManagerImpl.getByIdTagged(AvatarManagerImpl.java:117)
        at com.atlassian.jira.avatar.AvatarManagerImpl.lambda$transformToJIRAAvatar$480(AvatarManagerImpl.java:421)
        at com.atlassian.fugue.Option$Some.fold(Option.java:421)
        at com.atlassian.fugue.Option.flatMap(Option.java:287)
        at com.atlassian.jira.avatar.AvatarManagerImpl.transformToJIRAAvatar(AvatarManagerImpl.java:421)
        at com.atlassian.jira.avatar.AvatarManagerImpl.processAvatarData(AvatarManagerImpl.java:376)
        at com.atlassian.jira.avatar.AvatarManagerImpl.readAvatarData(AvatarManagerImpl.java:488)
        at com.atlassian.jira.web.servlet.AvatarToStream.sendAvatar(AvatarToStream.java:29)
        at com.atlassian.jira.web.servlet.AbstractAvatarServlet.defaultDoGet(AbstractAvatarServlet.java:82)
        at com.atlassian.jira.web.servlet.ViewUserAvatarServlet.defaultDoGet(ViewUserAvatarServlet.java:39)
        at com.atlassian.jira.web.servlet.AbstractAvatarServlet.doGet(AbstractAvatarServlet.java:39)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:622)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:729)

1 answer

1 accepted

0 vote
Chris Fuller Atlassian Team Oct 24, 2016

When do user facing threads cause synchronous activity?

Essentially, any write operation to a synchronous cache or the initial access to a replicated cache that has not yet bootstrapped values from another node.  At one point in time the Users and Groups caches behaved this way, but I think that was fixed in 7.0.  I have 7.1.0 checked out and can tell by the code that is definitely fixed there, so the only cache I know would do this is the TaskManager's.

Can this be configured to be asynchronous?

It definitely can in the code:

    https://docs.atlassian.com/atlassian-cache-api/2.11.3/atlassian-cache-api/apidocs/index.html

I could be wrong, but my understanding is that we use asynchronous replication by default and that only caches that explicitly opt in to being synchronous will replicate in that manner.  The only such cache I'm aware of offhand is the one for the TaskManager, which is one of the few value-replicating caches in JIRA (most replicate invalidation only).

You should be able to tell from a thread dump which cache was doing synchronous replication.  Without knowing which caches were involved, it isn't really possible to fairly evaluate what the potential consequences of changing them from synchronous to asynchronous might be.  It could be that the cache's author was just being paranoid, or it could be that asynchronous replication destabilized behaviour for their code, so they specified synchronous replication as a workaround.

 

thanks. indeed it looked to be from a removal from a cache related to avatars.

Chris Fuller Atlassian Team Oct 25, 2016

Hrmmmm... That isn't a cache I would expect to need to be synchronous, so I may be mistaken about the default?  I don't work on JDC anymore, so this is mostly from memory.  I do know that avatars were moved from the core to a plugin, so there may be something that changed during that process as well.  I'd probably need a stack trace to figure out whether it's peculiar to the avatar cache or not.

i added the full stacktrace.

Chris Fuller Atlassian Team Oct 25, 2016

From the stack trace a bit of digging, my guesses are:

  1. I don't see any custom settings on this cache, so we seem to be using synchronous replication by default.  There may be an opportunity to improve performance here by making cache replication asynchronous instead, but that could also cause instability if parts of the system don't respond as well to eventual consistency semantics, so it would need a lot of testing.
  2. This particular cache is doing a removal (and thus a replicated event) on every read request, which is just plain dumb.  You should report this to support.  The nasty code is this (from CachingTaggingAvatarStore):

@Override
public Avatar getByIdTagged(final Long avatarId) {
    try {
        return taggedAvatars.get(avatarId, () -> tagAndRetrieve(avatarId)).orElse(null);
    } finally {
        avatars.remove(avatarId);
    }
}

 

That `remove` probably isn't necessary at all.  The point is that once an avatar is "tagged" with the metadata that confirms it came from JIRA, we don't need to hold onto the untagged version of it anymore.  But on a cache hit, that untagged one is already gone, so the remove on the untagged avatar is just wasteful.  It could be moved inside the loading function or possibly removed altogether, depending on what the surrounding code does (I didn't dig in further than that).

 

Chris Fuller Atlassian Team Oct 25, 2016

In short, I think you saw these stack up not because they are synchronous, but because they are unnecessarily and excessively common replications.

can you suggest the right support address for this?

Chris Fuller Atlassian Team Oct 25, 2016

I think https://support.atlassian.com/servicedesk/customer/portal/22 is the correct place.  (And if it isn't, they'll know what is and forward it on).

thanks so much.

Suggest an answer

Log in or Sign up to answer
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Published Sunday in Agility

You asked for it, so we delivered: images on issues have arrived

A picture tells a thousand words. And agility boards have just released their latest feature: cover images on issues – so now your board can tell a story at first glance. Upload attachmen...

228 views 2 11
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you