Sunday, February 14, 2010

A funny thing happened to my ActiveMQ Performance

In ActiveMQ we have a message cursoring system to ensure that there is no memory restriction on the size of the queues we use. However, we do cache messages to improve performances - as depicted below:


The cursor cache is limited by memory constraints that are defined in the ActiveMQ configuration. Usually the cache is valid, but if memory constraints hit, - or consumers lag to far behind producers, the cache is cleared down. However for every message dispatched through the cursor after the cache is cleared,  the cursor will check if the Queue is empty in the store to determine if it can start using the cache again.

This all sounds eminently sensible. However, the code to check if the Queue was empty in the store called for a count on the outstanding messages. In order to determine the size of the Queue, the BTree index used for KahaDB would iterate through every BTree node and, er count them. ActiveMQ would do this for every message dispatched after the cache became invalid.

Just to labour the point, for every message dispatched, every BTree node for the BTree index would be paged into memory in the same transaction  and counted ....

Although this is an issue with some serious side-affects - like being a memory hog when using KahaDB and seriously affecting performance -  it does highlight that for distributed systems there are always going to be behaviour  that you can't always anticipate.

The reason I found this issue so interesting is because it wasn't obvious - its an edge case that happens under only under certain load conditions with certain configurations - but was very straightforward to fix - which always makes me smile :)

This issue only affects KahaDB and is fixed in trunk, the ActiveMQ 5.3.1 branch and Fuse Message Broker versions - and for those interested - the code change is here.

No comments: