Feike Wierda

Living with Couchbase

Written by Feike Wierda on

The relationship between Copernica and Couchbase has not always been a happy one. On one hand we've been very happy with Couchbase's speed and flexibility, on the other, Couchbase has let us down (quite dramatically) more than once. Over the course of a few months, we've been taking several steps, some big and some small, to increase Couchbase's reliability. Here are a few caveats we've found, mostly through trial and error.

Couchbase 1.8.x: Connect me not!
As with most new technologies we try, implementing Couchbase started small. And, as with most new technologies we try, as long as it's small, all is fine and dandy. Couchbase was no different, problems didn't start until we moved larger portions of our software to Couchbase. First thing we found, when running Couchbase 1.8.x on a single node, it will completely and utterly lock up when the number of connections rises above 2047. It will break. Hard.

Need. More. CPU's.
Couchbase 2.0.x specifies a minimum of four CPU's per node. Couchbase breaks when the per-CPU load average rises above 1, i.e. Couchbase has to wait for CPU time. With Couchbase being very CPU hungry, that will happen when using only four CPU's, so it is adviseable to go with more CPU's. By the way, Couchbase 2.0.0 is broken in more ways than one, so skip straight to 2.0.1.

A virtual machine is not a node
We like virtualization, it makes us fast and and flexible. Contrary to us, Couchbase loathes virtualization. We have found that running Couchbase on virtual machines (specifically KVM) leads to unreliable results. With that said, run Couchbase on physical machines at all times.

A node is not a cluster
For Couchbase to function the way it should, use as many nodes as you can get your hands on. After having struggled through several crashes when joining new nodes to a cluster in 2.0.0, we decided that a single node would probably be better. We were wrong. With mutiple nodes spreading the load among each other, adding nodes leads to less CPU usage on the individual nodes. It does, of course, introduce some overhead on the master node, so make sure you have ample CPU's available to prevent overloading.

Turn off autocompaction
With autocompaction enabled, Couchbase will automatically compact buckets when fragmentation reaches a certain percentage. Compaction reduces fragmentation and frees up diskspace, so that is a good thing. We have found, however, that during compaction many connections to the cluster fail. Not compacting buckets at all will eventually comsume all available diskspace, which is a bad thing. To overcome this, we now compact buckets twice a day using a small script in a crontab.

Set your limits
To prevent Couchbase from eating its way through too many open files, be sure set the proper limits for the couchbase user. The default open files limit may very well be too low, which may lead to connection failures once that number is reached. Apparently, the proper config files will be added to the packages starting with version 2.0.2, in the meanwhile, you'll have to set them manually. To do this, add the following to /etc/security/limits.d/couchbase.conf:

couchbase - memlock unlimited
couchbase - nofile 100000
Summing it all up
With the above modifications made, we have now been running Couchbase 2.0.1 without incident for a few weeks. Couchbase's near zero-conf setup suggests that it will run out of the box. The reality is that we had to stumble and fall quite a few times before finding a stable setup.