Managing distributed Solr Servers
We use the open-source search server Solr for real-time search on data stored in a Hadoop cluster. For our terabyte-scale dataset, we had to implement distributed search on multiple Lucene index partitions (shards). This article describes our solution to manage 40 independent Solr instances without human interaction, including transparent failover and automatic backup of the Lucene index.
In part 3 of this article series we have discussed our use of Solr to enable real-time search on data stored in a Hadoop cluster. Because our data consists of a very large number of small documents (we consider a single log message with a size of 600 bytes average to be a document), we need to index the dataset into multiple index shards. In fact, we have one Solr instance running on each data node of our Hadoop cluster.
Hadoop has a lot of features to ensure that the cluster is kept functional even when data nodes fail. Data node failures are detected automatically and actions are performed to rebalance the blocks of data stored on the failed data node to other nodes in the cluster. And if possible, tasks are restarted on the failed node. In the end, the user of a Hadoop cluster does not even notice the failure of a data node.
Why we developed our own Solr Manager
Solr, on the other hand, offers only very limited management features out of the box. It is possible to remotely start and stop Solr cores, to deploy new indexes and also to perform distributed searches. But the default Solr distribution lacks sophisticated monitoring and cluster management tools.
My team at mgm technology partners has therefore implemented its own Solr manager with distributed management functionality similar to Hadoop:
- detect failed Solr instances
- detect new nodes in the Hadoop cluster
- gather metadata of each Solr core
- backup an Lucene index
- move index from one Solr instance to another
With these features it is possible to have the same level of autonomy with respect to changes of the cluster topology as with the Hadoop software. We are able to detect failed nodes and move indexes which have been running on the failed node to other nodes in the cluster. Once the failed node is back online, this also gets detected and the indexes will be moved back to that server.
Monitoring the Health of Solr Instances
The Solr manager is a process which is running in regular intervals. We use 5 minutes for our interval. On each pass it will identify the current state of the Hadoop cluster, gather data from all Solr instances and perform actions to keep the cluster healthy. The following paragraphs describe the single steps of the process.
The first step is to detect the number of running data nodes. We use the Hadoop API for retrieving the addresses of all running data nodes. We can now iterate over this list and send a request to the Solr base-URL on each data node in order to check if a Solr instance is running on the data node. The result of this step is a list of all running Solr instances in our cluster.
The next step is to gather metadata of each Solr core. We use the CoreAdmin API to retrieve information about how many cores are running on each instance and how many documents are indexed in each core. Another very important piece of information is the optimize flag which indicates whether an index has been modified. This flag is needed to decide whether we need to backup a core.
To be able to correlate an index backup to a running Solr core, we need a mechanism to identify an index. To do this, each index contains a dummy document with a UUID as the shard name. We can identify each index with a simple query that returns only the dummy document. Once we have identified each index in the cluster, we have all the information we need, to perform maintenance actions.
Backing up and restoring an Lucene index
We use the Hadoop filesystem as a central storage for our indexes. Each index is backed up to HDFS whenever it is changed. We use the optimize flag to detect changes. Besides the optimize flag we also monitor the number of documents in the index to ensure that we do not start an backup when data is still being indexed in this shard. The backup is only started when optimize is false and the number of documents has not changed compared to the last run of the Solr manager.
The backup is performed by a small shell script on the data node hosting the index. It basically starts an optimize process of the index. In Solr we have configured a post-optimize hook which starts the snapshooter script after the optimize. Once a snapshot of the index is created, our backup script will zip it and write it into the HDFS.
Backups are performed in parallel, so that we utilize the performance of our Hadoop cluster.
We can now read the UUIDs of all backups stored in the HDFS. By comparing these UUIDs to the UUIDs hosted by Solr in the hadoop cluster, we can detect whether all index shards are currently deployed in the running Solr instances, or if there are any indexes stored in the HDFS that are currently not deployed in a Solr core.
If there are indexes stored in HDFS which are not deployed on one of the data nodes, we will create new Solr cores on the data nodes and restore the indexes on these cores. This is done with another shell script, which is running on the data node where we want to restore the index. The script copies the zipped index from HDFS, unzipps it into the Solr data dir of the new core and starts the new core.
This restore process is performed in parallel to utilize all data nodes in the Hadoop cluster.
Rebalancing the cluster
Once we have ensured, that all indexes stored in the HDFS are running in one Solr core in the cluster, we move on to optimize the performance of our cluster. This is done by ensuring that the indexes are equally distributed among all available Solr instances. We aim for 2 cores on each instance. If there are instances with less than 2 cores running and also instances with more than 2 cores, we will move an index from one of the loaded instances to a vacant instance.
After the rebalancing, our Lucene indexes should be equally distributed throughout the data nodes. If we now still have Solr instances running with less than 2 cores, we can use these to create completely new index shards. This is usually the case when we add new data nodes to the cluster.
Now the Solr management cycle is finished. During this cycle we have ensured, that we utilize all available Solr instances and that all our index shards are running on a Solr core. We also know how many documents are stored in each index, so that we can use the index with the smallest number of documents to index new data.
The Solr manager has proved to be very reliable in our tests scenarios and it has kept human intervention in the daily cluster administration at a minimum. The algorithm is very robust and is able to cope with the standard failure scenarios we face.
We still have to cope with some limitations:
- The Solr manager is a single point of failure similar to the Hadoop namenode software
- If more than 1/3 of the data nodes fail, the manager is not able to restore all indexes. If this happens, the cluster will still be able to accept new data, but search results might be incomplete.
In a cluster with 40 data nodes, failures of single data nodes do happen. Without the monitoring and maintenance functionality of the Solr manager, each failure would lead to manual administration efforts. Thanks to the manager we have been able to run the cluster despite data node failures without the need for manual intervention.