We are currently experiencing a Geospatial Revolution that changes in how we navigate from A to B and how we search for locations like a specific sight or restaurants nearby. Geospatial search technology provides such information. This article shows how commercial applications can utilize geospatial search, e.g. for real estate search (qualifing real estates by their distance to the nearest kindergartens, schools, doctors, etc.), calculating building density in cities and so on.
For our recent online shop project, we required a full-text, multi-criteria product search. Lucene, the popular Java search engine, is an ideal candidate for this functionality. But in order to meet the high performance requirement, we had to extend its usage beyond standard full-text search. This posting describes our solution including index switching and using Lucene as a simple NoSQL database.
We use the open-source search server Solr for real-time search on data stored in a Hadoop cluster. For our terabyte-scale dataset, we had to implement distributed search on multiple Lucene index partitions (shards). This article describes our solution to manage 40 independent Solr instances without human interaction, including transparent failover and automatic backup of the Lucene index.
In the previous part of this article series we focused on the efficient storage of log data in Hadoop. We described how to store the data in Hadoop’s MapFiles, and we tweaked the configuration settings for increased data storage capacity and greater retrieval speed. Today, we discuss how to perform near real-time searches on up to 36.6 billion log messages by a clever combination of Hadoop with Lucene and Solr.