elasticsearch logo

Elasticsearch vs Hadoop Comparison

Elasticsearch vs Hadoop a usage comparision. Discussion on where elastic fits in and hadoop excels. Use cases for Elasticsearch and Hadoop.

ElasticSearch

is for search.

It is a fast and eventually consistent search engine. You can index documents in JSON format. It facilitates to have mappings, so that you can specify types for your fields.

“Elasticsearch is a real-time distributed search and analytics engine. It allows you to explore your data at a speed and at a scale never before possible. It is used for full-text search, structured search, analytics, and all three in combination”

Shop the best electronics products at yoshop.com! Enjoy free shipping and up to 69% OFF! Shop now!

Some of the common use cases are

– Having all your server event logs, user logs and application logs on elastic with logstash a log monitor and anaylze the logs using Kibana, elastics very own analytics UI framework.

– Store your application data on elastic, almost all the non transaction data can be stored in a structured format in elastic.

Elastic documentation provides the following examples.

  • Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions.
  • The Guardian uses Elasticsearch to combine visitor logs with social -network data to provide real-time feedback to its editors about the public’s response to new articles.
  • Stack Overflow combines full-text search with geolocation queries and uses more-like-this to find related questions and answers.
  • GitHub uses Elasticsearch to query 130 billion lines of code.

Hadoop

is for big data analytics.

Hadoop on the other hand can store massive amounts of data. It can scale to thousands of machines, in fact elastic can scale similarly. The difference is that elastic is more like internet where every node knows about the other nodes or atleast will be able to know about other nodes eventually, whereas Hadoop manages nodes in a master slave format where all the data nodes are known to one master node in a cluster.

What sets them apart?

Hadoop’s map reduce to process large data sets and its tool set to process, query and analyze large datasets sets it apart from Elastic Search.

Which one do you want to go for?

Size of your datasets should be the deciding factor. Although Elastic Search can handle huge datasets similar to hadoop, hadoop is good with extremely large datasets in petabytes for the time being.

If it was me i would use elastic search on top of hadoop.