Bulk loading data into ElasticSearch

Every database needs to have a way to load multiple records into it. With MySQL you can import a csv file, for instance.  ElasticSearch has a way too and it generally works well.  There’s just a couple important points to working with it that can trip one up.

  1. Bulk API entry point.In this example let’s say we have an index called ‘library’ (referenced in the previous blog post on ES) and in that we have a type named, simply, ‘books’.  The API url will be have _bulk on the end of it.


    Now, as it happens… the index and type parts of this are pretty much not needed because you can put that into the data file itself.  So all you really need is this:


  2. You can do a curl POST to this endpoint and load your data file.  The second point is to use the –data-binary option to preserve newline characters.  We can also reference our actual data file using @data_file_path.  So our curl call looks like this:curl -XPOST ‘http://localhost:9200/library/books/_bulk’ –data-binary @library_entries.json
  3. Now.. about the data file itself.  This is not exactly a legal JSON format, but rather a series of json entries.  Here is an abbreviated sample:
    { "index" : { "_index" : "library", "_type": "books" }}
    { "isbn13" : "978-0553290998", "title" : "Nightfall", "authors" : "Isaac Asimov, Robert Silverberg" }
    { "index" : { "_index" : "library", "_type" : "books" } }
    { "isbn13"   : "978-0141007540", "title"  : "Empire: How Britain Made the Modern World", "authors": "Niall Ferguson" }
    { "index" : { "_index" : "library", "_type" : "books" } }
    { "isbn13" : "978-0199931156", "title" : "The Rule of Empires", "authors" : "Timothy H. Parsons" }

    You want to view the data file as pairs of lines.  The first line tells ES what to do — in this case index (ie. insert) an entry into a specified index and type.  You can also delete or update content too.

    The second line is the actual content.  This is pretty straight-forward as having the data for the entry.  The one trick here is that you have to have the full entry on ONE LINE.  If you try to split it up (say, to make the thing more readable) then that will confuse ES.  So one line per entry.

  4. Finally… you must have a last blank line in your data file!  If you don’t then you will likely lose the last entry.

External References
Bulk API | ElasticSearch [2.4]