Elasticsearch : stored scripts for bulk updates

Première publication : 2013-12-11

I’ve been trying to improve my game with Elasticsearch and found myself in a situation where I needed to update thousands of records in an index. Some of those records, depending on existing field values, wouldn’t need to be updated, but it couldn’t be determined without getting those records first.

Given the number of records and the facts that a lot of similar operations would take place concurrently, the chance of race conditions was high.

Then I’ve heard about scripts that are available in bulk update requests. Here is a very simple example :

POST http://127.0.0.1:9200/_bulk
{"update":{"_index":"my_index","_type":"my_type","_id":"id1"}}
{"script":"ctx._source.counter += value","params":{"counter":10}}
{"update":{"_index":"my_index","_type":"my_type","_id":"id2"}}
{"script":"ctx._source.counter += value","params":{"counter":4}}
…

Scripts are really useful. You can use MVEL (the basic/default embedded language), or Javascript, native Java and even Python. Some have even managed to use JRuby scripts.
Using a script is slower than not, but you can save some network roundtrips and let Elasticsearch decide if and how the record must be updated.

If the script is the same for a lot or updates, you can also choose to store it in the node and juste reference it in the update action.

You can store it in config/scripts (you might have to create this). The base directory depends on your installation. The .deb package puts it in /etc/elasticsearch/. The homebrew package puts it in /usr/local/Cellar/elasticsearch/_version_/config/scripts.

You can create a file config/scripts/myscript.mvel that must be accessible to the user who runs the elasticsearch process.

Your update action can be changed to :

POST http://127.0.0.1:9200/_bulk
{"update":{"_index":"my_index","_type":"my_type","_id":"id1"}}
{"script":"myscript","params":{"counter":10}}
{"update":{"_index":"my_index","_type":"my_type","_id":"id2"}}
{"script":"myscript","params":{"counter":4}}

Be careful with undescores in script names since Elasticsearch uses them to map to a nested directory structure. For example {"script":"my_perfect_script","params":{"counter":4}} will look for a script config/scripts/my/perfect/script.mvel

I’ve not verified this (yet) but it seems that the script must be copied on every node and the server might automatically reload scripts regularly. Check the documentation for details.

According to the documentation and other sources, MVEL is really easy, convenient and easy to write (mine was done in a matter of minutes, as a first time experience) but can be a little slow. When speed really matters, you can write native Java code. There is a lot more boilerplate code that needs to be written (it’s Java, right?) and the script must implement a predefined interface. I’ve nt done this yet and will definitely post a follow up if I do.

At first I’ve had issues with the stored script. It was not named properly, or containing code bugs, but the error message was less than informative. I’ve found the solution in Elasticsearch’s log file. At startup it will complain if the script can’t be compiled (at least with MVEL scripts).


Comments

Jérémy Lecour 2014-02-10 10:16:08

Je viens de galérer plus de 30 minutes pour un problème d’exécution d’un script MVEL.

Ça paraîtra évident à bien d’entre vous, mais ça ne l’étais pas pour moi (du moins pas immédiatement) : le dossier scripts n’avait pas le bit d’exécution. Du coup le script n’était pas vu par Elasticsearch.

Je maintient que ces scripts sont mal “traités par Elasticsearch qui devrait rendre évident, via son API, la liste des script correctement exécutables et les erreurs éventuelles pour les scripts détectés mais non exécutables (erreur de syntaxe, erreur de droits, …).