RumbleDB

Made for JSON

Query billions of objects.

Many datasets are in the JSON Lines format, with one JSON object per line.

Just drop your dataset on an HDFS drive and RumbleDB can read it directly from there, and output right back to HDFS. Or on your screen in an interactive shell.

RumbleDB also works with many other formats (Text, Parquet, CSV, SVM, ROOT...) and can read from many other file systems (local, S3, Azure, ...), and the list is growing.

JSON Lines website

Be productive with JSONiq

An intuitive, easy-to-learn language.

JSONiq is a declarative and functional language. It is user-friendly and easy to read and write, because it looks a lot like JSON.

JSONiq is as simple as SQL, but can do much more.

JSONiq website

Query nested data

An object, in an array, in an array, in an object.

Often, datasets are not in first normal form and data can be nested at multiple levels. But SQL was not made for nested data.

Forget EXPLODE() calls in Spark SQL and dot projections. JSONiq was born to read and write nested data.

Query heterogeneous data

That's a string. Or an integer. Or an array. Or nothing.

Often datasets are heterogeneous, especially when they are accumulated over years with evolving schemas.

When this happens, it does not fit in DataFrames and before RumbleDB, one had no choice but dealing with this manually.

If a field is not always associated with a value of the same type: no worries. JSONiq handles this seamlessly.

Local execution

You won't believe what your laptop is capable of.

Under the hood, RumbleDB uses Apache Spark. Not all of us have clusters: You can use it to query JSON from your local drive. RumbleDB will spread the computations on your cores.

Apache Spark website

Query massive datasets

My computer is a cluster.

RumbleDB runs on large clusters of machines, with the data lying on any layer supported by Spark: HDFS, S3, ... We have tested RumbleDB with up to 64 machines, as well as collections of more than 20 billion objects (10+ TB), but it supports any sizes supported by Apache Spark.

Try it now!

RumbleDB is now in beta.

Resources

JSONiq tutorial

JSONiq specification

RumbleDB documentation

License

RumbleDB is available under an Apache 2.0 license.

Contact

Email
rumble-users@googlegroups.com

© 2017-2026 Founders: Stefan Irimescu, Ghislain Fourny, Gustavo Alonso. ETH Zürich. All rights reserved.
Current contributors: Matteo Agnoletto, Henrik Pätzold, Jimmy Cai.
Past contributors: Renato Marroquin, Rodrigo Bruno, Falko Noé, Ioana Stefan, Andrea Rinaldi, Stevan Mihajlovic, Mario Arduini, Can Berker Çıkış, Elwin Stephan, David Dao, Zirun Wang, Ingo Müller, Dan-Ovidiu Graur, Thomas Zhou, Olivier Goerens, Alexandru Meterez, Pierre Motard, Remo Röthlisberger, Dominik Bruggisser, David Loughlin, David Buzatu, Marco Schöb, Maciej Byczko, Dwij Dixit, Omar Hammoud.

2.1.0 "Cedrus Libani" Beta