Rumble uses the JSONiq language, which was tailored-made for heterogenous, nested JSON data.
JSONiq is a declarative and functional language. It is user-friendly and easy to read and write, because it looks a lot like JSON.
JSONiq's most powerful construct works exactly like SQL's SELECT-FROM-WHERE, but with the enhanced flexibility that JSON needs.
If you have JSON data that does not fit on your machine, or that is too slow to process, Rumble is for you. It leverages Spark to spread the I/O workload on multiple machines and parallelize as much as it can.
If you have your JSON files ready, one object per line, all you need to do is copy them over to your HDFS cluster and you are ready to go.
With Rumble, you do not need to write Java or Scala code. Simply write what you want and run your query.
JSONiq understands nested data natively. It is more than just SQL with dot syntax on top. Nested objects and arrays are a walk in the park.
Not all data fits into highly structured DataFrames. JSONiq is a NoSQL language that has the flexibility to deal with heterogeneous, missing, or extra fields.