Breyten Ernsting

Webdeveloper at Open State Foundation.

Modeling a Spreadsheet in Elasticsearch

I recently worked on a project that involved extracting information for Excel files and storing this information in Elasticsearch. Doing this is quite fun and using some tricks you can emulate some of the basic functions of a spreadsheet with ease.

Mapping the data

The first step is to map the data in Elasticsearch. For the sake of simplicity, I assume the following:

  1. only a single worksheet.
  2. No formulae

A worksheet is basically a collection of cells that have a row, a column, and a value. Hence, the resulting mapping looks like this:

1
2
3
4
5
6
7
8
9
{
  "spreadsheet": {
    "properties": {
      "row": {"type": "string", "index": "not_analyzed"},
      "column": {"type": "integer"},
      "value": {"type": "float"}
    }
  }
}

Now you can load it with some sample data of your choosing.

Using facets to aggregate

Facetting is a really powerful tool in Elasticsearch and here I’ll show you how to use it to do aggregations.

Let’s say we can to caculate the sum of A2 and A3. In Excel, you would simple use the SUM function to accomplish this. In Elasticsearch you use the term stats facet like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
    "query": {
        "bool": {
            "must": [
               {"term": {
                  "row": {
                     "value": "A"
                  }
               }},
               {"range": {
                      "column": {
                         "from": 2,
                         "to": 3
                      }
               }}
            ]
        }
    },
    "facets": {
       "sum": {
           "terms_stats": {
              "key_field": "row",
              "value_field": "value"
           }
       }
    }
}

The result is returned in the total field of the sum facet. Doing a sum across rows is trivial (use a range or terms query), and using the facet you can even compute sums for the same columns across rows.

You can do even more aggregations in one go using facet filters, which makes Elasticsearch a powerful tool for emulating a spreadhseet, as long as you don’t need complex formulae.

Comments