Here are some jq tricks that I learned over the years to parse JSONs. By “learned” I mean that I know my way around, not that I fully understand them all, this stuff is magical.

Cambridge pH Meter

TOC:


Jq commands

I’ll use the JSON below in all examples below, but wont type it all the time:

{
  "foo": "bar",
  "numbers": [
    {
      "one": 1
    },
    {
      "two": 2
    },
    {
      "tree": "NaN"
    },
    {
      "foo": "NaN"
    }
  ]
}

Export the JSON if you want to copy-pasta the examples:

$ export my_json_above='{"foo": "bar", "numbers": [{"one": 1}, {"two": 2}, {"tree": "NaN"}, {"foo": "NaN"}]}'

Now the trickeryfoo.

Pretty Print

Maybe the most used feature of jq is to pretty-print a JSON. By pretty-print, jq mean: split the input into multiple lines and align them vertically in a meaningful and colorful way. For example:

$ echo '{"foo": "bar", "numbers": [{"one": 1}, {"two": 2}, {"tree": "NaN"}, {"foo": "NaN"}]}' | jq
{
  "foo": "bar",
  "numbers": [
    {
      "one": 1
    },
    {
      "two": 2
    },
    {
      "tree": "NaN"
    },
    {
      "foo": "NaN"
    }
  ]
}

In this example, jq operates on the entire JSON. jq can also operate on “parts”, or internal/nested objects, of the input JSON. jq uses “filters” to parse/modify its input.

In this example, the filter used is the identity: .. This filter doesn’t modify the JSON instead, it returns the input. The identity filter is like multiplying a number by 1. jq . is equivalent to jq.

Individual object operations

To work with an internal object or value:

  • to get value of key foo, the filter is .foo:

    $ echo "$my_json_above" | jq .foo
    bar
    

    In this case, . in .foo is not the identity operator. .foo is the syntax to select the foo object.

  • to get all keys, the filter is keys[]:

    $ echo "$my_json_above" | jq 'keys[]'
    "foo"
    "numbers"
    
  • to get all elements of an array, the filter is [], so to get all elements of the numbers array we first select numbers and then [] it:

    $ echo $my_json_above | jq .numbers[]
    {
      "one": 1
    }
    {
      "two": 2
    }
    {
      "tree": "NaN"
    }
    {
      "foo": "NaN"
    }
    

    Note: without the square brackets (.numbers instead of .numbers[]) you get the numbers sub object: the array. With the square brackets, you get the elements of the array.

  • to get all keys of an internal array, you combine the filters with a |:

    $ echo "$my_json_above" | jq '.numbers[] | keys[]'
    "one"
    "two"
    "tree"
    "foo"
    
  • to get all values, the filter is values[]:

    $ echo $my_json_above | jq '.numbers[] | values[]'
    1
    2
    "NaN"
    "NaN"
    
  • to get all keys and their values, the magic is 'keys[] as $k | "\($k), \(.[$k])"':

    $ echo $my_json_above | jq 'keys[] as $k | "key: _\($k)_   value: \(.[$k])"'
    "key: _foo_   value: bar"
    "key: _numbers_   value: [{\"one\":1},{\"two\":2},{\"tree\":\"NaN\"},{\"foo\":\"NaN\"}]"
    

    In this case, this is similar as the identity operator, but with some extra formatting text. It is more interesting to get the keys and their values of internal objects:

    $ echo "$my_json_above" | jq '.numbers.[] | keys[] as $k | "\($k): \(.[$k])"'
    "one: 1"
    "two: 2"
    "tree: NaN"
    "foo: NaN"
    

    Note: this creates a variable k. We can reference its value with $k.

  • to get the value of a key or a default one in case the key does not exist, the filter is .key // "value":

    echo "$my_json_above" | jq '.foos // "bars"'
    "bars"
    
  • to add/update key/value, the filter is . * {"key": "value"}:

    $ echo "$my_json_above" | jq '. * {"foo": "FOO", "bar": "baz"}'
    {
      "foo": "FOO",
      "numbers": [
        {
          "one": 1
        },
        {
          "two": 2
        },
        {
          "tree": "NaN"
        },
        {
          "foo": "NaN"
        }
      ],
      "bar": "baz"
    }
    
  • to delete key/value, the filter is del(.key):

    echo "$my_json_above" | jq 'del(.numbers)'
    {
      "foo": "bar"
    }
    
  • to delete all keys that contain a specific value, the filter is del(.[] | select . == "value"):

    $ echo "$my_json_above" | jq 'del(.[] | select(. == "bar"))'
    {
      "numbers": [
        {
          "one": 1
        },
        {
          "two": 2
        },
        {
          "tree": "NaN"
        },
        {
          "foo": "NaN"
        }
      ]
    }
    

Array operations

Array:

  • the array operator is square brackets, to “extract” all objects from an array, the filter is []:

     echo $my_json_above | jq .numbers[]
    {
      "one": 1
    }
    {
      "two": 2
    }
    {
      "tree": "NaN"
    }
    {
      "foo": "NaN"
    }
    

    This first “selects” the numbers sub object and then all its elements.

  • the first element is zero: jq .[0]

    $ echo $my_json_above | jq .numbers[0]
    {
      "one": 1
    }
    
    $ echo $my_json_above | jq .numbers[2]
    {
      "tree": "NaN"
    }
    
  • to get the last element of the array, check the -1 position:

    $ echo $my_json_above | jq .numbers[-1]
    {
      "foo": "NaN"
    }
    
    # To get the second last:
    $ echo $my_json_above | jq .numbers[-2]
    {
      "tree": "NaN"
    }
    
    # You got the idea
    
  • the filter to get the length of an object is length. In this case, we want the input for the length filter to be the array. Here comes the Pipe | to feed the output of one filter to the input of the next:

    $ echo $my_json_above | jq '.numbers | length'
    4
    
  • the filter to append to an array is +=:

    $ echo $my_json_above | jq '.numbers += [{"three": 3}]'
    {
      "foo": "bar",
      "numbers": [
        {
          "one": 1
        },
        {
          "two": 2
        },
        {
          "tree": "NaN"
        },
        {
          "foo": "NaN"
        },
        {
          "three": 3
        }
      ]
    }
    

    Another approach is with the update |= operator:

    % echo $my_json_above | jq '.numbers |= . + [{"three": 3}]'
    {
      "foo": "bar",
      "numbers": [
        {
          "one": 1
        },
        {
          "two": 2
        },
        {
          "tree": "NaN"
        },
        {
          "foo": "NaN"
        },
        {
          "three": 3
        }
      ]
    }
    
  • add element as first position of the array: add an element and “sum” the entire object. Pretty much the same as the previous example, but with the position of the “summed” elements swapped:

    echo $my_json_above | jq '.numbers |= [{"zero": 0}] + .'
    {
      "foo": "bar",
      "numbers": [
        {
          "zero": 0
        },
        {
          "one": 1
        },
        {
          "two": 2
        },
        {
          "tree": "NaN"
        },
        {
          "foo": "NaN"
        }
      ]
    }
    

    The order matters.

  • to delete second element from array is similar to deleting any element:

    $ echo $my_json_above | jq '. |= del(.numbers[1])'
    {
      "foo": "bar",
      "numbers": [
        {
          "one": 1
        },
        {
          "tree": "NaN"
        },
        {
          "foo": "NaN"
        }
      ]
    }
    
  • to delete all elements matching a criteria you pass the selected elements to del() filter.

    $ echo $my_json_above | jq 'del(.numbers[] | select(.[] == "NaN"))'
    {
      "foo": "bar",
      "numbers": [
        {
          "one": 1
        },
        {
          "two": 2
        }
      ]
    }
    

Conclusion

jq is handy to get some values in an interactive way, piping from other commands or operating on files (like jq . data.json instead of cat data.json | jq .). jq is handy, but not straightforward. At least to me.

When you need to parse complex JSONs, modify them, use them as --data for curl, etc, use Python instead 🐍

Really. Python is more understandable for “complex” operations. Don’t get me wrong, jq is great to visualize JSONs in the terminal. But the moment that you need to do some nasty actions to nested JSON thingies, Bash will hit you with a big stick.

Look at this Python code:

import json
json_str='''
{
  "foo": "bar",
  "numbers": [{"one": 1}, {"two": 2}, {"tree": "NaN"}, {"foo": "NaN"}]
}
'''

# Turn the string into a "JSON" object
my_json = json.loads(json_str) # The JSON object is actually a Python dict

# Get the value of a key
foo = my_json['foo']
numbers = my_json['numbers']

# Get value of a key or a default one if the key doesn't exist
non_existent_key = my_json.get('blarg', 'default_value') 

# Add new key
my_json['bla'] = "bla"

# Update key
my_json['bla'] = "bleb"

# Delete a key
my_json.pop('foo')

# Append to `numbers`
my_json["numbers"].append({"xablau":"NaN"})

# Get all non "NaN"s from `numbers`
non_nans = []
for number_dict in my_json["numbers"]:
    for value in number_dict.values():
        if value != "NaN":
            non_nans.append(number_dict)

# Convert JSON dict to a valid JSON string
json_str_final = json.dumps(my_json)

Is it more readable? As a bonus, you get all Pythonic power at your fingertips. No magic needed.