python-course.eu

15. JSON and PYTHON

By Bernd Klein. Last modified: 07 Nov 2021.

JSON

Introduction

JSON stands for JavaScript Object Notation. JSON is an open standard file and data interchange format. The content of a JSON file or JSON data is human-readable. JSON is used for storing and exchanging data. The JSON data objects consist of attribute–value pairs. The data format of JSON looke very similar to a Python dictionary, but JSON is a language-independent data format. The JSON syntax is derived from JavaScript object notation syntax, but the JSON format is text only. JSON filenames use the extension .json.

dumps and load

It is possible to serialize a Python dict object to a JSON formatted string by using dumps from the json module:

import json

d = {"a": 3, "b": 3, "c": 12}

json.dumps(d)

OUTPUT:

'{"a": 3, "b": 3, "c": 12}'

The JSON formatted string looks exactly like a Python dict in a string format. In the followoing example, we can see a difference: "True" and "False" are turned in "true" and "false":

d = {"a": True, "b": False, "c": True}

d_json = json.dumps(d)
d_json

OUTPUT:

'{"a": true, "b": false, "c": true}'

We can transform the json string back in a Python dictionary:

json.loads(d_json)

OUTPUT:

{'a': True, 'b': False, 'c': True}

Differences between JSON and Python Dictionaries

If you got the idea that turning dictionaries in json strings is always structure-preserving, you are wrong:

persons = {"Isabella": {"surname": "Jones",
                       "address": ("Bright Av.", 
                                   34, 
                                   "Village of Sun")},
           "Noah": {"surname": "Horton",
                    "address": (None, 
                                None, 
                                "Whoville")}
          }


persons_json = json.dumps(persons)                                
print(persons_json)                          

OUTPUT:

{"Isabella": {"surname": "Jones", "address": ["Bright Av.", 34, "Village of Sun"]}, "Noah": {"surname": "Horton", "address": [null, null, "Whoville"]}}

We can see that the address tuple is turned into a list!

json.loads(persons_json)

OUTPUT:

{'Isabella': {'surname': 'Jones',
  'address': ['Bright Av.', 34, 'Village of Sun']},
 'Noah': {'surname': 'Horton', 'address': [None, None, 'Whoville']}}

You can prettyprint JSON by using the optinional indent parameter:

persons_json = json.dumps(persons, indent=4)                                
print(persons_json) 

OUTPUT:

{
    "Isabella": {
        "surname": "Jones",
        "address": [
            "Bright Av.",
            34,
            "Village of Sun"
        ]
    },
    "Noah": {
        "surname": "Horton",
        "address": [
            null,
            null,
            "Whoville"
        ]
    }
}

Relationship between Python dicts and JSON Objects

PYTHON OBJECT JSON OBJECT
dict object
list, tuple array
str string
int, long, float numbers
True true
False false
None null
import json

d = {"d": 45, "t": 123}
x = json.dumps(d)
print(x)

lst = [34, 345, 234]
x = json.dumps(lst)
print(x)

int_obj = 199
x = json.dumps(int_obj)
print(x)

OUTPUT:

{"d": 45, "t": 123}
[34, 345, 234]
199

There is another crucial difference, because JSON accepts onls keys str, int, float, bool or None as keys, as we can see in the following example:

board = {(1, "a"): ("white", "rook"),
         (1, "b"): ("white", "knight"),
         (1, "c"): ("white", "bishop"),
         (1, "d"): ("white", "queen"),
         (1, "e"): ("white", "king"),
         # further data skipped
        }

Calling json.dumps with board as an argument would result in the exeption TypeError: keys must be str, int, float, bool or None, not tuple.

To avoid this, we could use the optional key skipkeys:

board_json = json.dumps(board, 
                       skipkeys=True)

board_json

OUTPUT:

'{}'

We avoided the exception, but the result is not satisfying, because the data is missing!

A better solution is to turn the tuples into string, as we do in the following:

board2 = dict((str(k), val) for k, val in board.items())
board2

OUTPUT:

{"(1, 'a')": ('white', 'rook'),
 "(1, 'b')": ('white', 'knight'),
 "(1, 'c')": ('white', 'bishop'),
 "(1, 'd')": ('white', 'queen'),
 "(1, 'e')": ('white', 'king')}
board_json = json.dumps(board2)
board_json

OUTPUT:

'{"(1, \'a\')": ["white", "rook"], "(1, \'b\')": ["white", "knight"], "(1, \'c\')": ["white", "bishop"], "(1, \'d\')": ["white", "queen"], "(1, \'e\')": ["white", "king"]}'
 

board2 = dict((str(k[0])+k[1], val) for k, val in board.items())
board2

OUTPUT:

{'1a': ('white', 'rook'),
 '1b': ('white', 'knight'),
 '1c': ('white', 'bishop'),
 '1d': ('white', 'queen'),
 '1e': ('white', 'king')}

board_json = json.dumps(board2) board_json

board2 = dict((str(key[0])+key[1], value) for key, value in board.items())
board2

OUTPUT:

{'1a': ('white', 'rook'),
 '1b': ('white', 'knight'),
 '1c': ('white', 'bishop'),
 '1d': ('white', 'queen'),
 '1e': ('white', 'king')}
board_json = json.dumps(board2)
board_json

OUTPUT:

'{"1a": ["white", "rook"], "1b": ["white", "knight"], "1c": ["white", "bishop"], "1d": ["white", "queen"], "1e": ["white", "king"]}'

Reading a JSON File

We will read in now a JSON example file json_example.jsonwhich can be found in ourdatadirectory. We use an example fromjson.org```.

json_ex = json.load(open("data/json_example.json"))
print(type(json_ex), json_ex)

OUTPUT:

<class 'dict'> {'glossary': {'title': 'example glossary', 'GlossDiv': {'title': 'S', 'GlossList': {'GlossEntry': {'ID': 'SGML', 'SortAs': 'SGML', 'GlossTerm': 'Standard Generalized Markup Language', 'Acronym': 'SGML', 'Abbrev': 'ISO 8879:1986', 'GlossDef': {'para': 'A meta-markup language, used to create markup languages such as DocBook.', 'GlossSeeAlso': ['GML', 'XML']}, 'GlossSee': 'markup'}}}}}

if you work with jupyter-lab or jupyter-notebook, you might have wondered about the data format used by it. You may guess already: It is JSON. Let's read in a notebook file with the extension ".ipynb":

nb = json.load(open("data/example_notebook.ipynb"))
print(nb)

OUTPUT:

{'cells': [{'cell_type': 'markdown', 'metadata': {}, 'source': ['# Titel\n', '\n', '## Introduction\n', '\n', 'This is some text\n', '\n', '- apples\n', '- pears\n', '- bananas']}, {'cell_type': 'markdown', 'metadata': {}, 'source': ['# some code\n', '\n', 'x = 3\n', 'y = 4\n', 'z = x + y\n']}], 'metadata': {'kernelspec': {'display_name': 'Python 3', 'language': 'python', 'name': 'python3'}, 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3}, 'file_extension': '.py', 'mimetype': 'text/x-python', 'name': 'python', 'nbconvert_exporter': 'python', 'pygments_lexer': 'ipython3', 'version': '3.7.6'}}, 'nbformat': 4, 'nbformat_minor': 4}
for key, value in nb.items():
    print(f"{key}:\n    {value}")

OUTPUT:

cells:
    [{'cell_type': 'markdown', 'metadata': {}, 'source': ['# Titel\n', '\n', '## Introduction\n', '\n', 'This is some text\n', '\n', '- apples\n', '- pears\n', '- bananas']}, {'cell_type': 'markdown', 'metadata': {}, 'source': ['# some code\n', '\n', 'x = 3\n', 'y = 4\n', 'z = x + y\n']}]
metadata:
    {'kernelspec': {'display_name': 'Python 3', 'language': 'python', 'name': 'python3'}, 'language_info': {'codemirror_mode': {'name': 'ipython', 'version': 3}, 'file_extension': '.py', 'mimetype': 'text/x-python', 'name': 'python', 'nbconvert_exporter': 'python', 'pygments_lexer': 'ipython3', 'version': '3.7.6'}}
nbformat:
    4
nbformat_minor:
    4
fh = open("data/disaster_mission.json")
data = json.load(fh)
print(list(data.keys()))

OUTPUT:

['Reference number', 'Country', 'Name', 'Function']

Read JSON with Pandas

We can read a JSON file with the modue Pandas as well.

import pandas

data = pandas.read_json("data/disaster_mission.json")
data

Write JSON files with Pandas

We can also write data to Pandas files:

import pandas as pd
data.to_json("data/disaster_mission2.txt")