Ordered JSON in Python with Requests

JSON is an intrinsically unordered format. Nevertheless, there are lots of cases where you might want to load a JSON resource, work with it, and then serialise it back out again, while retaining the original ordering.

IIIF Presentation API manifests, for example, are much more readable in Manifest, Sequence, Canvas order, with the @context, @id, @type, label, description, and so on at the start.

As per this skeleton manifest from iiif.io:


{
  // Metadata about this sequence
  "@context": "http://iiif.io/api/presentation/2/context.json",
  "@id": "http://example.org/iiif/book1/sequence/normal",
  "@type": "sc:Sequence",
  "label": "Current Page Order",

  "viewingDirection": "left-to-right",
  "viewingHint": "paged",
  "startCanvas": "http://example.org/iiif/book1/canvas/p2",

  // The order of the canvases
  "canvases": [
    {
      "@id": "http://example.org/iiif/book1/canvas/p1",
      "@type": "sc:Canvas",
      "label": "p. 1"
      // ...
    },
    {
      "@id": "http://example.org/iiif/book1/canvas/p2",
      "@type": "sc:Canvas",
      "label": "p. 2"
      // ...
    },
    {
      "@id": "http://example.org/iiif/book1/canvas/p3",
      "@type": "sc:Canvas",
      "label": "p. 3"
      // ...
    }
  ]
}

Take a simple Python program which gets a manifest, and then prints it.

import requests
import json

manifest = Requests.get('http://iiif.bodleian.ox.ac.uk/iiif/manifest/4304843f-bb7b-46f3-bf1a-442850b1a05b.json').json()

print json.dumps(manifest, indent=4)

The original manifest – edited for brevity but otherwise unchanged – looks like this:


{
  "@context": "http://iiif.io/api/presentation/2/context.json", 
  "@id": "http://iiif.bodleian.ox.ac.uk/iiif/manifest/4304843f-bb7b-46f3-bf1a-442850b1a05b.json", 
  "@type": "sc:Manifest", 
  "label": "POSTER 1909/10-27", 
  "metadata": [
    {
      "label": "Shelfmark", 
      "value": "POSTER 1909/10-27"
    }
    ,    {
      "label": "Source", 
      "value": "Luna: CPA Poster Collection"
    }, 
    {
      "label": "Collection", 
      "value": "History and Politics"
    }, 
    {
      "label": "Id", 
      "value": "4304843f-bb7b-46f3-bf1a-442850b1a05b"
    }
  ], 
  "description": "(Unemployed march): The 'People's Budget'! Genial foreigner: - \"How they must wish that Mr Lloyd George had taxed us instead of them.\"", 
  "attribution": "From: Luna: CPA Poster Collection<br>Rights: Photo: \u00a9 Bodleian Libraries, University of Oxford<br>Terms of Access: http://digital.bodleian.ox.ac.uk/terms.html<br>", 
  "logo": "http://www.bodleian.ox.ac.uk/__data/assets/image/0005/117176/logo.jpg", 
  "viewingHint": "individuals", 
  "sequences": [
    {
      "@id": "http://iiif.bodleian.ox.ac.uk/iiif/sequence/4304843f-bb7b-46f3-bf1a-442850b1a05b_default.json", 
      "@type": "sc:Sequence", 
      "label": "Default", 
      "canvases": [
        {
          "@id": "http://iiif.bodleian.ox.ac.uk/iiif/canvas/4304843f-bb7b-46f3-bf1a-442850b1a05b.json", 
          "@type": "sc:Canvas", 
          "label": "POSTER 1909/10-27", 
          "height": 8098, 
          "width": 10836, 
          "images": [
            {
              "@type": "oa:Annotation", 
              "motivation": "sc:painting", 
              "resource": {
                "@id": "http://iiif.bodleian.ox.ac.uk/iiif/image/4304843f-bb7b-46f3-bf1a-442850b1a05b/full/full/0/default.jpg", 
                "@type": "dctypes:Image", 
                "format": "image/jpeg", 
                "height": 8098, 
                "width": 10836, 
                "service": {
                  "@context": "http://iiif.io/api/image/2/context.json", 
                  "@id": "http://iiif.bodleian.ox.ac.uk/iiif/image/4304843f-bb7b-46f3-bf1a-442850b1a05b", 
                  "profile": "http://iiif.io/api/image/2/level1.json"
                }
              }, 
              "on": "http://iiif.bodleian.ox.ac.uk/iiif/canvas/4304843f-bb7b-46f3-bf1a-442850b1a05b.json"
            }
          ]
        }
      ]
    }
  ]
}

The Python code, though, produces output like this, again edited for brevity but otherwise unchanged:

{
    "viewingHint": "individuals", 
    "attribution": "From: Luna: CPA Poster Collection<br>Rights: Photo: \u00a9 Bodleian Libraries, University of Oxford<br>Terms of Access: http://digital.bodleian.ox.ac.uk/terms.html<br>", 
    "description": "(Unemployed march): The 'People's Budget'! Genial foreigner: - \"How they must wish that Mr Lloyd George had taxed us instead of them.\"", 
    "sequences": [
        {
            "canvases": [
                {
                    "height": 8098, 
                    "width": 10836, 
                    "images": [
                        {
                            "on": "http://iiif.bodleian.ox.ac.uk/iiif/canvas/4304843f-bb7b-46f3-bf1a-442850b1a05b.json", 
                            "motivation": "sc:painting", 
                            "resource": {
                                "service": {
                                    "profile": "http://iiif.io/api/image/2/level1.json", 
                                    "@context": "http://iiif.io/api/image/2/context.json", 
                                    "@id": "http://iiif.bodleian.ox.ac.uk/iiif/image/4304843f-bb7b-46f3-bf1a-442850b1a05b"
                                }, 
                                "format": "image/jpeg", 
                                "height": 8098, 
                                "width": 10836, 
                                "@id": "http://iiif.bodleian.ox.ac.uk/iiif/image/4304843f-bb7b-46f3-bf1a-442850b1a05b/full/full/0/default.jpg", 
                                "@type": "dctypes:Image"
                            }, 
                            "@type": "oa:Annotation"
                        }
                    ], 
                    "label": "POSTER 1909/10-27", 
                    "@id": "http://iiif.bodleian.ox.ac.uk/iiif/canvas/4304843f-bb7b-46f3-bf1a-442850b1a05b.json", 
                    "@type": "sc:Canvas"
                }
            ], 
            "@id": "http://iiif.bodleian.ox.ac.uk/iiif/sequence/4304843f-bb7b-46f3-bf1a-442850b1a05b_default.json", 
            "@type": "sc:Sequence", 
            "label": "Default"
        }
    ], 
    "label": "POSTER 1909/10-27", 
    "logo": "http://www.bodleian.ox.ac.uk/__data/assets/image/0005/117176/logo.jpg", 
    "@context": "http://iiif.io/api/presentation/2/context.json", 
    "seeAlso": "http://digital.bodleian.ox.ac.uk/inquire/p/4304843f-bb7b-46f3-bf1a-442850b1a05b", 
    "@id": "http://iiif.bodleian.ox.ac.uk/iiif/manifest/4304843f-bb7b-46f3-bf1a-442850b1a05b.json", 
    "@type": "sc:Manifest", 
    "metadata": [
       
        {
            "value": "POSTER 1909/10-27", 
            "label": "Shelfmark"
        }, 
        {
            "value": "Luna: CPA Poster Collection", 
            "label": "Source"
        }, 
        {
            "value": "History and Politics", 
            "label": "Collection"
        }, 
        {
            "value": "4304843f-bb7b-46f3-bf1a-442850b1a05b", 
            "label": "Id"
        }
    ]
}

The original ordering is lost, and the manifest is less readable than the original with the core manifest fields jumbled throughout.

Luckily, Python’s JSON module offers a solution via json.load(s).

object_pairs_hook is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of object_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict() will remember the order of insertion).

Loading the json using:

object_pairs_hook=OrderedDict()

will deserialise the JSON into an ordered dict, and then if serialised back out again, the order should be kept as is.

However, the Requests module’s JSON decoder doesn’t use object_pairs_hook

See here.

    def json(self, **kwargs):
        """Returns the json-encoded content of a response, if any.
        :param \*\*kwargs: Optional arguments that ``json.loads`` takes.
        :raises ValueError: If the response body does not contain valid json.
        """

        if not self.encoding and self.content and len(self.content) > 3:
            # No encoding set. JSON RFC 4627 section 3 states we should expect
            # UTF-8, -16 or -32. Detect which one to use; If the detection or
            # decoding fails, fall back to `self.text` (using chardet to make
            # a best guess).
            encoding = guess_json_utf(self.content)
            if encoding is not None:
                try:
                    return complexjson.loads(
                        self.content.decode(encoding), **kwargs
                    )
                except UnicodeDecodeError:
                    # Wrong UTF codec detected; usually because it's not UTF-8
                    # but some other 8-bit codec.  This is an RFC violation,
                    # and the server didn't bother to tell us what codec *was*
                    # used.
                    pass
        return complexjson.loads(self.text, **kwargs)

So, the Python object returned from the .json() method on any response, will not be an ordered dict that retains the order of the serialised data.

Luckily, there’s a simple workaround. Use the .text method on the Requests response, and generate an ordered dict from that.

import requests
import json
from collections import OrderedDict

source = requests.get('http://iiif.bodleian.ox.ac.uk/iiif/manifest/4304843f-bb7b-46f3-bf1a-442850b1a05b.json').text

manifest = json.loads(source, object_pairs_hook=OrderedDict)

print json.dumps(manifest, indent=4)

Grab the text using requests.get, not the json. Then load that as an OrderDict using object_pairs_hook and you can work with the data, and reserialise back out again while maintaing the order.