Appendix: JSON and Web APIs#
In Files and Exceptions we saw how to read data from files. In this part, we will look at how to read data directly from web APIs. Web APIs are machine-readable online data sources. We will look at two different web APIs.
ECHR-OD API
The European Court of Human Rights Open Data (ECHR-OD) project provides data about ECHR cases. ECHR-OD provides machine-readable data for download, but also a public ECHR-OD API for online use. Here is the ECHR-OD API documentation.
Harvard’s Caselaw Access Project
We will also use data from Harvard’s Caselaw Access Project (CAP). CAP aims to make all published US courts decisions freely available in a standard, machine-readable format. CAP and the data format is documented here.
Reading JSON from file#
JSON
JSON (JavaScript Object Notation) is a machine-readable data format. Machine-readable data makes it easy to read and process the information with a computer. JSON data is usually tree structured, with multiple levels containing information.
In Python, JSON data is stored as lists and dictionaries. The top level can be either a list or a dictionary.
First, let’s look at how we can read JSON data from a local file. Here we read a file containing a few cases from ECHR-OD.
import json
def read_json_file(filename):
with open(filename, 'r') as file:
text_data = file.read()
return json.loads(text_data)
cases = read_json_file('cases-5.json')
However, this approach has some drawbacks. Firstly, we must manually download the data set. Secondly, we must keep the data set updated. Case law databases are updated regularly, and we probably want to include the latest data. Therefore, using online data directly is sometimes preferable. For example, if we are developing a mobile app, the full data set might be too large to fit on the device.
Reading JSON from a web API#
To fetch data from the web, we can use a library called requests that makes this task quite easy. First, we import this:
import requests
First, we will look at the ECHR-OD API. This API has a function of getting statistics about the number of cases. We need to specify the URL:
URL = 'https://echr-opendata.eu/api/v1/stats'
Now, we can get the data.
We use the request library’s .get()
function to fetch the data,
and then the method .json()
to parse the results into Python lists and dictionaries.
request = requests.get(URL)
data = request.json()
The result is a list of dictionaries with statistics about the numbers of violations and non-violations for different articles of the ECHR. We can display the first few articles:
display(data[:5])
Let’s try to get some cases. This query has a different URL:
URL = 'https://echr-opendata.eu/api/v1/cases'
This query will result in all the cases in ECHR-OD, which are several thousand.
Therefore, the query results are split into several “pages” of results.
We must specify the page size, called limit
.
We usually start with page number 1.
We specify both these parameters in a dictionary.
parameters = {'page': 1,
'limit': 3}
Now, we can get the results.
We include the parameters in the query as a parameter to the .get()
function.
cases = requests.get(URL, params=parameters).json()
Tip
We can fetch the results and parse the JSON in two steps or in one line. Both are fine, and this is a matter of preference.
Inspecting the Data#
We can display the data, however this is a lot of text:
display(cases)
JSON data can be contained in a list or dictionary at the top level. Let’s check which type we got:
type(cases)
Our data is a list of cases. Let’s check the type of case 0:
type(cases[0])
The data about each case is in a dictionary.
We can print the keys using list()
:
keys = list(cases[0])
print(keys)
We can loop over the list to get the title of each case:
for case in cases:
print(case['docname'])
Each case has a decision or judgment date.
for case in cases:
print(case['docname'])
print(case.get('decisiondate'))
print(case.get('judgementdate'))
print()
Using Harvard’s CAP API#
Again, we need to specify the URL to the data we want to fetch.
URL = "https://api.case.law/v1/cases/"
We include some parameters that specifies which cases we want to load:
parameters = {'jurisdiction': 'ill',
'full_case': 'true',
'decision_date_min': '2011-01-01',
'page_size': 3}
jurisdiction
is Illinois in this examplefull_case
include the full text of each casedecision_date_min
is the minimum date, we only want decisions later than this datepage_size
is the number of items
More parameters are listed in the CAP documentation.
Now, let’s fetch the data.
request = requests.get(URL, params=parameters)
data = request.json()
Inspecting the Data#
JSON data can be contained in a list or dictionary at the top level. Let’s check which type we got:
type(data)
Since our data is in a dictionary, we can print the keys using list()
:
keys = list(data)
print(keys)
The field count
contains the number of hits in the database.
This is usually different from the number of items we requested.
If the count
is zero, we don’t have any results and need to check the URL and the parameters.
print(data["count"])
That looks good. Let’s fetch the list of cases, which are located in results
:
cases = data["results"]
Now we can inspect each case. Let’s loop over the cases and get some of the information. The data contains various metadata about each case, such as the case name and the abbreviated case name.
for case in cases:
print("Case name:", case["name_abbreviation"])
It’s often useful to look at the data in a web browser to get an overview. We can do that by opening the full URL, including the parameters:
print(request.url)