Dictionaries#

Dictionaries are lookup tables. We can use dictionaries to store and look up information. Rather than numerical indexes, a dictionary assigns a key to each stored value. For example, in an English dictionary, the keys are English words and the values are the meanings of the words.

We can say that a dictionary is a map from the keys to the values. Keys are usually strings, and we can use the key to find the value.

Like lists, dictionaries can store many different types of data. In Python code dictionaries are enclosed in curly brackets: { }.

Creating Dictionaries#

We can create empty dictionaries:

clients = {}
print(type(clients))
<class 'dict'>

We can also create a dictionary containing data with a dictionary literal. Here is a client list mapping names (the keys) to phone numbers (the values):

clients = {'Peder Ås': 5664,
           'Marte Kirkerud': 8952}

Getting Values#

We can get dictionary values like we can get list items. But with dictionaries we use keys instead of numerical indexes.

number = clients['Peder Ås']
print(number)
5664

Adding or Changing Values#

We can add new values to our dictionary. Let’s add a new client:

clients['Ole Vold'] = 3009
print(clients)
{'Peder Ås': 5664, 'Marte Kirkerud': 8952, 'Ole Vold': 3009}

We can change existing values the same way:

clients['Ole Vold'] = 3131
print(clients)
{'Peder Ås': 5664, 'Marte Kirkerud': 8952, 'Ole Vold': 3131}

Removing Values#

Like with lists, we can use the method pop() to remove a value from a dictionary. This method returns the removed value.

Let’s remove our troublesome client ‘Peder Ås’.

number = clients.pop('Peder Ås')
print('Let Peder Ås know at his number', number)

print(clients)
Let Peder Ås know at his number 5664
{'Marte Kirkerud': 8952, 'Ole Vold': 3131}

Is a key in the dictionary?#

We can use the relational operator in to check for the presence of a key in a dictionary. This is like checking for an item in a list, but we can only check for keys not values.

if 'Peder Ås' in clients:
    print(clients['Peder Ås'])
else:
    print('Peder Ås is not a client')
Peder Ås is not a client

Getting Values with Default#

If we try to get a key that doesn’t exist, we get an error:

clients['Hannah Hanson']
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[8], line 1
----> 1 clients['Hannah Hanson']

KeyError: 'Hannah Hanson'

We can avoid this by using the method .get() instead:

print(clients.get('Hannah Hanson'))
None

You can also give .get() a default value to return if the key isn’t found, .get(key, default):

clients.get('Hannah Hanson', '') # Default: empty string
''
clients.get('Hannah Hanson', 'No such client')
'No such client'

This is useful for processing data that can have some missing keys.

Iterating over Dictionaries#

We can use a for loop to iterate over the dictionary, as we did with lists. With dictionaries, we iterate over the keys.

for name in clients:
    print(name, 'has the phone number', clients[name])
Marte Kirkerud has the phone number 8952
Ole Vold has the phone number 3131

Finding a Value#

We saw above that the relational operator in only works with keys. If we want to look for a value, we can use a for loop to examine all the values.

number = 3131

for name in clients:
    if clients[name] == number:
        print(name, 'has the phone number', clients[name])
Ole Vold has the phone number 3131

Nested Dictionaries#

We have seen previously that lists can contain other, nested lists. Likewise, dictionaries can contain both lists and nested dictionaries.

clients = {'Peder Ås': {'phone': 5664,
                        'address': 'Lillevik'},
           'Marte Kirkerud': {'phone': 8952,
                              'address': 'Lillevik'},
          }

We must use multiple indexes to access the deeper levels of nested data structures. This can be done stepwise:

marte_info = clients['Marte Kirkerud']
marte_phone = marte_info['phone']
print(marte_phone)
8952

We can also do multiple steps at a time:

marte_phone = clients['Marte Kirkerud']['phone']
print(marte_phone)
8952

This is a matter of style. Choose the variant you think makes sense and makes the code most readable.

Examining Large Dictionaries#

When we use large dictionaries, their content will not fit on the screen. Instead, we can use the method keys() to get an overview of their content.

print(clients.keys())
dict_keys(['Peder Ås', 'Marte Kirkerud'])

Looping over Nested Dictionaries#

We can use nested for loops to iterate over nested dictionaries. This is similar to looping over iterated lists, but we must use the keys to get the values.

for name in clients:
    client_info = clients[name]
    for entry_name in client_info:
        value = client_info[entry_name]
        print(name, 'has', entry_name, value)
Peder Ås has phone 5664
Peder Ås has address Lillevik
Marte Kirkerud has phone 8952
Marte Kirkerud has address Lillevik

Example: Counting Cases for Judges#

In this example we will collect statistics about judges from a data set of cases from ECtHR. The data are an excerpt from ECHR-OD, which we will discuss further in Appendix: JSON and Web APIs.

We load the data from a file. We will learn how to read files in Files and Exceptions, so for now we won’t go into how the file is read.

If you want to run the code yourself, you can download the file cases-5-short.json.

Hide code cell content
import json

def read_json_file(filename):
    with open(filename, 'r') as file:
        text_data = file.read()
        json_data = json.loads(text_data)
        return json_data
cases = read_json_file('cases-5-short.json')

We can use a dictionary to count the number of cases each judge has participated in. To do this, we use a dictionary where the keys are the judges’ names, and the values are the number of cases we have found for that judge. The name of the dictionary should describe the contents, and possible names are for example cases_per_judge or judge2count. The latter is a common convention that emphasizes the dictionary’s function as a map from a judge’s name to a count.

judge2count = {}

We can start with a for loop, since we need to process each case. We can inspect the keys of the cases by converting them to lists.

for case in cases:
    print(list(case))
['article', 'country', 'decision_body', 'decisiondate', 'docname', 'doctypebranch', 'importance', 'judgementdate', 'parties']
['article', 'country', 'decision_body', 'decisiondate', 'docname', 'doctypebranch', 'importance', 'judgementdate', 'parties']
['article', 'country', 'decision_body', 'decisiondate', 'docname', 'doctypebranch', 'importance', 'judgementdate', 'parties']
['article', 'country', 'decision_body', 'decisiondate', 'docname', 'doctypebranch', 'importance', 'judgementdate', 'parties']
['article', 'country', 'decision_body', 'decisiondate', 'docname', 'doctypebranch', 'importance', 'judgementdate', 'parties']

We can also examine the full contents of a single case.

display(cases[0])
{'article': ['3', '6'],
 'country': {'alpha2': 'ru', 'name': 'Russian Federation'},
 'decision_body': [{'name': 'Helena Jäderblom', 'role': 'president'},
  {'name': 'Branko Lubarda', 'role': 'judges'},
  {'name': 'Helen Keller', 'role': 'judges'},
  {'name': 'Dmitry Dedov', 'role': 'judges'},
  {'name': 'Pere Pastor Vilanova', 'role': 'judges'},
  {'name': 'Georgios A. Serghides', 'role': 'judges'},
  {'name': 'Jolien Schukking', 'role': 'judges'},
  {'name': 'Stephen Phillips', 'role': 'section registrar'}],
 'decisiondate': '',
 'docname': 'CASE OF SKLYAR v. RUSSIA',
 'doctypebranch': 'CHAMBER',
 'importance': '4',
 'judgementdate': '18/07/2017 00:00:00',
 'parties': ['SKLYAR', 'RUSSIA']}

The names of the judges are in the list decision_body. We can get that:

for case in cases:
    decision_body = case['decision_body']
    print(decision_body)
[{'name': 'Helena Jäderblom', 'role': 'president'}, {'name': 'Branko Lubarda', 'role': 'judges'}, {'name': 'Helen Keller', 'role': 'judges'}, {'name': 'Dmitry Dedov', 'role': 'judges'}, {'name': 'Pere Pastor Vilanova', 'role': 'judges'}, {'name': 'Georgios A. Serghides', 'role': 'judges'}, {'name': 'Jolien Schukking', 'role': 'judges'}, {'name': 'Stephen Phillips', 'role': 'section registrar'}]
[{'name': 'Luis López Guerra', 'role': 'president'}, {'name': 'Helena Jäderblom', 'role': 'judges'}, {'name': 'Helen Keller', 'role': 'judges'}, {'name': 'Dmitry Dedov', 'role': 'judges'}, {'name': 'Branko Lubarda', 'role': 'judges'}, {'name': 'Pere Pastor Vilanova', 'role': 'judges'}, {'name': 'Georgios A. Serghides', 'role': 'judges'}, {'name': 'Stephen Phillips', 'role': 'section registrar'}]
[{'name': 'MrJ. Hedigan', 'role': 'president'}, {'name': 'MrB.M. Zupančič', 'role': 'judges'}, {'name': 'MrC. Bîrsan', 'role': 'judges'}, {'name': 'MrV. Zagrebelsky', 'role': 'judges'}, {'name': 'MrsA. Gyulumyan', 'role': 'judges'}, {'name': 'MrDavid Thór Björgvinsson', 'role': 'judges'}, {'name': 'MrsI. Ziemele', 'role': 'judges'}, {'name': 'Mr V. Berger', 'role': 'section registrar'}]
[{'name': 'Nina Vajić', 'role': 'president'}, {'name': 'Anatoly Kovler', 'role': 'judges'}, {'name': 'Khanlar Hajiyev', 'role': 'judges'}, {'name': 'Mirjana Lazarova Trajkovska', 'role': 'judges'}, {'name': 'Julia Laffranque', 'role': 'judges'}, {'name': 'Linos-Alexandre Sicilianos', 'role': 'judges'}, {'name': 'Erik Møse', 'role': 'judges'}, {'name': 'Søren Nielsen', 'role': 'section registrar'}]
[{'name': 'Françoise Tulkens', 'role': 'president'}, {'name': 'Ireneu Cabral Barreto', 'role': 'judges'}, {'name': 'Vladimiro Zagrebelsky', 'role': 'judges'}, {'name': 'Danutė Jočienė', 'role': 'judges'}, {'name': 'András Sajó', 'role': 'judges'}, {'name': 'Nona Tsotsoria', 'role': 'judges'}, {'name': 'Işıl Karakaş', 'role': 'judges'}, {'name': 'Sally Dollé', 'role': 'section registrar'}]

We can loop over the decision body to get the names of the judges.

for case in cases:
    decision_body = case['decision_body']
    for judge in decision_body:
        name = judge['name']
        print(name)
Helena Jäderblom
Branko Lubarda
Helen Keller
Dmitry Dedov
Pere Pastor Vilanova
Georgios A. Serghides
Jolien Schukking
Stephen Phillips
Luis López Guerra
Helena Jäderblom
Helen Keller
Dmitry Dedov
Branko Lubarda
Pere Pastor Vilanova
Georgios A. Serghides
Stephen Phillips
MrJ. Hedigan
MrB.M. Zupančič
MrC. Bîrsan
MrV. Zagrebelsky
MrsA. Gyulumyan
MrDavid Thór Björgvinsson
MrsI. Ziemele
Mr V. Berger
Nina Vajić
Anatoly Kovler
Khanlar Hajiyev
Mirjana Lazarova Trajkovska
Julia Laffranque
Linos-Alexandre Sicilianos
Erik Møse
Søren Nielsen
Françoise Tulkens
Ireneu Cabral Barreto
Vladimiro Zagrebelsky
Danutė Jočienė
András Sajó
Nona Tsotsoria
Işıl Karakaş
Sally Dollé

Now, we can count the names using our dictionary. If the name already exists in the dictionary, we can increase the count by one. We can do this by using the combined addition assignment operator +=. This adds the value of the expression on the right-hand side to the variable on the left-hand side.

for case in cases:
    decision_body = case['decision_body']
    for judge in decision_body:
        name = judge['name']
        if name in judge2count:
            judge2count[name] += 1

If the name is not already in the dictionary, we need to add it. We set the value to 1 since this is the first instance of that name.

for case in cases:
    decision_body = case['decision_body']
    for judge in decision_body:
        name = judge['name']
        if name in judge2count:
            judge2count[name] += 1
        else:
            judge2count[name] = 1

That’s it! We can print the results.

display(judge2count)
{'Helena Jäderblom': 2,
 'Branko Lubarda': 2,
 'Helen Keller': 2,
 'Dmitry Dedov': 2,
 'Pere Pastor Vilanova': 2,
 'Georgios A. Serghides': 2,
 'Jolien Schukking': 1,
 'Stephen Phillips': 2,
 'Luis López Guerra': 1,
 'MrJ. Hedigan': 1,
 'MrB.M. Zupančič': 1,
 'MrC. Bîrsan': 1,
 'MrV. Zagrebelsky': 1,
 'MrsA. Gyulumyan': 1,
 'MrDavid Thór Björgvinsson': 1,
 'MrsI. Ziemele': 1,
 'Mr V. Berger': 1,
 'Nina Vajić': 1,
 'Anatoly Kovler': 1,
 'Khanlar Hajiyev': 1,
 'Mirjana Lazarova Trajkovska': 1,
 'Julia Laffranque': 1,
 'Linos-Alexandre Sicilianos': 1,
 'Erik Møse': 1,
 'Søren Nielsen': 1,
 'Françoise Tulkens': 1,
 'Ireneu Cabral Barreto': 1,
 'Vladimiro Zagrebelsky': 1,
 'Danutė Jočienė': 1,
 'András Sajó': 1,
 'Nona Tsotsoria': 1,
 'Işıl Karakaş': 1,
 'Sally Dollé': 1}

In Sorting, Filtering Data and Search, we will see how we can sort the results to present an ordered list of the judges with the most cases.