Tabular Data

Tabular Data#

Tabular data are data in a table format, such as an Excel spreadsheet. We often need to work with tabular data. Pandas is a powerful Python library for working with tabular data.

In this part, we will work with real-world data from the European Court of Human Rights OpenData project (ECHR-OD). ECHR-OD has data sets with cases from the European Court of Human Rights in different formats. Here, we will work with data from the tabular data file echr_2_0_0_structured_cases.csv, which is in CSV format. CSV is an abbreviation for “comma separated values”.

You can get the smaller subset from the semester page or the full data set from ECHR-OD.

pandas#

We must first import the pandas library.

import pandas as pd

Note

By convention, pandas is usually imported as pd.

Alternatively, we can import the functions we want to use.

from pandas import Series, DataFrame, read_csv

Pandas has two different data types: Series and DataFrame.

DataFrame#

A pandas DataFrame is a table. Tables are made up of rows and columns of data, we say they are two-dimensional. A DataFrame can store numbers and strings as well as other types of data, for example dates. We can make a DataFrame like this table with made-up data about some fictitious cases:

table = DataFrame([[2, 3, 3],
                   [5, 4, 0],
                   [1, 2, 1]]
                  )
display(table)

	0	1	2
0	2	3	3
1	5	4	0
2	1	2	1

By default, the rows are numbered. We can add an index to give the rows names instead.

table.index = ['Smith v. Jones',
               'Doe v. Doe',
               'Jones v. Doe']
display(table)

	0	1	2
Smith v. Jones	2	3	3
Doe v. Doe	5	4	0
Jones v. Doe	1	2	1

We can also give names to the columns.

table.columns = ['time_allotted', 'time_used', 'witnesses']
display(table)

	time_allotted	time_used	witnesses
Smith v. Jones	2	3	3
Doe v. Doe	5	4	0
Jones v. Doe	1	2	1

Reading CSV Files#

Pandas has functions for importing data in many different formats. Here, we will use CSV files. To read the data set, we use pandas read_csv() function.

cases = read_csv('cases-100.csv')

Examining Data#

Now, we can examine the data set. Pandas has a method head() which prints the first five rows of the table.

display(cases.head())

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	docname	doctypebranch	ecli	importance	...	ccl_article=p1-3
0	0	MS WORD	45498/11	ru	Russian Federation	NaN	CASE OF SKLYAR v. RUSSIA	CHAMBER	ECLI:CE:ECHR:2017:0718JUD004549811	4	...	0
1	1	MS WORD	18860/07	ru	Russian Federation	NaN	CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PART...	CHAMBER	ECLI:CE:ECHR:2016:1108JUD001886007	4	...	-1
2	2	MS WORD	75567/01	si	Slovenia	NaN	CASE OF OBERWALDER v. SLOVENIA	CHAMBER	ECLI:CE:ECHR:2007:0118JUD007556701	4	...	0
3	3	MS WORD	50031/11	ru	Russian Federation	NaN	CASE OF RAKHMONOV v. RUSSIA	CHAMBER	ECLI:CE:ECHR:2012:1016JUD005003111	4	...	0
4	4	MS WORD	42894/04	tr	Turkey	NaN	CASE OF ARAT AND OTHERSv. TURKEY	CHAMBER	ECLI:CE:ECHR:2009:0113JUD004289404	4	...	0

5 rows × 374 columns

Display Function

We can use the Jupyter Notebook function display() to show a nicer format than with print().

This is a large table with 374 columns. Therefore, pandas shows only the first and last columns. We can look at the column names.

There are several interesting columns. We will look at the conclusion columns in the section Relational Operators. Here are just a few examples:

Column	Content
`docname`	The name of the case.
`country.name`	The name of the country involved in the case.
`article=*`	Gives the relevant articles for each case. For example, the column `article=3` states whether Article 3 is mentioned. The value is 1 for `True` or 0 for `False`.
`ccl_article=*`	Conclusion for each article. For article 3, the conclusion is in the column `ccl_article=3`

We can list all the column names like this:

display(list(cases.columns))

Show code cell output

Hide code cell output

['Unnamed: 0',
 'application',
 'appno',
 'country.alpha2',
 'country.name',
 'decisiondate',
 'docname',
 'doctypebranch',
 'ecli',
 'importance',
 'introductiondate',
 'issue.0',
 'issue.1',
 'issue.10',
 'issue.11',
 'issue.12',
 'issue.13',
 'issue.14',
 'issue.15',
 'issue.16',
 'issue.17',
 'issue.18',
 'issue.19',
 'issue.2',
 'issue.20',
 'issue.21',
 'issue.22',
 'issue.23',
 'issue.24',
 'issue.25',
 'issue.26',
 'issue.3',
 'issue.4',
 'issue.5',
 'issue.6',
 'issue.7',
 'issue.8',
 'issue.9',
 'itemid',
 'judgementdate',
 'kpdate',
 'languageisocode',
 'originatingbody',
 'originatingbody_name',
 'originatingbody_type',
 'parties.0',
 'parties.1',
 'parties.2',
 'rank',
 'respondent.0',
 'respondent.1',
 'respondent.2',
 'respondent.3',
 'respondent.4',
 'respondentOrderEng',
 'separateopinion',
 'sharepointid',
 'typedescription',
 'article=3',
 'article=6',
 'article=p1-3',
 'article=5',
 'article=p1-1',
 'article=13',
 'article=8',
 'article=10',
 'article=2',
 'article=35',
 'article=34',
 'article=38',
 'article=41',
 'article=p4-2',
 'article=29',
 'article=4',
 'article=14',
 'article=p12-1',
 'article=9',
 'article=11',
 'article=1',
 'article=46',
 'article=17',
 'article=18',
 'article=7',
 'article=',
 'article=37',
 'article=p7-4',
 'article=57',
 'article=26',
 'article=27',
 'article=36',
 'article=p7-2',
 'article=30',
 'article=12',
 'article=p7-1',
 'article=p1-2',
 'article=56',
 'article=p1-4',
 'article=15',
 'article=39',
 'article=p4-4',
 'article=25',
 'article=p6-1',
 'article=p7-5',
 'article=19',
 'article=32',
 'article=p7-3',
 'article=33',
 'article=53',
 'article=28',
 'article=43',
 'article=p4-3',
 'article=52',
 'article=p13-1',
 'article=16',
 'article=p6-2',
 'documentcollectionid=CASELAW',
 'documentcollectionid=JUDGMENTS',
 'documentcollectionid=CHAMBER',
 'documentcollectionid=ENG',
 'documentcollectionid=COMMITTEE',
 'documentcollectionid=GRANDCHAMBER',
 'applicability=',
 'applicability=36',
 'applicability=43',
 'applicability=41',
 'applicability=55',
 'applicability=3',
 'applicability=22',
 'applicability=60',
 'applicability=58',
 'applicability=25',
 'applicability=47',
 'applicability=12',
 'applicability=38',
 'applicability=20',
 'applicability=18',
 'applicability=24',
 'applicability=62',
 'applicability=21',
 'applicability=23',
 'applicability=8',
 'applicability=26',
 'applicability=53',
 'applicability=15',
 'applicability=48',
 'applicability=14',
 'applicability=51',
 'applicability=13',
 'applicability=5',
 'applicability=7',
 'applicability=50',
 'applicability=52',
 'applicability=28',
 'applicability=29',
 'applicability=31',
 'applicability=6',
 'applicability=81',
 'applicability=66',
 'applicability=49',
 'applicability=33',
 'applicability=63',
 'applicability=68',
 'applicability=46',
 'applicability=19',
 'applicability=40',
 'applicability=17',
 'applicability=32',
 'applicability=72',
 'applicability=34',
 'applicability=35',
 'applicability=54',
 'applicability=27',
 'applicability=16',
 'applicability=64',
 'applicability=57',
 'applicability=56',
 'applicability=2',
 'applicability=4',
 'applicability=67',
 'applicability=77',
 'applicability=71',
 'applicability=59',
 'paragraphs=6-3-c',
 'paragraphs=3',
 'paragraphs=6',
 'paragraphs=6-1',
 'paragraphs=P1-3',
 'paragraphs=5-1',
 'paragraphs=5',
 'paragraphs=P1-1',
 'paragraphs=8-1',
 'paragraphs=13',
 'paragraphs=8',
 'paragraphs=10-1',
 'paragraphs=10-2',
 'paragraphs=10',
 'paragraphs=2',
 'paragraphs=35',
 'paragraphs=34',
 'paragraphs=35-1',
 'paragraphs=38',
 'paragraphs=41',
 'paragraphs=2-1',
 'paragraphs=P4-2-1',
 'paragraphs=35-3-a',
 'paragraphs=P4-2',
 'paragraphs=8-2',
 'paragraphs=29-3',
 'paragraphs=29',
 'paragraphs=4',
 'paragraphs=P1-1-1',
 'paragraphs=5-1-e',
 'paragraphs=5-1-c',
 'paragraphs=6-2',
 'paragraphs=14',
 'paragraphs=35-3',
 'paragraphs=6-3-d',
 'paragraphs=5-5',
 'paragraphs=5-1-b',
 'paragraphs=P12-1',
 'paragraphs=11-2',
 'paragraphs=9',
 'paragraphs=11',
 'paragraphs=11-1',
 'paragraphs=1',
 'paragraphs=9-1',
 'paragraphs=6-3',
 'paragraphs=5-3',
 'paragraphs=5-4',
 'paragraphs=46',
 'paragraphs=5-1-a',
 'paragraphs=46-2',
 'paragraphs=P4-2-2',
 'paragraphs=17',
 'paragraphs=18',
 'paragraphs=7',
 'paragraphs=7-1',
 'paragraphs=',
 'paragraphs=4-3-d',
 'paragraphs=4-2',
 'paragraphs=6-3-e',
 'paragraphs=9-2',
 'paragraphs=P1-1-2',
 'paragraphs=37-1-c',
 'paragraphs=37',
 'paragraphs=P7-4',
 'paragraphs=4-1',
 'paragraphs=57',
 'paragraphs=6-3-b',
 'paragraphs=35-3-b',
 'paragraphs=5-1-f',
 'paragraphs=35-4',
 'paragraphs=P4-2-3',
 'paragraphs=2-2',
 'paragraphs=26',
 'paragraphs=27',
 'paragraphs=27-2',
 'paragraphs=36-1',
 'paragraphs=36',
 'paragraphs=37-1',
 'paragraphs=P7-2',
 'paragraphs=35-2',
 'paragraphs=5-1-d',
 'paragraphs=6-3-a',
 'paragraphs=30',
 'paragraphs=46-1',
 'paragraphs=12',
 'paragraphs=P7-1',
 'paragraphs=P7-1-1',
 'paragraphs=36-2',
 'paragraphs=5-2',
 'paragraphs=P1-2',
 'paragraphs=56-3',
 'paragraphs=56',
 'paragraphs=P1-4',
 'paragraphs=56-1',
 'paragraphs=P7-1-2',
 'paragraphs=38-1-a',
 'paragraphs=37-1-b',
 'paragraphs=15-1',
 'paragraphs=15',
 'paragraphs=39',
 'paragraphs=7-2',
 'paragraphs=P4-4',
 'paragraphs=15-3',
 'paragraphs=25',
 'paragraphs=25-1',
 'paragraphs=37-1-a',
 'paragraphs=P6-1',
 'paragraphs=P7-5',
 'paragraphs=19',
 'paragraphs=32',
 'paragraphs=P7-3',
 'paragraphs=33',
 'paragraphs=53',
 'paragraphs=28-1-a',
 'paragraphs=28',
 'paragraphs=35-2-b',
 'paragraphs=43',
 'paragraphs=4-3-b',
 'paragraphs=4-3',
 'paragraphs=4-3-a',
 'paragraphs=P4-3',
 'paragraphs=28-3',
 'paragraphs=52',
 'paragraphs=P13-1',
 'paragraphs=P7-4-1',
 'paragraphs=16',
 'paragraphs=27-1-b',
 'paragraphs=29-1',
 'paragraphs=32-2',
 'paragraphs=28-1',
 'paragraphs=P6-2',
 'paragraphs=46-4',
 'ccl_article=.5-5',
 'ccl_article=10',
 'ccl_article=10-1',
 'ccl_article=11',
 'ccl_article=11-1',
 'ccl_article=12',
 'ccl_article=13',
 'ccl_article=14',
 'ccl_article=17',
 'ccl_article=18',
 'ccl_article=2',
 'ccl_article=2-1',
 'ccl_article=25',
 'ccl_article=25-1',
 'ccl_article=3',
 'ccl_article=34',
 'ccl_article=38',
 'ccl_article=4',
 'ccl_article=4-3-d',
 'ccl_article=46',
 'ccl_article=5',
 'ccl_article=5-1',
 'ccl_article=5-1-a',
 'ccl_article=5-1-b',
 'ccl_article=5-1-c',
 'ccl_article=5-1-e',
 'ccl_article=5-1-f',
 'ccl_article=5-2',
 'ccl_article=5-3',
 'ccl_article=5-4',
 'ccl_article=5-5',
 'ccl_article=6',
 'ccl_article=6-1',
 'ccl_article=6-2',
 'ccl_article=6-3',
 'ccl_article=6-3-a',
 'ccl_article=6-3-b',
 'ccl_article=6-3-c',
 'ccl_article=6-3-d',
 'ccl_article=6-3-e',
 'ccl_article=7',
 'ccl_article=7-1',
 'ccl_article=8',
 'ccl_article=8-1',
 'ccl_article=9',
 'ccl_article=9-1',
 'ccl_article=p1-1',
 'ccl_article=p1-1-1',
 'ccl_article=p1-1-2',
 'ccl_article=p1-2',
 'ccl_article=p1-3',
 'ccl_article=p12-1',
 'ccl_article=p4-2',
 'ccl_article=p4-2-1',
 'ccl_article=p4-4',
 'ccl_article=p6-1',
 'ccl_article=p7-1',
 'ccl_article=p7-2',
 'ccl_article=p7-3',
 'ccl_article=p7-4']

Specifying Index Column#

When reading the CSV file, we can specify which column to use as index. For the ECHR-OD data it makes sense to use docname as index.

cases = read_csv('cases-100.csv', index_col='docname')
display(cases.head())

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF SKLYAR v. RUSSIA	0	MS WORD	45498/11	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2017:0718JUD004549811	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA	1	MS WORD	18860/07	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2016:1108JUD001886007	4	NaN	...	-1	0	0	0	0	0	0	0	0	0
CASE OF OBERWALDER v. SLOVENIA	2	MS WORD	75567/01	si	Slovenia	NaN	CHAMBER	ECLI:CE:ECHR:2007:0118JUD007556701	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF RAKHMONOV v. RUSSIA	3	MS WORD	50031/11	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2012:1016JUD005003111	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF ARAT AND OTHERSv. TURKEY	4	MS WORD	42894/04	tr	Turkey	NaN	CHAMBER	ECLI:CE:ECHR:2009:0113JUD004289404	4	NaN	...	0	0	0	0	0	0	0	0	0	0

5 rows × 373 columns

Getting a Row#

We can use the locator .loc[] to select parts of the table. First, we select a row using the index.

selection = cases.loc['CASE OF SKLYAR v. RUSSIA']
display(selection)

Unnamed: 0                           0
application                    MS WORD
appno                         45498/11
country.alpha2                      ru
country.name        Russian Federation
                           ...        
ccl_article=p6-1                     0
ccl_article=p7-1                     0
ccl_article=p7-2                     0
ccl_article=p7-3                     0
ccl_article=p7-4                     0
Name: CASE OF SKLYAR v. RUSSIA, Length: 373, dtype: object

Note

Pandas only shows the first and last five columns. To show all the data, we can convert them to a list().

print(list(selection))

[np.int64(0), 'MS WORD', '45498/11', 'ru', 'Russian Federation', nan, 'CHAMBER', 'ECLI:CE:ECHR:2017:0718JUD004549811', np.int64(4), nan, nan, nan, np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), nan, np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), nan, nan, np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), np.float64(nan), '001-175680', '18/07/2017', '18/07/2017', 'ENG', np.int64(6), 'Third Section', 'Court', 'SKLYAR', 'RUSSIA', np.float64(nan), np.float64(1949.88940429688), 'RUS', nan, np.float64(nan), np.float64(nan), np.float64(nan), np.int64(38), np.False_, np.int64(466267), np.int64(15), np.int64(1), np.int64(1), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(0), np.int64(0), np.int64(1), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(1), np.int64(1), np.int64(1), np.int64(1), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(1), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(-1), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0), np.int64(0)]

Listing a Column#

We can use the subscript operator to get a column.

selection = cases['country.name']
display(selection)

docname
CASE OF SKLYAR v. RUSSIA                                                Russian Federation
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA    Russian Federation
CASE OF OBERWALDER v. SLOVENIA                                                    Slovenia
CASE OF RAKHMONOV v. RUSSIA                                             Russian Federation
CASE OF ARAT AND OTHERSv. TURKEY                                                    Turkey
                                                                               ...        
CASE OF ARTUR PAWLAK v. POLAND                                                      Poland
CASE OF PROKHOROV v. UKRAINE                                                       Ukraine
CASE OF ALAJOS KISS v. HUNGARY                                                     Hungary
CASE OF PSHEVECHERSKIY v. RUSSIA                                        Russian Federation
CASE OF TULESHOV AND OTHERS v. RUSSIA                                   Russian Federation
Name: country.name, Length: 100, dtype: object

We can also use the locator .loc[] for this, then we need to supply two indexes. If we want to get the entire country.name column, we leave out the indexes and just use a colon.

selection = cases.loc[:, 'country.name']
display(selection)

docname
CASE OF SKLYAR v. RUSSIA                                                Russian Federation
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA    Russian Federation
CASE OF OBERWALDER v. SLOVENIA                                                    Slovenia
CASE OF RAKHMONOV v. RUSSIA                                             Russian Federation
CASE OF ARAT AND OTHERSv. TURKEY                                                    Turkey
                                                                               ...        
CASE OF ARTUR PAWLAK v. POLAND                                                      Poland
CASE OF PROKHOROV v. UKRAINE                                                       Ukraine
CASE OF ALAJOS KISS v. HUNGARY                                                     Hungary
CASE OF PSHEVECHERSKIY v. RUSSIA                                        Russian Federation
CASE OF TULESHOV AND OTHERS v. RUSSIA                                   Russian Federation
Name: country.name, Length: 100, dtype: object

Note

We have seen the colon before with lists. With lists we can also leave out one or both of the indexes. Without any indexes, we get a copy of the entire list: list_copy = old_list[:]

Using Numeric Indexes#

Since we opened the table with the argument index_col='docname', it has string indexes like ‘CASE OF SKLYAR v. RUSSIA’. But we can also use numeric indexes with the integer locator .iloc[]. Here we get the first five rows using a range of indexes.

selection = cases.iloc[0:4]
display(selection)

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF SKLYAR v. RUSSIA	0	MS WORD	45498/11	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2017:0718JUD004549811	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA	1	MS WORD	18860/07	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2016:1108JUD001886007	4	NaN	...	-1	0	0	0	0	0	0	0	0	0
CASE OF OBERWALDER v. SLOVENIA	2	MS WORD	75567/01	si	Slovenia	NaN	CHAMBER	ECLI:CE:ECHR:2007:0118JUD007556701	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF RAKHMONOV v. RUSSIA	3	MS WORD	50031/11	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2012:1016JUD005003111	4	NaN	...	0	0	0	0	0	0	0	0	0	0

4 rows × 373 columns

Getting a Single Cell#

We can get a single cell by specifying both the row and column names. We can get the title in the row ‘CASE OF SKLYAR v. RUSSIA’.

selection = cases.loc['CASE OF SKLYAR v. RUSSIA', 'judgementdate']
print(selection)

18/07/2017

Note

Tables are two-dimensional; therefore, we must use two indexes.

Getting Multiple Columns#

We can get multiple columns by specifying their names in a list. Here, we use the subscript operator, not the locator .loc[].

selection = cases[['country.name', 'judgementdate']]
display(selection)

	country.name	judgementdate
docname
CASE OF SKLYAR v. RUSSIA	Russian Federation	18/07/2017
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA	Russian Federation	08/11/2016
CASE OF OBERWALDER v. SLOVENIA	Slovenia	18/01/2007
CASE OF RAKHMONOV v. RUSSIA	Russian Federation	16/10/2012
CASE OF ARAT AND OTHERSv. TURKEY	Turkey	13/01/2009
...	...	...
CASE OF ARTUR PAWLAK v. POLAND	Poland	05/10/2017
CASE OF PROKHOROV v. UKRAINE	Ukraine	30/11/2006
CASE OF ALAJOS KISS v. HUNGARY	Hungary	20/05/2010
CASE OF PSHEVECHERSKIY v. RUSSIA	Russian Federation	24/05/2007
CASE OF TULESHOV AND OTHERS v. RUSSIA	Russian Federation	24/05/2007

100 rows × 2 columns

Missing Data

Here, some of the dates are missing. Pandas uses NaN to indicate missing numeric data, short for Not a Number.

Matching Multiple Columns#

We can also select multiple columns using string matching on the column names. The matching uses regular expressions, which we will discuss in Regular Expressions.

selection = cases.filter(regex='date', axis='columns')
display(selection)

	decisiondate	introductiondate	judgementdate	kpdate
docname
CASE OF SKLYAR v. RUSSIA	NaN	NaN	18/07/2017	18/07/2017
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA	NaN	NaN	08/11/2016	08/11/2016
CASE OF OBERWALDER v. SLOVENIA	NaN	NaN	18/01/2007	18/01/2007
CASE OF RAKHMONOV v. RUSSIA	NaN	NaN	16/10/2012	16/10/2012
CASE OF ARAT AND OTHERSv. TURKEY	NaN	NaN	13/01/2009	13/01/2009
...	...	...	...	...
CASE OF ARTUR PAWLAK v. POLAND	NaN	NaN	05/10/2017	05/10/2017
CASE OF PROKHOROV v. UKRAINE	06/11/2006	02/03/2004	30/11/2006	30/11/2006
CASE OF ALAJOS KISS v. HUNGARY	NaN	NaN	20/05/2010	20/05/2010
CASE OF PSHEVECHERSKIY v. RUSSIA	NaN	NaN	24/05/2007	24/05/2007
CASE OF TULESHOV AND OTHERS v. RUSSIA	NaN	NaN	24/05/2007	24/05/2007

100 rows × 4 columns

Relational Operators#

Pandas can select data conditionally, using relational operators. The ECHR-OD case data has a column ccl_article=2. This column has the value 1 for cases where there was found a violation of Article 2 of the European Convention on Human Rights.

We can use this column in an expression with a relational operator.

cases['ccl_article=2'] == 1

docname
CASE OF SKLYAR v. RUSSIA                                                False
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA    False
CASE OF OBERWALDER v. SLOVENIA                                          False
CASE OF RAKHMONOV v. RUSSIA                                             False
CASE OF ARAT AND OTHERSv. TURKEY                                        False
                                                                        ...  
CASE OF ARTUR PAWLAK v. POLAND                                          False
CASE OF PROKHOROV v. UKRAINE                                            False
CASE OF ALAJOS KISS v. HUNGARY                                          False
CASE OF PSHEVECHERSKIY v. RUSSIA                                        False
CASE OF TULESHOV AND OTHERS v. RUSSIA                                   False
Name: ccl_article=2, Length: 100, dtype: bool

Boolean Columns

The result is a Boolean column. When we use them with pandas tables, relational operators return Boolean columns. This is just like regular relational operators, except that regular relational operators return a single value.

Relational Operators in Locators#

We can use the result of a relational operator as a filter when selecting cases. This will select the cases where the Boolean value is True.

selection = cases.loc[cases['ccl_article=2'] == 1]
display(selection)

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF KUKAYEV v. RUSSIA	10	MS WORD	29361/02	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2007:1115JUD002936102	3	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF MUDAYEVY v. RUSSIA	36	MS WORD	33105/05	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2010:0408JUD003310505	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF KALLIS AND ANDROULLA PANAYI v. TURKEY	62	MS WORD	45388/99	tr	Turkey	NaN	CHAMBER	ECLI:CE:ECHR:2009:1027JUD004538899	4	NaN	...	0	0	0	0	0	0	0	0	0	0

3 rows × 373 columns

Conclusion Columns

The conclusions of each case are listed in the columns starting with ccl_article. For Article N, the column name is ccl_article=N. These columns have 3 possible values:

Table 1 Conclusion Values#
Value	Meaning
1	violation
-1	no violation
0	no conclusion / not relevant

Likewise, we can select cases where there was found no violation of Article 5 § 3.

selection = cases.loc[cases['ccl_article=5-3'] == -1]
display(selection)

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF PAWLAK v. POLAND	47	MS WORD	39840/05	pl	Poland	NaN	CHAMBER	ECLI:CE:ECHR:2008:0115JUD003984005	4	NaN	...	0	0	0	0	0	0	0	0	0	0

1 rows × 373 columns

Finding Text#

We can’t use a relational operators with text strings. Instead, we use .str.contains():

cases.loc[cases['country.name'].str.contains('Germany')]

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF MARC BRAUER v. GERMANY	41	MS WORD	24062/13	de	Germany	NaN	CHAMBER	ECLI:CE:ECHR:2016:0901JUD002406213	3	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF BARTHOLD v. GERMANY (ARTICLE 50)	53	MS WORD	8734/79	de	Germany	NaN	CHAMBER	ECLI:CE:ECHR:1986:0131JUD000873479	2	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF KLINKENBUSS v. GERMANY	90	MS WORD	53157/11	de	Germany	NaN	CHAMBER	ECLI:CE:ECHR:2016:0225JUD005315711	3	NaN	...	0	0	0	0	0	0	0	0	0	0

3 rows × 373 columns

Combining Conditions#

With pandas, we can’t use and to combine expressions, instead we must use &:

cases.loc[cases['country.name'].str.contains('Germany') & (cases['importance'] >= 3)]

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF MARC BRAUER v. GERMANY	41	MS WORD	24062/13	de	Germany	NaN	CHAMBER	ECLI:CE:ECHR:2016:0901JUD002406213	3	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF KLINKENBUSS v. GERMANY	90	MS WORD	53157/11	de	Germany	NaN	CHAMBER	ECLI:CE:ECHR:2016:0225JUD005315711	3	NaN	...	0	0	0	0	0	0	0	0	0	0

2 rows × 373 columns

Important

Remember to use parentheses around the conditions with &. Otherwise, you will get unexpected results.

The isin() Method#

With basic Python data types, we can check if they occur in a list with the operator in, for example 5 in [0, 1] is False. Likewise, pandas has the method .isin() for checking if table values occur in a list. We can use this for example for finding cases where the conclusion is “violation” or “no violation”, filtering out cases where an article is not relevant.

As the table Conclusion Values shows, the value is 1 for “violation” or -1 for “no violation”. Again, we try with Article 5 § 3.

selection = cases.loc[cases['ccl_article=5-3'].isin([-1, 1])]
display(selection)

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF PAWLAK v. POLAND	47	MS WORD	39840/05	pl	Poland	NaN	CHAMBER	ECLI:CE:ECHR:2008:0115JUD003984005	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF POPLAWSKI v. POLAND	56	MS WORD	28633/02	pl	Poland	NaN	CHAMBER	ECLI:CE:ECHR:2008:0129JUD002863302	4	NaN	...	0	0	0	0	0	0	0	0	0	0
CASE OF PSHEVECHERSKIY v. RUSSIA	98	MS WORD	28957/02	ru	Russian Federation	NaN	CHAMBER	ECLI:CE:ECHR:2007:0524JUD002895702	4	NaN	...	0	0	0	0	0	0	0	0	0	0

3 rows × 373 columns

Selecting by Year#

We can select cases by the decision date. First, we must convert the decisiondate column to machine-readable format.

cases['decisiondate'] = pd.to_datetime(cases['decisiondate'], dayfirst = True)

Tip

We specify the date format with the argument dayfirst = True. There is also an argument yearfirst.

Note

Python has the built-in library datetime for handling time and dates. But when we use pandas, we usually only need pandas’ to_datetime() function.

Now, we can use the year from the decision date with a relational operator.

selection = cases.loc[cases['decisiondate'].dt.year == 2005]
display(selection)

	Unnamed: 0	application	appno	country.alpha2	country.name	decisiondate	doctypebranch	ecli	importance	introductiondate	...	ccl_article=p1-3	ccl_article=p12-1	ccl_article=p4-2	ccl_article=p4-2-1	ccl_article=p4-4	ccl_article=p6-1	ccl_article=p7-1	ccl_article=p7-2	ccl_article=p7-3	ccl_article=p7-4
docname
CASE OF KOTLYAROV v. UKRAINE	15	MS WORD	43593/02	ua	Ukraine	2005-11-22	CHAMBER	ECLI:CE:ECHR:2005:1213JUD004359302	4	25/10/2002	...	0	0	0	0	0	0	0	0	0	0
CASE OF FRYCKMAN v. FINLAND	20	MS WORD	36288/97	fi	Finland	2005-11-15	CHAMBER	ECLI:CE:ECHR:2006:1010JUD003628897	4	21/05/1997	...	0	0	0	0	0	0	0	0	0	0
CASE OF DEBONO v. MALTA	34	MS WORD	34539/02	mt	Malta	2005-05-03	CHAMBER	ECLI:CE:ECHR:2006:0207JUD003453902	4	06/09/2002	...	0	0	0	0	0	0	0	0	0	0
CASE OF GERASIMENKO v. RUSSIA	76	MS WORD	24657/03	ru	Russian Federation	2005-10-25	CHAMBER	ECLI:CE:ECHR:2005:1117JUD002465703	4	12/09/1999	...	0	0	0	0	0	0	0	0	0	0

4 rows × 373 columns

Aggregating Data#

Pandas has various functions for aggregating data. For example, we can count the number of occurrences of different values with value_counts().

The column doctypebranch contains the type of deciding body for the case.

selection = cases['doctypebranch'].value_counts()
print(selection)

doctypebranch
CHAMBER         76
COMMITTEE       22
GRANDCHAMBER     2
Name: count, dtype: int64

Visualizing Data#

We can also plot our results. Here we continue working with our selection.

selection.plot(kind='bar')

<Axes: xlabel='doctypebranch'>

../_images/bcdf531a328a33f8bb77dc672422046ea57619cdf1b14c5efce4ab6d0b2bc28d.png

Sums#

Pandas has a function for calculating the sum of columns or rows. Since our data uses the number 1 to indicate True, we can count matching cases simply by summing the ones.

We can find the number of cases where Article 3 applies by summing the column article=3.

result = cases.loc[:, 'article=3'].sum()
print(result)

We can also sum the values of multiple columns at once.

result = cases.loc[:, ['article=3', 'article=5', 'article=6', 'article=10']].sum()

And plot the result.

result.plot(kind='bar')

<Axes: >

../_images/7c2a51eb4350ec1bafca730cf26b17eb324f1b2298ac60b2c5c1a0491134642e.png

Choosing Axis#

As we have seen, the method .sum()without any parameters will calculate the sum downwards, along the index of the table. We are adding together the rows of the table.

result = cases.loc[:, ['article=3', 'article=5', 'article=6', 'article=10']].sum()
display(result)

article=3     17
article=5     17
article=6     54
article=10     9
dtype: int64

We get the same result if we specify the parameter axis = 'index'. This is because 'index' is the default value for the parameter axis.

result = cases.loc[:, ['article=3', 'article=5', 'article=6', 'article=10']].sum(axis='index')
display(result)

article=3     17
article=5     17
article=6     54
article=10     9
dtype: int64

We can also add together the columns of the table. To do this, we must change the direction to axis = 'column'.

result = cases.loc[:, ['article=3', 'article=5', 'article=6', 'article=10']].sum(axis='columns')
display(result)

docname
CASE OF SKLYAR v. RUSSIA                                                2
CASE OF YABLOKO RUSSIAN UNITED DEMOCRATIC PARTY AND OTHERS v. RUSSIA    0
CASE OF OBERWALDER v. SLOVENIA                                          1
CASE OF RAKHMONOV v. RUSSIA                                             1
CASE OF ARAT AND OTHERSv. TURKEY                                        1
                                                                       ..
CASE OF ARTUR PAWLAK v. POLAND                                          1
CASE OF PROKHOROV v. UKRAINE                                            1
CASE OF ALAJOS KISS v. HUNGARY                                          0
CASE OF PSHEVECHERSKIY v. RUSSIA                                        1
CASE OF TULESHOV AND OTHERS v. RUSSIA                                   0
Length: 100, dtype: int64

The result is a column with 100 items. These are the number of relevant articles (of the four we selected) for the 100 cases.

Series#

A pandas Series is a list of values, similar to a Python list. When we selected a single row or column above, the result was a Series.

We can select a single column from our small example DataFrame.

column = table['time_used']
print(column)

Smith v. Jones    3
Doe v. Doe        4
Jones v. Doe      2
Name: time_used, dtype: int64

We can select a row with .loc[ ].

row = table.loc['Doe v. Doe']
print(row)

time_allotted    5
time_used        4
witnesses        0
Name: Doe v. Doe, dtype: int64

If we need to, we can also make a Series from scratch:

row = Series([3, 1, 5])
print(row)

  3
  1
  5
dtype: int64

Tabular Data

Contents

Tabular Data#

pandas#

DataFrame#

Reading CSV Files#

Examining Data#

Specifying Index Column#

Getting a Row#

Listing a Column#

Using Numeric Indexes#

Getting a Single Cell#

Getting Multiple Columns#

Matching Multiple Columns#

Relational Operators#

Relational Operators in Locators#

Finding Text#

Combining Conditions#

The isin() Method#

Selecting by Year#

Aggregating Data#

Visualizing Data#

Sums#

Choosing Axis#

Series#