More Advanced Python Usage

Using shapefiles with GeoPandas

GeoPandas is a python package that is designed to make working with spatial data in python easier. It adds to the functions used by pandas to enable it to work with spatial data.


We're going to work with data sets for district boundaries and miscellaneous population settlements.


First we'll read in the dataset to a geodataframe. It's important to specify the encoding so that cyrillic characters are displayed properly.

.head() shows the top part of the dataframe so we can easily view a part of the dataset without loading it completely. Note the geometry field that contains the values that make up the polygons

We can view the spatial data using .plot()

Coordinate systems are important when dealing with spatial data. Here we check the coordinate system of the dataset.

Most of the time you won’t have to set a projection as most data will include projection information. If it doesn't you will need to specify it. It uses the epsg code for the CRS. This website gives codes for many projections

Note that this is different to reprojecting the data. If we want to reproject the data into a different coordinate system we need to use .to_crs() and specify the coordinate system we want to convert the data to.

Accessing columns can be done in the same way as pandas.

This is another way of speciying a column.

We can select data using attributes. The example below finds features where the Shape_Area is greater than 0.7. Other operators can be used.

Plot the selected data

We can also read in csv files and create point data using the x, y coordinates. To do this we'll first use pandas to read the csv using read_csv.

Here we use geopandas.GeoDataFrame() to create a new geodataframe using the settlements dataframe and add the geometry. We include points_from_xy() when we specify the geometry. This takes the latitude and longitude columns and uses them to create point objects that are stored in the geometry field of the geodataframe.

It's important to make sure that the coordinate system is set so that the geodataframe can be used in spatial operations.

To the right you can see the geometry column that has been created

Next, we'll write out the new settlements stations geodataframe to shapefile using to_file().

We can view the settlements using the district boundaries as a background.

First plot the district boundaries and save the axes to ax. Note that some keyword arguments are used to specify the fill colour (color='white') and the colour of the edges (edgecolor='black').

Then we can include ax in the arguments for the settlements so that they appear on the same plot.

We can also join data using a spatial join. A spatial join joins attributes from one set of features to another based on the spatial relationship.

In the example below, we use certain keyword arguments to find the district that a settlements is located in.

More information about the options can be found in the documentation

Scrolling to the right will show the fields that have been added.

We can analyse the data further to find the count of the settlements in each district.

First we prepare the dataset so that it contains only the columns we want to use. We do this by selecting those columns and saving that to the geodataframe object.

We then create a new column called count. When you specify a column that doesn't exist, geopandas will create a new column in that dataframe with the name specified. We set the value to 1 so that the sum of the values gives the count.

The dissolve function is then used aggregate the features.

We can join this dissolved dataframe onto the district polygons

We can then plot the resulting dataset using a few more options to get a better looking map. This example uses the package matplotlib to set some of the options.

We can save the plot to an image file.


From the district geodataframe, select the district with the ADM2_PCODE 'KG04220000000'