More Advanced Data Cube Usage¶

Timeseries Analysis¶

This notebook will attempt to show some of the long term time analysis processes that can be run using the Data Cube. The Data Cube is designed to allow for easy analysis of large volumes of Analysis Ready Data in both spatial and temporal dimensions. In this notebook, we will look at using Sentinel-2 data to analyse the Modified Soil Adjusted Vegetation Index (MSAVI) for a group of fields near Bishkek over time as well as find how the Normalised Difference Vegetation Index (NDVI) and Normalised Difference Water Index (NDWI) changes over the entire Naryn oblast in Kyrgyzstan for a whole year.

To start off with, we will need to load in the modules required to do this analysis, which can be done below:

In [ ]:

import sys
import datacube
import numpy as np
import matplotlib.pyplot as plt
sys.path.append('/eOsphere/bin')
import dcFunctions

The next thing to do is consider the area which will be analysed. We will be looking at some fields located to the West of Bishkek. Its latitude range is between 42.90 and 42.91 and its longitude range is between 74.39 and 74.91. Below is a picture of the area containing the fields, which is constrained by the red box.

For the purposes of this exercise, the bounding box we will use is the latitude and longitude extent range.

Now we have enough information to initialise a Data Cube instance, and create a query for the search. Try to do this below, with an answer for how to do this in the hidden solutions box.

In [ ]:

dc = datacube.Datacube()

query = {'x': [74.386, 74.391],
         'y': [42.9, 42.91],
         'measurements': ['red', 'nir', 'mask'],
         'output_crs': 'EPSG:32643',
         'resolution': [10, 10]
        }

Once the query has been done, it's time to load in the data. In this analysis, we will be using Sentinel-2 10m data. As we will be looking at the data for the entire Sentinel-2 archive currently in the Kyrgyz Data Cube, it is not necessary to set a time range, and so the data can be loaded in as shown below:

This data must be masked out, so that anything that isn't land or snow is ignored. This can be done the same way as shown in previous notebooks, bearing in mind that the values for land and snow in the mask DataArray are 5 and 7 respectively. Try masking the dataset below, looking at the solution in the box if you need help.

To find the median average pasture for this farmland area, we will need to take the median across the 'x' and 'y' dimensions which can be done in the following way:

Now that the exact coordinate of Station 246 has been identified in the data and the data has been cloud masked, it is time to create the MSAVI2 index. This index is similar to NDVI, in that it takes into account both the red and near-infrared bands from satellite imagery, but it accounts better for areas with large amounts of bare soil. The equation for implementing the MSAVI was derived by Qi et al. (1994) and is as follows:

$MSAVI = 2 \ \rho_{NIR} + 1 - \frac{\sqrt{(2 \ \rho_{NIR} + 1)^2 - 8 \ (\rho_{NIR} - \rho_{RED})}}{2} $

where $\rho_{NIR}$ is the near-infrared band surface reflectance and $\rho_{RED}$ is the red band surface reflectance. This equation can be calculated using Python as shown below:

This has created an xarray DataArray of MSAVI points. These points can be plotted using Matplotlib and the .scatter() function, using the the points created in the MSAVI array and lining them up with the time array given by ds.time.

Finding average index values in Naryn for 2019¶

Now we can do some timeseries averaging to find useful information for a larger area over the course of a year. In this instance, we will plot the average value for all of Naryn oblast for a number of different indices over the course of a year. This will be done using composite Sentinel-2/Landsat8 data at 100m resolution. This will require a change to the query used in the previous part of the notebook.

Naryn has the following bbox coordinates:

longitudes: 73.7128139, 77.9073308
latitudes: 40.2809711, 42.4530034

It is also important to remember that the pixel resolution is now 100m. This results in the following query:

In [ ]:

query = {'x': [73.7128139, 77.9073308],
         'y': [40.2809711, 42.4530034],
         'time': ['2019-01-01', '2019-12-31'],
         'measurements': ['ndvi', 'ndwi'],
         'output_crs': 'EPSG:32643',
         'resolution': [100, 100]
        }

This query can then be used to get all the 10 Day composite data in for Zavkhan in 2019. The product required to do this is modis_indices_250m and otherwise it is done in the usual way. Try to do this below, looking at the hidden solution if required.

The loaded in data must now be scaled from the uint16 values it is stored as to the familiar float values that are expected when for indices such as NDVI, NDWI and NDDI. This can be done using the function scal_2_tru in dcFunctions, as shown below:

Next we must find the median value for NDVI and NDWI at each time step. This can be done by once again using the .median() method on the dataset itself, but instead of specifying the dimension that the median will be taken over as the "time" dimension, we will instead specify this using just square brackets, to show that we want the median over both the "x" and "y" dimensions. This is done as shown below:

Next we can plot the average NDVI and NDWI over the course of a year for Naryn oblast. This is done using matplotlib, and to plot both lines on the same graph, a little more setup must be done compared to previous examples.

To do this we have to set up the plot figure before initialising the plot, using the plt.subplots() function. For each plot we specify the axes to plot the figure on, to ensure both plots overlap on the same axes. We can also use the "label" function, which when combined with plt.legend() creates an easy label box for both lines.

The below example illustrates just some of the things that can be customised in plotting with matplotlib, although it is possible to do a very wide variety of things indeed.

In [ ]:

fig, ax = plt.subplots(1, 1, figsize=[8,5])

ds['ndvi'].plot.line(ax=ax, label='NDVI')
ds['ndwi'].plot.line(ax=ax, label='NDWI')
plt.legend(loc='upper left')
plt.show()