# More Advanced Data Cube Usage¶

## Introduction to cropping products with shapefiles¶

In this notebook you will learn more about the Kyrgyzstan Data Cube (KDC) products and how to use geopandas to analyse or view the region that you want.

This notebook will require a number of external Python modules to run, which will be imported at the top of the notebook as his convention.

As we have been showed in other notebooks, we need to initialise a Data Cube in order to have access to the KDC, and this can be done as follows:

We use geopandas to load the Raions shapefiles; inside there will be a poligon for each raion.

Use the command print(sf) to see what's inside the shapefile.

As showed in previous notebooks, you can select a specific Raion using the index of the geopandas dataframe.

### Let's look at the NDVI for this raion¶

Define query to load data from the Data Cube.

In this case, we are loading the already made 10day median NDVI, in a raion (region_id=15), for the last 10day of June.

Let's scale the data so to have the NDVI from -1 to +1.

Here we change the CRS of the shapefile to make sure that the shapefile and the data are in the same projection.

Now we plot the NDVI with the borders of the raion selected.

### Let's cut the image using the shape file¶

Plot the cutted data with the shapefile of the raions

Plot the cutted data with the shapefile of the selected raion

### Multiple polygons¶

In the following cells we will use MODIS data to show how the data can be cutted using multiple polygons.

RGB of the whole country with the shapefile of the Raions

Cut the image selecting only two Raions (15 and 33 are the index in the geopandas dataframe)

Plotting the cutted image

The same can be applied for the Sentinel data. We can load from the RGB product

Load the raion shapefile and selecting a specific soum

Setting the query for the Data Cube to load data from high resolution data

Load the data and make sure that our raion (gdf) has the same projection of our data (crs = EPSG:32643)

Select the only one image/time snapshot available in this case.

Plotting the data and the boundary of the selected raion

# More Advanced Data Cube Usage¶

## Timeseries Analysis¶

This notebook will attempt to show some of the long term time analysis processes that can be run using the Data Cube. The Data Cube is designed to allow for easy analysis of large volumes of Analysis Ready Data in both spatial and temporal dimensions. In this notebook, we will look at using Sentinel-2 data to analyse the Modified Soil Adjusted Vegetation Index (MSAVI) for a group of fields near Bishkek over time as well as find how the Normalised Difference Vegetation Index (NDVI) and Normalised Difference Water Index (NDWI) changes over the entire Naryn oblast in Kyrgyzstan for a whole year.

To start off with, we will need to load in the modules required to do this analysis, which can be done below:

The next thing to do is consider the area which will be analysed. We will be looking at some fields located to the West of Bishkek. Its latitude range is between 42.90 and 42.91 and its longitude range is between 74.39 and 74.91. Below is a picture of the area containing the fields, which is constrained by the red box.

For the purposes of this exercise, the bounding box we will use is the latitude and longitude extent range.

Now we have enough information to initialise a Data Cube instance, and create a query for the search. Try to do this below, with an answer for how to do this in the hidden solutions box.

Once the query has been done, it's time to load in the data. In this analysis, we will be using Sentinel-2 10m data. As we will be looking at the data for the entire Sentinel-2 archive currently in the Kyrgyz Data Cube, it is not necessary to set a time range, and so the data can be loaded in as shown below:

This data must be masked out, so that anything that isn't land or snow is ignored. This can be done the same way as shown in previous notebooks, bearing in mind that the values for land and snow in the mask DataArray are 5 and 7 respectively. Try masking the dataset below, looking at the solution in the box if you need help.

To find the median average pasture for this farmland area, we will need to take the median across the 'x' and 'y' dimensions which can be done in the following way:

Now that the exact coordinate of Station 246 has been identified in the data and the data has been cloud masked, it is time to create the MSAVI2 index. This index is similar to NDVI, in that it takes into account both the red and near-infrared bands from satellite imagery, but it accounts better for areas with large amounts of bare soil. The equation for implementing the MSAVI was derived by Qi et al. (1994) and is as follows:

$MSAVI = 2 \ \rho_{NIR} + 1 - \frac{\sqrt{(2 \ \rho_{NIR} + 1)^2 - 8 \ (\rho_{NIR} - \rho_{RED})}}{2}$

where $\rho_{NIR}$ is the near-infrared band surface reflectance and $\rho_{RED}$ is the red band surface reflectance. This equation can be calculated using Python as shown below:

This has created an xarray DataArray of MSAVI points. These points can be plotted using Matplotlib and the .scatter() function, using the the points created in the MSAVI array and lining them up with the time array given by ds.time.

## Finding average index values in Naryn for 2019¶

Now we can do some timeseries averaging to find useful information for a larger area over the course of a year. In this instance, we will plot the average value for all of Naryn oblast for a number of different indices over the course of a year. This will be done using composite Sentinel-2/Landsat8 data at 100m resolution. This will require a change to the query used in the previous part of the notebook.

Naryn has the following bbox coordinates:

• longitudes: 73.7128139, 77.9073308
• latitudes: 40.2809711, 42.4530034

It is also important to remember that the pixel resolution is now 100m. This results in the following query:

This query can then be used to get all the 10 Day composite data in for Zavkhan in 2019. The product required to do this is modis_indices_250m and otherwise it is done in the usual way. Try to do this below, looking at the hidden solution if required.

The loaded in data must now be scaled from the uint16 values it is stored as to the familiar float values that are expected when for indices such as NDVI, NDWI and NDDI. This can be done using the function scal_2_tru in dcFunctions, as shown below:

Next we must find the median value for NDVI and NDWI at each time step. This can be done by once again using the .median() method on the dataset itself, but instead of specifying the dimension that the median will be taken over as the "time" dimension, we will instead specify this using just square brackets, to show that we want the median over both the "x" and "y" dimensions. This is done as shown below:

Next we can plot the average NDVI and NDWI over the course of a year for Naryn oblast. This is done using matplotlib, and to plot both lines on the same graph, a little more setup must be done compared to previous examples.

To do this we have to set up the plot figure before initialising the plot, using the plt.subplots() function. For each plot we specify the axes to plot the figure on, to ensure both plots overlap on the same axes. We can also use the "label" function, which when combined with plt.legend() creates an easy label box for both lines.

The below example illustrates just some of the things that can be customised in plotting with matplotlib, although it is possible to do a very wide variety of things indeed.