More Advanced Data Cube Usage

Plotting a Single Band of Data

In the previous notebook, the basics of accessing and using data from the Data Cube were explained. This notebook will expand on the concepts explained in that notebook and actually allow us to do useful things with data contained within the Data Cube.

First of all, we will look at plotting data, first just a single band of satellite data and then creating an RGB image. Run the following cell to initialise the Data Cube and load in some data:

Hopefully, the code contained in the previous cell should not be a surprise to you. As mentioned previously, the Data Cube returns data in an array structure enabled using the Xarray module. Xarray has useful and convenient plotting facilities built into it. It is as simple as calling a specific band and then calling a plot method to that loaded data. In practise this is done with a series of full stops, shown by running the code segment below:

Hopefully an image has been displayed above, using matplotlib to display it (the module matplotlib was also covered in the third notebook in this series). I will now describe the command to display the red band, ds.red.plot(). ds by itself is not able to be plotted, as it is a Dataset object, and could potentially contain multiple days worth of data, or multiple bands. However, to select just a single band as a DataArray, which is an object which can be plotted, you must enter the command ds."band_desired_to_be_selected". So if I just wanted to select the green band for example, I would enter ds.green.

Any DataArray object which is just a single timestep can in theory be plotted as an image using the .plot() method. This would also work if you had a DataArray object already, such as one saved as a variable. It is also possible to plot RGB images, either using the visible channels or to create a false colour composite image. This is a slightly more difficult task and we will go through it now.

RGB Image Creation

Creating an RGB Image requires slightly more work than just calling .plot() on a DataArray, as there are now three bands which need to be plotted. The ds object we plotted above contains four bands as the cloud mask is present in the Dataset, so we will change the query to only get three bands from the Data Cube, by running the code below:

The first thing to do in order to plot an RGB image is to turn the Dataset into a single timestep DataArray, but one which still contains the three bands required. To do this, we first select a single timestep using the .isel() method. This is required even in the instance below where there is only one timestep in the Dataset, as the .isel() removes it as an array dimension entirely. Then, the array can be transformed into a DataArray using the .to_array() command, and specifying the dimension along which the original bands in the Dataset will now be separated - in this instance we call this dimension the colour dimension.

In order to scale the image so all three bands are proportional, we also need to define a "fake saturation" parameter, which is the number where data in a pixel of the satellite band is assumed to be saturated. If in all three bands, a pixel is above this number then it will appear saturated and white. This can be done either by specifying a single integer or by analysing the image to set a certain percentage of pixels as saturated. Some images won't even require a fake saturation parameter which is lower than the natural values of the image. However, in images where there is one bright spot, including this value can distort the image and make all the useful information appear very dark.

These steps described are shown below:

The next few steps are quite complicated. Due to the way the DataArray is arranged, at the moment, the dimensions of the image are, in order, ("color", "x", "y"). In order to plot an RGB image with Python, the "color" dimension needs to be specified after the "y" and "x" dimensions in the call to plot. So we need to move it, and this can be done with the .transpose() method. The command shown below will rearrange the dimensions from ("color","y","x") to ("y","x","color"):

The next two steps describe scaling the data to ensure nothing exceeds the fake_saturation variable we have defined and also to ensure all data in the image bands is scaled between 0 and 1, as this is required for plotting an rgb image. To ensure no pixels are above the fake_saturation level, we mask the data so that any pixels above this value are made equal to the value. This is done using the .where(*insert condition*) method.

Then all pixels in the DataArray are divided by the fake_saturation value. As the maximum value of the array is now the fake_saturation value and the minimum value is 0, this means all values in the DataArray are now between 0 and 1. This is done using the /= operator, which divides the value on the left hand side by the value on the right hand side of the equation, and sets the result of that as the new left hand side value.

These two steps are shown below:

Now we have a DataArray which is ready to be plotted as an RGB image. This can be done in a similar way to how it has been done above, except that in this case, we use the method .plot.imshow() instead of just .plot(). This is because imshow has support for collecting together and plotting multi-band images.

Alternatively, you could use the keyword robust=True in imshow to show data. This will remove the anomalous pixels, it will clip the data at 2 and 98 percentile.

Creating indices from satellite imagery

It is possible to create indices from satellite imagery as well. The Kyrgyzstan Data Cube does already contain some median averaged multiday indices which are accessible by changing the product type to "indices" in when querying the Data Cube.

However, sometimes you might want to create your own indices - be that an index which is in the Data Cube already but for a different time period or perhaps an index which isn't currently in the Data Cube. We will look at creating an NDVI image in the Data Cube now, and for that we will need to change the query again to bring in the Near-Infrared band and the cloud mask by running the code below:

Now, it is an easy task to create an NDVI band from this data. The NDVI is calculated using the following equation: $NDVI = \frac{NIR - RED}{NIR + RED}$

Using the knowledge gained above that you can get bands out of a Dataset using the ds."band_desired_to_be_selected", it is possible to do this calculation for all pixels in a band in the following way:

The .astype(float) method on the end of the command ensured that the output NDVI array was of type float, as weird results can occur occasionally with Xarray when doing full array maths with integer arrays.

The NDVI array can now be plotted as a single band array, by calling the .plot() method on it, as done before. Run the code below to see the plot of NDVI.

This is useful, but the clouds and cloud shadow present in the image have resulted in areas where it is saying the NDVI is very high or very low. However, using the mask band, we can filter out everything which isn't land.

In the mask band, each pixel has a value, corresponding to a different type of mask value. For example, clear land is represented by 5, snow/ice by 7, cloud by 15 and so on. The full list of categories can be found in the shared_data folder.

For the NDVI, we are only interested in looking at clear land pixels, so we can use one of the methods seen previously .where() to mask everything out except for land. Once this is done, the NDVI can be recalculated and plotted, as seen below: