avatarAbdishakur

Summary

The article provides an overview of the top open-source earth observation data science toolkits, emphasizing their integration with data science and machine learning ecosystems.

Abstract

The field of earth observation data science has seen significant growth, fueled by advancements in satellite technology and data collection. The article highlights the leading open-source tools available to geospatial and earth observation data scientists. It begins by acknowledging the rapid increase in satellite-based earth observation data and then introduces Google Earth Engine (GEE) as a comprehensive platform offering extensive data processing, analysis, and machine learning capabilities. The article also discusses EO-learn, a Python package that integrates with the machine learning ecosystem, facilitating the use of advanced deep learning models on satellite imagery. Radiant MLHub is presented as a valuable resource for accessing open-source training datasets compliant with the SpatioTemporal Asset Catalog (STAC) standard. The Open Data Cube platform is noted for its data management capabilities and ease of access to large-scale earth observation data. Additionally, the article recommends other essential tools and libraries such as Rasterio, WhiteBoxTools, EarthPy, GDAL, and PDAL, which are crucial for various geospatial data processing tasks. The article concludes by affirming the golden age of earth observation and the importance of these resources in studying climate, agriculture, and natural hazards.

Opinions

  • Google Earth Engine (GEE) is regarded as an indispensable tool for earth observation data scientists, offering end-to-end functionality with extensive satellite imagery sources, data processing tools, and machine learning algorithms.
  • EO-learn is recognized for its ability to scale earth observation pipelines and perform complex machine learning and deep learning tasks, making it accessible even to non-experts.
  • Radiant MLHub is praised for providing a growing catalog of ready-to-use machine learning-ready earth observation training datasets.
  • The Open Data Cube is valued for its high-performance data access and management, supporting large-scale geospatial data analysis.
  • The author expresses a preference for using the mentioned toolkits in conjunction with other geospatial and earth observation data science packages and libraries for optimal processing of datasets.
  • The article suggests that the current era is particularly advantageous for earth observation data scientists due to the availability of advanced tools and resources.

The Best Earth Observation Data Science Toolkits

Platforms, Tools and Packages for Geospatial/Earth Observation Data Scientists

Photo by USGS on Unsplash

The satellite-based earth observation data is increasing at a rapid base, thanks to technological development in remote sensing platforms, and breakthroughs in data collection and storage. Today, we have more than 768 earth observation satellites in orbit, compared with only 150 in 2018.

As a Geospatial or earth observation data scientist, you have a vast array of tools and resources to choose. In this article, I highlight the best open source tools in the market that are integrated into the data science ecosystem.

1. Google Earth Engine (GEE)

Your wish granted. GEE is all in one package. Google Earth Engine(GEE) is by far the complete one in all package for earth observation data scientists. It does offer not only Geospatial data processing and analysis capabilities but also provides ready to use datasets to focus on analysing rather than downloading data.

With GEE, you can perform planetary-scale analysis with freely available satellite images from NASA/USGS (Landsat, MODIS), European Union (Sentinel 1 & 2) and non-satellite or derived products like elevation, climate data and land cover.

With a full-featured development environment in both Javascript and Python APIs, Google Earth Engine (GEE) is an essential arsenal for Earth Observation data scientist/analyst. It also comes with Code Editor (Javascript) where you can run your analysis and visualisation right in the browser.

Furthermore, you can create Machine Learning models right with GEE and can produce full ML models and predictions right in the browser. The python module integrates well with other python packages, and you can run in Jupyter notebooks or Google Colab.

There you go — a complete end-to-end functionality in GEE with terabytes of satellite imagery sources, data processing tools and ML algorithms, right in your browser.

It could not have been better!

2. EO-learn

Earth observation pipelines at scale running on CPU/GPU. EO-Learn is a Python package that links closely with the data science and machine learning python ecosystem to the remote sensing/earth observation community. With eo-learn, even non-experts can use to extract and derive valuable information from satellite images.

It also enables earth observation experts to carry state of the art deep learning and computer vision models efficiently.

Eo-learn is built on Numpy arrays and shapely geometry, so geospatial data scientists feel right at home. With bounding boxes using Geopandas, eo-learn can download satellite images right in your environment before carrying out any analysis.

Not only eo-learn enables you to carry out ML and Deep learning models, but it also has modules for batch processing, masking, IO functionalities and geometric transformation and conversion between vector and raster data.

With eo-learn, you can create pipelines to perform batches and multi-batches that can be run on parallel GPUs/GPUs for heavy processing. Forexample, this pipeline runs 12 hours for a sequence of tasks for an array of 200,000 km2, to perform land cover classification for an entire country.

Radiant MLHub

For earth observation and Machine learning, Radiant MLHub offers ready to use and open-source earth observation training datasets. All the dataset in their catalogue are SpatioTemporal Asset Catalog (STAC) complaint and the list is growing already — 14 datasets available so far.

The training datasets cover different machine learning applications including image classification, segmentation and object detection. Popular ML earth observation training datasets available here include SpaceNet, BigEarthNet and LandCoverNet.

Accessing the data is free and open for anyone via an API. Examples of Jupyter Notebook on how to access different datasets are available in MLHub Tutorials repository.

Open Data Cube

As a platform and an open-source geospatial data management tool, Open Data Cube provides easy and open data access to large indexed amounts of Earth observation data. The Python API enables earth observation analysts to query and access data with high performance, allowing them to carry out country-level to continent-scale processing of stored data.

Open Data Cube currently supports Digital Earth Australia and Africa Regional Data Cube and also provides tutorials, guides, documentation for their users.

They also offer a generous free ODC sandbox (16 GB ram) with preconfigured Jupyter Notebooks which you can run on the cloud without configurations and installation. With this feature, many Earth observation are finding it easy to start analysing petabytes of data without worrying about the software.

These platforms enable thousands of Earth observation data scientist and facilitate day-to-today tasks. Although some of these toolkits can be used as standalone and in a complete end-to-end pipeline, I tend to use them with other geospatial/earth data science packages and libraries.

So, I will conclude this article by providing the packages I use most to process earth observation datasets, that you will probably find useful.

  • Rasterio: An essential, lightweight and flexible Python package for remote sensing image reading and writing.
  • WhiteBoxTools: is an advanced geospatial data analysis tool including an extensive image processing tasks like image enhancement, filtering operations, cluster algorithms and other image processing functionalities, like hydrological, and geomorphometric analysis. It can handle Lidar Data effectively, enabling you to segment, tile or join raster lidar data and derive outputs.
  • EarthPy: s a python package that makes it easier to plot and work with spatial raster and vector data using open source tools. EarthPy bridges the gap between raster and vector data so you can work effectively between the two different data types.
  • GDAL: GDAL is a lovely tool used by most earth observation users. It is a translator library for raster and vector geospatial data formats and provides an extensive list of satellite image processing tools.
  • PDAL: is Point Data Abstraction Library. The focus of the PDAL is LiDAR data but also offers other tools as well. It also provides a simple python binding through Numpy, which enables working with earth observation tools like Jupyter notebooks and python.

Final thoughts

We live in the golden age of Earth observation. Not only do we witness the breakthroughs in space and airborne expeditions but also have incredible resources to analyse and study the earth and its environment. These platforms enable earth observation data scientists to study the climate, weather, agriculture, transportation and infrastructure as well as natural hazards.

Although the list is not extensive, I believe these are the best resources out there. If you think, I have left out some of your favourite tools; please let me know.

Data Science
Earth Observation
Remote Sensing
Machine Learning
Towards Data Science
Recommended from ReadMedium