avatarDaniel Ellis Research

Summary

The web content describes the process of digitizing data from plots in scientific papers or images using WebPlotDigitizer (WPD), a web-based tool that simplifies the extraction of numerical data for analysis and manipulation.

Abstract

The article titled "Extracting (digitising) data from plots in scientific papers or images" outlines the importance of converting analogue figures into digital formats for further research and data visualization. It introduces WebPlotDigitizer (WPD) as a versatile tool capable of handling various types of plots, including 2D, bar, ternary, and polar diagrams. The WPD operates across different operating systems and offers an online interface for user convenience. The digitization process involves uploading an image, defining plot boundaries, entering known points, and configuring settings such as mask, foreground, background, and pixel sampling intervals. The article demonstrates the use of WPD with a historical figure depicting trade profits to the West Indies from 1700 to 1780, and it concludes by emphasizing the tool's simplicity, practicality, and value in retrieving data from publications where raw data may be lost.

Opinions

  • The author suggests that data extraction from images is a common requirement in academia and data visualization, where existing research and historical data need to be compared or built upon.
  • The article conveys that WebPlotDigitizer is user-friendly, with algorithmic functions running in the background, allowing users to focus on simple 'point and click' interface interactions.
  • It is implied that WebPlotDigitizer's ability to run on various operating systems or through an online interface adds to its accessibility and convenience for users.
  • The author expresses that the tool is particularly valuable for modern-day scientists, as it enables the recovery of data from publications where the original raw data has been lost.
  • The article positively endorses WebPlotDigitizer as an "invaluable tool" for scientists and researchers, highlighting its ease of use and practical applications.

Extracting (digitising) data from plots in scientific papers or images

Chart of exports and imports from the West Indies (1700–1780) — W. Playfair. Source: WikiCommons

Often we see a graphic of interest and want to apply the data to our own interest and designs. This frequently occurs within academia, where research needs to be compared to that already existing within scientific journals and in data visualisation where historic figures can be built upon and improved (with the addition of new data/designs).

How do we extract the data

The extraction of data from images is called digitization. This is the conversion of an analogue figure into a quantized digital (numerical) format — to be used for manipulation and analysis.

The simplest process works by defining the range of data within a plot and calculating the value of the points on a plotted line within it. To do this we can make use of the WebPlotDigitizer (WPD).

Usage Case for WebPlotDigitizer

This can with a range of plots ranging from 2D (X-Y) plot to Bar Plots, Ternary and Polar Diagrams. The web-based nature of WPD means that it can run on a range of operating systems or even an online interface. Access to this can be found at :

https://automeris.io/WebPlotDigitizer/

In this article, we take a figure showing the import and export profits from trades to the West Indies between 1700 and 1780. This is a lovely figure which was hand plotted by W. Playfair. In the sections below we describe the process of digitizing the figure and then provide a couple of quick plots in python of the extracted data.

Digitization Process

The general digitization process is relatively simple using the WPD software. The algorithmic functions are hidden in the background, and the user only needs to change a handful of ‘point and click’ parameters on the interface. The procedure for this is outlined below:

  1. Click on file > Load Image > select the type of plotand upload your image.
  2. For an x-y plot select your boundaries. These begin with the minimal known x value, then the maximum known x value, followed by the minimum and maximum known y values
  3. Enter the values for the known points and select if a log scale has been used (You may add additional configuration points after pressing the confirm button.)
  4. Adjust settings such as mask (selecting the area to explore), foreground (line) and background colours, and the pixel samplingintervals — note sometimes a larger averaging area within the pixel sampling intervals gives a better result (in the example case 30 pixels was used).
  5. Click run within the algorithm box.

This produces a selection as shown below.

A screenshot of the WebPlotDigitizer online interface.

DataMunging

Finally, we can extract the generated comma separated file (CSV) for plotting in another program (e.g. JS, Python, R …). A quick example in python using pandas gives the following:

Left: Selection of the exports line. Middle: A line-plot for imports and exports. Right: A stacked area plot showing the difference between the two values.

Conclusions

The WebPlotDigitizer is very simple to use program with many practical applications. Its HTML and javascript nature makes it capable of running both online and on most popular operating systems. Finally, the ability to extract data from publications (especially ones where the raw data has been lost) makes it an invaluable tool for the modern-day scientist.

Digitizing
Image
Scientific
Figure
Extract
Recommended from ReadMedium