Understanding Digital Image Encoding
I think it is important to understand how images are encoded, in order to fully understand deep learning applied to computer vision.
Digital image encoding is the process of converting a visual image into a form that a computer can understand and manipulate. This process involves several steps, including sampling, quantization, and encoding. Here’s a more detailed explanation of the process:
- Image Sampling: The first step in the digitization process is to divide the image into a grid of individual picture elements or pixels. This is called “sampling.” For instance, a 1920 x 1080 pixel image contains a grid of 1920 pixels across and 1080 pixels down, giving a total of 2,073,600 individual pixels.
- Color Representation: Each pixel in an image is represented by a combination of primary colors. The most common method uses Red, Green, and Blue (RGB). Each of these colors is assigned an intensity value, and their combination defines the pixel’s color. In a standard 24-bit color representation, each color gets 8 bits, resulting in 256 possible intensities per color, and more than 16.7 million possible color combinations per pixel.
- Quantization: Quantization is the process of reducing the number of distinct colors used in an image. This can help to reduce the amount of data needed to represent the image. For instance, an image might be quantized down to 256 colors, with each pixel’s color represented by a single 8-bit number. Quantization can lead to some loss of detail, but this is often not noticeable to the human eye.
- Pixel Encoding: The quantized values are then encoded into a binary format. Each pixel value is represented as a binary number. For example, in an 8-bit grayscale image, each pixel can have a value from 0 (black) to 255 (white). These values are stored as a binary number, from 00000000 (0 in decimal) to 11111111 (255 in decimal).
- Compression: This step is optional but often used to reduce the amount of space needed to store the image. Compression can be lossless (no information is lost in the compression) or lossy (some information is lost, but the overall appearance of the image is preserved). Common image file formats like JPEG use lossy compression, which uses algorithms to reduce file size by discarding some data that is less noticeable to human perception.
- File Format: Lastly, the image data along with additional information such as the image’s dimensions, the color model (e.g., RGB), and any metadata are stored in a specific image file format like JPEG, PNG, GIF, BMP, etc. Each file format has a particular way of organizing this information, and may also include specific compression techniques.
In the end, a digitally encoded image is just a file on a computer, containing binary data that, when processed correctly, can reproduce a grid of colored pixels representing the original image.
Converting Image Pixels to Binary and Saving to a Text File with Python
To view the binary representation of an image, you’ll need to use a library in Python that can handle image data. One such library is PIL (Pillow). Here's a simple example of how you can open an image file and view its RGB values in binary format:
from PIL import Image
# Open an image file
img = Image.open('image.jpg')
# Convert the image data to RGB (if it's not already)
rgb_img = img.convert('RGB')
# Get the size of the image
width, height = img.size
# Open a new text file in write mode
with open('binary_representation.txt', 'w') as f:
# Iterate over the pixels of the image
for y in range(height):
for x in range(width):
# Get the RGB values of the pixel
r, g, b = rgb_img.getpixel((x, y))
# Write the binary values of the RGB components into the text file
f.write('R: ' + format(r, '08b') + ' G: ' + format(g, '08b') + ' B: ' + format(b, '08b') + '\n')This script will create a new file named binary_representation.txt in the same directory and write the binary representation of the red, green, and blue components of each pixel in the image. Please replace 'image.jpg' with the actual image file path you want to decode in binary.
Image compression and decompression
One relatively simple and widely used data compression algorithm is Run-Length Encoding (RLE). RLE is especially suitable for images with large areas of single-color pixels (like black and white images) but isn’t as effective for more complex, colorful images.
Here is a basic way to implement RLE compression on an image using Python with the Pillow library:
from PIL import Image
import numpy as np
def run_length_encoding(image):
# Create an empty list to store the run-length data
rle = []
# Convert the image into a numpy array
image_array = np.array(image)
# Flatten the array
flat_array = image_array.flatten()
# Initialize the run-length variables
run_value = flat_array[0]
run_length = 1
# Iterate over the pixel values in the flattened array
for pixel_value in flat_array[1:]:
if pixel_value == run_value:
# If the pixel value is the same as the current run value, increase the run length
run_length += 1
else:
# Otherwise, add the run value and length to the list and start a new run
rle.append((run_value, run_length))
run_value = pixel_value
run_length = 1
# Append the last run
rle.append((run_value, run_length))
return rle
# Load the image
image = Image.open('image.jpg').convert('L') # Convert to grayscale for simplicity
# Apply the run-length encoding
rle = run_length_encoding(image)
# Write the RLE data to a text file
with open('rle_compressed.txt', 'w') as f:
for run_value, run_length in rle:
f.write(f'{run_value} {run_length}\n')This script loads an image, converts it to grayscale, applies the RLE compression algorithm, and writes the compressed data to a text file. Note that this is a very basic implementation of RLE and may not provide good compression ratios for complex images.
Please replace 'image.jpg' with the path to the actual image file you want to compress. The compressed data will be saved in a file named 'rle_compressed.txt'. Each line in this file represents a run, with the pixel value and the number of times it repeats.
Please note that this method only compresses the image, it does not provide a way to decompress it back into an image file. In real-world applications, you would also need to implement a function to decompress the data and reconstruct the original image (or an approximation of it).
In order to view the compressed image, you would need to implement a decompression function that can convert the RLE data back into an image format.
Here’s a very simple implementation of RLE decompression, and then saving the decompressed data back into an image file:
def run_length_decoding(rle, width, height):
# Initialize an empty list for the pixel data
pixel_data = []
# Iterate over the runs in the RLE data
for run_value, run_length in rle:
# Extend the pixel data list with the run length repeated run value
pixel_data.extend([run_value] * run_length)
# Convert the pixel data list into a numpy array
pixel_array = np.array(pixel_data)
# Reshape the array into the original image dimensions
image_array = np.reshape(pixel_array, (height, width))
# Create a new image from the array and return it
return Image.fromarray(np.uint8(image_array))
# Decompress the RLE data
decompressed_image = run_length_decoding(rle, image.width, image.height)
# Save the decompressed image
decompressed_image.save('decompressed_image.jpg')
# Open the decompressed image
decompressed_image.show()This script takes the RLE data and the original image dimensions as input, constructs an array of pixel data from the RLE data, reshapes the array into the original image dimensions, and creates a new image from this data. The image is then saved to a new file and displayed.
Please note, the provided run-length encoding and decoding functions are simple and are applicable for grayscale images. For colored images, you would need to handle the color channels appropriately. Furthermore, the RLE method may not provide good compression ratios for complex images, and some image information may be lost during the process. The more advanced image compression techniques like JPEG, PNG, etc., involve more complex mathematical and computational methods to reduce the size of the image file while preserving as much image quality as possible.
Applying Filters
demonstrate how to apply a few simple filters to an image using Python’s Pillow library.
First, let’s cover the steps to apply a grayscale filter, a blur filter, and a sharpening filter:
from PIL import Image, ImageFilter
# Load an image
image = Image.open('image.jpg')
# Apply a grayscale filter
gray_image = image.convert('L')
gray_image.save('gray_image.jpg')
gray_image.show()
# Apply a blur filter
blur_image = image.filter(ImageFilter.BLUR)
blur_image.save('blur_image.jpg')
blur_image.show()
# Apply a sharpening filter
sharp_image = image.filter(ImageFilter.SHARPEN)
sharp_image.save('sharp_image.jpg')
sharp_image.show()In this script:
- The
convert('L')method is used to convert the image to grayscale. The 'L' stands for 'luminance', which is a fancy term for brightness in an image. In a grayscale image, the brightness is the only information stored in the pixels (as opposed to color images, where each pixel stores hue, saturation, and brightness). - The
filter(ImageFilter.BLUR)method applies a blur filter to the image. This is done by convolving the image with a kernel that averages the surrounding pixels. - The
filter(ImageFilter.SHARPEN)method applies a sharpening filter to the image. This enhances the edges and other high-frequency components in the image.
Remember to replace 'image.jpg' with the path to your actual image. The processed images will be displayed and saved in the same directory as 'gray_image.jpg', 'blur_image.jpg', and 'sharp_image.jpg'.
There are many other filters available in the ImageFilter module of the Pillow library, such as edge detection, embossing, and more. You can experiment with these filters to achieve different effects.






