
This node aims to capture the practical aspect of CV, while considering the tooling for the job. Expect theoretical depth to accompany this overview. OpenCV has bindings for several languages but I'll specifically be proceeding with the python ecosystem (and relevant libraries therefore for pragmatic completeness)

1. Reads, Conversions, and Writes

Images are dealt with as an array of numbers (using numpy usually). A combination of RGB values helps paint the pixels as you see them. Most popular image formats include JPEGs and PNGs with the latter also allowing for richer information like an alpha channel.

Mathematically, images can be loaded in three formats: Colored, GrayScale and unaltered (with an alpha channel if it exists).

It's good practice to explicitly specify the concerned "cmap" (color map) when loading the images.

import cv2

img = cv2.imread("/path/to/image.extension",cv2.IMREAD_<FLAG>)

# Possible values for FLAG
# - GRAYSCALE : enum of 0
# - COLOR  : enum of 1 (default)
# - UNCHANGED : enum of -1

Also keep in mind to provide colormaps when plotting images as using plt on a grayscale yields colors that you probably don't expect.

import matplotlib.pyplot as plt
import cv2

img = cv2.imread(path_to_grayscale, 0)
plt.imshow(img) # yields unusual colors
plt.imshow(img, cmap = "gray") # yields what you expect

Some helpful attributes to understand the image better at a first glance :

img.dtype # uint8 for 8 bit depth of each channel (24 bits total)
img.shape # (H,W,C) - 3 tuple of height , width , channel

Note that, OpenCV stores images in BGR (blue, green, red) and matplotlib expects it in RGB format. Channel reversal is necessary, hence, to plot the image as it actually isn't

c_reversed_img = img[:,:,::-1]
plt.imshow(img) # default cmap expects 3 channels so that's fine

Channels can be split and merged using the following methods:

r,g,b = cv2.split(c_reversed_img)

assert (img == cv2.merge((b,g,r))).all() # expects inputs of similar shape

Once loaded in a format, images can be converted to other color formats using cv2.cvtcolor

transformed = cv2.cvtcolor(original, transform_code)

checkout all transformation codes in the docs.

With this mechanism, we can say map the input image into Hue, Saturation, Value format (HSV), alter the saturation to change intensity in a way independent of the individual color channels' values and then map back to the RGB format.

Finally, one can write images in either grayscale (8-bit) or BGR (24-bit) format after conversion using imwrite

cv2.imwrite(save_path, img_object)

2. Basic Manipulations

OpenCv represents images as matrices and all relevant operations can be used to modify the image

Cropping for instance, would be slicing the matrix.

import cv2

  img_rgb = cv2.imread(imgpath)[:,:,::-1]

   # center cropping
  width_dev_frac = 0.1 
  height_dev_frac = 0.1

  h,w = img_rgb.shape[:2]

  gen_slice = lambda f,t : slice(int(0.5*t - f*t),
                                 int(0.5*t + f*t))

  center_cropped = img_rgb[gen_slice(height_dev_frac,h),

Resizing needs an interpolation algorithm (linear by default) to fill in the intermediates when scaling and is done by cv2.resize

So, Image manipulations are encompassed by the set of matrix operations that can be performed on the matrix representation of the image.

3. Annotations

Basic geometry again boils down to drawing over the two-dimensional discrete cartesian plan captured by the matrix reprentation of the image.

Any drawing function will be completely defined by a set of parameters that describes the object being drawn. For instance, needs a radius and a circle other than the normal defaults (line color, thickiness, drawing type, etc).

Other basic drawers include : cv2.line, cv2.rect, etc…

Text can also be written down using cv2.putText with all the font styling information

4. Enhancement

Brightness pertains to altering intensity values by addition/subtraction.

Contrast involves scaling all the intensities by a factor so as to increase/decrease the difference between them.

Overflow issues can be dealt with clipping at the max of the intensities via np.clip

One can create binary images (1's and 0's from grayscale) by thresholding images, using

  • cv2.threshold
  • cv2.adaptiveThreshold

Masks can be better used with bit-wise (or pixel wise boolean operations) like OR, AND , XOR. For instance, a circular alpha view can be emulated by a bitwise-and on a circular mask
