From Photons to Pixels: Image Formation and Color Spaces, with OpenCV in Python and C++
How a camera turns light into an array of numbers — projection, sampling, quantization, the Bayer sensor — and the color spaces (RGB/BGR, grayscale, HSV, YCrCb, Lab) you convert between every day. The equations, plus runnable OpenCV in both Python and C++.
Before any model, filter, or detector touches an image, that image has already been through a whole pipeline: light bounced off a scene, was focused by a lens, landed on a sensor, got sampled and quantized into integers, and was arranged into a grid you call an array. Understanding that pipeline — and the color spaces you reshuffle those integers into — is the foundation everything else sits on. Here it is end to end, with the math and runnable OpenCV in Python and C++.
1. What a digital image actually is
A scene in front of a camera is continuous: at every point and every wavelength there’s some amount of light. Mathematically we can write the image reaching the sensor as a continuous function
A computer can’t store a continuous function, so two things happen. Sampling reads only on a grid of points spaced apart, and quantization rounds each reading to one of a finite set of levels:
For a standard 8-bit image , so and every pixel is an integer in
. That’s the whole reason an image is a 2-D array of uint8 — sampling
gives it width and height, quantization gives it the integer values.
2. Image formation inside the camera
Projection: from 3-D scene to 2-D plane
A lens (idealized as a pinhole) projects a 3-D point in camera coordinates onto the image plane at focal length :
Converting those metric coordinates to pixel indices adds the focal lengths in pixels and the principal point — the camera intrinsic matrix :
This is the matrix you recover from camera calibration, and it’s what lets you go back and forth between pixels and rays.
From light to numbers: the sensor
Each sensor pixel (“photosite”) collects photons over the exposure time and turns them into a charge. A simplified but useful model of the digital value is linear in the incident irradiance :
where is exposure time, is the analog/ISO gain, is noise, and is the analog-to-digital quantizer from §1. Two practical consequences fall out immediately: more gain amplifies noise along with signal, and the final rounding to levels is where banding in smooth gradients comes from.
Color from a gray sensor: the Bayer filter
A silicon sensor only measures intensity — it’s colorblind. To capture color, manufacturers overlay a color filter array (CFA), most commonly the Bayer pattern: a mosaic of red, green, and blue filters with twice as many greens (your eye is most sensitive to green). Each photosite therefore records only one of R, G, or B; the missing two channels at every pixel are interpolated in a step called demosaicing. OpenCV does it for you:
import cv2
# `raw` is a single-channel Bayer mosaic from the sensor (here: BGGR layout).
bgr = cv2.cvtColor(raw, cv2.COLOR_BayerBG2BGR) # demosaic -> 3-channel BGR#include <opencv2/opencv.hpp>
// `raw` is a single-channel Bayer mosaic from the sensor (here: BGGR layout).
cv::Mat bgr;
cv::cvtColor(raw, bgr, cv::COLOR_BayerBG2BGR); // demosaic -> 3-channel BGRBy the time you call imread, all of this has already happened — but it explains
why your image is BGR, why greens look cleanest, and where demosaicing artifacts
near sharp edges come from.
3. The image as an array
Loading an image hands you that grid of integers. The one detail that trips up everyone new to OpenCV: channels are ordered B, G, R, not R, G, B.
import cv2
img = cv2.imread("street.jpg") # BGR, dtype uint8
print(img.shape, img.dtype) # (1080, 1920, 3) uint8
h, w, c = img.shape # rows, cols, channels#include <opencv2/opencv.hpp>
#include <iostream>
int main() {
cv::Mat img = cv::imread("street.jpg"); // BGR, type CV_8UC3
std::cout << img.rows << "x" << img.cols
<< " channels=" << img.channels() << "\n"; // 1080x1920 channels=3
int h = img.rows, w = img.cols, c = img.channels();
}Reading and writing a single pixel
Indexing is (row, col) — i.e. (y, x) — and each pixel is a 3-vector in BGR
order:
b, g, r = img[100, 200] # one pixel at row 100, col 200 (uint8 each)
print(int(b), int(g), int(r))
img[100, 200] = (0, 0, 255) # paint it pure red (B=0, G=0, R=255)cv::Vec3b px = img.at<cv::Vec3b>(100, 200); // (row, col), BGR order
uchar b = px[0], g = px[1], r = px[2];
img.at<cv::Vec3b>(100, 200) = cv::Vec3b(0, 0, 255); // pure red4. Color spaces
A color space is just a choice of axes for the same color information. You
convert between them because some tasks are far easier in the right coordinate
system. In OpenCV every conversion goes through one function, cvtColor.
Grayscale (luma)
Dropping color collapses three channels to one. It isn’t a plain average — the weights match human luminance sensitivity (Rec. 601):
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # shape (H, W), single channelcv::Mat gray;
cv::cvtColor(img, gray, cv::COLOR_BGR2GRAY); // single-channel CV_8UC1HSV — hue, saturation, value
RGB mixes color and brightness together, which makes “find the red things” hard when lighting changes. HSV separates what the color is (hue) from how vivid (saturation) and how bright (value). With , let , , and chroma :
A gotcha worth memorizing: in 8-bit OpenCV, hue is stored in (degrees halved to fit a byte), while and use the full .
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) # H in [0,179], S,V in [0,255]cv::Mat hsv;
cv::cvtColor(img, hsv, cv::COLOR_BGR2HSV); // H in [0,179], S,V in [0,255]YCrCb — luma plus chroma
This is the space behind JPEG and most video. It keeps the luma and stores two color-difference channels (Rec. 601, 8-bit, with offset ):
Because the eye is far more sensitive to luma than chroma, codecs subsample (4:2:0) and almost nobody notices — a direct, daily payoff of the camera→color-space chain.
ycrcb = cv2.cvtColor(img, cv2.COLOR_BGR2YCrCb) # channels: Y, Cr, Cbcv::Mat ycrcb;
cv::cvtColor(img, ycrcb, cv::COLOR_BGR2YCrCb); // channels: Y, Cr, CbCIELAB — perceptually uniform
Lab is designed so that equal numerical distances look like roughly equal color
differences to a human — handy for color comparison and matching. It’s a
nonlinear transform through CIE XYZ, with the reference white:
lab = cv2.cvtColor(img, cv2.COLOR_BGR2Lab) # L in [0,255], a,b offset by 128cv::Mat lab;
cv::cvtColor(img, lab, cv::COLOR_BGR2Lab); // L in [0,255], a,b offset by 128Splitting and merging channels
Whatever space you’re in, you can pull it apart and put it back:
b, g, r = cv2.split(img) # three single-channel images
merged = cv2.merge([b, g, r]) # back to one 3-channel imagestd::vector<cv::Mat> ch;
cv::split(img, ch); // ch[0]=B, ch[1]=G, ch[2]=R
cv::Mat merged;
cv::merge(ch, merged);5. A practical payoff: segmenting by color in HSV
Here’s why all of this matters. Picking out red objects in RGB is fiddly; in HSV it’s a hue window. Red is the awkward case because its hue wraps around 0, so we union two ranges:
import cv2
import numpy as np
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Red wraps around hue = 0, so combine the low and high ends.
mask1 = cv2.inRange(hsv, np.array([0, 120, 70]), np.array([10, 255, 255]))
mask2 = cv2.inRange(hsv, np.array([170, 120, 70]), np.array([179, 255, 255]))
mask = mask1 | mask2
result = cv2.bitwise_and(img, img, mask=mask) # keep only the red pixelscv::Mat hsv, mask1, mask2, mask, result;
cv::cvtColor(img, hsv, cv::COLOR_BGR2HSV);
// Red wraps around hue = 0, so combine the low and high ends.
cv::inRange(hsv, cv::Scalar(0, 120, 70), cv::Scalar(10, 255, 255), mask1);
cv::inRange(hsv, cv::Scalar(170, 120, 70), cv::Scalar(179, 255, 255), mask2);
cv::bitwise_or(mask1, mask2, mask);
cv::bitwise_and(img, img, result, mask); // keep only the red pixelsThe same five lines that would be brittle in RGB are robust in HSV — purely because we chose better axes for the question.
Takeaways
- An image is sampling + quantization of continuous light — that’s why it’s a grid of integers in , and where aliasing and banding originate.
- The camera pipeline is lossy at every stage (projection, sensor noise, demosaicing, quantization); artifacts you fight later are born here.
- OpenCV is BGR, indexed
(row, col)— internalize this once and stop fighting it. - Color spaces are coordinate choices. Convert with
cvtColor; reach for grayscale to drop color, HSV for color thresholding, YCrCb for compression, Lab for perceptual distance. - Watch the ranges: 8-bit hue lives in , not .
Once light is a clean array of numbers, the fun starts — like quantizing the models that consume those arrays and running them on an integrated GPU. That’s exactly what we do in YOLO26-seg vs RF-DETR-Seg: INT8 instance segmentation on an Intel iGPU.