Member-only story
Understanding LeNet: The Pioneer of Image Recognition in Computer Vision

Overview of LeNet Architecture:
LeNet is one of the earliest convolutional neural networks (CNNs), designed for handwritten digit recognition (e.g., MNIST dataset). It processes grayscale images and consists of a series of layers: convolutional layers, pooling layers, and fully connected layers, culminating in an output layer for classification. Below is a breakdown of its architecture based on the document:

Key Features and Historical Context
- Year Introduced: 1998
- Author: Yann lacuna
- Dataset: MNIST (handwritten digits)
- Innovation: Introduced the use of CNNs in document recognition.

1. Input Layer
- Input: A grayscale image of size 32x32x1 (32 pixels wide, 32 pixels tall, 1 channel for grayscale).
- Purpose: Accepts simple black-and-white visuals, like digits, for processing.
2. Convolution Layer 1
- Operation: Applies six 5x5 filters to the input image.
- Output Size Calculation:
- Formula: (Input size — Filter size + 2 * Padding) / Stride + 1
- Here: (32–5 + 2 * 0) / 1 + 1 = 28
- Output: 28x28x6 (six feature maps of size 28x28).
- Purpose: Extracts basic features (e.g., edges) from the input image.
3. Pooling Layer 1
- Operation: Applies a 2x2 average pooling filter with a stride of 2.
- Input: Six 28x28 feature maps.
- Output: 14x14x6 (downsizes each feature map to 14x14, depth remains 6).
- Purpose: Reduces spatial dimensions, making the network computationally efficient and less prone to overfitting.