Blog Post 2 - Preprocessing
Alex Berry, Jason Chan, Hyunjoon Lee
Brown University Data Science Initiative
DATA 2040: Deep Learning
March 1, 2020
Bengali.AI Handwritten Grapheme Classification Kaggle Competition
This is the first blog post by our team, Berry Bengali, participating in the Bengali AI Kaggle competition, whose link can be found here.
This project involves classifying handwritten characters of the Bengali alphabet, similar to classifying integers in the MNIST data set. In particular, the Bengali alphabet is broken down into three components for each grapheme, or character: 1) the root, 2) the vowel diacritic, 3) the consonant diacritic, where a diacritic is similar to an accent. The goal is the create a classification model that can classify each of these three components of a handwritten grapheme, and the final result is measured using the recall metrics, with double weight given to classification of the root.
Since our last blog post, we have focused our efforts on taking the given data and applying various preprocessing methods to both improve the performance of our model and decrease its training time.
Preprocessing
Motivated by a desire for fine-tuned control over the input data’s size and quality, we made the decision to apply preprocessing “outside” of the rest of the model pipeline. Our methodologies included cropping, resizing, denoising, and thresholding.
1. Cropping
The first preprocessing method applied to the image data is cropping. Cropping was applied first because we believed it would be most appropriate to crop the images first and then to manipulate the pixels (denoising and thresholding) of the remaining bits of the images.
Cropping is used to segment and maintain the most salient region (elements in region that stand out and attract the viewer’s attention) of the image while cutting out the non-salient region. The position of the graphemes varies by image. Some graphemes are in the center, some are in top-left corner of the image. The size of the graphemes varies by image as well. Some graphemes fill the entire image, while some graphemes are slightly bigger than a dot. Our objective was to segment the graphemes and crop out the white space while maintaining as much information about the graphemes as possible.
We implemented the function below to crop our images.
def crop_surrounding_whitespace(image):
"""Remove surrounding empty space around an image.
This implemenation assumes that the surrounding empty space
around the image has the same colour as the top leftmost pixel.
:param image: PIL image
:rtype: PIL image (cropped)
"""
bg = Image.new(image.mode, image.size, image.getpixel((0,0)))
diff = ImageChops.difference(image, bg)
diff = ImageChops.add(diff, diff, 2, -50)
bbox = diff.getbbox()
return image.crop(bbox)
2. Resizing
After cropping, we resized the images so that every image has the same dimension. The images must be resized to the same dimension because each sample of CNN model’s input must be of the same size. We determined the dimension of our resized images based on the distribution of the cropped images. Below is the distribution of row dimension (number of pixels) of the cropped images.
Below is the distribution of column dimension (number of pixels) of the cropped images.
The distribution of row dimension was slightly right-skewed and column dimension was symmetric. However, both distribution was bimodal with images that were not cropped (given that the orignal images were 137 x 236). Therefore, we determined that resizing each row and column dimension by the median of their respective distribution the most appropriate. The resized dimension of the images were 106 x 87.
3. Denoising
After we resized the images, we converted the image from PIL Image to numpy array to use CV2 library tools. The first tool we used was cv2.fastNlMeansDenoising()
. According to OpenCV docuemntation, this method performs image denoising using Non-Local Means Denoising algorithm (http://www.ipol.im/pub/algo/bcm_non_local_means_denoising/) with several compuational optimizations expected to applied to gray-scale images.
4. Thresholding
Lastly, we converted the cropped, resized, and denoised gray-scale image into black and white image by thresholding. Each pixel of a gray-scale image ranges from 0 to 255. We converted the pixels with 200+ pixel value as black and white for others. We chose 200 to capture the pixels that were most definitely grapheme (even after denoising, we attempted to remove any margin of graphemes with smudge/light-gray gradient). (https://opencv-python-tutroals.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_thresholding/py_thresholding.html)
Below is the code we used to perform resizing, denoising, and thresholding.
num_rows = int(np.median(rows)) # 106
num_cols = int(np.median(cols)) # 87
for i in range(len(train_images)):
images[i] = images[i].resize((num_rows, num_cols), Image.ANTIALIAS)
images[i] = np.array(images[i])
images[i] = cv2.fastNlMeansDenoising(images[i], h=3)
images[i] = cv2.threshold(images[i], 200, 1, cv2.THRESH_BINARY)[1]
Below are the images before and after preprocessing.
Bottom 10 Grapheme Roots (Before)
Bottom 10 Grapheme Roots (After)
Top 5 Vowels (Before)
Top 5 Vowels (After)
All 7 Consonants (Before)
All 7 Consonants (After)
We have that the images were preprocessed such that the salient regions were properly cropped, maintaining the graphemes while cutting out the white spaces. The margin of the graphemes were eliminated and resulted in clearer fonts.
Next Steps
We believe our preprocessing was satisfactory, although as you can see from above examples, not all images were preprocessed perfectly. We have one more preprocessing to be done, which is augmentation. However, we plan to use the library function provided by Keras. Our next step is to actually build the model with convolutional layers (CNN) using the preprocessed data. Stay tuned for our next update!