The Use Of Image Processing To Improve Perception
Table of contents
Open CV provides the inbuilt functionality which improves the perception of an input image. However, if the redundancy in the background is greater than the ROI (Region of Interest), the perception algorithm may give incorrect image as the output on supplying an original image as input. Main purpose of this research is to improve the quality of image perception by extracting the major coordinates that surround the ROI and supplying the same to the Perception Algorithm given by OpenCV. Various Image Processing techniques have been applied on the sample test image and a combination of them provides dramatic results. A comparison chart also shows which combination of algorithm provides the optimum results under specified circumstances.
Introduction
As we move more and more towards digitalization, it is mandatory to convert our manual documents in digital form. For simplifying the same, OCRs have been used, which help us to convert text from images to type written format. However, this has to be done by scanning the documents properly and then supplying it to the OCR tools. As humans, we do not expect to see much of perfection from the user who is scanning the document with his camera lens and supplying to the OCR for textual conversion. Perception is an important aspect when it comes to image correction and it captures a 3D image in 2D frame. While correcting the perception of an image it is important to note that the region of interest is not lost and at the same time, get rid of redundant background.
The research begins with exploring the primary requirements of what the Perspective Transformation requires to correct the input image. Drawing contours around our ROI helps us to determine which portion of image should be retained and which should be shun off. By default OpenCV demands a 3x3 transformation matrix, where straight lines are left uncorrected and slanted lines are tried to get in alignment with the 2 dimensional axis. It is mandatory to supply the 4 coordinates to the algorithm out of which 3 ought to be non collinear. This ensures that the coordinates are the 4 end points of our ROI. Manually entering the coordinates to correct the image may seem a bit tedious as it deals with pixel level coordinates. This research aims at defining a process by which, we are able to locate the contour of our ROI and thereby self-generate the four coordinates for supplying it to the perception algorithm.
Gray Scaling
Gray Scaling of images are done to reduce the complexity. Colored images or RGB form do not at times help us to define the edges prominently for applying algorithms like canny edge detection and masking. In order to reduce computational complexity the grayscale representations are often in place of operating on color images directly. In image processing, Gray Scaling helps us to improve the efficiency by providing focus on our real time application than dealing with complex colors in the RGB. All interfaces like Matlab and Python provide simplified development in grayscale mode.
Adaptive Threshold
Adaptive Thresholding results in binarization of images by depicting variations in different threshold values. A threshold value is defined beforehand and those which exceed the set value are described by black and the rest by white. This method is useful in performing segmentation by fixing the pixels of images depending on the intensity values and threshold value. Adaptive Threshold is applied on the gray scale of original image. What sets the adaptive threshold apart from conventional thresholding one is that, in that former one, values can be changed over the images, dynamically and it can be used to calculate the threshold for smaller regions in the images.
Maximum value that can be assigned to the output is 255, next comes the type of thresholding. The block size decides the size of neighborhood area, in this case 51 and 2 is the constant value subtracted from the threshold. The above produced grayscale image was converted to binarized image with the help of Adaptive Thresholding.
Contour generation
A. Contours
After successfully binarizing the image, now it is time to trace out the edges of our ROI. An image may contain redundant data in the background as well. Contours are defined as a curve which traces the boundary of objects having the same color intensity. It is majorly used to differentiate all the distinct objects in the given image. Better results are generated on binary images, hence the above step has been recommended. During this process contours are generated around every small complete object that may not be required by us. However, we just need to concentrate on the largest contour generated that will by default be our ROI. Contour generation can be improvised by removing the noise from the image. For this some filter algorithms can be used.
Every contour generated is stored as a vector of points after detecting them, Numpy array of (x, y) coordinates. The information topology is stored in the hierarchy parameter which decides if a generated contour has any parent or child. The input image is taken from the previous step, thresh flag returns the extreme outer flags only. This may help us to get rid of some minor redundant portions in the image which do not belong to our ROI or form a separate region inside the ROI. Chain_Approx_None has been used to extract all the boundary points of the contours. Instead we could also use Chain_Approx_Simple which could directly give us 4 coordinates of the boundary points, but as mentioned earlier, considering our image has much redundant data in the background, the coordinates so generated were very far or within the ROI thereby giving data loss.
B. Extarcting the largest contour
Out of all the generated contours, we will concentrate on the largest one as that will be the ROI or the scanned document. largest_areas will contain all the regions arranged in increasing order of their areas, out of them we extract the largest one that is, the last in sorted list. Cnt contains a collection of points in the form of numpy array.
Masking
In order to remove the actual contouring boundary and get the ROI as a continuous object, we apply mask. This has been used to recalculate each pixels value in an image so that the inner contours that may still exist as a separate entity may merge with the major ROI, thereby leaving us with a masked ROI according to a mask matrix. This mask holds values that will adjust how much influence neighboring pixels (and the current pixel) have on the new pixel value. We will crop out the mask and store the image in black and white format.
Improving the coordinate precision and plotting the circles along the edges. In order to print more points along the contours, we will use ApproxPolyDp and epsilon (0.01 gives better results than 0.1). ApproxPolyDp, helps to approximately points forming a near-to-perfect polygon. It is used for contour approximation. Depending upon the precision we specify, it approximates a contour shape to another shape with less number of vertices. The maximum distance from contour to approximated contour is denoted by an epsilon which is supplied as the second parameter.
Hull Defect Convexity
Hull Convexity is used to trace all points that may give a complete polygon. This helps in determining the skeleton of the ROI. The algorithm checks for any defects along the convex space and corrects by reforming the bulges if any.
Conclusion
Thus, using image processing this research has helped us to successfully generate an algorithm which will accept the image and inputs and mark our 4 corner points from the ROI which can be further given to the perception algorithm. Various steps mentioned in the above procedure include a series of image processing with was applied on 14 different images to study the influence of combination of different techniques as mentioned in analysis chart. The block size in adaptive threshold needs to be changed as per the requirement of our document and the noise level in the image and neighborhood pixels can affect the binarization of image.
Cite this Essay
To export a reference to this article please select a referencing style below