Tuesday, March 2, 2010

Thresholding is Difficult

So I took a slightly different approach to segmenting the letters and used Mean Shift to cluster colors together.  The algorithm I am using is called EDISON from Rutgers.  I did a little searching and found a really nice Matlab wrapper written for it on a page by Shawn Lankton.  The files required an end of line conversion to run on Windows, but other than that, it was pretty easy to get working.

Mean shift accomplishes more or less the same thing as k-means clustering but in a different fashion.  Instead of having to choose (or guess) k, mean shift iteratively reduces the number of colors in the image as they are "shifted" towards each other.  I haven't read up too much on the actual math behind it yet though.

So with this new tool in hand, I set off to try and get automatic segmentation of letters.  My approach is the following:

I first take an input image and rescale it to 64x64.  This reduces computation time, reduces some noise, and helps standardize feature extraction.  The next step is to convert the image into the LAB color space, which is used for all image processing.

Once we have the reduced image, it is fed through a few different parameterizations of mean shift (though we do not need to pick a K for mean shift, there are still several parameters to deal with).  The EDISON code automatically labels the output images based upon their color.  I take the area of the largest connected region found by EDISON, and use it to threshold the image.  My prior attempts began with trying to find the smallest region and assuming it was the letter, but there tended to be a lot of other small regions of distractors - I found that a the shape takes up a majority of the image in most cases.

After we have this binary image, we can perform a few morphological operations on it to try and get just the letter.  Often at this stage we can expect two connected components - the letter and the background.  I then make the assumption that the letter will have a smaller area than the background, and use that information to remove all regions larger than the assumed letter.

I have also developed a fairly reliable way of extracting just the shape from the image.  After the input image is fed through the first parameterization of mean shift, I take the number of regions it has found and use that to perform k-means clustering on the original image.  This is then further reduced by setting K=2, resulting in an image with just two colors - more often than not, this has the shape as one color and the background as another.  To take care of cases where smaller components are classified to the same color as the background, similar morphological operations are performed to reduce the binary image to just a solid shape.

With that being said, here are some results.  As can be seen, it is far from perfect, but can work quite well at times.  I think it is certainly a good start, but it may overfit the data I was using to create it.  When it comes time to actually testing this, I'll have a much better idea of how well it works.

No comments:

Post a Comment