Wednesday, May 19, 2010
Tuesday, March 2, 2010
So I took a slightly different approach to segmenting the letters and used Mean Shift to cluster colors together. The algorithm I am using is called EDISON from Rutgers. I did a little searching and found a really nice Matlab wrapper written for it on a page by Shawn Lankton. The files required an end of line conversion to run on Windows, but other than that, it was pretty easy to get working.
Mean shift accomplishes more or less the same thing as k-means clustering but in a different fashion. Instead of having to choose (or guess) k, mean shift iteratively reduces the number of colors in the image as they are "shifted" towards each other. I haven't read up too much on the actual math behind it yet though.
So with this new tool in hand, I set off to try and get automatic segmentation of letters. My approach is the following:
I first take an input image and rescale it to 64x64. This reduces computation time, reduces some noise, and helps standardize feature extraction. The next step is to convert the image into the LAB color space, which is used for all image processing.
Once we have the reduced image, it is fed through a few different parameterizations of mean shift (though we do not need to pick a K for mean shift, there are still several parameters to deal with). The EDISON code automatically labels the output images based upon their color. I take the area of the largest connected region found by EDISON, and use it to threshold the image. My prior attempts began with trying to find the smallest region and assuming it was the letter, but there tended to be a lot of other small regions of distractors - I found that a the shape takes up a majority of the image in most cases.
After we have this binary image, we can perform a few morphological operations on it to try and get just the letter. Often at this stage we can expect two connected components - the letter and the background. I then make the assumption that the letter will have a smaller area than the background, and use that information to remove all regions larger than the assumed letter.
I have also developed a fairly reliable way of extracting just the shape from the image. After the input image is fed through the first parameterization of mean shift, I take the number of regions it has found and use that to perform k-means clustering on the original image. This is then further reduced by setting K=2, resulting in an image with just two colors - more often than not, this has the shape as one color and the background as another. To take care of cases where smaller components are classified to the same color as the background, similar morphological operations are performed to reduce the binary image to just a solid shape.
With that being said, here are some results. As can be seen, it is far from perfect, but can work quite well at times. I think it is certainly a good start, but it may overfit the data I was using to create it. When it comes time to actually testing this, I'll have a much better idea of how well it works.
Wednesday, February 24, 2010
Friday, February 19, 2010
I'm playing around with methods to perform automatic thresholding of our input images. I am manually cropping targets with bounding boxes that give a pretty decent amount of background information as well.
My initial thought was to look at color histograms to compute the three most prevalent colors and then look at their distributions in order to isolate the letter. I am using some K-means code to reduce every input image to just three colors - the hope is that we see the letter as one color, the target as another, and the background as the third color.
Restricting k to 3 is be too strict of a bound for the problem - in some cases I am seeing that the background gets assigned the same color as the alphanumeric. In addition, the code I have doing this is using euclidean distance in RGB space, which isn't a very good way to group colors - I expect much better results using the LAB color space. Using LAB I will likely just group based upon A* and B* and disregard the L channel, since the color of the object is the same even though the lighting fluctuates.
I changed the Hu code to use euclidean distance since I am still having trouble with the mahalanobis distance code - it gives very undesirable results.
Wednesday, February 17, 2010
I wrote some code that computes Hu moments for labeled data and then gives a label to a new image based upon this data. Hu moments are rotation and scale invariant features of an image. Currently I am manually thresholding input images since I haven't written code to do it automatically.
Hu moments are calculated by first finding the Central moments of an image, which in turn are calculated by subtracting the centroid and then finding the raw image moments. Central image moments are made scale invariant by scaling them by a scaled 0th moment, which corresponds to the area of an object in a binary image. Images are made translation invariant by subtracting the centroid.
We use seven rotation invariant moments (so called Hu moments) which are based upon the scale invariant moments discussed earlier. Each one is some (usually nonlinear) combination of the scale invariant moments.
The following are some of the images I've been playing with. The original is from an actual target and then the rotated versions are synthetically created.
The moments for the two previous figures are:
As you can see, even though the images themselves look very different, the features extracted are very similar.
The values for the following rotations of E (0, 22, 90, 132 degrees) are:
1.3308 1.3185 1.3308 1.3336
0.6832 0.6780 0.6832 0.6911
0.1851 0.1730 0.1851 0.1905
0.1071 0.1059 0.1071 0.1228
0.0057 0.0057 0.0057 0.0090
-0.0003 0.0027 -0.0003 0.0153
-0.0017 -0.0109 0.0017 -0.0160
Once again you can see that the values are all very similar.
Once the algorithm has been given labeled data, we simply feed it a new image, calculate its Hu moments, and perform a nearest neighbor classification based upon Mahalanobis distance.
This distance metric differs from euclidean distance in that it takes into account correlations of the data sets and is scale invariant.
The code I have to do this is acting a little strangely right now so the results aren't very reliable. However, from manual inspection we can see that the Hu moments give a surprisingly good signature for a letter, even under rotation. I will have to include more letters and different versions of each letter to see whether it will be suitable enough for our purposes or whether we will require something more complex.
Thursday, February 11, 2010
We are starting to implement the methods found in a paper we posted a couple weeks ago, about invariant pattern recognition.
The paper outlines its methods as follows.
"1. Normalize the pattern of size n×n so that it is translation- and scale-invariant.
2. Discard al
l those pixels that are outside the surrounding circle with center (n/2,n/2) and radius n/2.
3. Project the pattern in 2n different orientations (θ) to get the radon transform coefficients.
4. Perform a 1D dual-tree complex wavelet transform on the radon coefficients along the radial direction.
5. Conduct a 1D Fourier transform on the resultant coefficients along the angle direction and get the Fourier spectrum magnitude.
6. Save these features into the feature database."
I found code online that does a radon transform and returns results in a fairly usable manner here. It only calculates using 8 angles, so the data isnt as precise as we would like, and we will likely modify it to do far more angles, then port it to CUDA (run on the GPU).
We have gotten to step 3, though we are not projecting in as many orientations as our source paper calls for, so this is a modification we will need to make to the code. Here are some preliminary results. Note that all pixels in the gray area are discarded, the angle is displayed on the vertical axis, and the translation is displayed on the horizontal axis. I discarded 40% of the far left and far right translations because they had no worthwhile information.
As you can see, with this information, the B and the A appear to be identical. It is possible that putting these results through the last two steps, or analyzing the specific numbers, would distinguish the two. I feel that the biggest problem here is the small number of orientations in the radon transform. This is a problem with the current implementation, and not the theory, and its something we can fix fairly easily.
In unrelated news, I hate the blogger posting system.
Wednesday, February 3, 2010
Here are a bunch of pictures (The ones with a lot of green grass are from our flight last week - these are just frames taken from the videos):
The following are some examples of running the images through our saliency program running on the GPU. It currently only takes color channels into account and isn't using more complex filtering like Gabor right now: