Friday, February 19, 2010
I'm playing around with methods to perform automatic thresholding of our input images. I am manually cropping targets with bounding boxes that give a pretty decent amount of background information as well.
My initial thought was to look at color histograms to compute the three most prevalent colors and then look at their distributions in order to isolate the letter. I am using some K-means code to reduce every input image to just three colors - the hope is that we see the letter as one color, the target as another, and the background as the third color.
Restricting k to 3 is be too strict of a bound for the problem - in some cases I am seeing that the background gets assigned the same color as the alphanumeric. In addition, the code I have doing this is using euclidean distance in RGB space, which isn't a very good way to group colors - I expect much better results using the LAB color space. Using LAB I will likely just group based upon A* and B* and disregard the L channel, since the color of the object is the same even though the lighting fluctuates.
I changed the Hu code to use euclidean distance since I am still having trouble with the mahalanobis distance code - it gives very undesirable results.
Wednesday, February 17, 2010
I wrote some code that computes Hu moments for labeled data and then gives a label to a new image based upon this data. Hu moments are rotation and scale invariant features of an image. Currently I am manually thresholding input images since I haven't written code to do it automatically.
Hu moments are calculated by first finding the Central moments of an image, which in turn are calculated by subtracting the centroid and then finding the raw image moments. Central image moments are made scale invariant by scaling them by a scaled 0th moment, which corresponds to the area of an object in a binary image. Images are made translation invariant by subtracting the centroid.
We use seven rotation invariant moments (so called Hu moments) which are based upon the scale invariant moments discussed earlier. Each one is some (usually nonlinear) combination of the scale invariant moments.
The following are some of the images I've been playing with. The original is from an actual target and then the rotated versions are synthetically created.
The moments for the two previous figures are:
As you can see, even though the images themselves look very different, the features extracted are very similar.
The values for the following rotations of E (0, 22, 90, 132 degrees) are:
1.3308 1.3185 1.3308 1.3336
0.6832 0.6780 0.6832 0.6911
0.1851 0.1730 0.1851 0.1905
0.1071 0.1059 0.1071 0.1228
0.0057 0.0057 0.0057 0.0090
-0.0003 0.0027 -0.0003 0.0153
-0.0017 -0.0109 0.0017 -0.0160
Once again you can see that the values are all very similar.
Once the algorithm has been given labeled data, we simply feed it a new image, calculate its Hu moments, and perform a nearest neighbor classification based upon Mahalanobis distance.
This distance metric differs from euclidean distance in that it takes into account correlations of the data sets and is scale invariant.
The code I have to do this is acting a little strangely right now so the results aren't very reliable. However, from manual inspection we can see that the Hu moments give a surprisingly good signature for a letter, even under rotation. I will have to include more letters and different versions of each letter to see whether it will be suitable enough for our purposes or whether we will require something more complex.
Thursday, February 11, 2010
We are starting to implement the methods found in a paper we posted a couple weeks ago, about invariant pattern recognition.
The paper outlines its methods as follows.
"1. Normalize the pattern of size n×n so that it is translation- and scale-invariant.
2. Discard al
l those pixels that are outside the surrounding circle with center (n/2,n/2) and radius n/2.
3. Project the pattern in 2n different orientations (θ) to get the radon transform coefficients.
4. Perform a 1D dual-tree complex wavelet transform on the radon coefficients along the radial direction.
5. Conduct a 1D Fourier transform on the resultant coefficients along the angle direction and get the Fourier spectrum magnitude.
6. Save these features into the feature database."
I found code online that does a radon transform and returns results in a fairly usable manner here. It only calculates using 8 angles, so the data isnt as precise as we would like, and we will likely modify it to do far more angles, then port it to CUDA (run on the GPU).
We have gotten to step 3, though we are not projecting in as many orientations as our source paper calls for, so this is a modification we will need to make to the code. Here are some preliminary results. Note that all pixels in the gray area are discarded, the angle is displayed on the vertical axis, and the translation is displayed on the horizontal axis. I discarded 40% of the far left and far right translations because they had no worthwhile information.
As you can see, with this information, the B and the A appear to be identical. It is possible that putting these results through the last two steps, or analyzing the specific numbers, would distinguish the two. I feel that the biggest problem here is the small number of orientations in the radon transform. This is a problem with the current implementation, and not the theory, and its something we can fix fairly easily.
In unrelated news, I hate the blogger posting system.
Wednesday, February 3, 2010
Here are a bunch of pictures (The ones with a lot of green grass are from our flight last week - these are just frames taken from the videos):
The following are some examples of running the images through our saliency program running on the GPU. It currently only takes color channels into account and isn't using more complex filtering like Gabor right now: