Aerial Vision

Shane Grant

I'm using Virtualdub to preview a bunch of different methods for deinterlacing the content. Here are a few samples (all of this is on our more zoomed out footage - still working on fixing the other file, though it might not be possible to recover the data):

This is using "discard," which cuts the vertical resolution in half and only uses one set of fields:

Next up is "double," which takes either the even or odd scan lines and doubles them, ignoring one set of scan lines (discard with 2x the vertical resolution):

Next is "blend," which takes the average of both field lines:

Increasing in complexity, next is the "bob" algorithm:

Finally there is the "yadif" algorithm, which is similar to "bob" but slighty more complex (the information on these is rather sparse, however):

From looking at the stills alone it seems that the bob and yadif methods produce the best results - it's difficult to judge them in actual video sequences since everything in the field of view is constantly moving. The yadif algorithm is apparently fairly good at dealing with static scenes which we currently have none of.

Shane Grant

We finally got around to flying and recording video this past weekend. We put out targets and recorded two videos - one about eight minutes long and the other slightly longer. Unfortunately the second file seems to have become corrupted so we can't view it for now. We've tried hex editing the header of the avi file to fix it but we haven't had any luck so far.

The flight tests were going very well up until one of our flight batteries that powers the motor failed on us. This caused the motor to shut off and made for a rather rough emergency landing so we won't fly for another one to two weeks while the plane is being repaired.

The video files are very large (several gb) and I haven't compressed them to upload yet.

We did realize one very important thing though - we need to deinterlace our video. We use a Sony FCB-HD11 camera which has the capability to capture HD or SD. Right now our transmitter is only for SD so we have to use the NTSC format, which is an interlaced 29.97 fps video.

This video looks great when viewed on a device that is designed to display interlaced video, like an older television. However, when viewed on a progressive monitor, like a computer display, it looks terrible.

Here's an example of what interlaced video looks like when displayed progressively:

We can use video editing software to deinterlace the recorded footage, but we need a real time solution to use until we can acquire an HD transmitter and use 720p.

There is a lot of literature on the subject and many approaches ranging in complexity and effectiveness. This site has a good overview on the subject. Deinterlacing—An Overview is a great paper on many approaches to solving the problem.

Our first efforts will be to implement a simple deinterlacing scheme that takes either the odd or even field lines and interpolates the values in between them (code can be found to do this in both OpenCV and GLSL). Our videos have a lot of motion so likely we will need something more robust than this.

More advanced techniques use both temporal and motion information to deinterlace, such as that found in A Method of De-lntcrlacing with Motion Compensated Interpolation, Motion and Edge Adaptive Interpolation De-interlacing Algorithm, or Deinterlacing with
Motion-Compensated Anisotropic Diffusion.

Deinterlacing concerns aside, I was not too happy with the video of targets we did acquire. Currently we are using a fixed downward looking camera and have no control over zoom once the plane has taken off. In about a week we will have the capability to control the camera via a wireless serial interface, but actually pointing the camera is a ways off. We should also have access to telemetry data at the same time, which will allow us to perform image rectification.

At this point it is premature for us to discuss object/character recognition algorithms in detail since what our final imagery will look like is still in flux. It is likely that since we won't be flying for about two weeks that we will acquire "still" shots of the targets on campus from a high vantage point so we can make some progress on the recognition front.

Shane Grant

We attempted to acquire a bunch of data last weekend but faced a new challenge in regards to acquiring aerial images. We control the plane over a 2.4GHz wireless link which is the same band as our video transmission. There were a few issues where the video was causing us to lose control of the aircraft so we decided it safest not to fly.

To address this we are moving the airplane to 72MHz which should get rid of the issue. On the upside, the video comes through very clearly in all of our tests so far.

We have written a program to capture the video data and save it to disk using OpenCV and are working on a more advanced interface that renders the video in OpenGL.

The weather should be fine to perform a flight this weekend. If the flight is delayed for any reason, we are going to lay the targets out and capture them from a high building on campus to simulate the airplane so we can get started working with the images.

The first thing we plan on doing, after labeling the images, is to compute L*A*B* histograms of the targets and see what types of observations we can make from that. For actually segmenting the data in a real image, we will compare something like a simple sliding window approach using color histograms versus a more complex saliency based approach which we have already implemented to run on the GPU.

Lewis Anderson

Three researchers at Concordia University, in Canada, published a paper that Shane found, describing a very accurate method of character recognition. When run on a set of chinese characters, it had a high accuracy rate even under various rotations and significant noise.

Chen, Bui, and Krzyzak (the authors) reported 98.8% accuracy or higher on the chinese characters present in Figure 1. Since our set of possible characters is approximately a third of theirs, their method would be extremely useful for us. Additionally, their 98.8% accuracy was obtained with high white noise (see Figure 2, SNR = 0.5). With a SNR of 1.0 or higher, their accuracy was 100% under all rotational angles.

Their approach involves the Radon transform, dual-tree Complex Wavelet, and Fourier transforms. Both the Radon transform and the Fourier transform benefit highly from the GPU programming we will be using, and likely the dual-tree Complex Wavelet will as well.

I also found another student who had experience implementing a Radon transform in C++, though he did not use the GPU.

Figure 1:

Figure 2:

Shane Grant

The planned flight for today had to be delayed because of rain, so no sample imagery will be available until this weekend. One of our goals is to determine the color of targets. Since this project is under the assumption that targets will be reasonably segmented (we can expect some noisy brush or grass in the background), one approach we are going to try as soon as we have imagery is to use color histograms to figure out the colors.

Targets are two colors, the majority of which is the background color. The alphanumeric makes up only a small portion of any given color, so we can expect the distribution of color to have its highest peak in the background color, followed by the alphanumeric. We will have to check this theory once we have actual images.

Shane Grant

This past Sunday we created a lot of sample targets to use in acquiring images. We have almost every letter/number currently and should have the remainder finished next weekend. The targets are made out of plywood and painted solid colors.

Currently we plan to fly on Wednesday and acquire video of many of the targets from various angles and extract frames to create individual images.

For the Wednesday update we will have video and images of the sample targets.

I have also started to review some papers on shape/character recognition, specifically: Invariant pattern recognition using radon, dual-tree complex wavelet and Fourier transforms. It is a fairly complex paper that presents rotation invariant recognition for even highly noisy images. I am currently in the process of becoming familiar with the Radon Transform, which in some cases is very similar to the Hough transform.

Shane Grant

The aim of this project is to figure out a way to recognize characters and shapes in targets used in the 2010 AUVSI Student UAS competition. The targets consist of a solid colored alphanumeric character painted on a differently colored, solid background that is a "simple geometric" shape. Based upon past experience we take this to mean shapes in the realm of rectangles, triangles, crosses, stars, circles, and semi-circles.

More information on the target specifics can be found on page 5 of the rules for the competition.

Our first goal is to create replica targets and acquire imagery of them from our plane. We do have some prior footage to use but it will likely not be very useful since the imagery system we are using this year is more sophisticated. We will try to create enough targets to capture each possible character we could encounter, as well as cover the basic shapes we expect to see.

Images will be acquired by capturing frames from a Sony FCBH11 camera, which we are currently using in standard definition mode.

We expect to have this footage within one to one and a half weeks.

Aerial Vision

Friday, January 29, 2010

Interlace Update

Monday, January 25, 2010

Flying and Interlace Problems

Friday, January 22, 2010

Data Delay

Wednesday, January 13, 2010

Pattern and Character Recognition

Rain on Our Parade

Monday, January 11, 2010

Target Making

Wednesday, January 6, 2010

Introduction

Followers

Blog Archive

Contributors