Let's consider printers. How should we measure the quality of the images produced by a particular printer? The most often sighted statistic is the printer's resolution (dots per inch or DPI). However, nearly everyone would agree that a 300 DPI laser printer is superior in quality to a 300 DPI ink jet printer. Clearly there is more involved than just printer resolution. A couple of hypothetical examples may help illustrate this point.
These examples illustrate two important points. 1) Image quality assessments should correspond to assessments made by humans. After all, customers buying these products are human. 2) Human evaluation is time consuming, expensive (because it's time consuming), and inconsistent. Ever since the beginning of the industrial revolution people have been searching for ways to make machines do their work. We are no exception.
Okay, so I'm saying that my research project is make a machine that will, given a bunch of images, tell us which one we would perceive as having highest quality. To many this sounds a little fishy. Making a machine that screws bottle caps on Coke bottles or puts computer chips on a circuit board, while challenging, seems a bit more reasonable. However, we're not trying to build something or do anything physical. Instead, we're trying to make a machine act like it is thinking.
So how do we get a machine to act like it's thinking? First we need to very carefully and clearly define what we want the machine to do. Typically we are given a set of observations that we call our input. Our brain processes that information and makes a conclusion. We can make our life easier by formulating the problem in such a way that we conclude either yes or no. Consider the simple example of a machine that indicates when it is necessary to refill your car with gasoline. In this example we must make certain assumptions about what input data we consider relevant (current gas level in the tank) and what criteria we will use to make our decision (if there is less than a gallon of gas left in the tank). In my research project we must make similar decisions.
We begin by making a fundamental simplifying assumption that we an ideal version of the test image available. For example, if we have a scanned photograph for our test image, we might use the original photographic print as our ideal version of the test image. Given such a situation, we could say that the printer that printed an image that most closely matched the photographic print was the highest quality printer. For each printer we might ask, "Is the printed image identical to the ideal print (the original photograph)?" This looks promising since we now have a question that can be answered with a simple yes/no response. However, we still have a couple problems.
As suggested in the above two links, we can overcome the above problems by changing our question to: "For a particular image block, what is the probability of noticing a difference between the printed image and the ideal print?" While this may seem like a very difficult question for a human to answer, we will see that we can acquire the necessary information about the human visual system by repetitively asking a yes/no question. For example, if we are shown the same two images 100 times, but only see a difference between them 14 times, we would conclude that the probability of seeing a difference between the two images is 14%.
We now have a specific question (or set of questions) and we know the form of the output. The output will be a series of probabilities each one corresponding to a particular block in the image. We will call this output a probability map. We can represent this as an image where light corresponds to a high probability of noticing a difference between the two images and dark corresponds to a low probability of noticing a difference. Here is an example:
The image on the left is what we will consider the ideal image. The image in the middle can be thought of as the output from the printer we are testing. It was creating by blurring the ideal image. We are assessing the quality of this image. The image on the right is the output probability map. It is supposed to indicate (in light) areas where we would notice a difference between the two images. While this probably doesn't match perfectly with where you see differences between the two images, it does give you a rough idea of what it's supposed to do.
You should now have a pretty good idea of what we are trying to do. The next step is to explain how we are attempting to do this. After that we can look at some of the results. I haven't gotten this stuff written yet, so you are going to have to check back for this. If you would like, I'll send you email when I get it done. Just send me a note with your request. In the meantime, feel free to look at a paper I presented at a recent conference.