Fitting parameters for a 2-D model

A big black box provides measurements of voltage. We find that most of the measurements are zero. Occasionally, however, the voltage increases for a short period, then decreases again, like this:

Let's choose to compare these measurements to a simple model: a gaussian function like this:

Moreover, after a bit of experimentation, we find out that the width of these pulses is very regular: the FWHM (Full Width at Half-Maximum) is always 1.0 seconds, which means that the parameter σ in the equation above is

That leaves us with exactly 2 free parameters: the peak amplitude V₀, and the time of the peak t₀. How can we find the combination of these two parameters, so that our model best matches the observations?

As a metric, we'll choose the sum-of-squares-of-differences quantity (which is similar to the chi-squared statistic):

Our job: find the values of V₀ and t₀ which minimize K².

Brute force: simple, stupid, slow

One approach is very simple: we can create a 2-D grid which covers some range of values of amplitude V₀, and some range of values of time t₀. We can compute the K² metric for all locations on the grid, and then choose the location which yields the smallest value. We might iterate several times to "zoom in" on the location.

Pseudocode for this procedure might look like this:


% select a starting point
guess_amp = 4.0;
guess_t = 7.0;

% select a range around the starting point
amp_range = 1.0;
t_range = 0.5;

% break each range into this many pieces
n = 100;

% iterate 5 times, zooming in on the best location
for i = 1 : 5
  
  % loop over all possible amplitudes
  for amp = min_amp to max_amp

     % loop over all possible times
     for time = min_time to max_time

         % compute the k-squared metric using this amp, time
         k_squared = compute_ksquared
         
         % is this value smaller than the best one found so far?
         if (k_squared < min_ksquared)
            min_ksquared = k_squared
            best_amp = amp
            best_time = time
         end

     end

  end

  % now recenter the range of amplitudes and times on best values
  %    and decrease the range of each

end

Will this work? Yes, probably. But it might take quite a while.



  Q:  How many times will we have to compute the K-squared
      metric in this particular example?

When there is just a single parameter to fit, this is often the best way to go: it's easy to write the code, and easy to understand. But if there are 3, or 5, or 10 free parameters, the time it takes to cover a reasonable range in each parameter with a fine grid can become much too long.

Scan the landscape, then decide

A different approach can save a lot of time. Instead of trying to find the very best spot by an exhaustive search -- which wastes lots of time looking in the wrong places -- let's make a quick scan of the region around some initial guess, figure out which way will lead us to the goal, and then head in that direction.

It may help to consider an analogy. Suppose we want to reach the very peak of a mountain which has a complicated shape. We start somewhere far from the peak.

Our method is simple: look for the direction which goes uphill most directly from our current location, and start walking in that direction.

Keep walking as long we continue to go uphill. But if we reach a point at which moving further ahead would take us DOWNhill, then stop.

Now, while standing here, once again scan the local landscape and identify the direction which goes uphill most directly.

Walk along this path until the path begins to head downhill, then stop.

Once more, scan the local landscape and find the uphill direction.

Walk forward, checking as always that we are still going uphill. When moving forward would cause us to go downhill, stop.

And, voila!, we have made it to the top of the hill.

This procedure is known as the method of steepest ascent. Make sense, doesn't it?

Telling a computer to implement the method of steepest ascent

Now, first of all, in our problem, we are seeking a minimum in the value of K², not a maximum, but that's easy to fix: just modify the procedure so that we scan the landscape and look for the direction which takes us DOWNhill most rapidly.

The real question here is -- how can we identify the "direction" to move? What does that even mean?

Let's define a function f(A, t) which takes two parameters -- a guess at the amplitude of the voltage spike and the time of its peak -- and uses them to compute some number (yes, this should look familiar).

If we take the derivative of f with respect to A, we'll find a measure of how quickly f is changing with respect to A; a positive value means that if we increase A, we also increase f. Likewise, if we take the derivative of f with respect to t, then we'll find out how quickly, and in which direction, the function f changes with respect to changes in t. The gradient of the function is simply a vector formed by these partial derivatives.

On a 2-D map (or a 3-D map, or a 9-D map), the gradient points toward the steepest local "uphill" direction.

If we want to go "downhill", we can just compute the gradient, then go in the opposite direction.

But -- wait! Our function f looks pretty complicated. How can we take the derivative with respect to the parameters?

Well, even if the function is complicated, we can still find an approximate derivative numerically. Suppose we have some starting values A₁ and t₁. We can compute a numerical derivative of the function with respect to each variable by adding and subtracting just a little bit to each parameter and computing the change in the function.

For example, let's go back to the example of the voltage coming out of a big black box. One set of measurements yields a single spike:

We begin with a guess that the spike has an amplitude of 4 volts and is centered at 7 seconds. At this starting point, we can compute K²: it has a value of 99. Surely we can find a pair of parameters which yields a smaller result.

If we calculate the K² statistic in this general area, we can see that we aren't too far from the minimum value. In the graph below, amplitude is on the horizontal axis and time on the vertical axis.

In order to use the method of steepest descent (remember, we want to go DOWNHILL in order to find the miminum value of the statistic), we must first compute the gradient at our initial location of (4 volts, 7 seconds). In order to compute the numerical derivative with respect to amplitude, let's add and subtract 0.5 volts. I'll do some of the math for you.



  Q:  What is the gradient of f at our starting point?



  Q:  Which direction should we move to go downhill most rapidly?

Okay, it's clear that we should move toward SMALLER values of amplitude, and LARGER values of time. We can also tell that changes in time are more important than changes in amplitude -- by about a factor of 10 or so. So, when we move, we want to move mostly in time, with just a relatively small change in amplitude. How can we do that?

One way to proceed is to create a vector which points in the right direction. If we define the gradient as

then we can compute a unit vector in its direction like so:

Now, if we want to move "forward", toward the local downhill direction, we can simply pick some stepsize and compute a new value of the parameters like this:


stepsize     amp       time       K²
-------------------------------------------
 0.0000    4.0000     7.0000    99.8941 
 0.1000    3.9941     7.0941    73.8937 
 0.2000    3.9881     7.1881    50.1717 
 0.3000    3.9822     7.2822    30.1702 
 0.4000    3.9763     7.3763    15.1884 
 0.5000    3.9704     7.4704     6.2411 
 0.6000    3.9644     7.5644     3.9424 
 0.7000    3.9585     7.6585     8.4330 
 0.8000    3.9526     7.7526    19.3635 
 0.9000    3.9467     7.8467    35.9358 
 1.0000    3.9407     7.9407    56.9969 

-------------------------------------------

Aha! This table indicates that eventually, moving in this direction stops going "downhill" and starts moving "uphill", toward larger values of K², and thus toward fits which are less good. That means that we should stop when we reach A = 3.96 volts and t = 7.56 seconds. We should then once again compute the local gradient, pick a new direction of steepest descent, and walk in THAT direction until we reach another minimum.

We can follow this procedure several times, moving closer and closer to the minimum value of K² and so closer and closer to the "best" values of amplitude and time.

Let's try this method on our example.


 iter     amp       time          K²
------------------------------------------
  0      4.0000    7.0000       99.8941 
  1      3.9644    7.5644        3.9424 
  2      4.0221    7.5221        3.5686 
  3      4.0416    7.5527        3.0769 
  4      4.3230    7.5091        1.6658 
  5      4.3318    7.5503        0.9144 
  6      4.6626    7.5404        0.0444 
  7      4.6629    7.5494        0.0024 
  8      4.6635    7.5503        0.0021 
  9      4.6644    7.5496        0.0019 
------------------------------------------

That's pretty impressive.

The important thing is that we didn't have to go through the long, time-consuming calculation of K² very frequently. In this example, I called a function to compute K² only 191 times. That may sound like a lot, but if I had tried the brute force method, it would have taken a LOT more time.



  Consider using the brute force method, 
  with a square grid ranging from 
      
          (3, 6)    to    (5, 8)

     
  Q:  In order to keep the number of calculations
      of K² under 200, what is the largest
      grid one could use?


  Q:  What would be the precision of a "best guess"
      at (amp, time) if you used that grid?

  
  Q:  Suppose you wanted to match the precision
      of this method of steepest descent;
      it found both amplitude and time to a 
      precision of about 0.001 volts and 0.001 seconds.

      How large a grid would you need?

      How many evaluations of K² would that require?

      How much longer would that take to compute?