Match: a program for matching star lists

Michael Richmond
May 25, 1996
updated July 30, 2000
updated Jan 21, 2001
updated Dec 14, 2001
updated Dec 31, 2001 (ver 0.5)
updated Jun 19, 2002 (ver 0.6)

Table of Contents:


INTRODUCTION:

This program is designed to match up items in two different lists, which may have two different systems of coordinates. The program allows the two sets of coordinates to be related by a linear, quadratic, or cubic transformation. It is an implementation of the algorithm described in Valdes et al., Publications of the Astronomical Society of the Pacific, vol 107, page 1119 (1995).

PREPARING THE INPUT:

The user must place the objects to be matched in ASCII text files, which have a fixed number of columns in each line. Each file must have contain two spatial coordinates, i.e. (row, col) or (x, y), and a "magnitude" for each item. This is a measure of brightness, which has the peculiar property that bright items have SMALLER magnitudes than faint ones. The user may optionally specify ID numbers for each star in a column. An example might look like this:

         5    368.77     3.93 11.369 0.027
         6     45.90     5.33 11.308 0.026
         7    407.27     5.05  6.650 0.001
         8    636.23     7.75 10.715 0.015
         9    338.60     8.90 11.314 0.026

In the example above, the first column contains an "ID" number for each star, the second column contains an "x" coordinate, the third column a "y" coordinate, and the fourth column a "magnitude" coordinate. Note that the brightest item is number 7, and the faintest is number 5.

Other columns may contain data of any type, as long as each column contains no white space (tabs, spaces). The "x", "y" and "magnitude" (and, optionally, "ID") values must appear in the first ten columns, but there may be additional columns beyond the tenth.

The "match" program adopts the C-like convention that the leftmost column in a file is "number 0", the next column is "number 1", and so forth. Thus, in the example above,

              column 0      has ID number
              column 1      has "x" coord
              column 2      has "y" coord
              column 3      has "magnitude" value
              column 4      has "magnitude error" value


USAGE:

Usage is as follows:

       match  fileA xcolA ycolA magcolA  fileB xcolB ycolB magcolB
                    [id1=] [id2=] [max_iter=] [halt_sigma=] 
                    [outfile=] [trirad=] [nobj=] [matchrad=] 
                    [scale=] [min_scale=] [max_scale=] [transonly] 
                    [recalc] [linear|quadratic|cubic]
                    [medtf] [medsigclip=]  [intrans=] [identity [xsh=] [ysh=]]
                    [--version] [--help] [help]

The first 8 arguments are required, and must appear in the order given above. The remaining arguments are optional, and may appear in any order. The arguments are:

     fileA          name of ASCII file containing items in the first list
     xcolA          column in fileA containing "x" value of items
     ycolA          column in fileA containing "y" value of items
     magcolA        column in fileA containing "mag" value of items
     fileB          name of ASCII file containing items in the second list
     xcolB          column in fileB containing "x" value of items
     ycolB          column in fileB containing "y" value of items
     magcolB        column in fileB containing "mag" value of items

     id1=N          column in fileA containing "ID" value of items
     id2=N          column in fileB containing "ID" value of items
                           (the values must be unique within each list;
                            the program will not check, and strange
                            behavior will result if duplicate IDs occur)

     outfile=abc    place output in files with base name "abc"
                           abc.mtA         those in A which do match
                           abc.mtB         those in A which do match
                           abc.unA         those in B which don't match
                           abc.unB         those in B which don't match

     trirad=0.001   critical value for counting triangles as a match
                          (default value 0.002; see below)
     nobj=30        use the brightest 'nobj' values in each list
                          during the matching process (default value 20)
     matchrad=4     after applying transformation to listA, so that both
                          lists are in coordinate system of listB, 
                          items from the two lists must be within this 
                          distance to count as 'matched'
                          (default value 5)

     scale=2.0      if present, only triangles with the given scale factor
                          are counted as matches (in this example, 
                          only triangles in coord system A which are
                          about 2.0 times larger than triangles in coord
                          system B are counted as matches)
     min_scale=1.5  if present, only triangles with ratios greater than
                          the given scale factor are counted as matches
     max_scale=2.5  if present, only triangles with ratios less than
                          the given scale factor are counted as matches

     max_iter=3     during the solution process, there is a loop in which
                          the code matches stars in the two lists, finds
                          the residuals for each pair, and discards pairs
                          with big residuals.  This argument sets a limit
                          on the number of times this loop can be entered.
                          Default value is given by AT_MATCH_MAXITER
                          parameter in atpmatch.h, currently 3.
     halt_sigma=1.0e-12
                          during the solution process, if the typical residual
                          between pairs of stars in the two lists drops
                          to this level, stop iterating and declare success.
                          The units are the square of the units in the
                          second list of stars; often, radians squared.
                          Default value is given by AT_MATCH_HALTSIGMA
                          parameter in atpmatch.h, currently 1.0e-12.

     transonly      if present, stop after having derived a transformation
                           which best takes the 'nobj' objects from list 
                           A into objects in list B.  Don't go on to
                           find the matching objects explicitly.
     recalc         if present, after having derived a transformation with
                           the brightest 'nobj' objects, and having applied
                           that transformation to ALL the objects in list A,
                           derive a transformation again, this time using
                           ALL the objects in list A and list B

     linear         use a linear transformation (the default)
     quadratic      use a quadratic transformation
     cubic          use a cubic transformation
	  
     medtf          calculate the MEDTF statistics (see below);
                           assumes that a simple translation connects
                           the two lists
     medsigclip=N   discard matched pairs more than N stdev from the
                           median offset before calculating MEDTF statistics

     identity       as an initial guess, use a linear TRANS in which there
                           is no rotation, change in scale, or translation
                           (unless xsh= and ysh= are specified)
     xsh=           in initial guess, use linear TRANS with the given 
                           translation in the 'x' direction
     ysh=           in initial guess, use linear TRANS with the given 
                           translation in the 'x' direction

     intrans=file   as an initial guess, use the TRANS which is given
                           in the ASCII text file 'file' (see below
                           for the format)

     --version      print the current version number and exit
     --help         print list of command-line options
     help           print list of command-line options

Example: given two files with the format shown above, called "starsA.dat" and "starsB.dat", one might call 'match' as follows:

    match starsA.dat 1 2 3  starsB.dat 1 2 3 outfile=matched \
              id1=0 id2=0 matchrad=3.0 trirad=0.001 nobj=40 

The argument "trirad" may need some extra explanation. For a full desciption, see the paper by Valdes et al. In short, stars are grouped into triangles, the sides of which are labelled "a", "b" and "c" in order of decreasing size. The normalized ratios

            b/a      and    c/a

are formed for each triangle. In order for a triangle, t1, from one list to be considered identical to a triangle, t2, in the other list, it must satisfy the equation

     sqrt[ (t1.ba - t2.ba)^2 + (t1.ca - t2.ca)^2 ] <= trirad

So the value of "trirad" has units of neither list; it exists in a two-dimensional "triangle-space" formed by the coordinates b/a and c/a (each of which ranges from 0.0 to 1.0).

One way the user can control the matching of triangles is to specify a "scale factor". Only triangles with a ratio of size falling with a particular window will be counted as matches. The user can control the "scale factor window" in two ways:

If the two lists are known to have the same scale and rotation, the user may use the identity keyword to force an initial guess at the TRANS: a linear model with

     a=0.0   b=1.0   c=0.0
     d=0.0   e=0.0   f=1.0
The program will use this initial TRANS structure to match up the items in the two lists, rather than trying to figure out the transformation itself. After looking for matched items with this fixed model, the program will use the matched pairs it finds to define an improved TRANS.

In a similar vein, if the user knows that the scale and rotation of the two sets of objects are identical, but there is a fixed and known translation of XX units in the x-direction and YY units in the y-direction, he can use the identity keyword with additional arguments

     xsh=XX  ysh=YY
which will create an initial TRANS with
     a=XX    b=1.0   c=0.0
     d=YY    e=0.0   f=1.0

Finally, for ultimate control, the user may specify the exact values of the coefficients for the initial TRANS via the intrans=filename option. The program will look for an ASCII text file with the given filename, which must have the following format:

In any case, if the user specifies the initial TRANS, either via the identity or intrans method, that initial model is used to convert the coordinates of objects in list A into the system of list B. The program then looks for matches within the matchrad units. It then uses only the matched pairs to generate an improved TRANS model.

On the other hand, if the user does NOT provide an initial TRANS model, the program searches for one itself. After its first guess, the code matches up candidate pairs of stars from the two lists. It then enters a loop in which it calculates a plate solution, applies the solution to the stars in list A, then compares the predicted positions from list A to the actual positions of stars in list B. Based on those residuals, the code marks some of the pairs as "bad", discards them, and goes back to the top of the loop. The "max_iter" and "halt_sigma" arguments can be used to control the number of times the code enters this loop and discards possible matches. Note that, if one uses the project_coords routine to turn (RA, Dec) positions from a catalog into an input list, the units will of this list will be radians; hence, the residuals from a solution against this list will be in radians squared. A typical residual of 1 arcsec corresponds to "halt_sigma" of 2.4e-11 radians squared.


OUTPUT: THE TRANS AND MEDTF STRUCTURES

This "match" can send information directly to the screen (as discussed in this section), or to auxiliary output files (as discussed in the next section). If any errors occur during execution, the program will print messages to the stdout stream.

TRANS structure

The only ordinary information sent to stdout is a description of the TRANS structure, which converts coordinates in the file A to those in file B. A TRANS structure describes the transformation between coordinates (x, y) and (x', y'):

If the user has chosen the 'linear' option (which is the default),

       x' = A + Bx + Cy
       y' = D + Ex + Fy

If the user has chosen the 'quadratic' option,

       x' =  A + Bx + Cy + Dxx + Exy + Fyy
       y' =  G + Hx + Iy + Jxx + Kxy + Lyy

If the user has chosen the 'cubic' option,

       x' =  A + Bx + Cy + Dxx + Exy + Fyy + Gx(xx+yy) + Hy(xx+yy)
       y' =  I + Jx + Ky + Lxx + Mxy + Nyy + Ox(xx+yy) + Py(xx+yy)

The coefficients A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P are the members of the TRANS structure. In the case of a linear solution, one can interpret these values as follows:

     - a, d: describe the translation between the two systems
     - b,c,d,e: describe rotation and magnification; if there
                is a good transformation, they should satisfy

                       b = +/-f      c = -/+e

                magnification = sqrt(b*b + c*c);
                rotation angle = atan(c/b);

Thus, in the example given above, in order to match the coordinates of the items in the first list to those in the second list, one must

  1. rotate by atan(-0.01/13.78) = 0.04 degrees
  2. multiply by sqrt(13.78*13.78 + 0.01*0.01) = 13.78
  3. add (-4543, 5918)

There are five additional members which describe the quality of the fit. The program prints out their values in a form like this (it prints out only the terms appropriate for the given order of the transformation):

TRANS: a=0.043180410     b=0.706908898     c=-0.706603954    d=0.053624825 
       e=0.706348660     f=0.708975359     sig=6.7655e-03    Nr=15    Nm=39
       sx=5.8321e-01     sy=2.3939e-01

The final five elements always appear. They are

Note the difference between sig, on the one hand, and sx and sy on the other hand: the former refers to (relatively few) pairs used to DEFINE the transformation, whereas the latter refer to the (many) pairs which appear after items in file A are transformed into coord system of file B.

MEDTF structure

This information is produced only if the user requests it specifically, via the medtf and/or medsigclip options. The MEDTF structure contains information relevant to a particular case of matching: two lists which have the same scale and rotation, and are simply translated. It arises so frequently that there are special routines to measure how well one can match up these lists.

Suppose that two lists can be registered by a simply shift. After registration, how many items in the two lists match --- and how well do their positions agree? We measure the following quantities and then set the corresponding elements of the MEDTF structure:

The MEDTF information is printed to stdout (if requested) in the following format (it all appears on a single line, which I have broken into two pieces for clarity):

MEDTF: mdx=1.000000000     mdy=0.000000000     adx=0.000000000     
       ady=0.052631579     sdx=0.973328527     sdy=0.686231832     n=19 
The MEDTF line appears before the TRANS line in the output.

If the user specifies the medsigclip option, like so:

       match filea.dat 1 2 3 fileb.dat linear medsigclip=3.0
then an extra set of calculations are made after the two sets have been matched:
  1. Calculate the median, average, stdev of differences between positions, after matching. If medsigclip is not given, this is the final step of processing.
  2. All pairs with distances more than the given number of standard deviations from the median (in either direction) are discarded.
  3. Re-calculate the median, average, stdev of differences for the surviving pairs.

Strictly speaking, the medtf step requires re-calculating the positions of all objects in list A, after a valid TRANS has been found. Therefore, the user should always specify recalc together with medtf. However, the code will recognize the problem if medtf alone is given on the command-line, and set the recalc option itself. The program will print a warning message to stdout explaining its actions.

Likewise, the medsigclip option presupposes medtf (and, hence, recalc); once again, the program will recognize the situation and set the required flags itself, giving warnings to stdout.


OUTPUT: LISTS OF MATCHED AND UNMATCHED ITEMS

The second type of output is the creation of 4 output files. The files have names with the same base, which may be specified by the user via the "outfile=" option, or defaults to "match". There are 4 files, each of which contains 4 columns of numerical information:

      column 1:   an ID number, unique to each list
      column 2:   "x" coordinate, in the system of list B
      column 3:   "y" coordinate, in the system of list B
      column 4:   "mag" coordinate

The are 4 files with this format. They are named (by default)

      matched.mtA    items from list A which had matches in list B
      matched.mtB    items from list B which had matches in list A
      matched.unA    items from list A which had no matches in list B
      matched.unB    items from list B which had no matches in list A

The files "matched.mtA" and "matched.mtB" ought to have the same number of entries. The two files of unmatched items need not have the same number of entries.

If the user has specified the "id1" and "id2" arguments, then the ID numbers in these output files are taken from the specified columns in the input files. Otherwise, the program creates ID numbers which start at 0 in list A, and increase monotonically to N-1, where "N" is the number of stars in list A; the ID numbers then start at N for the first star in list B, and continue to increase to the end of list B.


HELPFUL HINTS

If you know the relative scale of the two sets of objects, you should use the scale, or min_scale and max_scale, options to narrow down the possible matches.

In the very common case that the two sets of objects have (nearly) identical scale and rotation, and little translation, use the identity option. This will fail if there is an offset of more than the matchrad units between the lists; if you know the offset even roughly, use the xsh= and ysh= options to improve the changes of finding a good match.

If the input are lists of stars from astronomical FITS images, one might use the FITS header keywords to determine the relative scale, rotation and/or translation, create a file with the appropriate TRANS coefficients, and then run the program with the intrans= option.

The algorithm creates a set of all the triangles which can be formed from the 'nobj' brightest members of each list: this set grows as 'nobj' to the sixth power (!), so that even a small increase in 'nobj' can lead to large increases in run time. The 'recalc' option can help: one may create two very large lists of candidates -- say, 200 items in each -- and then set 'nobj' to some small subset of the total. If all goes well, the 'nobj' candidates will provide a rough transformation without taking too long. That transformation may be applied to all 200 objects in each list, and then a better plate solution can be calculated, quickly, using all the objects which match.

For reference, given lists of 525 (detected in an image), and 1338 (from the Guide Star Catalog) stars, and asked to use 40 stars from each list in the matching process, 'match' took the following times to run: MM:SS

   Compaq Aero    Intel 486/25           4 Meg RAM        10:17
   PC clone       Intel Pentium 90      32 Meg RAM         0:23
   PC clone       Intel PentiumII 233  128 Meg RAM         0:02
   Sun 4/50       4.2 MFlop ?           16 Meg RAM         0:09
   SGI            150 MHz IP22/R4400   192 Meg RAM         0:01

It is possible to get a lot of information on the program's progress by setting the "#define DEBUG" flag in the source code. Setting the "#define DEBUG2" flag yields a LOT of information.

Be careful to ensure that the two lists DO include stars in common, within the 'nobj' brightest in each. If one list includes stars much brighter than the other, the 20 brightest (for example) might not appear at all in the other list ... in which case the matching algorithm will fail.

The program does the best it can -- it may report a TRANS even if a good match does not exist. To verify that the match is a valid one, make sure that the output ".mtA" and ".mtB" files contain more than 1 or 2 entries. If the TRANS coefficients don't satisfy b=+/-f and c=+/-e, it is likely that any reported matches are spurious.

Look at the file "atpmatch.h" for some constants that affect program operation. The #define'd constant AT_MATCH_REQUIRE, for example, sets the minimum number of points in each list which must exist.

It may be possible to improve the program's performance on some particular sort of data by tweaking some of the #define'd values in "atpmatch.h". In particular,

#define AT_MATCH_PERCENTILE   0.65
#define AT_MATCH_NSIGMA       2.5
control an interative sigma-clipping loop which discards the worst stars in the current set of matches. A low value of AT_MATCH_NSIGMA will cause the loop to iterate many times, discarding lots of stars, even if they aren't much worse than the average. A low value of AT_MATCH_PERCENTILE will do the same, by lowering the percentile used as a stand-in for standard deviation and causing more pairs to qualify as "bad".

Another way to improve the performance on a particular dataset is to fiddle with the "max_iter" and "halt_sigma" arguments. It may be a good idea to start with very low values for "max_iter", and large values for "halt_sigma", and look at the solutions by #define'ing DEBUG in "atpmatch.c". One may then increase "max_iter", and decrease "halt_sigma", and see how the solutions improve.

The two lists of stars must have rectilinear coordinates for this program to find a transformation connecting them. Note that (RA, Dec) is, in general, NOT such a system. It is usually necessary to project (RA, Dec) onto a plane tangent to the sphere at the center of the field. However, for a field very close to the celestial equator, (RA, Dec) can be close enough to rectilinear for matching purposes.

The matching procedure works best if the magnitude of the two coordinate systems is not vastly different. That is, these two sets of objects, with a scale factor of 1.4, are easily matched:

          list A                       list B
       ( 50, 100)                   ( 70, 140)
       ( 20,  60)                   ( 28,  84)
       (-30,  40)                   (-42,  56)

whereas these two sets, with a scale factor of 140000, might fail to be matched:

          list A                       list B
     ( 0.00050, 0.00100)            ( 70, 140)
     ( 0.00020, 0.00060)            ( 28,  84)
     (-0.00030, 0.00040)            (-42,  56)

It may help to multiply one set of values by a constant, so that they are approximately the same magnitude as the other set. It may also help to add a constant to one set so that it has roughly the same origin as the other set.


project_coords

One often wants to match objects detected in an image with those listed in a catalog. The detected objects usually have coordinates from the detector, (row, col) or (x, y), which are really projections of the celestial position onto a plane tangent to the celestial sphere, centered near the middle of the image. The catalog objects, on the other hand, usually have coordinates in Right Ascension (RA) and Declination (Dec) which are in the spherical system. In order to match the two sets of coordinates, it helps to project the catalog positions onto a plane, tangent to the celestial sphere at the center of the image. The two sets of projected coordinates may then be fairly compared.

One can find the relationship between spherical coords (RA, Dec) and their projected equivalents (xi, eta) in any good book on spherical trigonometry, such as Smart's Textbook on Spherical Astronomy. One source for this information on the Web is a paper published a few years ago, now available from the Astrophysics Data Systems archive, Astrometry of Single-Chord Occultations: Application to the 1993 Triton Event by Olkin et al., Publications of the Astronomical Society of the Pacific, volume 108, page 202 (1996). See especially the material in section 3.2, on page 204 of the paper.

The project_coords program converts (RA, Dec) coordinates into the coordinates on a tangent plane, called (xi, eta) by convention. The input must be an ASCII list of objects, with one line per object. Each line must have columns of data, two of which must be the RA and Dec in decimal degrees (not sexigesimal HH:MM:SS notation!). The project_coords program will convert these columns of the input list, but leave the others as-is.

Usage is:

          project_coords    starfile racol deccol ra dec [outfile=]
where

So, for example, given a catalog file gsc.cat, which looks like this:

        GSC_0043_23388    33.87533  -15.22973    13.90  star
        GSC_0043_23390    33.88923  -15.12091    14.30  starlike_object
        GSC_0043_23395    33.90756  -15.33952    12.50  star

and given an image of the sky centered at (RA, Dec) = (33.89, -15.20), one might run

         project_coords  gsc.cat 1 2 33.89 -15.20 outfile=gsc.proj

Looking then at the file gsc.proj, we should find

        GSC_0043_23388 -2.470478e-04 -5.188947e-04 13.90 star
        GSC_0043_23390 -1.297375e-05  1.380382e-03 14.30 starlike_object
        GSC_0043_23395  2.955624e-04 -2.435100e-03 12.50 star

The values in the RA and Dec columns have been replaced by their xi and eta equivalents projected onto the tangent plane. Note also that the other columns of information are now separated by exactly one space, which might be different from the input spacing.

The (xi, eta) values have units which might be interpreted as radians from the tangent point of the projection. Astronomers might recall that one radian is about 206,265 arcseconds, and likewise one arcsecond is about 4.85 x 10^(-6) radians.

After project_coords has converted the coords of a catalog list to (xi, eta), one might run the match program on this converted list, and the list of objects detected in the image.


apply_match

As mentioned in the section on project_coords, above, one often projects catalog (RA, Dec) onto a tangent plane before matching against objects detected in an image. The match program can be used to find a good transformation which takes the (row, col) coordinates of detected objects into the (xi, eta) coordinates of catalog objects. Afterwards, one may wish to

  1. apply that transformation to the (row, col) coords of detected objects, bringing them into the (xi, eta) coords on the tangent plane

    and then

  2. de-project the (xi, eta) coords of detected objects to yield their (RA, Dec) values

The apply_match program performs both these steps, transforming a list of detected objects with (row, col) coordinates into a list with (RA, Dec) coordinates. The input must be an ASCII list of objects, with one line per object. Each line must have columns of data, two of which must be the position in (x, y) coordinates of the detector. The apply_match program will convert these columns of the input list, but leave the others as-is.

Usage is:

          apply_match    starfile xcol ycol ra dec 
                         linear|quadratic|cubic
                         a b c d e f [g h i j h k [l m n o]]
                         [outfile=]
where

So, for example, suppose we have an ASCII file of objects detected in an image called stars.cat:

    1    2.30  585.68   2462  10.63   14.575  0.033 
    2    1.71 1864.68   2462   9.51   15.917  0.108 
    3    3.94 1400.51   2462  10.84   16.024  0.119 
where the second and third columns contain the (row, col) positions of each object. We use this as an input to the match program, comparing it to a subset of a catalog extracted at (RA, Dec) = (33.89, -15.20), and find the following linear TRANS structure:
TRANS: a=0.043661422     b=0.707484673     c=-0.707367056    
       d=0.077633855     e=0.708066513     f=0.706880146 

We can then convert all the (row, col) coords of objects into (RA, Dec) by running

    apply_match  stars.cat 1 2  33.89 -15.20  linear   \
                      0.043661 0.707484 -0.70736 0.077633 0.708066 0.706880

which prints to stdout a list like this:

       1  318.81225   43.19368 2462 10.63 14.575 0.033
       2  318.63845   43.05146 2462 9.51 15.917 0.108
       3  318.70546   43.15008 2462 10.84 16.024 0.119

Now the second and third columns contain the transformed and de-projected coords of the each objects in (RA, Dec).

Note that the spacing of the columns may be modified: the program leaves a single space between columns.