Match: a program for matching star lists

Michael Richmond
May 25, 1996
updated July 30, 2000
updated Jan 21, 2001

Table of Contents:


INTRODUCTION:

This program is designed to match up items in two different lists, which may have two different systems of coordinates. The program allows the two sets of coordinates to be related by a linear, quadratic, or cubic transformation. It is an implementation of the algorithm described in Valdes et al., Publications of the Astronomical Society of the Pacific, vol 107, page 1119 (1995).

PREPARING THE INPUT:

The user must place the objects to be matched in ASCII text files, which have a fixed number of columns in each line. Each file must have contain two spatial coordinates, i.e. (row, col) or (x, y), and a "magnitude" for each item. This is a measure of brightness, which has the peculiar property that bright items have SMALLER magnitudes than faint ones. The user may optionally specify ID numbers for each star in a column. An example might look like this:

         5    368.77     3.93 11.369 0.027
         6     45.90     5.33 11.308 0.026
         7    407.27     5.05  6.650 0.001
         8    636.23     7.75 10.715 0.015
         9    338.60     8.90 11.314 0.026

In the example above, the first column contains an "ID" number for each star, the second column contains an "x" coordinate, the third column a "y" coordinate, and the fourth column a "magnitude" coordinate. Note that the brightest item is number 7, and the faintest is number 5.

Other columns may contain data of any type, as long as each column contains no white space (tabs, spaces). The "x", "y" and "magnitude" (and, optionally, "ID") values must appear in the first ten columns, but there may be additional columns beyond the tenth.

The "match" program adopts the C-like convention that the leftmost column in a file is "number 0", the next column is "number 1", and so forth. Thus, in the example above,

              column 0      has ID number
              column 1      has "x" coord
              column 2      has "y" coord
              column 3      has "magnitude" value
              column 4      has "magnitude error" value


USAGE:

Usage is as follows:

      match  fileA xcolA ycolA magcolA  fileB xcolB ycolB magcolB
                    [id1=]  [id2=] [max_iter=] [halt_sigma=]
                    [outfile=] [matchrad=] [trirad=] [nobj=] [scale=]
                    [transonly] [recalc] [linear|quadratic|cubic]

The first 8 arguments are required, and must appear in the order given above. The remaining arguments are optional, and may appear in any order. The arguments are:

     fileA          name of ASCII file containing items in the first list
     xcolA          column in fileA containing "x" value of items
     ycolA          column in fileA containing "y" value of items
     magcolA        column in fileA containing "mag" value of items
     fileB          name of ASCII file containing items in the second list
     xcolB          column in fileB containing "x" value of items
     ycolB          column in fileB containing "y" value of items
     magcolB        column in fileB containing "mag" value of items

     id1=0          column in fileA containing "ID" value of items
     id2=0          column in fileB containing "ID" value of items


     outfile=abc    place output in files with base name "abc"
                           abc.mtA         those in A which do match
                           abc.mtB         those in A which do match
                           abc.unA         those in B which don't match
                           abc.unB         those in B which don't match
     trirad=0.001   critical value for counting triangles as a match
                           (default value 0.002; see below)
     nobj=30        use the brightest 'nobj' values in each list
                           during the matching process (default value 20)
     matchrad=4     after applying transformation to listA, so that both
                           lists are in coordinate system of listB, 
                           items from the two lists must be within this 
                           distance to count as 'matched'
                           (default value 5)
     scale=2.0      if present, only triangles with the given scale factor
                           are counted as matches (in this example, 
                           only triangles in coord system A which are
                           about 2.0 times larger than triangles in coord
                           system B are counted as matches)

     max_iter=3     during the solution process, there is a loop in which
                           the code matches stars in the two lists, finds
			   the residuals for each pair, and discards pairs
			   with big residuals.  This argument sets a limit
			   on the number of times this loop can be entered.
			   Default value is given by AT_MATCH_MAXITER
			   parameter in atpmatch.h, currently 3.
     halt_sigma=1.0e-12
                    during the solution process, if the typical residual
		           between pairs of stars in the two lists drops
			   to this level, stop iterating and declare success.
			   The units are the square of the units in the
			   second list of stars; often, radians squared.
			   Default value is given by AT_MATCH_HALTSIGMA
			   parameter in atpmatch.h, currently 1.0e-12.

     transonly      if present, stop after having derived a transformation
                           which best takes the 'nobj' objects from list 
                           A into objects in list B.  Don't go on to
                           find the matching objects explicitly.
     recalc         if present, after having derived a transformation with
                           the brightest 'nobj' objects, and having applied
                           that transformation to ALL the objects in list A,
                           derive a transformation again, this time using
                           ALL the objects in list A and list B

     linear         use a linear transformation (the default)
     quadratic      use a quadratic transformation
     cubic          use a cubic transformation

Example: given two files with the format shown above, called "starsA.dat" and "starsB.dat", one might call 'match' as follows:

    match starsA.dat 1 2 3  starsB.dat 1 2 3 outfile=matched \
              id1=0 id2=0 matchrad=3.0 trirad=0.001 nobj=40 

The argument "trirad" may need some extra explanation. For a full desciption, see the paper by Valdes et al. In short, stars are grouped into triangles, the sides of which are labelled "a", "b" and "c" in order of decreasing size. The normalized ratios

            b/a      and    c/a

are formed for each triangle. In order for a triangle, t1, from one list to be considered identical to a triangle, t2, in the other list, it must satisfy the equation

     sqrt[ (t1.ba - t2.ba)^2 + (t1.ca - t2.ca)^2 ] <= trirad

So the value of "trirad" has units of neither list; it exists in a two-dimensional "triangle-space" formed by the coordinates b/a and c/a (each of which ranges from 0.0 to 1.0).

Near the start of the solution process, the code matches up candidate pairs of stars from the two lists. It then enters a loop in which it calculates a plate solution, applies the solution to the stars in list A, then compares the predicted positions from list A to the actual positions of stars in list B. Based on those residuals, the code marks some of the pairs as "bad", discards them, and goes back to the top of the loop. The "max_iter" and "halt_sigma" arguments can be used to control the number of times the code enters this loop and discards possible matches. Note that, if one uses the project_coords routine to turn (RA, Dec) positions from a catalog into an input list, the units will of this list will be radians; hence, the residuals from a solution against this list will be in radians squared. A typical residual of 1 arcsec corresponds to "halt_sigma" of 2.4e-11 radians squared.


OUTPUT: THE TRANS STRUCTURE

The output of "match" is of two kinds. First, the program can print information to the screen as it executes. If there are any errors, these will appear on the stdout stream. Normally, the only output is a description of the TRANS structure which converts coordinates in the file A to those in file B. A TRANS structure describes the transformation between coordinates (x, y) and (x', y'):

If the user has chosen the 'linear' option (which is the default),

       x' = A + Bx + Cy
       y' = D + Ex + Fy

If the user has chosen the 'quadratic' option,

       x' =  A + Bx + Cy + Dxx + Exy + Fyy
       y' =  G + Hx + Iy + Jxx + Kxy + Lyy

If the user has chosen the 'cubic' option,

       x' =  A + Bx + Cy + Dxx + Exy + Fyy + Gx(xx+yy) + Hy(xx+yy)
       y' =  I + Jx + Ky + Lxx + Mxy + Nyy + Ox(xx+yy) + Py(xx+yy)

The coefficients A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P are the members of the TRANS structure. The program prints out their values in a form like this (it prints out only the terms appropriate for the given order of the transformation):

TRANS: a=0.043180410     b=0.706908898     c=-0.706603954    d=0.053624825 
       e=0.706348660     f=0.708975359

In the case of a linear solution, one can interpret these values as follows:

     - a, d: describe the translation between the two systems
     - b,c,d,e: describe rotation and magnification; if there
                is a good transformation, they should satisfy

                       b = +/-f      c = -/+e

                magnification = sqrt(b*b + c*c);
                rotation angle = atan(c/b);

Thus, in the example given above, in order to match the coordinates of the items in the first list to those in the second list, one must

  1. rotate by atan(-0.01/13.78) = 0.04 degrees
  2. multiply by sqrt(13.78*13.78 + 0.01*0.01) = 13.78
  3. add (-4543, 5918)


OUTPUT: LISTS OF MATCHED AND UNMATCHED ITEMS

The second type of output is the creation of 4 output files. The files have names with the same base, which may be specified by the user via the "outfile=" option, or defaults to "match". There are 4 files, each of which contains 4 columns of numerical information:

      column 1:   an ID number, unique to each list
      column 2:   "x" coordinate, in the system of list B
      column 3:   "y" coordinate, in the system of list B
      column 4:   "mag" coordinate

The are 4 files with this format. They are named (by default)

      matched.mtA    items from list A which had matches in list B
      matched.mtB    items from list B which had matches in list A
      matched.unA    items from list A which had no matches in list B
      matched.unB    items from list B which had no matches in list A

The files "matched.mtA" and "matched.mtB" ought to have the same number of entries. The two files of unmatched items need not have the same number of entries.

If the user has specified the "id1" and "id2" arguments, then the ID numbers in these output files are taken from the specified columns in the input files. Otherwise, the program creates ID numbers which start at 0 in list A, and increase monotonically to N-1, where "N" is the number of stars in list A; the ID numbers then start at N for the first star in list B, and continue to increase to the end of list B.


HELPFUL HINTS

The algorithm creates a set of all the triangles which can be formed from the 'nobj' brightest members of each list: this set grows as 'nobj' to the sixth power (!), so that even a small increase in 'nobj' can lead to large increases in run time. The 'recalc' option can help: one may create two very large lists of candidates -- say, 200 items in each -- and then set 'nobj' to some small subset of the total. If all goes well, the 'nobj' candidates will provide a rough transformation without taking too long. That transformation may be applied to all 200 objects in each list, and then a better plate solution can be calculated, quickly, using all the objects which match.

For reference, given lists of 525 (detected in an image), and 1338 (from the Guide Star Catalog) stars, and asked to use 40 stars from each list in the matching process, 'match' took the following times to run: MM:SS

   Compaq Aero    Intel 486/25           4 Meg RAM        10:17
   PC clone       Intel Pentium 90      32 Meg RAM         0:23
   PC clone       Intel PentiumII 233  128 Meg RAM         0:02
   Sun 4/50       4.2 MFlop ?           16 Meg RAM         0:09
   SGI            150 MHz IP22/R4400   192 Meg RAM         0:01

It is possible to get a lot of information on the program's progress by setting the "#define DEBUG" flag in the source code. Setting the "#define DEBUG2" flag yields a LOT of information.

Be careful to ensure that the two lists DO include stars in common, within the 'nobj' brightest in each. If one list includes stars much brighter than the other, the 20 brightest (for example) might not appear at all in the other list ... in which case the matching algorithm will fail.

The program does the best it can -- it may report a TRANS even if a good match does not exist. To verify that the match is a valid one, make sure that the output ".mtA" and ".mtB" files contain more than 1 or 2 entries. If the TRANS coefficients don't satisfy b=+/-f and c=+/-e, it is likely that any reported matches are spurious.

Look at the file "atpmatch.h" for some constants that affect program operation. The #define'd constant AT_MATCH_REQUIRE, for example, sets the minimum number of points in each list which must exist.

It may be possible to improve the program's performance on some particular sort of data by tweaking some of the #define'd values in "atpmatch.h". In particular,

#define AT_MATCH_PERCENTILE   0.65
#define AT_MATCH_NSIGMA       2.5
control an interative sigma-clipping loop which discards the worst stars in the current set of matches. A low value of AT_MATCH_NSIGMA will cause the loop to iterate many times, discarding lots of stars, even if they aren't much worse than the average. A low value of AT_MATCH_PERCENTILE will do the same, by lowering the percentile used as a stand-in for standard deviation and causing more pairs to qualify as "bad".

Another way to improve the performance on a particular dataset is to fiddle with the "max_iter" and "halt_sigma" arguments. It may be a good idea to start with very low values for "max_iter", and large values for "halt_sigma", and look at the solutions by #define'ing DEBUG in "atpmatch.c". One may then increase "max_iter", and decrease "halt_sigma", and see how the solutions improve.

The two lists of stars must have rectilinear coordinates for this program to find a transformation connecting them. Note that (RA, Dec) is, in general, NOT such a system. It is usually necessary to project (RA, Dec) onto a plane tangent to the sphere at the center of the field. However, for a field very close to the celestial equator, (RA, Dec) can be close enough to rectilinear for matching purposes.

The matching procedure works best if the magnitude of the two coordinate systems is not vastly different. That is, these two sets of objects, with a scale factor of 1.4, are easily matched:

          list A                       list B
       ( 50, 100)                   ( 70, 140)
       ( 20,  60)                   ( 28,  84)
       (-30,  40)                   (-42,  56)

whereas these two sets, with a scale factor of 140000, might fail to be matched:

          list A                       list B
     ( 0.00050, 0.00100)            ( 70, 140)
     ( 0.00020, 0.00060)            ( 28,  84)
     (-0.00030, 0.00040)            (-42,  56)

It may help to multiply one set of values by a constant, so that they are approximately the same magnitude as the other set. It may also help to add a constant to one set so that it has roughly the same origin as the other set.


project_coords

One often wants to match objects detected in an image with those listed in a catalog. The detected objects usually have coordinates from the detector, (row, col) or (x, y), which are really projections of the celestial position onto a plane tangent to the celestial sphere, centered near the middle of the image. The catalog objects, on the other hand, usually have coordinates in Right Ascension (RA) and Declination (Dec) which are in the spherical system. In order to match the two sets of coordinates, it helps to project the catalog positions onto a plane, tangent to the celestial sphere at the center of the image. The two sets of projected coordinates may then be fairly compared.

One can find the relationship between spherical coords (RA, Dec) and their projected equivalents (xi, eta) in any good book on spherical trigonometry, such as Smart's Textbook on Spherical Astronomy. One source for this information on the Web is a paper published a few years ago, now available from the Astrophysics Data Systems archive, Astrometry of Single-Chord Occultations: Application to the 1993 Triton Event by Olkin et al., Publications of the Astronomical Society of the Pacific, volume 108, page 202 (1996). See especially the material in section 3.2, on page 204 of the paper.

The project_coords program converts (RA, Dec) coordinates into the coordinates on a tangent plane, called (xi, eta) by convention. The input must be an ASCII list of objects, with one line per object. Each line must have columns of data, two of which must be the RA and Dec in decimal degrees (not sexigesimal HH:MM:SS notation!). The project_coords program will convert these columns of the input list, but leave the others as-is.

Usage is:

          project_coords    starfile racol deccol ra dec [outfile=]
where

So, for example, given a catalog file gsc.cat, which looks like this:

        GSC_0043_23388    33.87533  -15.22973    13.90  star
        GSC_0043_23390    33.88923  -15.12091    14.30  starlike_object
        GSC_0043_23395    33.90756  -15.33952    12.50  star

and given an image of the sky centered at (RA, Dec) = (33.89, -15.20), one might run

         project_coords  gsc.cat 1 2 33.89 -15.20 outfile=gsc.proj

Looking then at the file gsc.proj, we should find

        GSC_0043_23388 -2.470478e-04 -5.188947e-04 13.90 star
        GSC_0043_23390 -1.297375e-05  1.380382e-03 14.30 starlike_object
        GSC_0043_23395  2.955624e-04 -2.435100e-03 12.50 star

The values in the RA and Dec columns have been replaced by their xi and eta equivalents projected onto the tangent plane. Note also that the other columns of information are now separated by exactly one space, which might be different from the input spacing.

The (xi, eta) values have units which might be interpreted as radians from the tangent point of the projection. Astronomers might recall that one radian is about 206,265 arcseconds, and likewise one arcsecond is about 4.85 x 10^(-6) radians.

After project_coords has converted the coords of a catalog list to (xi, eta), one might run the match program on this converted list, and the list of objects detected in the image.


apply_match

As mentioned in the section on project_coords, above, one often projects catalog (RA, Dec) onto a tangent plane before matching against objects detected in an image. The match program can be used to find a good transformation which takes the (row, col) coordinates of detected objects into the (xi, eta) coordinates of catalog objects. Afterwards, one may wish to

  1. apply that transformation to the (row, col) coords of detected objects, bringing them into the (xi, eta) coords on the tangent plane

    and then

  2. de-project the (xi, eta) coords of detected objects to yield their (RA, Dec) values

The apply_match program performs both these steps, transforming a list of detected objects with (row, col) coordinates into a list with (RA, Dec) coordinates. The input must be an ASCII list of objects, with one line per object. Each line must have columns of data, two of which must be the position in (x, y) coordinates of the detector. The apply_match program will convert these columns of the input list, but leave the others as-is.

Usage is:

          apply_match    starfile xcol ycol ra dec 
                         linear|quadratic|cubic
                         a b c d e f [g h i j h k [l m n o]]
                         [outfile=]
where

So, for example, suppose we have an ASCII file of objects detected in an image called stars.cat:

    1    2.30  585.68   2462  10.63   14.575  0.033 
    2    1.71 1864.68   2462   9.51   15.917  0.108 
    3    3.94 1400.51   2462  10.84   16.024  0.119 
where the second and third columns contain the (row, col) positions of each object. We use this as an input to the match program, comparing it to a subset of a catalog extracted at (RA, Dec) = (33.89, -15.20), and find the following linear TRANS structure:
TRANS: a=0.043661422     b=0.707484673     c=-0.707367056    
       d=0.077633855     e=0.708066513     f=0.706880146 

We can then convert all the (row, col) coords of objects into (RA, Dec) by running

    apply_match  stars.cat 1 2  33.89 -15.20  linear   \
                      0.043661 0.707484 -0.70736 0.077633 0.708066 0.706880

which prints to stdout a list like this:

       1  318.81225   43.19368 2462 10.63 14.575 0.033
       2  318.63845   43.05146 2462 9.51 15.917 0.108
       3  318.70546   43.15008 2462 10.84 16.024 0.119

Now the second and third columns contain the transformed and de-projected coords of the each objects in (RA, Dec).

Note that the spacing of the columns may be modified: the program leaves a single space between columns.