Tomoe Note 013: Preparing to reduce and analyze Tomoe full data

Michael Richmond
Jun 16, 2019
Sep 08, 2019

This document describes the steps necessary to set up the Tomoe pipeline, so that one can run it on data taken with the full Tomoe camera. These notes replace the ones in Tomoe Note 010 . That document was written several years ago, and the pipeline has changed in some ways to deal with the new data.

Create directories to hold code
Copy the source code for each package
The "run_fields_2019" script
Looking for transients in the output

Create directories for code, and for data

Although it is not strictly necessary, it will make things easier if you create one "top" level directory for Tomoe analysis, and place all the software needed to run the pipeline under this "top." I suggest a structure like this:

These directories are all for the software used to process the data; the "ucac4" directory is a slight exception, as it consists of star catalog files, not code. Because it is larger than the other directories, about 2 GB in size, you might place it in some other location; but it's okay to be underneath the "top" directory as well.

What about directories for the data? There are two types:

raw data should sit in a directory of its own. There is no need to move it to a special place: the pipeline will allow you to set a variable to point to it.
The pipeline will NOT MODIFY any of the raw data files in their own directory. So, it is safe to run the code, and run it again, and again, and again; the original raw data is only read, never written or modified.
"working" data includes versions of partially processed images, and files that the pipeline creates by itself: lists of stars, magnitudes, and so forth. The files in these directories will be modified, so it is best to create a special directory for them. I suggest that you create a directory for this called work. Because there will be a large volume of data in this directory (and sub-directories), it might be necessary to create this directory on a special disk which is not the same as the one used for the source code.

Copy the source code for each software package

The pipeline uses software from a number of different packages, so you must copy the code for each package into its directory. All of these are public, open-source packages. You can either copy the versions I've used in June 2019, or try to download more recent versions of some, if you wish. However, it's possible that more recent versions might break some of the pipeline code.

Below is a list of the packages, together with references to the original websites or authors.

bait contains a few small astronomical programs; for example, one to compute the JD for a given date and time. The programs were written by Michael Richmond in the C language. He can provide copies of the source.

cdsclient is a set of programs which interact with the Centre de Données astronomiques de Strasbourg (CDS) , the people who run SIMBAD and Vizier, as well as many other services for astronomers. The cdsclient package allows a program to download selected data directly from the CDS, over the network.
Early versions of the Tomoe pipeline used this package to download sections of the UCAC4 stellar catalog, in order to perform astrometric and photometric calibration. The current version still has an option to do so, but using the network is slow; so the new version also has an option to use a local, modified version of the UCAC4 catalog instead. That is much faster and more reliable.
In theory, then, one might not need this package to run the pipeline.

ensemble is a set of programs which perform inhomogeneous ensemble photometry on a set of stars in many different images; basically, they try to bring all the images to a common photometric zeropoint. The package was written by Michael Richmond, and can be found at http://spiff.rit.edu/ensemble/ .
The programs are written in the C language. They rely upon the GNU Scientific Library (GSL), so this package must be installed on the computer as well. See the GSL pages for directions on downloading and installing it. The README file in the ensemble package contains special instructions for building the package if the GSL libraries are not in the regular system directories, but in one of the user's own directories instead.

match is a set of programs designed to match a list of stars detected in an image with a list of stars from a catalog. The software will try to find a match even if the coordinate axes of the two lists are different. Michael Richmond wrote this code in the C language. You can find the software at http://spiff.rit.edu/match/.

skeleton is a set of scripts which actually run the pipeline. These scripts call all the other programs to perform various tasks, but they are responsible for carrying out the overall reductions. The scripts were written by Michael Richmond in Perl; he can supply the proper version of the code.

wcstools are designed to carry out operations on FITS files. The pipeline uses only a small number of many programs in this suite, mostly to access and modify values in FITS headers. The package can be found at http://tdc-www.harvard.edu/wcstools/. The code is written in C.

xvista is a set of image processing tools, specially designed to work on astronomical images. Programs in this package are used to find and measure stars in the Tomoe images. Michael Richmond wrote the programs in C. You can find them at the XVista home page: http://spiff.rit.edu/tass/xvista/

ucac4 contains a very modified subset of the UCAC4 astrometric catalog. This version differs from the original in several ways:
- it only contains stars within a range of magnitudes which are suitable for calibrating Tomoe images
- it only includes between Declinations -40 and +90 (since Tomoe can't see stars farther south)
- most of the information for each star has been discarded; the only fields which are kept are ID, RA, Dec, Vmag
- it is in simple ASCII format
An example of the format is shown below, for star UCAC4 568-000021.
```
     568-000021       0.17807      23.57824  17.159
    
```
Stars are collected in groups by Declination, with one datafile for each 0.2 degrees in Declination.

In addition to the packages listed above, the computer on which one runs the pipeline must have the following software installed:

Gnu Scientific Library (GSL), as described above: GNU Scientific Library (GSL). This library contains many mathematical routines, such as linear algebra, optimization, and so forth.
Gnuplot, http://www.gnuplot.info/ for making diagnostic graphs.
The ImageMagic set of utilities for general image processing, https://imagemagick.org/index.php which are used in the diagnostic graphing routines (they can convert the output of Gnuplot into more convenient forms).

The "run_fields_2019" script

The "skeleton_2019" directory contains a set of Perl scripts which process Tomoe data, starting with the raw chunk files and producing -- if all goes well -- lists of stars with (RA, Dec) and calibrated V-band magnitudes.

The file run_scripts.pl holds all the basic pipeline functions within it. It is possible to call it directly from the command line, but it will process just one chunk file from a single sensor, such as TMQ1201905290012133543.fits. and then stop. That is not very useful, except for testing purposes.

Most of the time, the goal is to run the pipeline on many chunks, to reduce the data from multiple sensors and over many minutes or hours. For this purpose, the file run_fields_2019.pl is more useful. One can

edit the file to set a few variables describing the location of raw data, the chips to be analyzed, and the range of chunks
run the script, providing a base name for the location of output
go to the output location and examine the results

Let me illustrate this procedure with an example;

go to the "skeleton_2019" directory

          cd /gwkiso/tomoesn/richmond/skeleton_2019

edit the "run_scripts_2019.pl" file to set a number of parameters. First, the list of sensors to be analyed.
```
          @chip_list = ("11", "12");
     
```
Next, the list of chunks (file index numbers) to be analyzed for those sensors.
```
          my $start_index = 121336;
          my $end_index = 121338;
     
```
We need to tell the script where the raw datafiles are located, and the start of the raw datafile names. Note that the $raw_file_base contains just the first portions of the full filename: "TMQ1" means "quadrant 1 of the camera", and "20190529" is the date. The full file name then contains the chunk number, the sensor ID, and the ".fits" extension.
```
          my $raw_data_dir = "/lustre/tomoesn/realraw/20190529";
          my $raw_file_base = "TMQ120190529";
     
```

create a directory for the output

          mkdir /gwkiso/tomoesn/richmond/work/work_20190529

Go to the that directory

          cd /gwkiso/tomoesn/richmond/work/work_20190529

Pick a name for the dataset you are about to process. You might use different names for the same raw datafile in order to test the results with different parameters such as aperture size for photometry, or minimum signal-to-noise ratio, etc. I'll pick the name "runa" for this set.

Run the "run_fields_2019.pl" script, telling it the name for this dataset, like so:

          perl /gwkiso/tomoesn/richmond/skeleton_2019/run_fields_2019.pl base=runa debug=1 >& runa.out

We will now look at the output of the pipeline. Although some of the items have changed slightly, Tomoe Note 010: Building the Tomoe pipeline provides a good deal of useful information on the output files. So, you might refer to it if the material below is not sufficient.

After several minutes pass, script finishes. The output directory now contains two items:

                runa  runa.out

The first, "runa", is a directory which contains all the results of this processing. We'll look at it in a moment. The second, "runa.out", is a simple text file which contains any messages that the pipeline may have printed as it was performing the work. For example, any error messages will be saved on this text file; that is very useful for figuring out what went wrong.

Let's look in the "runa" sub-directory now. It looks like this:

runa_11_00121335  runa_11_12133511.out  runa_12_00121335  runa_12_12133512.out
runa_11_00121336  runa_11_12133611.out  runa_12_00121336  runa_12_12133612.out
runa_11_00121337  runa_11_12133711.out  runa_12_00121337  runa_12_12133712.out
runa_11_00121338  runa_11_12133811.out  runa_12_00121338  runa_12_12133812.out

Once again, there are pairs of items. Each chunk of raw data has

a sub-directory containing all the results for that chunk
a text file with messages from the pipeline describing its calculations, and containing any error messages

For example, the directory "runa_11_00121335" contains all the results for sensor 11 and chunk 121335; any error messages produced during its analysis can be found in "runa_11_12133511.out".

Here's a picture which may help to illustrate the structure of the output:

       directory for    sub-directory        sub-directory and text file
         one night        one run             for each sensor and chunk
------------------------------------------------------------------------


                                        runa_11_00121335
                                        runa_11_12133511.out
       work_20190529       runa      
                                        runa_11_00121336
                                        runa_11_12133611.out

                                        runa_12_00121335
                                        runa_12_12133511.out

                                               etc.
------------------------------------------------------------------------

Let's pick one sensor and chunk -- sensor 11, and chunk 121335 -- and look at the contents of its subdirectory. If we go to the directory "runa_11_00121335" and list the contents, we'll see:

addheader.log              Q_00121335__0003_fits.pht  Q_00121335__0008.fits
pff                        Q_00121335__0004.fits      Q_00121335__0008_fits.coo
Q_00121335__0000.fits      Q_00121335__0004_fits.coo  Q_00121335__0008_fits.pht
Q_00121335__0000_fits.coo  Q_00121335__0004_fits.pht  Q_00121335__0009.fits
Q_00121335__0000_fits.pht  Q_00121335__0005.fits      Q_00121335__0009_fits.coo
Q_00121335__0001.fits      Q_00121335__0005_fits.coo  Q_00121335__0009_fits.pht
Q_00121335__0001_fits.coo  Q_00121335__0005_fits.pht  Q_00121335__0010.fits
Q_00121335__0001_fits.pht  Q_00121335__0006.fits      Q_00121335__0010_fits.coo
Q_00121335__0002.fits      Q_00121335__0006_fits.coo  Q_00121335__0010_fits.pht
Q_00121335__0002_fits.coo  Q_00121335__0006_fits.pht  Q_00121335__0011.fits
Q_00121335__0002_fits.pht  Q_00121335__0007.fits      Q_00121335__0011_fits.coo
Q_00121335__0003.fits      Q_00121335__0007_fits.coo  Q_00121335__0011_fits.pht
Q_00121335__0003_fits.coo  Q_00121335__0007_fits.pht

There are three files for each individual image in this chunk:

the FITS image: for example, "Q_00121335__0002.fits"
a list of stars found in the image, with (x, y) coordinates: "Q_00121335__0002_fits.coo"
a list of stars found in the image, with uncalibrated, instrumental photometry: "Q_00121335__0002_fits.pht"

There is also one last sub-directory, called "pff". Inside this subdirectory are all the calibrated results. Let's look at it:

multi_match.out            Q_00121335__0004.ast  Q_00121335__0009.ast
multi_Q_00121335__var.out  Q_00121335__0004.pht  Q_00121335__0009.pht
Q_00121335__0000.ast       Q_00121335__0005.ast  Q_00121335__0010.ast
Q_00121335__0000.pht       Q_00121335__0005.pht  Q_00121335__0010.pht
Q_00121335__0001.ast       Q_00121335__0006.ast  Q_00121335__0011.ast
Q_00121335__0001.pht       Q_00121335__0006.pht  Q_00121335__0011.pht
Q_00121335__0002.ast       Q_00121335__0007.ast  solve_Q_00121335__var.cal
Q_00121335__0002.pht       Q_00121335__0007.pht  solve_Q_00121335__var.img
Q_00121335__0003.ast       Q_00121335__0008.ast  solve_Q_00121335__var.out
Q_00121335__0003.pht       Q_00121335__0008.pht  solve_Q_00121335__var.sig

In this directory, there are two files per individual image:

a copy of the uncalibrated photometry "Q_00121335__0002.pht"
a list of stars with (RA, Dec) positions and calibrated V-band photometry "Q_00121335__0002.ast"

In addition, there are summary files containing a list of stars from ALL the images which were part of the ensemble for this set. The ensemble will only contain objects which appeared in at least 2 images. The interesting ensemble outputs are

"solve_Q_00121335__var.img" -- list of each image, its Julian Date at mid-exposure, and the amount by which it was shifted to bring all zeropoints to a common value
"solve_Q_00121335__var.cal" -- one entry for each star in the ensemble, with calibrated average magnitude and an uncertainty in magnitude based on all its measurements
"solve_Q_00121335__var.out" -- a list of all the measurements for each star in every image. Unfortunately, these are uncalibrated magnitudes, but it can still be useful to see in which images a star was detected

Looking for transients in the output

Once the pipeline has run successfully, producing lists of many stars with calibrated positions and magnitudes, it is time to sift through the files to find objects which may have suddenly appeared and then disappeared; in other words, to search for transient objects.

The "skeleton" directory contains two scripts which carry out this search. The first is the important one, and the second simply runs the first on a large set of directories.

The first is transient_a.pl, which scans the output files created by the pipeline to look for objects which appear only briefly. One calls it in the following manner.

go to a directory for some night

          cd /gwkiso/tomoesn/richmond/work/work_20190529

inside this directory there may be several datasets (each reduced with different parameters). Let's assume one of those datasets has the name "runa". To search for transients in a single chunk of this dataset, type a command like
```
          perl /gwkiso/tomoesn/richmond/skeleton_2019/transient_a.pl prefix=runa_ runa_11_00121335 debug=1 >& transient_a.out
     
```
Usually, you will want to search a large number of chunks; perhaps all the chunks generated during a night. You can use wildcard characters in the argument for the chunk to match many chunks at once:
```
          perl /gwkiso/tomoesn/richmond/skeleton_2019/transient_a.pl prefix=runa_ runa_11_???????? debug=1 >& transient_a.out
     
```

This procedure may take several minutes or more to finish, if it is analyzing a large number of chunks.

The most important section of this code, inside the file transient_a.pl, is the subroutine called find_transients. Inside this subroutine are lines which set the parameters used to set the properties of "good" transients. For example, this section



  # parameters of the tests that star must pass to qualify as transient
    my $max_mag = $limiting_mag + 1.0;
    my $max_det = 20;
    my $window_extra = 3;

states that in order to qualify as a "good" transient, an object must

have a magnitude which is no larger than (the limiting magnitude for its chunk) + 1.0
be detected in no more than 20 images in the chunk
appear in all but 3, at most, images between its first and last detection

Since these and other lines of code within the find_transients routine set the properties of the candidates which will be produced, understanding this routine is very important. If you wish to search for a different type of transient object -- perhaps one which remains bright for a longer time -- then you should edit the code in this routine.

The output of the transient_a.pl script is a text file with one line per candidate object. Each line looks something like this:


# trans     1 chunk      0109500 star   226     798   2089 =    196.18406    -16.13120  16.184  0.74      6    349   354

The columns are

a "#" character
the word "trans"
index of this candidate within its chunk: first candidate is 0, next is 1, then 2, etc.
the word "chunk"
chunk ID
the word "star"
index of this star in the ensemble for this chunk
row position of this star in its image (average row, over the ensemble)
col position of this star in its image (average col, over the ensemble)
the "=" character
RA of the star (decimal degrees)
Dec of the star (decimal degrees)
calibrated ensemble magnitude of the star
variability score of the star in the ensemble (roughly, how much did this object vary in magnitude, compared to other objects of similar brightness)
number of detections
index of first image in the chunk in which object is detected
index of final image in the chunk in which object is detected

The second script can help you to look through a long list of candidate transient objects quickly. It takes a list of candidate transient objects as input, and generates an HTML document with both a listing of all the information shown above, plus small image cutouts of each detection of every candidate. This is the show_trans.pl script.

The basic usage is

          perl /gwkiso/tomoesn/richmond/skeleton_2019/show_trans.pl transient_a.out html=1 debug=1 >& show_trans.out

You can find an example of the documents created by this routine in Tech Note 9:

show_trans.html file for night 20160317