Tomoe Note 010: Building the Tomoe pipeline

Michael Richmond
Jun 19, 2017

This document describes how one can acquire all the software necessary to run a pipeline on a set of Tomoe data, set it up properly on one's computer, and then run the pipeline.


Acquiring the pieces

One must grab the following software packages onto one's computer:

It might be convenient to place all the packages into a single directory. If you do so, after you have unpacked and built them all, that directory might look something like this:


$ /bin/ls
cdsclient-3.84  match-1.0     wcstools-3.9.5
ensemble-1.0    skeleton-0.1  xvista-0.1.13

Note that on my system, the "gnuplot" and "ImageMagick" packages had already been installed in another directory (/usr/bin), so they do not appear here.


Setting up some parameters

The pipeline needs to know where these external software programs are on the computer, so it is necessary to edit one of the "skeleton" scripts and insert this information.

Go to the "skeleton" directory and edit the file tomoe_config.pl. You should see a section like this:


#    XVista: image processing programs
xvista_dir     => "/w/tomoe/xvista",
#    match: matching star lists
match_dir      => "/w/tomoe/match",
#    ensemble: inhomogeneous ensemble photometry 
ensemble_dir   => "/w/tomoe/ensemble",
#    WCS: World Coordinate System tools
wcs_dir        => "/w/tomoe/wcstools/wcstools-3.9.4/bin",
#    CDS: tools to grab star catalog data from CDS websites
cdsclient_dir  => "/w/tomoe/cdsclient/cdsclient-3.83",
#    scripts: home of the scripts in the Tomoe pipeline
script_dir     => "/w/tomoe/skeleton-0.1",
#    gnuplot: home of the gnuplot executable program
gnuplot_dir     => "/usr/bin",
#    convert: home of the ImageMagick 'convert' executable program
convert_dir     => "/usr/bin",

Edit this section so that each line contains the full path to the location of the packages on your system. For example, when I was testing the pipeline, my file looked like this:


#    XVista: image processing programs
xvista_dir     => "/w/tomoe/temp/xvista-0.1.13",
#    match: matching star lists
match_dir      => "/w/tomoe/temp/match-1.0",
#    ensemble: inhomogeneous ensemble photometry 
ensemble_dir   => "/w/tomoe/temp/ensemble-1.0",
#    WCS: World Coordinate System tools
wcs_dir        => "/w/tomoe/temp/wcstools-3.9.5/bin",
#    CDS: tools to grab star catalog data from CDS websites
cdsclient_dir  => "/w/tomoe/temp/cdsclient-3.84",
#    scripts: home of the scripts in the Tomoe pipeline
script_dir     => "/w/tomoe/temp/skeleton-0.1",
#    gnuplot: home of the gnuplot executable program
gnuplot_dir     => "/usr/bin",
#    convert: home of the ImageMagick 'convert' executable program
convert_dir     => "/usr/bin",

The tomoe_config.pl file has some additional parameters which describe general properties of the images; for example, the location and number of overscan rows, or the plate scale in arcseconds per pixel. If the properties of the camera have changed since this software was written, it may be necessary to modify some of these parameters.


Input raw data

The Tomoe camera produces large FITS files which contain a number of individual FITS images. I will use the term "composite" to describe these large, raw datafiles. The word chunk refers to all the measurements recorded in this composite file.

In this example, I'll use a single composite file as the input. On my computer, it sits in the directory /w/tomoe/temp/data:


$ /bin/ls -l /w/tomoe/temp/data
-rwxr-xr-x 1 richmond richmond 1866248640 Dec  1  2016 TMPM0109330.fits

The composite file is named TMPM0109330.fits. This standard name has a particular format:



                TMPM      010933       0     .fits
                 ^           ^         ^       ^
                 |           |         |       |
                 |           |         |       suffix
                 |           |         |
                 |           |        chip index (0-7)
                 |           |
                 |          frame index (always increasing)
                 |
                 prefix 

A typical composite file contains 360 images from a single chip. For images with an exposure time of 0.5 seconds, the entire file covers a span of 180 seconds = 3 minutes. So, during a night of length 8 hours = 480 minutes, each chip would produce (480 / 3) = 160 frames, and 160 composite files; that means that the "frame index" number would increase by 160 during the course of the night.

Each chip produces separate composite files. The prototype camera used during 2016 had 8 chips, so the chip values range from 0 to 7. When larger versions of the camera are built, the name convention will have to change; perhaps two characters will be used to denote the chip number.


Running the pipeline

In order to run the pipeline, one should create a directory into which all temporary and permanent pipeline output will be placed. For this example, I'll create a directory called work for this purpose.


$ cd /w/tomoe/temp
$ mkdir work
$ /bin/ls
cdsclient-3.84  ensemble-1.0  skeleton-0.1    work
data            match-1.0     wcstools-3.9.5  xvista-0.1.13

Before running the pipeline, we can choose to "de-activate" some sub-sections of it. The entire pipeline has a number of "stages", which are carried out in sequence on every image. There may be times when you wish to perform only one or two of the stages; for example, if you have already run the stage which extracts individual FITS images from the composite files, and saved the FITS images, you can skip that stage.

To "activate" or "de-activate" particular stages, edit the run_scripts.pl file. Around line 148, there is a section which looks like this:


# which pieces of the pipeline do we want to activate?
#
# split composite FITS files into individual images?
$do_split = 1;
# subtract the bias from raw images?
$do_sub_bias = 1;
# find and measure stars in each clean image?
$do_stars = 1;
# add JD to the starlist files?
$do_addjd = 1;
# run the ensemble photometry programs?
$do_ensemble = 1;
# calibrate the ensemble output?
$do_calib_ensemble = 1;
# calibrate the individual .pht files?
$do_calib_pht = 1;
# create diagnostic graphs?
$do_make_graphs = 1;

The value "1" indicates that a stage is "active" and will run. By default, the pipeline will run every stage. In order to "de-activate" a stage, set the value of its line to 0. For example, to skip the stage which splits composite FITS files into individual images, I would edit the file so it looked like this:


# split composite FITS files into individual images?
$do_split = 0;

There are several ways to run the pipeline, but a simple one is as follows:

  1. cd to a directory into which all the pipeline output will go
    
    
        cd work
    
        
  2. invoke the run_scripts.pl script in something like the following manner:
    
    
        perl ../skeleton-0.1/run_scripts.pl  ../data/TMPM0109330.fits 
                  basedir=test_  config_dir=../skeleton-0.1  debug=1   >&  run_scripts.out
    
        

Let's look at the pieces of that rather long command.

On my current machine, which runs on an Intel i7-4790 CPU @ 3.60GHz rated at 7183 bogomips, a single composite FITS file takes about 2.5 minutes to process completely.


Output of the pipeline

If you run the pipeline with a debug level of 1 or higher, then all the temporary files produced will still be present; there can be many hundreds of them for each composite FITS file. In general, these temporary files have names of the form do_something_XXXXX, where the XXXXX are five random alphanumeric characters. A quick way to delete all temporary files is to run the del_tempfiles.pl script:



    perl ../skeleton-0.1/del_tempfiles.pl  

    

If the pipeline is run with debug=0, then it will automatically delete all temporary files after they are no longer needed.

For each input composite FITS file, the pipeline will create a separate directory for all output files. In my example, the output file for frame 010933, chip 0, has name test_0109330. Inside this directory, there should five files per individual FITS image, plus one sub-directory with calibrated quantities.

Note the slight redundancy in this output. Identical measurements of most stars will appear in two places. First, in the .ast file corresponding to each image; every star, even if it is detected only in a single image, will have calibrated measurements in an .ast file. Stars which are detected enough times to become members of the photometric ensemble will ALSO appear in the single solve_Q_xxxxxxx__var.out for this chunk. Since all the measurements for a given star are collected together in the solve_Q_xxxxxxx__var.out file, that is usually a more convenient place to start detailed analysis than the individual .ast files for each image.

A quick way to find out if a run of the pipeline has succeeded is to examine these output files. Check to see if the pff directory exists and contains 5 .png files. If not, some of the processing probably failed. If these .png files do exist, then examine each one briefly and see if it has the proper general form. Below are examples of graphs from a successful run.


Processing an entire night of data

The method described above will process a single composite file, or several such files. But it can be awkward to use on an entire night of data. Therefore, there is another script, run_fields.pl, which is designed to call the basic run_scripts.pl repeatedly on a large number of composite FITS files.

My usual procedure is to consider the data from each chip of the detector as an independent set. So, for example, let's consider all the images taken with chip 0 on the night of 20160411 = April 11, 2016. On my system, these files can be found in the directory /media/root/tomoepm201604/20160411. There are 126 composite FITS files:


  TMPM0108830.fits
  TMPM0108840.fits
  TMPM0108850.fits
    ...
  TMPM0110060.fits
  TMPM0110070.fits
  TMPM0110080.fits

Note that the first file has index number 10883, and the last has index number 11008.

In order to run the pipeline on all these files, we can invoke the run_fields.pl script as follows:



   perl ../skeleton-0.1/run_fields.pl chip=0 base=testa_
       start=10883 end=11008 raw_file_base=TMPM0 
       datadir=/media/root/tomoepm201604/20160411 
       config_dir=../skeleton-0.1 debug=1  >&  run_fields_testa.out

The arguments to this command are


A quick way to check the quality of results

After processing an entire night's worth of data, just for one chip, one may have over 100 directories, each with hundreds of data files. Is there any easy way to find out if the results are good, or if they suffered from bad weather or equipment failure?

Yes. The check_weather.pl script looks at output of the pipeline, computes a number of summary statistics, and creates a single graph which shows several quantities as a function of time during the night. One can learn much about the conditions of the sky and the data with a quick glance at this graph. One can find detailed descriptions of this script and its graphical output at

To run the script, go to the directory in which the sub-directories for each chunk are located. In my example, this is the work directory.



     $ pwd
     /w/tomoe/temp/work
     $ /bin/ls -d testa_*
     testa_0109300  testa_0109360     testa_109310.out  testa_109370.out
     testa_0109310  testa_0109370     testa_109320.out  testa_109380.out
     testa_0109320  testa_0109380     testa_109330.out  testa_109390.out
     testa_0109330  testa_0109390     testa_109340.out  testa_109400.out
     testa_0109340  testa_0109400     testa_109350.out
     testa_0109350  testa_109300.out  testa_109360.out

I ran the pipeline on a dataset containing 11 chunks, numbers 010930 to 010940, all with chip 0 only. There are therefore 11 subdirectories, one for each chunk, with the prefix testa_. There are also 11 pipeline output files, one for each chunk, with the same prefix; the output files have suffix .out.

In order to check the output from these 11 chunks, I can run the check_weather.pl script like so:



     $ perl ../skeleton-0.1/check_weather.pl prefix=testa_ 
             testa_01093?0 testa_0109400 
             config_dir=../skeleton-0.1 
             debug=1 >& check_weather.out

The arguments here are:

After running the script, three new files are created. One is the check_weather.out file with diagnostic information, and the other two contain the graphs produced by this routine. In this case, they are

Note that the names include the (chunk index) + (chip) numbers for the first and last sub-directory included in the analysis.

To illustrate features of the graphs produced by this routine, I'll choose one from Tech Note 007 which contains results for an entire night.

The five panels, from top to bottom, illustrate various features in the reduced data.

In this particular instance, it is clear that conditions were best in the middle of the night.