Running 2 pipelines simultaneously on "pmdata" takes about the same time (6 hours) as running a single pipeline. But running 3 pipelines simultaneously takes more time (about 8 hours), so we may be starting to reach the limits of disk I/O.
Thanks to Ohsawa-san's work, we can now use the nice computer for processing Tomoe data. Here are some specs for the computer:
The lscpu command provides the following information:
pmdata Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 94 Model name: Intel(R) Xeon(R) CPU E3-1230 v5 @ 3.40GHz Stepping: 3 CPU MHz: 3748.898 BogoMIPS: 6816.00
There are 8 CPUs, each with 4 cores, and 2 threads per core.
We can use the free command to examine the memory:
total used free shared buff/cache available Mem: 16215816 950492 334672 10272 14930652 14531088 Swap: 8191996 740272 7451724
So, that is 16 GB of memory.
The data from the night of 20160411 = Apr 11, 2016, consists of measurements from 8 CMOS chips. Each chip generated 126 "chunks"; a "chunk" is 360 FITS images compressed into a single big file, representing about 3 minutes of data from the sky. The size of a raw "chunk" is about 1.8 TB, and it represents observations covering a span of about 6.5 hours.
The pipeline breaks up the "chunk" into individual FITS images, cleans then, finds and measures stars in each image, then calibrates the stars astrometrically and photometrically. The output is a set of calibrated star lists.
How long does it take the pipeline to run, and what happens if we try to run multiple pipelines in parallel?
The Linux time command reports three values for a process:
I have run the pipeline on the machine "pmdata" in several configurations: one pipeline at a time, 2 pipeline simultaneously, and 3 pipelines simultaneously. Let's look at the results.
One pipeline at a time It took about 6:08 (6 hours, 8 minutes) to process the entire set for chip 0.
Two pipelines simultaneously I tried small tests in which I explicitly forced one pipeline to run on one particular CPU, but the pipeline to run a different particular CPU. The time it took to run tasks in parallel appeared to be the same as if I allowed the OS to pick the CPUs for each job. So, these numbers represent the times when the OS is choosing the CPUs/cores for every process and sub-process.
Overnight on Apr 25/26
chip 1 chip 2 real (HH:MM) 06:57 06:54 user (seconds) 16,493 15,747 sys (seconds) 2,756 2,787 number of stars 4.6 M 4.5 M
Overnight on Apr 26/27. There may have been small changes in some of the pipeline code since the previous results shown above.
chip 6 chip 7 real (HH:MM) 05:27 05:29 user (seconds) 10,333 10,414 sys (seconds) 3,063 3,050 number of stars 4.8 M 4.4 M
Three pipelines simultaneously In this case again, the OS is choosing which processors to use for each pipeline process and sub-process.
Late afternoon and evening, Apr 27
chip 3 chip 4 chip 5 real (HH:MM) 07:48 07:47 08:06 user (seconds) 10,892 10,969 11.571 sys (seconds) 3,575 3,571 3,606 number of stars 4.6 M 4.3 M 5.1 M
Running 2 pipelines simultaneously appears to be nearly as fast as running a single pipeline. That's good.
But running 3 pipelines simultaneously appears to slow down considerably, taking roughly 1.3 times as long as a single pipeline. Of course, the machine does process three entire chips during that time, so the net result is still faster than running 3 pipelines sequentially.
When running 3 pipelines simultaneously, the "sys" portion of the execution time increases. This is expected if the disk I/O is becoming a bottleneck for the processing. The system may have to wait for one pipeline to read data from a disk file before it can read data for a second pipeline, for example.
I will run additional tests over the next few days, including running 4 pipelines simultaneously.