Skip to content

Commit

Permalink
Merge pull request #5 from martinkilbinger/p3
Browse files Browse the repository at this point in the history
P3
  • Loading branch information
martinkilbinger authored Dec 26, 2023
2 parents ac0a339 + efb2799 commit 1f36bf5
Show file tree
Hide file tree
Showing 17 changed files with 1,123 additions and 378 deletions.
80 changes: 16 additions & 64 deletions docs/source/post_processing.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,63 +3,14 @@
This page shows all required steps of post-processing the results from one or
more `ShapePipe` runs. Post-processing combines various individual `ShapePipe`
output files, and creates joint results, for example combining individual tile
catalogues in a large sky area. The output of post-processing is a joint _shape
catalogues into a large sky area. The output of post-processing is a joint _shape
catalogue_, containing all required information to create a calibrated shear
catalogue via _metacalibration_), a joint star catalogue, and PSF diagnostic plots.

Some of the following steps pertain specifically to runs carried out on [canfar](https://www.canfar.net/en),
but most are general.
If main ShapePipe processing happened at the old canfar VM system (e.g. CFIS v0 and v1), go
[here](vos_retrieve.md) for details how to retrieve the ShapePipe output files.

1. Retrieve `ShapePipe` result files

For a local run on the same machine as for post-processing, nothing needs to be done.
In some cases, the run was carried out on a remote machine or cluster, and the resulting `ShapePipe`
output files need to be retrieved.

In the specific case of canfar_avail_results.py, this is done as follows.

A. Check availability of results

A `canfar` job can submit a large number of tiles, whose processing time can vary a lot.
We assume that the submitted tile ID list is available locally via the ascii file `tile_numbers.txt`.
To check which tiles have finished running, and whose results have been uploaded, use
```bash
canfar_avail_results -i tile_numbers.txt -v -p PSF --input_path INPUT_PATH
```
where PSF is one in [`psfex`|`mccd`], and INPUT_PATH the input path on vos, default `vos:cfis/cosmostat/kilbinger/results`.
See `-h` for all options.

B. Download results

All results files will be downloaded with
```bash
canfar_download_results -i tile_numbers.txt -v -p PSF --input_vos INPUT_VOS
```
Use the same options as for same as for `canfar_avail_results`.

This command can be run in the same directory at subsequent times, to complete an ongoing run: Only newer files will be downloaded
from the `vos` directory. This also assures that partially downloaded or corrupt files will be replaced.

Checking the `vos` directorty can be slow for large patches.
To only download files that are not yet present locally (in `.`), first write the missing ones to an ascii file, using again the
script `canfar_avail_results`, but this time with `.` as input path:
```bash
canfar_avail_results -i tile_numbers.txt --input_path . -p PSF -v -o missing.txt
'''
Then, download only the missing files with
```bash
canfar_download_results -i missing.txt --input_vos cosmostat/kilbinger/results_mccd_oc2 -p mccd -v
```
C. Un-tar results
```bash
untar_results -p PSF
```
On success, `ShapePipe` output `fits` and `log` files will be now in various subdirs of the `output` directory.
At this step all required `ShapePipe` resulting output files are available in the current working directory.
2. Optional: Split output in sub-samples
1. Optional: Split output into sub-samples

An optional intermediate step is to create directories for sub-samples, for example one directory
for each patch on the sky. This will create symbolic links to the results `.tgz` files downloaded in
Expand All @@ -70,33 +21,34 @@ At this step all required `ShapePipe` resulting output files are available in th
```
The following steps will then be done in the directory `tiles_W3`.

3. Run PSF diagnostics, create merged catalogue
2. Run PSF diagnostics, create merged catalogue

Type
```bash
post_proc_sp -p PSF
```
to automatically perform a number of post-processing steps. Chose the PSF model with the option
to automatically perform a number of post-processing steps. Choose the PSF model with the option
`-p psfex|mccd`. In detail, these are (and can also be done individually
by hand):

A. Analyse psf validation files
1. Analyse psf validation files

```bash
prepare_star_cat -p PSF
combine_runs -t psf -p PSF
```
with options as for `post_proc_sp`.
This script identifies all psf validation files (from all processed tiles downloaded to `pwd`), creates symbolic links,
merges the catalogues, and creates plots of PSF ellipticity, size, and residuals over the focal plane.
This script creates a new combined psf run in the ShapePipe `output` directory, by identifying all psf validation files
and creating symbolic links. The run log file is updated.

B. Create plots of the PSF and their residuals in the focal plane, as a diagnostic of the overall PSF model.
As a scale-dependend test, which propagates directly to the shear correlation function, the rho statistics are computed,
see {cite:p}`rowe:10` and {cite:p}`jarvis:16`,
3. Merge individual psf validation files into one catalogue. Create plots of the PSF and their residuals in the focal plane,
as a diagnostic of the overall PSF model.
As a scale-dependend test, which propagates directly to the shear correlation function, the rho statistics are computed,
see {cite:p}`rowe:10` and {cite:p}`jarvis:16`,
```bash
shapepipe_run -c /path/to/shapepipe/example/cfis/config_MsPl_PSF.ini
```

C. Prepare output directory
4. Prepare output directory

Create links to all 'final_cat' result files with
```bash
Expand All @@ -105,7 +57,7 @@ At this step all required `ShapePipe` resulting output files are available in th
The corresponding output directory that is created is `output/run_sp_combined/make_catalog_runner/output`.
On success, it contains links to all `final_cat` output catalogues

D. Merge final output files
5. Merge final output files

Create a single main shape catalog:
```bash
Expand Down
51 changes: 51 additions & 0 deletions docs/source/vos_retrieve.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
## Retrieve files from VOspace

This page describes how ShapePipe output files can be retrieved via the Virtual Observatory Space
on canfar. This system was used for the CFIS v0 and v1 runs, and is now obsolete.

1. Retrieve ShapePipe result files

For a local run on the same machine as for post-processing, nothing needs to be done. In some cases, the run was carried out on a remote machine or cluster, and the resulting ShapePipe output files
need to be retrieved.

In the specific case of canfar_avail_results.py, this is done as follows.

1. Check availability of results

A canfar job can submit a large number of tiles, whose processing time can vary a lot. We assume that the submitted tile ID list is available locally via the ascii file tile_numbers.txt. To check
which tiles have finished running, and whose results have been uploaded, use
```bash
canfar_avail_results -i tile_numbers.txt -v -p PSF --input_path INPUT_PATH
```
where PSF is one in [`psfex`|`mccd`], and INPUT_PATH the input path on vos, default `vos:cfis/cosmostat/kilbinger/results`.
See `-h` for all options.

2. Download results

All results files will be downloaded with
```bash
canfar_download_results -i tile_numbers.txt -v -p PSF --input_vos INPUT_VOS
```
Use the same options as for same as for `canfar_avail_results`.

This command can be run in the same directory at subsequent times, to complete an ongoing run: Only newer files will be downloaded
from the `vos` directory. This also assures that partially downloaded or corrupt files will be replaced.

Checking the `vos` directorty can be slow for large patches.
To only download files that are not yet present locally (in `.`), first write the missing ones to an ascii file, using again the
script `canfar_avail_results`, but this time with `.` as input path:
```bash
canfar_avail_results -i tile_numbers.txt --input_path . -p PSF -v -o missing.txt
```
Then, download only the missing files with
```bash
canfar_download_results -i missing.txt --input_vos cosmostat/kilbinger/results_mccd_oc2 -p mccd -v
```

3. Un-tar results
```bash
untar_results -p PSF
```
On success, `ShapePipe` output `fits` and `log` files will be now in various subdirs of the `output` directory.

At this step all required `ShapePipe` resulting output files are available in the current working directory.
4 changes: 2 additions & 2 deletions example/cfis/config_MsPl_psfex.ini
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ LOG_NAME = log_sp
RUN_LOG_NAME = log_run_sp

# Input directory, containing input files, single string or list of names
INPUT_DIR = $SP_RUN/psf_validation_ind
INPUT_DIR = $SP_RUN/output

# Output directory
OUTPUT_DIR = $SP_RUN/output
Expand All @@ -54,7 +54,7 @@ TIMEOUT = 96:00:00
## Module options
[MERGE_STARCAT_RUNNER]

INPUT_DIR = psf_validation_ind
INPUT_DIR = last:psfex_interp_runner

PSF_MODEL = psfex

Expand Down
68 changes: 68 additions & 0 deletions example/cfis/config_Ms_psfex.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# ShapePipe configuration file for post-processing.
# merge star cat.


## Default ShapePipe options
[DEFAULT]

# verbose mode (optional), default: True, print messages on terminal
VERBOSE = True

# Name of run (optional) default: shapepipe_run
RUN_NAME = run_sp_Ms

# Add date and time to RUN_NAME, optional, default: False
RUN_DATETIME = False


## ShapePipe execution options
[EXECUTION]

# Module name, single string or comma-separated list of valid module runner names
MODULE = merge_starcat_runner

# Parallel processing mode, SMP or MPI
MODE = SMP


## ShapePipe file handling options
[FILE]

# Log file master name, optional, default: shapepipe
LOG_NAME = log_sp

# Runner log file name, optional, default: shapepipe_runs
RUN_LOG_NAME = log_run_sp

# Input directory, containing input files, single string or list of names
INPUT_DIR = $SP_RUN/output

# Output directory
OUTPUT_DIR = $SP_RUN/output


## ShapePipe job handling options
[JOB]

# Batch size of parallel processing (optional), default is 1, i.e. run all jobs in serial
SMP_BATCH_SIZE = 4

# Timeout value (optional), default is None, i.e. no timeout limit applied
TIMEOUT = 96:00:00


## Module options
[MERGE_STARCAT_RUNNER]

INPUT_DIR = last:psfex_interp_runner

PSF_MODEL = psfex

NUMBERING_SCHEME = -0000000-0

# Input file pattern(s), list of strings with length matching number of expected input file types
# Cannot contain wild cards
FILE_PATTERN = validation_psf

# FILE_EXT (optional) list of string extensions to identify input files
FILE_EXT = .fits
2 changes: 1 addition & 1 deletion example/cfis/config_exp_Pi.ini
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ OUTPUT_DIR = $SP_RUN/output
[JOB]

# Batch size of parallel processing (optional), default is 1, i.e. run all jobs in serial
SMP_BATCH_SIZE = 2
SMP_BATCH_SIZE = 1

# Timeout value (optional), default is None, i.e. no timeout limit applied
TIMEOUT = 96:00:00
Expand Down
2 changes: 1 addition & 1 deletion scripts/python/link_to_exp_for_tile.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,7 +327,7 @@ def main(argv=None):
patterns = ["run_sp_exp_SxSePsf", "run_sp_exp_Pi"]
for pattern in patterns:
paths, number = get_paths(exp_base_dir, exp_shdu_IDs, pattern)
print(number)
#print(number)

create_links_paths(tile_base_dir, tile_ID, paths, verbose=verbose)

Expand Down
14 changes: 12 additions & 2 deletions scripts/python/summary_run.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def main(argv=None):
list_tile_IDs_dot = get_IDs_from_file(tile_ID_path)

# tile IDs with dashes
list_tile_IDs = replace_dot_dash(list_tile_IDs_dot)
list_tile_IDs = job_data.replace_dot_dash(list_tile_IDs_dot)
n_tile_IDs = len(list_tile_IDs)
n_CCD = 40

Expand Down Expand Up @@ -147,6 +147,16 @@ def main(argv=None):
verbose=verbose,
)


jobs["1024"] = job_data(
"1024",
"run_sp_combined_psf",
["psfex_interp_runner"],
"shdus",
path_left=f"{main_dir}/output",
verbose=verbose
)

job_data.print_stats_header()

for key in "1":
Expand All @@ -169,7 +179,7 @@ def main(argv=None):
print_par_runtime(par_runtime, verbose=verbose)

#for key in ["2", "4", "8", "16", "32", "64", "128"]:
for key in ["128"]:
for key in ["1024"]:
job = jobs[key]
job.print_intro()
job.check_numbers(par_runtime=par_runtime)
Expand Down
Loading

0 comments on commit 1f36bf5

Please sign in to comment.