Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

monitoring #673

Closed
wants to merge 132 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
132 commits
Select commit Hold shift + click to select a range
6934d27
added Dockerfile and GH action
sfarrens Nov 16, 2022
82bc72a
added make dependency
sfarrens Nov 16, 2022
29c609e
docker testing
martinkilbinger Nov 1, 2023
da18970
Merge remote-tracking branch 'upstream/develop' into docker_image
martinkilbinger Nov 3, 2023
5770367
Dockerfile tests
martinkilbinger Nov 8, 2023
3166e3c
Added cent-os to CI tests
martinkilbinger Nov 8, 2023
373722e
Added pyproject
martinkilbinger Nov 9, 2023
b025b6a
canfar pyproject update
martinkilbinger Nov 10, 2023
ba135f2
cfis vos configs: missing keywords, DR5 added
martinkilbinger Nov 10, 2023
07cafb4
Testing dockerfile with only conda env installed
martinkilbinger Nov 15, 2023
92a999c
Fixing numpy and astropy versions
martinkilbinger Nov 15, 2023
2b6600d
installation on canfar working: removed most versions from yml file; …
martinkilbinger Nov 17, 2023
b6e5c97
Dockerfile for image 0:7, 0:8
martinkilbinger Nov 17, 2023
bc66565
Installation of sextractor and psfex with conda
martinkilbinger Nov 19, 2023
0b3aac6
script to call curl with canfar contained updated, adding NCORE as ar…
martinkilbinger Nov 19, 2023
69317cb
added init scripts for canfar; job_sp added n_smp for further jobs
martinkilbinger Nov 19, 2023
d35bd69
script to call curl for canfar container: added resources
martinkilbinger Nov 19, 2023
d93a60e
testing input numbers
martinkilbinger Nov 19, 2023
fb83a27
removed debug msg
martinkilbinger Nov 20, 2023
e33972b
numpy error fixed with version upgrade
martinkilbinger Nov 20, 2023
edbec76
Merge branch 'science_portal_run' of github.com:martinkilbinger/shape…
martinkilbinger Nov 20, 2023
9aa16be
Getting ready for exclusive-one-tile processing
martinkilbinger Nov 22, 2023
e5d0335
Added command line option to specify exclusive ID for processing
martinkilbinger Nov 22, 2023
49d8880
Merge remote-tracking branch 'origin/exclusive' into science_portal_run
martinkilbinger Nov 22, 2023
d02347e
curl script exclusive ID
martinkilbinger Nov 24, 2023
12e6b63
Dockerfile conda -> source activate
martinkilbinger Nov 24, 2023
b05f2a5
comment added
martinkilbinger Nov 24, 2023
45942d5
Updated Dockerfile
martinkilbinger Nov 24, 2023
7d1891e
running with exclusive ID
martinkilbinger Nov 24, 2023
02affa8
exp runs
martinkilbinger Nov 25, 2023
d8ad328
local curl script NCORES -> 1
martinkilbinger Nov 25, 2023
60c3cd1
Renamed science-portal scripts (local/remote)
martinkilbinger Nov 28, 2023
0737c0f
trying to run Pi
martinkilbinger Dec 1, 2023
7dba136
Merge remote-tracking branch 'origin/science_portal_run' into exclusive
martinkilbinger Dec 1, 2023
c65d75a
numpy -> 1.22 to avoid asscalar bug
martinkilbinger Dec 1, 2023
1e7b633
aux script to create links for exposure output runs for tile
martinkilbinger Dec 3, 2023
b006368
Merge remote-tracking branch 'origin/science_portal_run' into exclusive
martinkilbinger Dec 3, 2023
85f0c99
Added aux script to update runs log file
martinkilbinger Dec 3, 2023
d0c3d9f
update runs log file script: deal with multiple runs of same module
martinkilbinger Dec 3, 2023
323d44a
run_log: added function get_all_dirs
martinkilbinger Dec 4, 2023
dedbc3f
Fixed new function get_all_dirs
martinkilbinger Dec 4, 2023
e67f549
Fixed (as for MCCD) FITS key bug
martinkilbinger Dec 4, 2023
55677e7
Merge branch 'exclusive' of github.com:martinkilbinger/shapepipe-1 in…
martinkilbinger Dec 4, 2023
b6cb27d
vignet makers: can use last and all in additional input inage directo…
martinkilbinger Dec 4, 2023
096af60
Dockerfile + jupyter, activate
martinkilbinger Dec 8, 2023
3c37250
file handler raises error if no process
martinkilbinger Dec 8, 2023
667eb7b
combine mask outputs
martinkilbinger Dec 8, 2023
bc20ab2
canfar curl command: added kind (tile, exp) as option
martinkilbinger Dec 13, 2023
1b4b3bf
curl remote job script init_run_exclusive_canfar: command line option…
martinkilbinger Dec 13, 2023
bd82f95
curl local command: added -k kind
martinkilbinger Dec 14, 2023
33ea4a1
summary missing ID 32 fixed (?)
martinkilbinger Dec 15, 2023
50df9a7
canfar scripts command line options
martinkilbinger Dec 15, 2023
ba17ebc
curl canfar local script added job, kind
martinkilbinger Dec 15, 2023
394e464
Merge branch 'exclusive' of github.com:martinkilbinger/shapepipe-1 in…
martinkilbinger Dec 15, 2023
94428b4
Merge pull request #4 from martinkilbinger/science_portal_run
martinkilbinger Dec 15, 2023
9ac9dc1
SP ngmxix (job 128) running on canfar
martinkilbinger Dec 17, 2023
1d25a8b
curl canfar local script updated
martinkilbinger Dec 17, 2023
3d2b127
Merge branch 'exclusive' of github.com:martinkilbinger/shapepipe-1 in…
martinkilbinger Dec 17, 2023
ca6b602
curl scripts updated
martinkilbinger Dec 19, 2023
cf69907
Merge branch 'exclusive' of github.com:martinkilbinger/shapepipe-1 in…
martinkilbinger Dec 19, 2023
c643d0c
run summary more OO
martinkilbinger Dec 19, 2023
523b10f
Merge branch 'exclusive' of github.com:martinkilbinger/shapepipe-1 in…
martinkilbinger Dec 19, 2023
53d9d35
update_runs_log_file script: fixed bug when run dir is empty
martinkilbinger Dec 21, 2023
ce1179e
improved canfar job scripts; fixed some make cat bugs
martinkilbinger Dec 21, 2023
a005119
added curl to Dockerimage
martinkilbinger Dec 21, 2023
64f04cf
Merge remote-tracking branch 'origin/exclusive' into exclusive
martinkilbinger Dec 21, 2023
22ee0f3
Remove temp hack from mask
martinkilbinger Dec 21, 2023
c62861d
Merge branch 'exclusive' of github.com:martinkilbinger/shapepipe-1 in…
martinkilbinger Dec 21, 2023
b682e7c
removed unused code from mask
martinkilbinger Dec 21, 2023
9b9dbf1
Merge pull request #3 from martinkilbinger/exclusive
martinkilbinger Dec 21, 2023
dd57fd8
Merge remote-tracking branch 'origin/develop' into develop
martinkilbinger Dec 21, 2023
7fc3645
curl scripts updated
martinkilbinger Dec 23, 2023
15a4b7b
combine psf validation files: preles now with prepare_tiles_for_final…
martinkilbinger Dec 25, 2023
0d56e23
Update post_processing.md
martinkilbinger Dec 25, 2023
fde5982
prepare tiles script loop tests
martinkilbinger Dec 25, 2023
9c5cad4
Merge branch 'p3' of github.com:martinkilbinger/shapepipe-1 into p3
martinkilbinger Dec 25, 2023
ac0a339
curl canfar local script minor change
martinkilbinger Dec 25, 2023
4048bf1
added vos doc md file
martinkilbinger Dec 25, 2023
0a3f80e
Update post_processing.md
martinkilbinger Dec 25, 2023
d81bdf2
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
672001d
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
cc96a0c
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
87bb2b7
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
76c08a6
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
faa0836
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
85f6d91
Update vos_retrieve.md
martinkilbinger Dec 25, 2023
4f81635
combine runs script renamed
martinkilbinger Dec 25, 2023
5574393
Merge branch 'p3' of github.com:martinkilbinger/shapepipe-1 into p3
martinkilbinger Dec 25, 2023
d4cb47c
Update post_processing.md
martinkilbinger Dec 25, 2023
2ffba3b
P3 proceesing to final cat
martinkilbinger Dec 26, 2023
67c966c
Merge branch 'p3' of github.com:martinkilbinger/shapepipe-1 into p3
martinkilbinger Dec 26, 2023
efb2799
Merge branch 'develop' into p3
martinkilbinger Dec 26, 2023
1f36bf5
Merge pull request #5 from martinkilbinger/p3
martinkilbinger Dec 26, 2023
9f81502
config files updated
martinkilbinger Jan 6, 2024
6eb11cf
PSFEx interp runner: allowing all: for ME_DOT_PSF_PDIRS
martinkilbinger Jan 15, 2024
485da58
init run exc script: added -d option; updates
martinkilbinger Jan 15, 2024
ef3a2c9
job sp canfar script: remove old vos upload code
martinkilbinger Jan 15, 2024
a20cb4f
summary create library and param files
martinkilbinger Jan 15, 2024
29b463b
minor changes
martinkilbinger Jan 15, 2024
cb67e8f
curl local script: now working with -e ID and -f file_IDs, in dry and…
martinkilbinger Jan 15, 2024
f0acd65
summary: fixed main path, verbose
martinkilbinger Jan 16, 2024
1bc602b
Started canfar howto
martinkilbinger Jan 16, 2024
d6ff437
Added summary run notebook
martinkilbinger Jan 16, 2024
1a81ae3
psfex_interp: continue instead of error if one of the .psf files not …
martinkilbinger Jan 17, 2024
93d6c6e
fixed symlink config files
martinkilbinger Jan 17, 2024
9c0b9a5
Updated summary run
martinkilbinger Jan 17, 2024
b19ad03
revert to main psfex in link exp for tiles script
martinkilbinger Jan 17, 2024
af38556
curl job script: remoging session logs
martinkilbinger Jan 17, 2024
8c9921f
comments
martinkilbinger Jan 17, 2024
18d104c
Merge pull request #6 from martinkilbinger/P7
martinkilbinger Jan 17, 2024
4276fea
Update canfar.md
martinkilbinger Jan 19, 2024
3f15533
Update canfar.md
martinkilbinger Jan 19, 2024
cfefcfb
Update canfar.md
martinkilbinger Jan 19, 2024
24480e2
Update canfar.md
martinkilbinger Jan 19, 2024
48e6114
Merge pull request #8 from martinkilbinger/martinkilbinger-patch-1
martinkilbinger Jan 19, 2024
cd7c61b
Merge pull request #7 from martinkilbinger/martinkilbinger-canfar-doc
martinkilbinger Jan 19, 2024
9a964c5
updated canfar doc
martinkilbinger Jan 26, 2024
e0100ca
script with akaha lib to count headlerss jobs
martinkilbinger Jan 26, 2024
984ec90
cleaned up curl submit script
martinkilbinger Jan 26, 2024
842e382
minor modifs to summary
martinkilbinger Jan 26, 2024
d9846e9
Removed VM_HOME; jon sp canfar cleaned up
martinkilbinger Jan 26, 2024
c5deb34
Dockerfile cleand up
martinkilbinger Jan 30, 2024
d507780
summary params minor bug fixed
martinkilbinger Feb 1, 2024
b4e0aa0
major bugx fixed: N_SMP was n_SMP, not propagated
martinkilbinger Feb 1, 2024
f8a07d7
major bugx fixed: N_SMP was n_SMP, not propagated
martinkilbinger Feb 1, 2024
30116b8
summary nb
martinkilbinger Feb 1, 2024
9e44fa6
sumamry updated
martinkilbinger Feb 11, 2024
3ab923f
terminal title from within curl script
martinkilbinger Feb 11, 2024
ee5b3ad
Update tiles_P7.txt
martinkilbinger Feb 11, 2024
79cb00a
Merge pull request #9 from martinkilbinger/P7
martinkilbinger Feb 11, 2024
92e4308
merged Dockerfile from docker_image
martinkilbinger Feb 11, 2024
1d928d4
monitoring and job handling scripts; small modifs, output; canfar pip…
martinkilbinger Feb 23, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, macos-latest]
os: [ubuntu-latest, macos-latest, centos-latest]
python-version: [3.9]

steps:
Expand Down
41 changes: 41 additions & 0 deletions .github/workflows/deploy-image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
name: Create and publish a Docker image

on:
push:
branches: ['master']

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v3

- name: Log in to the Container registry
uses: docker/login-action@f054a8b539a109f9f41c372932f1ae047eff08c9
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@98669ae865ea3cffbcbaa878cf57c20bbf1c6c38
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}

- name: Build and push Docker image
uses: docker/build-push-action@ad44023a93711e3deb337508980b4b5e9bcdc5dc
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
57 changes: 57 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
FROM continuumio/miniconda3

LABEL Description="ShapePipe Docker Image"
ENV SHELL /bin/bash

ARG CC=gcc-9
ARG CXX=g++-9

# gcc < 10 is required to compile ww
ENV CC=gcc-9
ENV CXX=g++-9

RUN apt-get update --allow-releaseinfo-change && \
apt-get update && \
apt-get upgrade -y && \
apt-get install apt-utils -y && \
apt-get install make -y && \
apt-get install automake -y && \
apt-get install autoconf -y && \
apt-get install gcc-9 g++-9 -y && \
apt-get install locales -y && \
apt-get install libgl1-mesa-glx -y && \
apt-get install xterm -y && \
apt-get install cmake protobuf-compiler -y && \
apt-get install libtool libtool-bin libtool-doc -y && \
apt-get install libfftw3-bin libfftw3-dev -y && \
apt-get install libatlas-base-dev liblapack-dev libblas-dev -y && \
apt-get install vim -y && \
apt-get install locate -y && \
apt-get install curl -y && \
apt-get install acl -y && \
apt-get install sssd -y && \
apt-get clean

ADD nsswitch.conf /etc/

RUN sed -i '/en_US.UTF-8/s/^# //g' /etc/locale.gen && \
locale-gen
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8

SHELL ["/bin/bash", "--login", "-c"]

COPY ./environment.yml ./
COPY install_shapepipe README.rst setup.py setup.cfg ./
RUN touch ./README.md

RUN conda update -n base -c defaults conda -c defaults
RUN conda env create --file environment.yml

COPY shapepipe ./shapepipe
COPY scripts ./scripts

# Make RUN commands use the new environment:
SHELL ["conda", "run", "-n", "shapepipe", "/bin/bash", "-c"]
RUN pip install jupyter
1 change: 1 addition & 0 deletions auxdir/CFIS/tiles_202106/tiles_P7.txt
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
000.000
052.332
053.331
053.332
Expand Down
94 changes: 94 additions & 0 deletions docs/source/canfar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
# Running `shapepipe` on the canfar science portal

## Introduction

## Steps from testing to parallel running

Before starting a batch remote session job on a large number of images (step 5.),
it is recommended to perform some or all of the testing steps (1. - 4.).


1. Run the basic `shapepipe` runner script to test (one or several) modules in question, specified by a given config file, on one image.
This step has to be run in the image run directory. The command is
```bash
shapepipe_run -c config.ini
```

2. Run the job script to test the job management, on one image.
This step has to be run in the image run directory. The command is
```bash
job_sp_canfar -j JOB [OPTIONS]
```

3. Run the pipeline script to test the processing step(s), on one image.
This step has to be run in the patch base directory.

1. First, run in dry mode:
```bash
init_run_exclusive_canfar.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n
```
2. Next, perform a real run with
```bash
init_run_exclusive_canfar.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n
```

4. Run remote session script to test job submission using docker images, on one image.
This step has to be run in the patch base directory.
1. First, run in dry mode=2, to display curl command, with
```bash
curl_canfar_local.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n 2
```

2. Next, run in dry mode=1, to use curl command without processing:
```bash
curl_canfar_local.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp] -n 1
```
3. Then, perform a real run, to use curl with processing:
```bash
curl_canfar_local.sh -j JOB -e ID -p [psfex|mccd] -k [tile|exp]
```

5. Full run: Call remote session script and docker image with collection of images
```bash
curl_canfar_local.sh -j JOB -f path_IDs -p [psfex|mccd] -k [tile|exp]
```
with `path_IDs` being a text file with one image ID per line.

## Monitoring


### Status and output of submitted job

Monitoring of the currently active remote session can be performed using the session IDs `session_IDs.txt` written by the
remote session script `curl_canfar_local.sh`. In the patch main directory, run
```bash
curl_canfar_monitor.sh events
```
to display the remotely started docker image status, and
```bash
curl_canfar_monitor.sh logs
```
to print `stdout` of the remotely run pipeline script.

### Number of submitted running jobs

The script
```bash
stats_headless_canfar.py
```
returns the number of actively running headless jobs.


## Post-hoc summary

In the patch main directory, run
```bash
summary_run PATCH
```
to print a summary with missing image IDs per job and module.

## Deleting jobs

```bash
for id in `cat session_IDs.txt`; do echo $id; curl -X DELETE -E /arc/home/kilbinger/.ssl/cadcproxy.pem https://ws-uv.canfar.net/skaha/v0/session/$id; done
```
91 changes: 91 additions & 0 deletions docs/source/pipeline_canfar.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
patch="P7"
psf="psfex"
N_SMP=16

# Terminal title
echo -ne "\033]0;$patch\007"

# Run directory
dir=~/cosmostat/v2/pre_v2/$psf/$patch
cd $dir

# Get tile number list
ln -s ~/shapepipe/auxdir/CFIS/tiles_202106/tiles_$patch.txt tile_numbers.txt


# Get images

## Download and link separately

### Download
### Create and link to central image storage directory
mkdir -p ~/cosmostat/v2/data_tiles/$patch
ln -s ~/cosmostat/v2/data_tiles/$patch data_tiles

### Download and move tiles
ln -s ~/shapepipe/example/cfis
mkdir -p output
export SP_RUN=`pwd`

shapepipe_run -c cfis/config_Git_vos.ini
mv -i output/run_sp_Git_*/get_images_runner/output/CFIS.???.???.*fits* data_tiles
rm -rf output/run_sp_tiles_Git_*
update_run_log_file.py
# repeat the above block

### Find exposures; this run can be stopped after Fe
shapepipe_run -c cfis/config_GitFe_symlink.ini

### Download and move exposures

shapepipe_run -c cfis/config_Gie_vos.ini
mv -i output/run_sp_Gie_*/get_images_runner/output/*.fits*fz data_exp
rm -rf output/run_sp_Gie_*
update_run_log_file.py
# repeat the above

### Create links (and re-run Fe, not necessary)
job_sp_canfar.bash -p $psf `cat tile_numbers.txt` -j 1 -r symlink

# Uncompress weights, split exposures into single HDUs
job_sp_canfar.bash -p $psf -n $N_SMP -j 2

# Mask tiles
job_sp_canfar.bash -p $psf -n $N_SMP -j 4

# Mask exposures
job_sp_canfar.bash -p $psf -n $N_SMP -j 8


# Tile detection
curl_canfar_local.sh -j 16 -f tile_numbers.txt -k tile -p $psf -N $N_SMP


# Exposure detection
## Get single-HDU single-exposure IDs
~/shapepipe/scripts/python/summary_run.py

cp summary/missing_job_32_sextractor.txt all.txt
curl_canfar_local.sh -j 32 -f all.txt -k exp -p $psf -N $N_SMP

# Tile preparation
curl_canfar_local.sh -j 64 -f tile_numbers.txt -k tile -p $psf -N $N_SMP

# Tile shape measurement
curl_canfar_local.sh -j 128 -f tile_numbers.txt -k tile -p $psf -N 8

# Merge subcatalogues, and create final cat
job_sp_canfar.bash -p $psf -n 1 -j 256

# Combine all final cats in common output dir as links
combine_runs.bash -c final -p psfex

# Merge all final cats
# (use 192GB RAM)
merge_final_cat -i output/run_sp_combined_final/make_catalog_runner/output -p cfis/final_cat.param -v


# Delete jobs
SSL=~/.ssl/cadcproxy.pem
SESSION=https://ws-uv.canfar.net/skaha/v0/session
for ID in `cat session_IDs.txt`; do echo $ID; curl -X DELETE -E $SSL $SESSION/$ID; done
Loading
Loading