A simple crawler that tracks the rate of data added to the PHOENIX file system.
- Create a
config.ini
file based on thesample.config.ini
file. - Add the DAG under
crons
directory to thedags
directory in your Airflow installation after applicable modifications.
- Crawls the PHOENIX file system and adds all files in the system to a database. (daily cron)
- the files are tagged with:
- modification date
- size
- modalities (e.g. MRI, EEG, etc.)
- Associated subject ID / Study / Reseach Network
- File type (e.g. CSV, zip, etc.)
- Summarizes the data available per modality and per subject ID / Study / Research Network. (daily cron)
- Sends out a daily Slack message with the delta between the previous day's data and the current day's data (based on 2). (daily cron)
This tool is intended to be used to detect when no new data is being added to the PHOENIX file system, and alert the relevant parties.
- Python 3
- Airflow
- PostgreSQL
- Slack