Skip to content

Commit

Permalink
Merge pull request #1 from allenai/opensource
Browse files Browse the repository at this point in the history
initial commit
  • Loading branch information
pbeukema committed Jul 13, 2023
2 parents f2e0931 + b863022 commit 80ebf1a
Show file tree
Hide file tree
Showing 3,425 changed files with 7,262 additions and 0 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
16 changes: 16 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
.git
.gitlab-ci.yml
.dockerignore
dist
Dockerfile
docker-compose.yml
**/.ipynb_checkpoints
**/.pytest_cache
**/__pycache__
**/*.ipynb
**/__pycache__
tests/test_outputs/
tests/test_files/dev
tests/test_files/full_moon
tests/test_files/new_moon
src/feedback_model/vvd_annotations/**
2 changes: 2 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
docs/**
example/sample_response.json
28 changes: 28 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Lint (type checking, security, code quality, ruff)

on:
push:
branches:
- "main"
pull_request:
branches:
- "main"

jobs:
linting:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3
with:
lfs: false

- name: Set up Python 3.10
uses: actions/setup-python@v4
with:
python-version: 3.10.10

- name: Linting
run: |
pip install pre-commit interrogate
pre-commit run --all-files
46 changes: 46 additions & 0 deletions .github/workflows/push_image.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
name: Create and publish a Docker image

on:
push:
branches:

env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

jobs:
build-and-push-image:
runs-on: ubuntu-latest
permissions:
contents: read
packages: write

steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
lfs: false

- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@v4
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=sha,format=short
type=raw,value=latest,enable={{is_default_branch}}
- name: Build and push Docker image
uses: docker/build-push-action@v3
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
53 changes: 53 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
name: Run tests

on:
push:
branches:
- "main"
pull_request:
branches:
- "main"

jobs:
unit-tests:
runs-on: ubuntu-latest
env:
COMPOSE_FILE: docker-compose.yml

steps:
- name: Checkout repository
uses: actions/checkout@v3
with:
lfs: true

- name: Checkout LFS objects for test cases
run: git lfs checkout

- name: Log in to the Container registry
uses: docker/login-action@v2
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}

- name: Build docker images
env:
EARTHDATA_TOKEN: "${{ secrets.EARTHDATA_TOKEN }}"
run: |
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker compose build
- name: Run unit and integration tests
env:
EARTHDATA_TOKEN: "${{ secrets.EARTHDATA_TOKEN }}"
run: |
docker compose run test pytest --ignore=/src/tests/test_main.py -vv
- name: Start server and test sample request
env:
EARTHDATA_TOKEN: "${{ secrets.EARTHDATA_TOKEN }}"
run: |
COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 docker compose -f docker-compose.yml up -d
sleep 5
docker ps -a
docker compose -f docker-compose.yml exec -T test pytest tests/test_main.py -vv
22 changes: 22 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.DS_Store
*.ipynb
.ipynb_checkpoints/
__pycache__/
debug/
.venv/
.vscode/
tests/test_outputs
*nc.aux.xml
dataset/
**/wandb
src/feedback_model/viirs_classifier/**/*.jpeg
single_file_inference.py
creds.json
tests/test_files/dev/
.coverage
docs/data_card.md
tests/.coverage
tests/test_files/chips/
tests/test_files/full_moon/
tests/test_files/new_moon/
src/feedback_images/
77 changes: 77 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-yaml
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-json
- id: mixed-line-ending
- id: requirements-txt-fixer
- id: pretty-format-json
args: ["--autofix"]
- id: check-case-conflict
- id: check-docstring-first
- id: check-added-large-files
- id: check-ast
- id: check-byte-order-marker
- id: check-executables-have-shebangs
- id: check-merge-conflict
- id: check-toml
- id: debug-statements
- id: detect-aws-credentials
args: [--allow-missing-credentials]
- id: detect-private-key
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.1.1
hooks:
- id: mypy
args:
[
--install-types,
--ignore-missing-imports,
--disallow-untyped-defs,
--ignore-missing-imports,
--non-interactive,
]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: detect-private-key
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: check-added-large-files
- repo: https://github.com/PyCQA/bandit
rev: "1.7.5"
hooks:
- id: bandit
exclude: ^tests/
args:
- -s
- B101
- repo: local
hooks:
- id: interrogate
name: interrogate
language: system
entry: interrogate
types: [python]
args:
[
--ignore-init-method,
--ignore-init-module,
-p,
-vv,
src,
--exclude,
src/feedback_model/wandb/,
--exclude,
.ipynb_checkpoints/,
--fail-under=90,
]
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: "v0.0.257"
hooks:
- id: ruff
exclude: docs/openapi.json
55 changes: 55 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Contributing

We want to make contributing to this project as easy and transparent as possible. If you identify a bug or have a feature request please open an issue. If you discover a new method or an improvement on the models, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement" if you discover an issue but don't have a solution.

## Issues

Please ensure your description is clear and has sufficient instructions to be able to reproduce the issue. The recommended issue format is:

---

#### To Reproduce

`How to reproduce the issue.`

#### Expected behavior

`Expected output.`

#### Environment

`Your environment.`

---

## Developer environment

Test project:

```bash
pytest .
```

Lint project:
Linting is required for PRs. Lint via ruff or use the provided pre commit hooks.

### Precommit hooks

Hooks can be installed from .pre-commit-config.yaml. For example:

1. `$ pip install pre-commit`
2. `$ pre-commit install`

## Pull Requests

We actively welcome your pull requests.

1. Fork the repo and create your branch from `main`.
2. If you've changed APIs, update the documentation.
3. Ensure the test suite passes (`pytest`). These are also required for PRs.
4. Make sure your code lints (`ruff`). This is also required for PRs.

## Coding Style

We use
[![Code style: ruff](https://github.com/astral-sh/ruff)](https://github.com/astral-sh/ruff)
20 changes: 20 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
FROM ubuntu:22.04@sha256:67211c14fa74f070d27cc59d69a7fa9aeff8e28ea118ef3babc295a0428a6d21

RUN apt-get update -y
RUN apt-get install ffmpeg libsm6 libxext6 -y

RUN apt-get install libhdf5-serial-dev netcdf-bin libnetcdf-dev -y

RUN apt-get update && apt-get install -y \
python3-pip

COPY requirements/requirements.txt requirements.txt

RUN pip3 install --no-cache-dir --upgrade -r requirements.txt

WORKDIR /src

COPY ./src /src
COPY ./tests /src/tests

CMD ["python3", "main.py"]
54 changes: 54 additions & 0 deletions data.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@

## Data for inference
There are two required datasets for inference, the light intensity data (*DNB_NRT) and supporting data including geolocation, moonlight illumination, and other files used during inference. In addition to these two data sources, there are several optional datasets that are used to improve the quality of the detections. The optional datasets are cloud masks (CLDMSK_NRT) and additional bands (MOD_NRT) used for gas flare identification and removal. The DNB and MOD datasets are provided in near real time through [earthdata](https://www.earthdata.nasa.gov/learn/find-data/near-real-time/viirs) and the cloud masks are provided in near real time through [sips-data](https://sips-data.ssec.wisc.edu/nrt/). The urls for each dataset and satellite is below. Note that downloads require a token, if using the API. Register for the API and create a token at [earthdata](https://urs.earthdata.nasa.gov/).
Suomi NPP (NOAA/NASA Suomi National Polar-orbiting Partnership)
| File | SUOMI-NPP | NOAA-20 |
|-------------------------------|-----------------------------------------------------------------------|----------|
| Day/Night Band (DNB) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP02DNB_NRT) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ102DNB_NRT) |
| Terrain Corrected Geolocation (DNB) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP03DNB_NRT)| [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ103DNB_NRT)|
| Clear sky confidence | [url](https://sips-data.ssec.wisc.edu/nrt/CLDMSK_L2_VIIRS_SNPP_NRT) | [url](https://sips-data.ssec.wisc.edu/nrt/CLDMSK_L2_VIIRS_NOAA20_NRT)|
| Gas Flares Band | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP02MOD_NRT/) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ102MOD_NRT/)|
| Terrain Corrected Geolocation (MOD) | [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VNP03MOD_NRT/)| [url](https://nrt3.modaps.eosdis.nasa.gov/archive/allData/5200/VJ103DNB_NRT/)|

## Downloading data
1. Register an account on earthdata and download a token: https://www.earthdata.nasa.gov/learn/find-data
2. Set this token in your environment e.g. (export EARTHDATA_TOKEN=$DOWNLOADED_TOKEN)
3. Download data for each img_path (DNB, GEO data, and cloud masks are required with the default configuration on and around full moons)
```python
TOKEN = f"{os.environ.get('EARTHDATA_TOKEN')}"
with open(dnb_path, "w+b") as fh:
utils.download_url(img_path, TOKEN, fh)
```
Sample data can be found in the test_files directory. The example requests reference data within test_files.
## API documentation
The API schema is automatically generated from src.utils.autogen_api_schema. The schema is written to docs/openapi.json (open in openapi editor such as swagger: https://editor.swagger.io/). Documentation and additional examples are available at http://0.0.0.0:5555/redoc after starting server. Example data is located in test_files.

To regenerate the schema:
```bash
python -c 'from src import utils; utils.autogen_api_schema()'
```

## Tuning the model
Parameters are defined in src/config/config.yml. Within that config, there are in line comments for the most important parameters, along with recommendations on appropriate ranges to tune those values in order to achieve higher precision or higher recall.

By default, the model filters out a variety of light sources and image artifacts that cause false positive detections. These filters are defined in pipeline section, and can be turned off or on within the config. By default, there are filters for auroral lit clouds, moonlit clouds, image artifacts (bowtie/noise smiles, edge noise), near shore detections, non-max suppression, lightning, and gas flares.

## Generate a labeled dataset
There are two types of training datasets. The first contains bounding box annotations for each detection in a frame. The second contains image level labels (crops of detected vessels) for training the supervised CNN referenced in src/postprocessor.

To generate a new object detection dataset:

1. Create account at https://nrt3.modaps.eosdis.nasa.gov/
2. Download earthdata token by clicking on profile icon and "Download token"
3. Build and run docker container with an an optional mounted volume:
```bash
docker run -d -m="50g" --cpus=120 --mount type=bind,source="$(pwd)"/target,target=/src/raw_data skylight-vvd-service:latest
```
4. Set this token in your environment: e.g. ```export EARTHDATA_TOKEN=YOUR_DOWNLOADED_TOKEN_FROM_STEP_2```
5. Annotate the data from within the docker container using ```python src/gen_object_detection_dataset.py```


To generate a new image label dataset:
1. Use src/gen_image_labeled_dataset.py. Sample imagery to train the feedback model is contained within the feedback_model/viirs_classifier folder

Note that a sample dataset of ~1000 detections (<1 GB) has been provided within this repository.
14 changes: 14 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
version: "3.9"
services:
test:
hostname: vvd-test
environment:
- EARTHDATA_TOKEN=${EARTHDATA_TOKEN}
build:
context: .
dockerfile: Dockerfile
extra_hosts:
- "host.docker.internal:host-gateway"
stdin_open: true
ports:
- 5555:5555
Loading

0 comments on commit 80ebf1a

Please sign in to comment.