Aloception-oss is a set of packages for computer vision built on top of popular deep learning libraries: pytorch and pytorch lightning.
Aloscene extend the use of tensors with Augmented Tensors designed to facilitate the use of computer vision data (such as frames, 2d boxes, 3d boxes, optical flow, disparity, camera parameters...).
frame = aloscene.Frame("/path/to/image.jpg")
frame = frame.to("cpu")
frame.get_view().render()
Alodataset implement ready-to-use datasets for computer vision with the help of aloscene and augmented tensors to make it easier to transform and display your vision data.
coco_dataset = alodataset.CocoBaseDataset(sample=True)
for frame in coco_dataset.stream_loader():
frame.get_view().render()
Alonet integrates several promising computer vision architectures. You can use it for research purposes or to finetune and deploy your model using TensorRT. Alonet is mainly built on top of lightning with the help of aloscene and alodataset.
Training
# Init the training pipeline
detr = alonet.detr.LitDetr()
# Init the data module
coco_loader = alonet.detr.CocoDetection2Detr()
# Run the training using the two components
detr.run_train(data_loader=coco_loader, project="detr", expe_name="test_experiment")
Inference
# Load model
model = alonet.detr.DetrR50(num_classes=91, weights="detr-r50").eval()
# Open and normalized frame
frame = aloscene.Frame("/path/to/image.jpg").norm_resnet()
# Run inference
pred_boxes = model.inference(model([frame]))
# Add and display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view().render()
One can use aloscene independently than the two other packages to handle computer vision data, or to improve its training pipelines with augmented tensors.
docker build -t aloception-oss:cuda-11.3-pytorch1.13.1-lightning1.9.3 .
docker run -e LOCAL_USER_ID=$(id -u) --gpus all -it -v /YOUR/WORKSPACE/:/workspace --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix aloception-oss:cuda-11.3-pytorch1.13.1-lightning1.9.3
Or without building the image
docker run -e LOCAL_USER_ID=$(id -u) --gpus all -it -v /YOUR/WORKSPACE/:/workspace --privileged -e DISPLAY=$DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix visualbehaviorofficial/aloception-oss:cuda-11.3-pytorch1.13.1-lightning1.9.3
You first need to install PyTorch 1.10.1 based on your hardware and environment configuration. Please refer to the pytorch website for this installation.
Once this is done, you can run:
pip install git+https://github.com/Visual-Behavior/aloception-oss/
Alternatively, you can clone the repository and use:
pip install -e aloception-oss/
Or setup the repo yourself in your env and install the dependencies
pip install -r requirements.txt
- Getting started
- Aloscene: Computer vision with ease
- Alodataset: Loading your vision datasets
- Alonet: Loading & training your models
- About augmented tensors
- How to setup your data?
- Training Detr
- Finetuning DETR
- Training Panoptic Head
- Training Deformable DETR
- Finetuning Deformanble DETR
- Exporting DETR / Deformable-DETR to TensorRT
Model name | Link | alonet location | Learn more |
---|---|---|---|
detr-r50 | https://arxiv.org/abs/2005.12872 | alonet.detr.DetrR50 | Detr |
deformable-detr | https://arxiv.org/abs/2010.04159 | alonet.deformable_detr.DeformableDETR | Deformable detr |
RAFT | https://arxiv.org/abs/2003.12039 | alonet.raft.RAFT | RAFT |
detr-r50-panoptic | https://arxiv.org/abs/2005.12872 | alonet.detr_panoptic.PanopticHead | DetrPanoptic |
Here is a simple example to get started with Detr and aloception. To learn more about Detr, you can checkout the Tutorials or the detr README.
# Load model
model = alonet.detr.DetrR50(num_classes=91, weights="detr-r50").eval()
# Open and normalized frame
frame = aloscene.Frame("/path/to/image.jpg").norm_resnet()
# Run inference
pred_boxes = model.inference(model([frame]))
# Add and display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view().render()
Here is a simple example to get started with Deformable Detr and aloception. To learn more about Deformable, you can checkout the Tutorials or the deformable detr README.
# Loading Deformable model
model = alonet.deformable_detr.DeformableDetrR50(num_classes=91, weights="deformable-detr-r50").eval()
# Open, normalize frame and send frame on the device
frame = aloscene.Frame("/home/thibault/Desktop/yoga.jpg").norm_resnet().to(torch.device("cuda"))
# Run inference
pred_boxes = model.inference(model([frame]))
# Add and display the predicted boxes
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.get_view().render()
Here is a simple example to get started with RAFT and aloception. To learn more about RAFT, you can checkout the raft README.
# Use the left frame from the Sintel Flow dataset and normalize the frame for the RAFT Model
frame = alodataset.SintelFlowDataset(sample=True).getitem(0)["left"].norm_minmax_sym()
# Load the model using the sintel weights
raft = alonet.raft.RAFT(weights="raft-sintel")
# Compute optical flow
padder = alonet.raft.utils.Padder()
flow = raft.inference(raft(padder.pad(frame[0:1]), padder.pad(frame[1:2])))
# Render the flow along with the first frame
flow[0].get_view().render()
Here is a simple example to get started with PanopticHead and aloception. To learn more about PanopticHead, you can checkout the panoptic README.
# Open and normalized frame
frame = aloscene.Frame("/path/to/image.jpg").norm_resnet()
# Load the model using pre-trained weights
detr_model = alonet.detr.DetrR50(num_classes=250, background_class=250)
model = alonet.detr_panoptic.PanopticHead(DETR_module=detr_model, weights="detr-r50-panoptic")
# Run inference
pred_boxes, pred_masks = model.inference(model([frame]))
# Add and display the boxes/masks predicted
frame.append_boxes2d(pred_boxes[0], "pred_boxes")
frame.append_segmentation(pred_masks[0], "pred_masks")
frame.get_view().render()
Here is a list of all the datasets you can use on Aloception. If you're dataset is not in the list but is important for computer vision. Please let us know using the issues or feel free to contribute.
Dataset name | alodataset location | To try |
---|---|---|
CocoDetection | alodataset.CocoBaseDataset | python alodataset/coco_base_dataset.py |
CocoPanoptic | alodataset.CocoPanopticDataset | python alodataset/coco_panopic_dataset.py |
CrowdHuman | alodataset.CrowdHumanDataset | python alodataset/crowd_human_dataset.py |
Waymo | alodataset.WaymoDataset | python alodataset/waymo_dataset.py |
ChairsSDHom | alodataset.ChairsSDHomDataset | python alodataset/chairssdhom_dataset.py |
FlyingThings3DSubset | alodataset.FlyingThings3DSubsetDataset | python alodataset/flyingthings3D_subset_dataset.py |
FlyingChairs2 | alodataset.FlyingChairs2Dataset | python alodataset/flying_chairs2_dataset.py |
SintelDisparityDataset | alodataset.SintelDisparityDataset | python alodataset/sintel_disparity_dataset.py |
SintelFlowDataset | alodataset.SintelFlowDataset | python alodataset/sintel_flow_dataset.py |
MOT17 | alodataset.Mot17 | python alodataset/mot17.py |
python -m pytest
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.