From e4bc65b25d24ae82a8ff1d462aab9a809d704974 Mon Sep 17 00:00:00 2001 From: Niels Date: Thu, 25 Jan 2024 13:04:25 +0100 Subject: [PATCH 1/6] Add resource --- docs/source/en/model_doc/depth_anything.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/docs/source/en/model_doc/depth_anything.md b/docs/source/en/model_doc/depth_anything.md index adf1ca4639c583..39abb9a7b7921a 100644 --- a/docs/source/en/model_doc/depth_anything.md +++ b/docs/source/en/model_doc/depth_anything.md @@ -94,6 +94,17 @@ If you want to do the pre- and postprocessing yourself, here's how to do that: >>> depth = Image.fromarray(formatted) ``` +## Resources + +A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Depth Anything. + + + +- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). 🌎 +- [Monocular depth estimation task guide](../tasks/depth_estimation) + +If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. + ## DepthAnythingConfig [[autodoc]] DepthAnythingConfig From fd1c346612322fa3138605a032a3a00fecd31c26 Mon Sep 17 00:00:00 2001 From: Niels Date: Thu, 25 Jan 2024 13:12:53 +0100 Subject: [PATCH 2/6] Add more resources --- docs/source/en/model_doc/sam.md | 14 +++++++++++--- docs/source/en/model_doc/siglip.md | 11 +++++++++++ 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/docs/source/en/model_doc/sam.md b/docs/source/en/model_doc/sam.md index e4ef59683be49f..feace522ef70be 100644 --- a/docs/source/en/model_doc/sam.md +++ b/docs/source/en/model_doc/sam.md @@ -94,12 +94,20 @@ masks = processor.image_processor.post_process_masks( scores = outputs.iou_scores ``` -Resources: +## Resources + +A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SAM. - [Demo notebook](https://github.com/huggingface/notebooks/blob/main/examples/segment_anything.ipynb) for using the model. - [Demo notebook](https://github.com/huggingface/notebooks/blob/main/examples/automatic_mask_generation.ipynb) for using the automatic mask generation pipeline. -- [Demo notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Run_inference_with_MedSAM_using_HuggingFace_Transformers.ipynb) for inference with MedSAM, a fine-tuned version of SAM on the medical domain. -- [Demo notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Fine_tune_SAM_(segment_anything)_on_a_custom_dataset.ipynb) for fine-tuning the model on custom data. +- [Demo notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Run_inference_with_MedSAM_using_HuggingFace_Transformers.ipynb) for inference with MedSAM, a fine-tuned version of SAM on the medical domain. 🌎 +- [Demo notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SAM/Fine_tune_SAM_(segment_anything)_on_a_custom_dataset.ipynb) for fine-tuning the model on custom data. 🌎 + +## SlimSAM + +SlimSAM, a pruned version of SAM, was proposed in [0.1% Data Makes Segment Anything Slim](https://arxiv.org/abs/2312.05284) by Zigeng Chen et al. SlimSAM reduces the size of the SAM models considerably while maintaining the same performance. + +Checkpoints can be found on the [hub](https://huggingface.co/models?other=slimsam), and they can be used as a drop-in replacement of SAM. ## SamConfig diff --git a/docs/source/en/model_doc/siglip.md b/docs/source/en/model_doc/siglip.md index 28f96b02f1faf2..cfbc9566517ef6 100644 --- a/docs/source/en/model_doc/siglip.md +++ b/docs/source/en/model_doc/siglip.md @@ -94,6 +94,17 @@ If you want to do the pre- and postprocessing yourself, here's how to do that: 31.9% that image 0 is 'a photo of 2 cats' ``` +## Resources + +A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SigLIP. + + + +- Demo notebooks for SigLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SigLIP). 🌎 +- [Zero-shot image classification task guide](../tasks/zero_shot_image_classification_md) + +If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. + ## SiglipConfig [[autodoc]] SiglipConfig From c3e5868fca2d9bc1fe309eb76b95c179a5fca8f8 Mon Sep 17 00:00:00 2001 From: Niels Date: Mon, 5 Feb 2024 14:39:04 +0100 Subject: [PATCH 3/6] Add resources --- docs/source/en/model_doc/patchtsmixer.md | 10 +++++++--- docs/source/en/model_doc/patchtst.md | 3 +++ 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/docs/source/en/model_doc/patchtsmixer.md b/docs/source/en/model_doc/patchtsmixer.md index fe1de509fd0000..a67138e533b71a 100644 --- a/docs/source/en/model_doc/patchtsmixer.md +++ b/docs/source/en/model_doc/patchtsmixer.md @@ -28,14 +28,14 @@ The abstract from the paper is the following: *TSMixer is a lightweight neural architecture exclusively composed of multi-layer perceptron (MLP) modules designed for multivariate forecasting and representation learning on patched time series. Our model draws inspiration from the success of MLP-Mixer models in computer vision. We demonstrate the challenges involved in adapting Vision MLP-Mixer for time series and introduce empirically validated components to enhance accuracy. This includes a novel design paradigm of attaching online reconciliation heads to the MLP-Mixer backbone, for explicitly modeling the time-series properties such as hierarchy and channel-correlations. We also propose a Hybrid channel modeling approach to effectively handle noisy channel interactions and generalization across diverse datasets, a common challenge in existing patch channel-mixing methods. Additionally, a simple gated attention mechanism is introduced in the backbone to prioritize important features. By incorporating these lightweight components, we significantly enhance the learning capability of simple MLP structures, outperforming complex Transformer models with minimal computing usage. Moreover, TSMixer's modular design enables compatibility with both supervised and masked self-supervised learning methods, making it a promising building block for time-series Foundation Models. TSMixer outperforms state-of-the-art MLP and Transformer models in forecasting by a considerable margin of 8-60%. It also outperforms the latest strong benchmarks of Patch-Transformer models (by 1-2%) with a significant reduction in memory and runtime (2-3X).* - - This model was contributed by [ajati](https://huggingface.co/ajati), [vijaye12](https://huggingface.co/vijaye12), [gsinthong](https://huggingface.co/gsinthong), [namctin](https://huggingface.co/namctin), [wmgifford](https://huggingface.co/wmgifford), [kashif](https://huggingface.co/kashif). +## Usage example + +The code snippet below shows how to randomly initialize a PatchTSMixer model. The model is compatible with the [Trainer API](../trainer.md). -## Sample usage ```python from transformers import PatchTSMixerConfig, PatchTSMixerForPrediction @@ -55,6 +55,10 @@ results = trainer.evaluate(test_dataset) The model can also be used for time series classification and time series regression. See the respective [`PatchTSMixerForTimeSeriesClassification`] and [`PatchTSMixerForRegression`] classes. +## Resources + +- A blog post explaining PatchTSMixer in depth can be found [here](https://huggingface.co/blog/patchtsmixer). The blog can also be opened in Google Colab. + ## PatchTSMixerConfig [[autodoc]] PatchTSMixerConfig diff --git a/docs/source/en/model_doc/patchtst.md b/docs/source/en/model_doc/patchtst.md index a6b8396a286b8c..544e4cb378c6df 100644 --- a/docs/source/en/model_doc/patchtst.md +++ b/docs/source/en/model_doc/patchtst.md @@ -34,6 +34,9 @@ This model was contributed by [namctin](https://huggingface.co/namctin), [gsinth The model can also be used for time series classification and time series regression. See the respective [`PatchTSTForClassification`] and [`PatchTSTForRegression`] classes. +## Resources + +- A blog post explaining PatchTST in depth can be found [here](https://huggingface.co/blog/patchtst). The blog can also be opened in Google Colab. ## PatchTSTConfig From 5c28af38e433da8130891a53bbfc62417d7ad46c Mon Sep 17 00:00:00 2001 From: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Date: Mon, 5 Feb 2024 20:47:26 +0100 Subject: [PATCH 4/6] Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --- docs/source/en/model_doc/depth_anything.md | 2 +- docs/source/en/model_doc/siglip.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/source/en/model_doc/depth_anything.md b/docs/source/en/model_doc/depth_anything.md index 39abb9a7b7921a..d7a2f62f86255a 100644 --- a/docs/source/en/model_doc/depth_anything.md +++ b/docs/source/en/model_doc/depth_anything.md @@ -100,8 +100,8 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h -- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). 🌎 - [Monocular depth estimation task guide](../tasks/depth_estimation) +- A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). 🌎 If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. diff --git a/docs/source/en/model_doc/siglip.md b/docs/source/en/model_doc/siglip.md index cfbc9566517ef6..afc9476c3378d0 100644 --- a/docs/source/en/model_doc/siglip.md +++ b/docs/source/en/model_doc/siglip.md @@ -100,8 +100,8 @@ A list of official Hugging Face and community (indicated by 🌎) resources to h -- Demo notebooks for SigLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SigLIP). 🌎 - [Zero-shot image classification task guide](../tasks/zero_shot_image_classification_md) +- Demo notebooks for SigLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SigLIP). 🌎 If you're interested in submitting a resource to be included here, please feel free to open a Pull Request and we'll review it! The resource should ideally demonstrate something new instead of duplicating an existing resource. From acd18550d40ff95e4a514513b851080655cfbe31 Mon Sep 17 00:00:00 2001 From: Niels Date: Mon, 5 Feb 2024 20:50:03 +0100 Subject: [PATCH 5/6] Remove mention --- docs/source/en/model_doc/whisper.md | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/source/en/model_doc/whisper.md b/docs/source/en/model_doc/whisper.md index e384d2be908c0b..138f2b374bf347 100644 --- a/docs/source/en/model_doc/whisper.md +++ b/docs/source/en/model_doc/whisper.md @@ -31,7 +31,6 @@ The original code can be found [here](https://github.com/openai/whisper). - The model usually performs well without requiring any finetuning. - The architecture follows a classic encoder-decoder architecture, which means that it relies on the [`~generation.GenerationMixin.generate`] function for inference. -- Inference is currently only implemented for short-form i.e. audio is pre-segmented into <=30s segments. Long-form (including timestamps) will be implemented in a future release. - One can use [`WhisperProcessor`] to prepare audio for the model, and decode the predicted ID's back into text. - To convert the model and the processor, we recommend using the following: From cc0510bdb014d5857144b7daa0b21c8934e3183f Mon Sep 17 00:00:00 2001 From: Niels Date: Thu, 15 Feb 2024 08:50:50 +0100 Subject: [PATCH 6/6] Remove pipeline tags --- docs/source/en/model_doc/depth_anything.md | 2 -- docs/source/en/model_doc/siglip.md | 2 -- 2 files changed, 4 deletions(-) diff --git a/docs/source/en/model_doc/depth_anything.md b/docs/source/en/model_doc/depth_anything.md index d7a2f62f86255a..99332697b38ef2 100644 --- a/docs/source/en/model_doc/depth_anything.md +++ b/docs/source/en/model_doc/depth_anything.md @@ -98,8 +98,6 @@ If you want to do the pre- and postprocessing yourself, here's how to do that: A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with Depth Anything. - - - [Monocular depth estimation task guide](../tasks/depth_estimation) - A notebook showcasing inference with [`DepthAnythingForDepthEstimation`] can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Depth%20Anything/Predicting_depth_in_an_image_with_Depth_Anything.ipynb). 🌎 diff --git a/docs/source/en/model_doc/siglip.md b/docs/source/en/model_doc/siglip.md index 8b488f8fb27482..c6db0441e7a694 100644 --- a/docs/source/en/model_doc/siglip.md +++ b/docs/source/en/model_doc/siglip.md @@ -98,8 +98,6 @@ If you want to do the pre- and postprocessing yourself, here's how to do that: A list of official Hugging Face and community (indicated by 🌎) resources to help you get started with SigLIP. - - - [Zero-shot image classification task guide](../tasks/zero_shot_image_classification_md) - Demo notebooks for SigLIP can be found [here](https://github.com/NielsRogge/Transformers-Tutorials/tree/master/SigLIP). 🌎