Add Prompt Depth Anything Model #35401

haotongl · 2024-12-23T17:15:09Z

What does this PR do?

This PR adds the Prompt Depth Anything Model. Prompt Depth Anything builds upon Depth Anything V2 and incorporates metric prompt depth to enable accurate and high-resolution metric depth estimation.

The implementation leverages Modular Transformers. The main file can be found here.

Before submitting

[ N/A] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ✅] Did you read the [contributor guideline] (https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md#create-a-pull-request),
Pull Request section?
[ N/A] Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
[ ✅] Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
[ ✅] Did you write any new necessary tests?

…nything

haotongl · 2024-12-24T04:21:29Z

@NielsRogge @qubvel @pcuenca Could you help review this PR when you have some time? Thanks so much in advance! Let me know if you have any questions or suggestions. 😊

docs/source/en/_toctree.yml

src/transformers/models/prompt_depth_anything/__init__.py

tests/models/prompt_depth_anything/test_modeling_prompt_depth_anything.py

qubvel · 2024-12-24T11:37:17Z

Hi @haotongl! Thanks for working on the model integration to transformers 🤗 I'm on holidays until Jan 3rd, and I'll do a review after that if it's still necessary.

docs/source/en/model_doc/prompt_depth_anything.md

src/transformers/models/prompt_depth_anything/modeling_prompt_depth_anything.py

docs/source/en/model_doc/prompt_depth_anything.md

Co-authored-by: Joshua Lochner <admin@xenova.com>

haotongl · 2025-01-03T16:56:07Z

Hi, @xenova @NielsRogge ! All suggestions have been addressed. Could you please take another look and provide any further suggestions, or go ahead and merge this PR? Thanks!

qubvel · 2025-02-04T09:47:21Z

Yes, only members can run slow tests

github-actions · 2025-02-04T09:48:05Z

This comment contains run-slow, running the specified jobs: ['models/prompt_depth_anything'] ...

qubvel · 2025-02-04T10:06:13Z

run-slow: prompt_depth_anything

github-actions · 2025-02-04T10:07:35Z

This comment contains run-slow, running the specified jobs: ['models/prompt_depth_anything'] ...

HuggingFaceDocBuilderDev · 2025-02-04T10:22:31Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

haotongl · 2025-02-04T10:44:44Z

run-slow: prompt_depth_anything

Thank you! All checks have passed. @qubvel

qubvel · 2025-02-04T10:51:55Z

Thanks for fixing the tests! Waiting for the final review form @ArthurZucker

qubvel · 2025-02-14T13:38:42Z

Friendly ping @ArthurZucker 🤗

haotongl · 2025-02-20T15:32:01Z

@ArthurZucker Hey, just a friendly ping to see if you’ve had a chance to look at the PR. Appreciate your help when you get a moment!

ArthurZucker · 2025-03-11T15:23:03Z

Super sorry i will get to this soon 😭

ArthurZucker

Kudos it's super clean 🤗
Thanks a lot @haotongl and @qubvel
Mostly missing a licence and good to go!

ArthurZucker · 2025-03-19T14:09:18Z

docs/source/en/model_doc/prompt_depth_anything.md

+>>> # visualize the prediction
+>>> predicted_depth = post_processed_output[0]["predicted_depth"]
+>>> depth = predicted_depth * 1000 
+>>> depth = depth.detach().cpu().numpy()
+>>> depth = Image.fromarray(depth.astype("uint16")) # mm


should this go in image_processor.visualize_detph_estimation directly? 🤗

ArthurZucker · 2025-03-19T14:10:02Z

src/transformers/models/prompt_depth_anything/convert_prompt_depth_anything_to_hf.py

+    }
+
+
+ORIGINAL_TO_CONVERTED_KEY_MAPPING = {


nice thanks! 🤗

ArthurZucker · 2025-03-19T14:10:21Z

src/transformers/models/prompt_depth_anything/image_processing_prompt_depth_anything.py

+if is_vision_available():
+    pass
+


Suggested change

if is_vision_available():

pass

ArthurZucker · 2025-03-19T14:11:36Z

src/transformers/models/prompt_depth_anything/modular_prompt_depth_anything.py

@@ -0,0 +1,379 @@
+from typing import List, Optional, Tuple, Union


this file is missing a licence! 🤗

ArthurZucker · 2025-03-19T14:20:20Z

A few small conflicts to fix

haotongl · 2025-03-19T15:48:52Z

Kudos it's super clean 🤗 Thanks a lot @haotongl and @qubvel Mostly missing a licence and good to go!

Thanks for your reviews! I am working on these suggestions.

qubvel · 2025-03-20T15:58:21Z

run-slow: prompt_depth_anything

github-actions · 2025-03-20T15:59:34Z

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/prompt_depth_anything']
quantizations: [] ...

haotongl · 2025-03-20T16:21:54Z

Thanks for supporting! @qubvel @ArthurZucker

qubvel · 2025-03-20T16:26:32Z

Thanks for adding the model @haotongl! Can you please update model cards in HF repos to include the transformers code snippet?

haotongl · 2025-03-21T12:24:32Z

Thanks for adding the model @haotongl! Can you please update model cards in HF repos to include the transformers code snippet?

@qubvel Yeah, could you please provide some example repos for reference?

qubvel · 2025-03-21T12:35:21Z

Sure, here is an example of the model card for another depth estimation model https://huggingface.co/apple/DepthPro-hf
You can make smth similar to the model doc page - https://huggingface.co/docs/transformers/main/en/model_doc/prompt_depth_anything

haotongl · 2025-03-21T16:11:49Z

@qubvel Thanks for your help! Should I open a new PR to update the doc?

qubvel · 2025-03-21T16:15:21Z

I mean for these models:

https://huggingface.co/depth-anything/prompt-depth-anything-vits-hf
https://huggingface.co/depth-anything/prompt-depth-anything-vitl-hf
https://huggingface.co/depth-anything/prompt-depth-anything-vits-transparent-hf

model cards have not a transformers library code snippet, but it should be similar to the one in docs

import torch
import requests
import numpy as np

from PIL import Image
from transformers import AutoImageProcessor, AutoModelForDepthEstimation

url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/image.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/prompt-depth-anything-vits-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/prompt-depth-anything-vits-hf")

prompt_depth_url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/arkit_depth.png?raw=true"
prompt_depth = Image.open(requests.get(prompt_depth_url, stream=True).raw)
# the prompt depth can be None, and the model will output a monocular relative depth.

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt", prompt_depth=prompt_depth)

with torch.no_grad():
    outputs = model(**inputs)

# interpolate to original size
post_processed_output = image_processor.post_process_depth_estimation(
    outputs,
    target_sizes=[(image.height, image.width)],
)

# visualize the prediction
predicted_depth = post_processed_output[0]["predicted_depth"]
depth = predicted_depth * 1000 
depth = depth.detach().cpu().numpy()
depth = Image.fromarray(depth.astype("uint16")) # mm

haotongl · 2025-03-21T17:12:58Z

https://huggingface.co/depth-anything/prompt-depth-anything-vits-hf
Hi, @qubvel , should it be like this?

qubvel · 2025-03-21T17:28:56Z

Yes, this one looks good, thanks!

haotongl added 5 commits December 23, 2024 20:57

add prompt depth anything model by modular transformer

24151d8

add prompt depth anything docs and imports

7e6dcaa

update code style according transformers doc

dfa7d67

update code style: import order issue is fixed by custom_init_isort

8509440

fix depth shape from B,1,H,W to B,H,W which is as the same as Depth A…

2fa72ef

…nything