Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prompt Depth Anything Model #35401

Merged
merged 61 commits into from
Mar 20, 2025

Conversation

haotongl
Copy link
Contributor

What does this PR do?

This PR adds the Prompt Depth Anything Model. Prompt Depth Anything builds upon Depth Anything V2 and incorporates metric prompt depth to enable accurate and high-resolution metric depth estimation.

The implementation leverages Modular Transformers. The main file can be found here.

Before submitting

@haotongl
Copy link
Contributor Author

haotongl commented Dec 24, 2024

@NielsRogge @qubvel @pcuenca Could you help review this PR when you have some time? Thanks so much in advance! Let me know if you have any questions or suggestions. 😊

@qubvel
Copy link
Member

qubvel commented Dec 24, 2024

Hi @haotongl! Thanks for working on the model integration to transformers 🤗 I'm on holidays until Jan 3rd, and I'll do a review after that if it's still necessary.

@haotongl haotongl requested review from NielsRogge and xenova January 2, 2025 16:57
@haotongl
Copy link
Contributor Author

haotongl commented Jan 3, 2025

Hi, @xenova @NielsRogge ! All suggestions have been addressed. Could you please take another look and provide any further suggestions, or go ahead and merge this PR? Thanks!

@qubvel
Copy link
Member

qubvel commented Feb 4, 2025

Yes, only members can run slow tests

Copy link

github-actions bot commented Feb 4, 2025

This comment contains run-slow, running the specified jobs: ['models/prompt_depth_anything'] ...

@qubvel
Copy link
Member

qubvel commented Feb 4, 2025

run-slow: prompt_depth_anything

Copy link

github-actions bot commented Feb 4, 2025

This comment contains run-slow, running the specified jobs: ['models/prompt_depth_anything'] ...

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@haotongl
Copy link
Contributor Author

haotongl commented Feb 4, 2025

run-slow: prompt_depth_anything

Thank you! All checks have passed. @qubvel

@qubvel
Copy link
Member

qubvel commented Feb 4, 2025

Thanks for fixing the tests! Waiting for the final review form @ArthurZucker

@qubvel
Copy link
Member

qubvel commented Feb 14, 2025

Friendly ping @ArthurZucker 🤗

@haotongl
Copy link
Contributor Author

@ArthurZucker Hey, just a friendly ping to see if you’ve had a chance to look at the PR. Appreciate your help when you get a moment!

@ArthurZucker
Copy link
Collaborator

Super sorry i will get to this soon 😭

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kudos it's super clean 🤗
Thanks a lot @haotongl and @qubvel
Mostly missing a licence and good to go!

Comment on lines +67 to +71
>>> # visualize the prediction
>>> predicted_depth = post_processed_output[0]["predicted_depth"]
>>> depth = predicted_depth * 1000
>>> depth = depth.detach().cpu().numpy()
>>> depth = Image.fromarray(depth.astype("uint16")) # mm
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this go in image_processor.visualize_detph_estimation directly? 🤗

}


ORIGINAL_TO_CONVERTED_KEY_MAPPING = {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice thanks! 🤗

Comment on lines 53 to 55
if is_vision_available():
pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if is_vision_available():
pass

@@ -0,0 +1,379 @@
from typing import List, Optional, Tuple, Union
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this file is missing a licence! 🤗

@ArthurZucker
Copy link
Collaborator

A few small conflicts to fix

@haotongl
Copy link
Contributor Author

Kudos it's super clean 🤗 Thanks a lot @haotongl and @qubvel Mostly missing a licence and good to go!

Thanks for your reviews! I am working on these suggestions.

@qubvel qubvel self-requested a review March 20, 2025 15:51
@qubvel
Copy link
Member

qubvel commented Mar 20, 2025

run-slow: prompt_depth_anything

Copy link

This comment contains run-slow, running the specified jobs: This comment contains run-slow, running the specified jobs:

models: ['models/prompt_depth_anything']
quantizations: [] ...

@qubvel qubvel merged commit 6515c25 into huggingface:main Mar 20, 2025
23 of 24 checks passed
@haotongl
Copy link
Contributor Author

Thanks for supporting! @qubvel @ArthurZucker

@qubvel
Copy link
Member

qubvel commented Mar 20, 2025

Thanks for adding the model @haotongl! Can you please update model cards in HF repos to include the transformers code snippet?

@haotongl
Copy link
Contributor Author

haotongl commented Mar 21, 2025

Thanks for adding the model @haotongl! Can you please update model cards in HF repos to include the transformers code snippet?

@qubvel Yeah, could you please provide some example repos for reference?

@qubvel
Copy link
Member

qubvel commented Mar 21, 2025

Sure, here is an example of the model card for another depth estimation model https://huggingface.co/apple/DepthPro-hf
You can make smth similar to the model doc page - https://huggingface.co/docs/transformers/main/en/model_doc/prompt_depth_anything

@haotongl
Copy link
Contributor Author

@qubvel Thanks for your help! Should I open a new PR to update the doc?

@qubvel
Copy link
Member

qubvel commented Mar 21, 2025

I mean for these models:

https://huggingface.co/depth-anything/prompt-depth-anything-vits-hf
https://huggingface.co/depth-anything/prompt-depth-anything-vitl-hf
https://huggingface.co/depth-anything/prompt-depth-anything-vits-transparent-hf

model cards have not a transformers library code snippet, but it should be similar to the one in docs

import torch
import requests
import numpy as np

from PIL import Image
from transformers import AutoImageProcessor, AutoModelForDepthEstimation

url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/image.jpg?raw=true"
image = Image.open(requests.get(url, stream=True).raw)

image_processor = AutoImageProcessor.from_pretrained("depth-anything/prompt-depth-anything-vits-hf")
model = AutoModelForDepthEstimation.from_pretrained("depth-anything/prompt-depth-anything-vits-hf")

prompt_depth_url = "https://github.com/DepthAnything/PromptDA/blob/main/assets/example_images/arkit_depth.png?raw=true"
prompt_depth = Image.open(requests.get(prompt_depth_url, stream=True).raw)
# the prompt depth can be None, and the model will output a monocular relative depth.

# prepare image for the model
inputs = image_processor(images=image, return_tensors="pt", prompt_depth=prompt_depth)

with torch.no_grad():
    outputs = model(**inputs)

# interpolate to original size
post_processed_output = image_processor.post_process_depth_estimation(
    outputs,
    target_sizes=[(image.height, image.width)],
)

# visualize the prediction
predicted_depth = post_processed_output[0]["predicted_depth"]
depth = predicted_depth * 1000 
depth = depth.detach().cpu().numpy()
depth = Image.fromarray(depth.astype("uint16")) # mm

@haotongl
Copy link
Contributor Author

@qubvel
Copy link
Member

qubvel commented Mar 21, 2025

Yes, this one looks good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants