Visualization of cost map #40

Kimsure · 2024-10-15T08:06:30Z

For the aggregated cost volume, we show the output of our model, hence has a higher resolution of 96x96. We simply apply bilinear upsampling to overlay with the image.

I don't have the code at the moment, but the visualized figures are normalized with min-max with some scaling for visual clarity, as the model output does not necessarily match the scale with the initial cost volume. This would probably be enough to reproduce the figure, but please let me know if you need more details.

Originally posted by @hsshin98 in #6 (comment)

Do you have the code to visualize the cost map now ? How to reproduce it

Anglejuebi · 2024-11-13T10:55:51Z

Hello, I just started to learn computer vision. I tried to write a cost volume visualization code by myself, but the effect is not very good. Have you realized the visualization of the cost volume? Or do you have any other ideas, thank you.

Kimsure · 2024-11-16T06:07:55Z

Hi, I referred to the visualization method for image-text similarity in CLIP-surgery. Since cost_map is the result of the cosine similarity between images and text, I think overlaying cost_map onto the original image is the visualization result.

BTW, would you consider releasing your codes as we can take a look together to see how it’s done？

Anglejuebi · 2024-11-16T07:01:11Z

Hello sir, thank you very much for your reply, my method is very simple, first of all, image cropping, after CLIP's image and text encoder, calculate the cosine similarity of each local map and text prompt respectively, to generate the heat map, but this is very easy to be affected by the color prompt, e.g. a woman's hair will also be hotter with ‘Black’ in the text prompt. thank you for the reminder, I'm going to learn about CLIP_Surgery's implementation next! thanks again for your reply, have a nice day!

import torch
import clip
import numpy as np
import cv2
import matplotlib.pyplot as plt
from PIL import Image
from tqdm import tqdm  # Progress bar library
import torch.nn.functional as F  # Import cosine similarity calculation function
import os
import datetime

# Load the CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

# Load and preprocess the image
def load_image(image_path, target_size=(336, 336)):
    image = Image.open(image_path).convert("RGB")
    # Resize the image to the target size
    image = image.resize(target_size, Image.ANTIALIAS)  # Resize to 336x336
    image = preprocess(image).unsqueeze(0).to(device)
    return image

# Generate heatmap
def generate_heatmap(image_path, text_prompt, grid_size=8):
    # Load and resize the image to 336x336
    image = Image.open(image_path).convert("RGB")
    target_size = (336, 336)
    image = image.resize(target_size, Image.ANTIALIAS)
    image_width, image_height = image.size

    # Text processing
    text = clip.tokenize([text_prompt]).to(device)

    # Get text features
    text_features = model.encode_text(text)

    # Initialize the heatmap matrix
    heatmap = np.zeros((image_height // grid_size, image_width // grid_size))

    # Use tqdm progress bar
    for i in tqdm(range(0, image_height, grid_size), desc="Processing Rows"):
        for j in range(0, image_width, grid_size):
            # Calculate row and column indices
            row_idx = i // grid_size
            col_idx = j // grid_size

            # Check if out of bounds
            if row_idx >= heatmap.shape[0] or col_idx >= heatmap.shape[1]:
                continue  # Skip out-of-bound parts

            # Crop each small block of the image
            cropped_image = image.crop((j, i, min(j + grid_size, image_width), min(i + grid_size, image_height)))
            cropped_image_tensor = preprocess(cropped_image).unsqueeze(0).to(device)

            # Get image block features
            image_features = model.encode_image(cropped_image_tensor)

            # Calculate cosine similarity
            similarity = F.cosine_similarity(image_features, text_features)

            # Fill the heatmap matrix
            heatmap[row_idx, col_idx] = similarity.item()

    # Print min and max values
    print("Heatmap min:", heatmap.min())
    print("Heatmap max:", heatmap.max())

    # Apply logarithmic transformation to amplify differences
    heatmap = np.log(1 + heatmap)

    # Normalize the heatmap
    heatmap = (heatmap - heatmap.min()) / (heatmap.max() - heatmap.min())  # Normalize to [0, 1]

    print("Normalized heatmap min:", heatmap.min())
    print("Normalized heatmap max:", heatmap.max())

    # Map the heatmap to the [0, 255] range and convert to uint8 type
    heatmap = np.uint8(255 * heatmap)

    return heatmap, image  # Return the heatmap and the original image

# Visualize the heatmap
def visualize_heatmap(image_path, heatmap, original_image):
    # Ensure the output folder exists
    output_folder = "output"
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    # Get the current system time
    current_time = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")

    # Save the heatmap image
    heatmap_path = os.path.join(output_folder, f"heatmap_{current_time}.png")
    plt.figure(figsize=(10, 10))
    plt.imshow(heatmap, cmap='jet', alpha=0.6)
    plt.axis('off')
    plt.title('Heatmap')
    plt.savefig(heatmap_path, bbox_inches='tight', pad_inches=0)
    plt.close()

    # Overlay the heatmap on the original image
    image = cv2.cvtColor(np.array(original_image), cv2.COLOR_RGB2BGR)  # Convert the original image to BGR format
    heatmap_resized = cv2.resize(heatmap, (image.shape[1], image.shape[0]))  # Resize the heatmap
    heatmap_colored = cv2.applyColorMap(heatmap_resized, cv2.COLORMAP_JET)

    # Overlay the heatmap on the original image
    superimposed_img = cv2.addWeighted(image, 0.7, heatmap_colored, 0.3, 0)

    # Save the superimposed image
    superimposed_path = os.path.join(output_folder, f"superimposed_{current_time}.png")
    cv2.imwrite(superimposed_path, superimposed_img)

# Main function
if __name__ == "__main__":
    image_path = "images/img.png"  # Replace with the actual image path
    text_prompt = "A photo of a Black Cat in the scene"  # Replace with your text prompt

    heatmap, original_image = generate_heatmap(image_path, text_prompt)
    visualize_heatmap(image_path, heatmap, original_image)

Here is original image

Here is Heat Map

Kimsure · 2024-11-18T08:40:58Z

Hi, thanks for your released code. Regarding your questions:

Several studies have observed that directly applying CLIP to dense prediction tasks (e.g., segmentation) often yields suboptimal object grounding results.
I’m unsure about the size of your input image, but the blocky artifacts in the heat map could potentially be caused by upsampling.

Anglejuebi · 2024-11-19T08:26:40Z

Thank you for your reply and answer. I am still in the process of learning. Have a nice day🌹🌹

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Visualization of cost map #40

Visualization of cost map #40

Kimsure commented Oct 15, 2024

Anglejuebi commented Nov 13, 2024 •

edited

Loading

Kimsure commented Nov 16, 2024

Anglejuebi commented Nov 16, 2024 •

edited

Loading

Kimsure commented Nov 18, 2024

Anglejuebi commented Nov 19, 2024

Visualization of cost map #40

Visualization of cost map #40

Comments

Kimsure commented Oct 15, 2024

Anglejuebi commented Nov 13, 2024 • edited Loading

Kimsure commented Nov 16, 2024

Anglejuebi commented Nov 16, 2024 • edited Loading

Here is original image

Here is Heat Map

Kimsure commented Nov 18, 2024

Anglejuebi commented Nov 19, 2024

Anglejuebi commented Nov 13, 2024 •

edited

Loading

Anglejuebi commented Nov 16, 2024 •

edited

Loading