Skip to content

Conversation

@podarok
Copy link

@podarok podarok commented Dec 30, 2025

Summary

Adds progress_format support to huggingface_hub, enabling machine-readable JSON progress output similar to huggingface/tokenizers#1921 and huggingface/datasets#7920.

Motivation

When using huggingface_hub in automated pipelines, web backends, or UI applications, it is useful to emit machine-readable progress instead of ANSI progress bars. This PR adds the same progress_format option that was implemented in tokenizers and datasets.

Changes

New Functions

  • set_progress_format(format: str): Set global progress format
  • get_progress_format() -> str: Get current progress format

Supported Formats

  1. "tqdm" (default): Interactive progress bars (current behavior, unchanged)
  2. "json": Machine-readable JSON lines to stderr
  3. "silent": No output

JSON Format

When progress_format="json", emits JSON every 5% progress change or at completion:

{"stage": "Downloading model.safetensors", "current": 1024, "total": 4096, "percent": 25.0}

Usage Example

from huggingface_hub import hf_hub_download
from huggingface_hub.utils import set_progress_format

# Enable JSON output
set_progress_format("json")

# Progress will now be emitted as JSON lines to stderr
hf_hub_download("gpt2", "config.json")
# Outputs: {"stage":"config.json","current":665,"total":665,"percent":100.0}

Implementation Details

  • Suppresses visual output using io.StringIO() when format is "json"
  • Keeps progress tracking active (unlike disable=True)
  • Emits JSON to stderr every 5% progress change
  • Exports new functions from huggingface_hub.utils

Backward Compatibility

  • Default behavior is "tqdm" - identical to current behavior
  • Existing code works without any changes
  • New option is opt-in only

Cross-Reference

This implementation mirrors the approach from:

Add set_progress_format() and get_progress_format() functions to control
progress output format:
- "tqdm" (default): Interactive progress bars
- "json": Machine-readable JSON lines to stderr
- "silent": No progress output

When format is "json", emits progress every 5% as:
{"stage":"Downloading file","current":1024,"total":4096,"percent":25.0}

Similar to huggingface/tokenizers#1921 and huggingface/datasets#7920
@Wauplin
Copy link
Contributor

Wauplin commented Jan 5, 2026

Hi @podarok, do you have an exact use case for that? If yes, have you checked the tqdm_class argument in hf_hub_download and snapshot_download that allows you to pass a custom tqdm class. This is usually what people use when they want to integrate in an exactly UI/App (see #1110)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants