Skip to content

[build] Add DocFX updater script#16980

Merged
titusfortner merged 6 commits intotrunkfrom
docfx_updater
Jan 23, 2026
Merged

[build] Add DocFX updater script#16980
titusfortner merged 6 commits intotrunkfrom
docfx_updater

Conversation

@titusfortner
Copy link
Member

@titusfortner titusfortner commented Jan 22, 2026

User description

Making sure dependencies we're pinning generally have a way to auto-update.

💥 What does this PR do?

Adds automated DocFX version updating for .NET documentation builds:

  • Add scripts/update_docfx.py to automatically fetch the latest DocFX release from GitHub
  • Integrate updater into dotnet/update-deps.sh
  • Bump DocFX to latest version

🔧 Implementation Notes

The script fetches release information from the GitHub API and updates dotnet/private/docfx_repo.bzl
with the latest version and SHA256 hash.

🔄 Types of changes

  • New feature (non-breaking change which adds functionality and tests!)

PR Type

Enhancement


Description

  • Add automated DocFX version updater script for NuGet packages

  • Fetch latest DocFX release and compute SHA256 hash automatically

  • Integrate updater into dotnet/update-deps.sh build workflow

  • Bump DocFX from 2.78.2 to 2.78.4


Diagram Walkthrough

flowchart LR
  A["NuGet API"] -->|fetch versions| B["update_docfx.py"]
  B -->|compute SHA256| C["docfx_repo.bzl"]
  D["update-deps.sh"] -->|invoke| B
  B -->|update| C
Loading

File Walkthrough

Relevant files
Enhancement
update_docfx.py
DocFX version updater script implementation                           

scripts/update_docfx.py

  • New Python script to fetch latest DocFX version from NuGet API
  • Computes SHA256 hash of downloaded nupkg file
  • Supports explicit version selection and prerelease filtering
  • Generates updated docfx_repo.bzl with version and hash
+141/-0 
Dependencies
docfx_repo.bzl
Update DocFX to version 2.78.4                                                     

dotnet/private/docfx_repo.bzl

  • Bump DocFX version from 2.78.2 to 2.78.4
  • Update SHA256 hash to match new version
+2/-2     
Configuration changes
update-deps.sh
Integrate DocFX updater into build workflow                           

dotnet/update-deps.sh

  • Add invocation of bazel run //scripts:update_docfx at end of script
  • Integrates DocFX updater into automated dependency update workflow
+2/-0     
BUILD.bazel
Add build target for DocFX updater                                             

scripts/BUILD.bazel

  • Add new py_binary target for update_docfx script
  • Declare dependency on packaging library for version parsing
+8/-0     

@titusfortner titusfortner requested a review from Copilot January 22, 2026 22:05
@selenium-ci selenium-ci added C-dotnet .NET Bindings B-build Includes scripting, bazel and CI integrations labels Jan 22, 2026
@titusfortner titusfortner changed the title Add DocFX updater script [build] Add DocFX updater script Jan 22, 2026
@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 22, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Supply chain update

Description: The new updater script performs network fetches and derives the version automatically from
remote NuGet index data (or user input) before downloading and hashing the corresponding
package, which introduces a potential supply-chain risk if the upstream metadata or
package source is compromised and the script is run automatically (e.g., via
dotnet/update-deps.sh).
update_docfx.py [21-135]

Referred Code
def fetch_json(url):
    with urllib.request.urlopen(url) as response:
        return json.loads(response.read())


def choose_version(versions, allow_prerelease, explicit_version=None):
    if explicit_version:
        return explicit_version

    parsed = []
    for v in versions:
        try:
            pv = Version(v)
        except InvalidVersion:
            continue
        if not allow_prerelease and pv.is_prerelease:
            continue
        parsed.append((pv, v))

    if not parsed:
        # Fall back to any parseable version.


 ... (clipped 94 lines)
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Comprehensive Audit Trails

Objective: To create a detailed and reliable record of critical system actions for security analysis
and compliance.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Network error handling: The new script makes external HTTP requests via urllib.request.urlopen() without timeouts
or contextual error handling, so transient failures may raise unhandled exceptions with
limited actionable context.

Referred Code
def fetch_json(url):
    with urllib.request.urlopen(url) as response:
        return json.loads(response.read())


def choose_version(versions, allow_prerelease, explicit_version=None):
    if explicit_version:
        return explicit_version

    parsed = []
    for v in versions:
        try:
            pv = Version(v)
        except InvalidVersion:
            continue
        if not allow_prerelease and pv.is_prerelease:
            continue
        parsed.append((pv, v))

    if not parsed:
        # Fall back to any parseable version.


 ... (clipped 21 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Output path validation: The --output argument is written directly via output_path.write_text(...) without
validation or safeguards, which could overwrite arbitrary files if the script is run with
an unexpected path.

Referred Code
parser.add_argument(
    "--output",
    default="dotnet/private/docfx_repo.bzl",
    help="Output file path (default: dotnet/private/docfx_repo.bzl)",
)
args = parser.parse_args()

index = fetch_json(NUGET_INDEX_URL)
versions = index.get("versions", [])
if not versions:
    raise ValueError("NuGet index returned no versions for DocFX")

version = choose_version(versions, args.allow_prerelease, args.version)
nupkg_url = NUGET_NUPKG_URL.format(version=version)
sha256 = sha256_of_url(nupkg_url)

output_path = Path(args.output)
if not output_path.is_absolute():
    workspace_dir = os.environ.get("BUILD_WORKSPACE_DIRECTORY")
    if workspace_dir:
        output_path = Path(workspace_dir) / output_path


 ... (clipped 2 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link
Contributor

qodo-code-review bot commented Jan 22, 2026

PR Code Suggestions ✨

Latest suggestions up to 406d8a8

CategorySuggestion                                                                                                                                    Impact
Possible issue
Verify download status and cleanup

In sha256_of_url, verify the HTTP status is 200 before hashing and use a
try...finally block to ensure release_conn() is always called.

scripts/update_docfx.py [53-59]

 def sha256_of_url(url):
     digest = hashlib.sha256()
-    r = http.request("GET", url, preload_content=False)
-    for chunk in r.stream(1024 * 1024):
-        digest.update(chunk)
-    r.release_conn()
-    return digest.hexdigest()
+    r = http.request(
+        "GET",
+        url,
+        preload_content=False,
+        timeout=urllib3.Timeout(connect=5.0, read=60.0),
+    )
+    try:
+        if r.status != 200:
+            raise RuntimeError(f"Failed to download {url} (HTTP {r.status})")
+        for chunk in r.stream(1024 * 1024):
+            digest.update(chunk)
+        return digest.hexdigest()
+    finally:
+        r.release_conn()
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: This suggestion fixes a potential bug where an incorrect SHA256 hash of an error page could be generated and written to a build file, and also prevents connection leaks.

Medium
Validate HTTP status for JSON

Add HTTP status code validation and a timeout to the fetch_json function to
handle network failures gracefully.

scripts/update_docfx.py [23-25]

 def fetch_json(url):
-    r = http.request("GET", url)
-    return json.loads(r.data)
+    r = http.request("GET", url, timeout=urllib3.Timeout(connect=5.0, read=30.0))
+    if r.status != 200:
+        raise RuntimeError(f"Failed to fetch JSON from {url} (HTTP {r.status})")
+    try:
+        return json.loads(r.data)
+    except json.JSONDecodeError as e:
+        raise RuntimeError(f"Invalid JSON from {url} (HTTP {r.status})") from e
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that the script lacks robustness by not checking HTTP status codes, which could lead to unhelpful errors if the NuGet API fails.

Medium
Make file writes atomic

Implement an atomic write operation by writing to a temporary file first and
then replacing the original file to prevent corruption.

scripts/update_docfx.py [132]

-output_path.write_text(render_docfx_repo(version, sha256))
+contents = render_docfx_repo(version, sha256)
+tmp_path = output_path.with_suffix(output_path.suffix + ".tmp")
+tmp_path.write_text(contents, encoding="utf-8")
+tmp_path.replace(output_path)
  • Apply / Chat
Suggestion importance[1-10]: 6

__

Why: The suggestion improves the script's robustness by proposing an atomic file write, which prevents file corruption if the script is interrupted.

Low
  • More

Previous suggestions

✅ Suggestions up to commit 894dde4
CategorySuggestion                                                                                                                                    Impact
High-level
Modify bzl file instead of overwriting

Instead of overwriting docfx_repo.bzl with a hardcoded template, the script
should read the file and use regular expressions to replace only the version and
sha256 values. This preserves other content and makes the update process more
robust.

Examples:

scripts/update_docfx.py [65-100]
def render_docfx_repo(version, sha256):
    return f'''\
"""Repository rule to download the docfx NuGet package."""

_BUILD = """
package(default_visibility = ["//visibility:public"])
exports_files(glob(["**/*"]))
filegroup(name = "docfx_dll", srcs = ["tools/net8.0/any/docfx.dll"])
"""


 ... (clipped 26 lines)

Solution Walkthrough:

Before:

# scripts/update_docfx.py

def render_docfx_repo(version, sha256):
    # Hardcoded template for the entire file
    return f'''\
...
def _docfx_extension_impl(module_ctx):
    docfx_repo(
        name = "docfx",
        version = "{version}",
        sha256 = "{sha256}",
    )
...
'''

def main():
    # ... fetch version and sha
    # Overwrites the entire file
    output_path.write_text(render_docfx_repo(version, sha256))

After:

# scripts/update_docfx.py
import re

def update_bzl_file(path, version, sha256):
    content = path.read_text()
    
    content = re.sub(
        r'(version = ")[^"]*(")',
        f'\\1{version}\\2',
        content
    )
    
    content = re.sub(
        r'(sha256 = ")[^"]*(")',
        f'\\1{sha256}\\2',
        content
    )
    
    path.write_text(content)

def main():
    # ... fetch version and sha
    update_bzl_file(output_path, version, sha256)
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a significant design flaw where overwriting docfx_repo.bzl is brittle; modifying only the necessary values would make the script more robust and maintainable.

Medium
Possible issue
Fix version selection to respect prerelease flag
Suggestion Impact:The commit removed the fallback that parsed "any version" when no stable versions were found, and replaced it with conditional ValueErrors that respect the allow_prerelease flag (stable-only error vs parseable-versions error). This prevents unintended prerelease selection.

code diff:

 def choose_version(versions, allow_prerelease, explicit_version=None):
     if explicit_version:
+        if explicit_version not in versions:
+            raise ValueError(f"Requested DocFX version {explicit_version!r} not found in NuGet index")
         return explicit_version
 
     parsed = []
@@ -38,27 +42,20 @@
         parsed.append((pv, v))
 
     if not parsed:
-        # Fall back to any parseable version.
-        for v in versions:
-            try:
-                parsed.append((Version(v), v))
-            except InvalidVersion:
-                continue
-
-    if not parsed:
-        raise ValueError("No parseable DocFX versions found in NuGet index")
+        if allow_prerelease:
+            raise ValueError("No parseable DocFX versions found in NuGet index")
+        else:
+            raise ValueError("No stable DocFX versions found. Use --allow-prerelease to include prereleases.")
 
     return max(parsed, key=lambda item: item[0])[1]

In the choose_version function, remove the fallback logic that ignores the
allow_prerelease flag to prevent unintentional selection of prerelease versions.
Instead, raise an error if no suitable versions are found.

scripts/update_docfx.py [26-51]

 def choose_version(versions, allow_prerelease, explicit_version=None):
     if explicit_version:
         return explicit_version
 
     parsed = []
     for v in versions:
         try:
             pv = Version(v)
         except InvalidVersion:
             continue
         if not allow_prerelease and pv.is_prerelease:
             continue
         parsed.append((pv, v))
 
     if not parsed:
-        # Fall back to any parseable version.
-        for v in versions:
-            try:
-                parsed.append((Version(v), v))
-            except InvalidVersion:
-                continue
-
-    if not parsed:
-        raise ValueError("No parseable DocFX versions found in NuGet index")
+        if allow_prerelease:
+            raise ValueError("No parseable DocFX versions found in NuGet index")
+        else:
+            raise ValueError("No stable DocFX versions found in NuGet index. Use --allow-prerelease to include them.")
 
     return max(parsed, key=lambda item: item[0])[1]

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a logic flaw where the allow_prerelease flag is ignored in the fallback path, potentially leading to an unintended prerelease version being selected.

Medium
Check explicit version validity
Suggestion Impact:The commit added an explicit_version validity check in choose_version(): if the requested version is not present in the NuGet index versions list, it raises a ValueError with a clear message before returning.

code diff:

 def choose_version(versions, allow_prerelease, explicit_version=None):
     if explicit_version:
+        if explicit_version not in versions:
+            raise ValueError(f"Requested DocFX version {explicit_version!r} not found in NuGet index")
         return explicit_version

Before returning an explicit_version, validate that it exists in the list of
available versions from the NuGet index to fail early with a clear error.

scripts/update_docfx.py [27-28]

 if explicit_version:
-    return explicit_version
+    if explicit_version in versions:
+        return explicit_version
+    else:
+        raise ValueError(f"Explicit version {explicit_version!r} not found in NuGet index")

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: This is a valuable suggestion for improving robustness by validating user input early, which provides a clearer error message than letting the script fail later during the download phase.

Medium
General
Handle HTTP errors during file download

In sha256_of_url, add a try...except block to handle potential HTTPError
exceptions during file download and provide a more informative error message.

scripts/update_docfx.py [54-62]

 def sha256_of_url(url):
     digest = hashlib.sha256()
-    with urllib.request.urlopen(url) as response:
-        while True:
-            chunk = response.read(1024 * 1024)
-            if not chunk:
-                break
-            digest.update(chunk)
+    try:
+        with urllib.request.urlopen(url) as response:
+            while True:
+                chunk = response.read(1024 * 1024)
+                if not chunk:
+                    break
+                digest.update(chunk)
+    except urllib.error.HTTPError as e:
+        raise ValueError(f"Failed to download from {url}: {e}") from e
     return digest.hexdigest()
Suggestion importance[1-10]: 6

__

Why: This suggestion improves the script's robustness by adding error handling for network download failures, which provides better feedback to the user than an unhandled exception.

Low
Handle HTTP errors gracefully

In the fetch_json function, wrap the urlopen call in a try...except block to
handle network failures and raise a more informative error.

scripts/update_docfx.py [21-23]

 def fetch_json(url):
-    with urllib.request.urlopen(url) as response:
-        return json.loads(response.read())
+    try:
+        with urllib.request.urlopen(url) as response:
+            return json.loads(response.read())
+    except Exception as e:
+        raise RuntimeError(f"Failed to fetch JSON from {url}: {e}") from e
Suggestion importance[1-10]: 6

__

Why: This suggestion improves the script's reliability by handling network errors when fetching the NuGet index, preventing the script from crashing and providing a clear error message.

Low
Learned
best practice
Validate CLI/env inputs before use
Suggestion Impact:The commit added stricter validation around explicit DocFX version selection by rejecting an explicit_version that is not present in the NuGet index, and improved error handling when no suitable (stable vs prerelease) versions are available. However, it did not implement the suggested CLI/env trimming/validation for --output or BUILD_WORKSPACE_DIRECTORY handling.

code diff:

 def choose_version(versions, allow_prerelease, explicit_version=None):
     if explicit_version:
+        if explicit_version not in versions:
+            raise ValueError(f"Requested DocFX version {explicit_version!r} not found in NuGet index")
         return explicit_version
 
     parsed = []
@@ -38,27 +42,20 @@
         parsed.append((pv, v))
 
     if not parsed:
-        # Fall back to any parseable version.
-        for v in versions:
-            try:
-                parsed.append((Version(v), v))
-            except InvalidVersion:
-                continue
-
-    if not parsed:
-        raise ValueError("No parseable DocFX versions found in NuGet index")
+        if allow_prerelease:
+            raise ValueError("No parseable DocFX versions found in NuGet index")
+        else:
+            raise ValueError("No stable DocFX versions found. Use --allow-prerelease to include prereleases.")
 

Trim and validate --version/--output and require/validate
BUILD_WORKSPACE_DIRECTORY (or explicitly define a fallback) so the script
doesn't write to an unexpected relative path or accept invalid versions.

scripts/update_docfx.py [126-135]

-version = choose_version(versions, args.allow_prerelease, args.version)
+explicit_version = args.version.strip() if args.version else None
+if explicit_version:
+    try:
+        Version(explicit_version)
+    except InvalidVersion as e:
+        raise ValueError(f"Invalid --version: {explicit_version}") from e
+
+version = choose_version(versions, args.allow_prerelease, explicit_version)
 nupkg_url = NUGET_NUPKG_URL.format(version=version)
 sha256 = sha256_of_url(nupkg_url)
 
-output_path = Path(args.output)
+output_arg = (args.output or "").strip()
+if not output_arg:
+    raise ValueError("--output must be a non-empty path")
+output_path = Path(output_arg)
 if not output_path.is_absolute():
-    workspace_dir = os.environ.get("BUILD_WORKSPACE_DIRECTORY")
-    if workspace_dir:
-        output_path = Path(workspace_dir) / output_path
+    workspace_dir = (os.environ.get("BUILD_WORKSPACE_DIRECTORY") or "").strip()
+    if not workspace_dir:
+        raise EnvironmentError("BUILD_WORKSPACE_DIRECTORY is required when --output is a relative path")
+    output_path = Path(workspace_dir) / output_path
 output_path.write_text(render_docfx_repo(version, sha256))

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why:
Relevant best practice - Add explicit validation and availability guards at integration boundaries (e.g., environment variables, CLI inputs, network calls) before use.

Low

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automated DocFX version updating for .NET documentation builds. The script fetches the latest DocFX release from NuGet (not GitHub as stated in the description), computes its SHA256 hash, and updates the Bazel repository rule configuration file.

Changes:

  • Add scripts/update_docfx.py to automatically fetch and update DocFX versions from NuGet
  • Add py_binary target in scripts/BUILD.bazel for the new script
  • Integrate the updater into dotnet/update-deps.sh workflow
  • Update DocFX from version 2.78.2 to 2.78.4 with corresponding SHA256

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
scripts/update_docfx.py New Python script that fetches DocFX version info from NuGet API and generates updated Bazel configuration
scripts/BUILD.bazel Adds py_binary target for update_docfx with packaging dependency
dotnet/update-deps.sh Integrates DocFX updater into the .NET dependency update workflow
dotnet/private/docfx_repo.bzl Updates DocFX version from 2.78.2 to 2.78.4 with new SHA256 hash

titusfortner and others added 2 commits January 22, 2026 18:20
- Validate explicit version exists in NuGet index before use
- Remove fallback that ignored --allow-prerelease flag

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Validate explicit version exists in NuGet index before use
- Remove fallback that ignored --allow-prerelease flag
- Switch from urllib.request to urllib3 for consistency with other scripts

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@qodo-code-review
Copy link
Contributor

Persistent suggestions updated to latest commit 406d8a8

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

This was referenced Feb 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

B-build Includes scripting, bazel and CI integrations C-dotnet .NET Bindings Review effort 2/5

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants