AnacondaRecipes · JamesRobertsonGames · Aug 13, 2024 · Jun 14, 2024 · Jun 14, 2024 · Jun 18, 2024
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 # repodata-hotfixes
 ## Changes to package metadata to fix behavior
 
-When packages are created, authors do their best to specify constraints that make their package work.  Sometimes things change, and their constraints are not accurate for making things work.  This results in broken environments.  People need to be able to patch the package metadata long after the packages are built, so that we can prevent conda from creating broken environments.  This repository holds python scripts that generate JSON files, which are then applied on top of the repodata.json index files that are generated from the original package content.
+When packages are created, authors do their best to specify constraints that make their package work. Sometimes things change, and their constraints are not accurate for making things work. This results in broken environments. People need to be able to patch the package metadata long after the packages are built, so that we can prevent conda from creating broken environments. This repository holds python scripts that generate JSON files, which are then applied on top of the repodata.json index files that are generated from the original package content.
 
 ## Things that may require a metadata hotfix:
 
@@ -13,25 +13,25 @@ When packages are created, authors do their best to specify constraints that mak
 
 ### Dependency and Constraint updates
 
-Changing dependencies and constraints is the primary reason hotfixes are applied.  Their
+Changing dependencies and constraints is the primary reason hotfixes are applied. Their
 may be reasons why you need to change a longstanding package but rebuilding may not be
-feasible or perhaps not worth the time.  By changing dependencies and constraints,
+feasible or perhaps not worth the time. By changing dependencies and constraints,
 the data used to solve for dependencies can be modified and leave the larger ecosystem
 unharmed.
 
-NOTE: Hotfixes are applied in a overwrite manner.  So any changes are implemented
+NOTE: Hotfixes are applied in a overwrite manner. So any changes are implemented
 will effect the the entire dependency or constraint list (i.e. If someone
 changes one out of the ten dependency for a single package, all ten will still should be in the
 "patch-instructions" as patching is an overwriting operation).
 
 ### Removal
 
-Adding a package to the removal list will remove the entire entry from the repodata.json.  It will no longer be searchable by conda search.
+Adding a package to the removal list will remove the entire entry from the repodata.json. It will no longer be searchable by conda search.
 
 We should put things on the remove list when:
 - We need a quick fix to stop consumers from downloading a bad package.
 
-Another approach might be to move the package into broken package directory (see directions in perseverance-skills).  This will cause it not to be indexed in the first place.
+Another approach might be to move the package into broken package directory (see directions in perseverance-skills). This will cause it not to be indexed in the first place.
 
 ### Revoked
 
@@ -45,12 +45,76 @@ We should put things on the revoke list when:
 - We feel we want a customer to still have access but not the whole consumer population by default
 - ?
 
+## Numpy 2.0 Compatibility Checks and Updates
+
+### Running numpy2.py
+
+The `numpy2.py` script is used to check and update package dependencies for compatibility with numpy 2.0. To run the script, use the following command:
+
+```
+python numpy2.py
+```
+
+### What numpy2.py does
+
+`numpy2.py` performs the following tasks:
+1. Scans through the repodata for packages depending on numpy.
+2. Checks if these dependencies need updates to ensure compatibility with numpy 2.0.
+3. Proposes changes to add upper bounds to numpy dependencies where necessary.
+4. Generates a `numpy2_patch.json` file containing all proposed changes.
+
+### When to use numpy2.py
+
+Use `numpy2.py` when:
+- Preparing for a major numpy version update (e.g., transitioning to numpy 2.0).
+- You need to audit and update numpy dependencies across many packages.
+- You want to ensure compatibility of the ecosystem with upcoming numpy versions.
+
+### Running main.py with proposed_numpy_changes.json
+
+After running `numpy2.py`, you'll have a `numpy2_patch.json` file. To apply these changes:
+
+1. Ensure `numpy2_patch.json` is in the same directory as `main.py`.
+2. Run `main.py` as usual:
+
+```
+python main.py
+```
+
+`main.py` will automatically detect and incorporate the changes from `numpy2_patch.json` into the hotfix process.
+
+## Reviewing CSV Updates
+
+After running `numpy2.py` or `main.py`, CSV files are generated containing detailed information about the proposed changes. To review these updates:
+
+1. Locate the generated CSV files in your working directory. They will be named according to the type of update, e.g., `dep_numpy2_updates.csv`, `constr_numpy2_updates.csv`.
+
+2. For a quick review, you can open these files with any spreadsheet application on your local machine.
+
+3. For a more collaborative review or to share the updates with your team, you can upload the CSV files to a cloud-based service:
+
+ - Google Sheets: 
+ 1. Go to [Google Sheets](https://sheets.google.com).
+ 2. Click on "Blank" to create a new spreadsheet.
+ 3. Go to File > Import > Upload and select your CSV file.
+ 4. Choose your import options and click "Import data".
+
+4. Once uploaded, you can easily sort, filter, and analyze the proposed changes. Look for:
+ - Packages affected
+ - Types of changes (e.g., adding upper bounds, modifying existing bounds)
+ - Reasons for changes
+
+5. Use this review to make informed decisions about which changes to approve or modify before applying the hotfixes.
+
+Remember to handle these CSVs securely, especially if they contain sensitive package information.
+
 ## Utility scripts:
+
 ### Seeing current hotfixes with `gen-current-hotfix-report.py`:
 
-It can be quite difficult to grok what the hotfix scripts are doing.  The script, `gen-current-hotfix-report.py`, attempts to make it easier to see what the current state of the applied hotfixes looks like.
+It can be quite difficult to grok what the hotfix scripts are doing. The script, `gen-current-hotfix-report.py`, attempts to make it easier to see what the current state of the applied hotfixes looks like.
 
-The script downloads the current repodata.  It then shows you a diff.  Example usage of this script:
+The script downloads the current repodata. It then shows you a diff. Example usage of this script:
 
 ```
 python gen-current-hotfix-report.py main --subdir linux-64 osx-64 win-64 osx-arm64 linux-ppc64le linux-aarch64 linux-s390x noarch
@@ -60,16 +124,15 @@ For repeated runs add `--use-cache` to avoid downloading the repodata files.
 
 ### Testing hotfixes with `test-hotfix.py`:
 
-The script, `test-hotfix.py`, downloads the current repodata and runs your instructions against it. It then shows you a diff.
-
-This useful for testing out changes before they are committed and deployed. This will show differences in current state of hotfixes
+The script, `test-hotfix.py`, downloads the current repodata and runs your instructions against it. It then shows you a diff.
+This useful for testing out changes before they are committed and deployed. This will show differences in current state of hotfixes
 and the ones you are working on.
 
 Example usage of this script:
+
 ```
 python test-hotfix.py main --subdir linux-64 osx-64 win-64 osx-arm64 linux-ppc64le linux-aarch64 linux-s390x noarch
 ```
 
 Use the `--color` or `--show-pkgs` options for different outputs.
-
-For repeated runs add `--use-cache` to avoid downloading the repodata files.
+For repeated runs add `--use-cache` to avoid downloading the repodata files.
diff --git a/main.py b/main.py
@@ -7,10 +7,21 @@
 import sys
 from collections import defaultdict
 from os.path import dirname, isdir, isfile, join
-
 from conda.models.version import VersionOrder
-
+import csv
 import requests
+import logging
+
+# Global dictionary to store data for CSV output
+csv_data = defaultdict(list)
+
+# Configure the logging
+logging.basicConfig(level=logging.DEBUG,
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+ handlers=[logging.FileHandler('hotfixes.log', mode='w'),
+ logging.StreamHandler()])
+# Create a logger object
+logger = logging.getLogger(__name__)
 
 CHANNEL_NAME = "main"
 CHANNEL_ALIAS = "https://repo.anaconda.com/pkgs"
@@ -261,6 +272,95 @@
 ]
 
 
+def load_numpy2_changes():
+ try:
+ with open('numpy2_patch.json', 'r') as f:
+ return json.load(f)
+ except FileNotFoundError:
+ logger.error("numpy2_patch.json not found. Aborting hotfixes.")
+ sys.exit(1)
+
+
+NUMPY_2_CHANGES = load_numpy2_changes()
-NUMPY_2_CHANGES = load_numpy2_changes()
+NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text())
-NUMPY_2_CHANGES = load_numpy2_changes()
+NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text())
+
+
+def apply_numpy2_changes(record, subdir, filename):
+ """
+ Applies predefined numpy changes to a record based on its directory and filename.
+
+ Parameters:
+ - record: The record to update.
+ - subdir: The subdirectory of the record.
+ - filename: The filename of the record.
+ """
+ if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
- if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
+ subdir_changes = NUMPY_2_CHANGES.get(subdir, set())
+ if filename not in subdir_changes:
- if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
+ subdir_changes = NUMPY_2_CHANGES.get(subdir, set())
+ if filename not in subdir_changes:
+ return
+ changes = NUMPY_2_CHANGES[subdir][filename]
+ for change in changes:
+ depends = _get_dependency_list(record, change['type'])
+ if depends is None:
+ continue
+ _apply_changes_to_dependencies(depends, change, record, filename, 'type')
+
+
+def _get_dependency_list(record, change_type):
+ """
+ Returns the appropriate dependency list based on the change type.
+
+ Parameters:
+ - record (dict): The record containing dependency information.
+ - change_type (str): The type of change ('dep' for dependencies, 'constr' for constraints).
+
+ Returns:
+ - list: The list of dependencies or constraints based on the change type, None if the change type is unrecognized.
+ """
+ if change_type == 'dep':
+ return record['depends']
+ elif change_type == 'constr':
+ return record.get('constrains', [])
+ return None
+
+
+def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
-def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
+def _apply_changes_to_dependencies(depends_or_constraints, change, record, filename, sort_type='reason'):
-def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
+def _apply_changes_to_dependencies(depends_or_constraints, change, record, filename, sort_type='reason'):
+ """
+ Applies changes to dependencies and logs the changes.
+
+ Parameters:
+ - depends (list): The list of dependencies to be modified.
+ - change (dict): A dict containing the original dependency, the updated dependency, the reason for the change.
+ - record (dict): The record to which the changes apply.
+ - filename (str): The name of the file being processed.
+ - sort_type (str, optional): The key in the 'change' dictionary to sort the CSV data by. Defaults to 'reason'.
+ """
+ for i, dep in enumerate(depends):
- for i, dep in enumerate(depends):
+ replace_dep(depends, change["original"], change["updated"])
- for i, dep in enumerate(depends):
+ replace_dep(depends, change["original"], change["updated"])
+ if dep == change['original']:
+ depends[i] = change['updated']
+ if change['reason'] == 'Upper bound added':
+ logger.info(f"Applied numpy change for {filename}: {change['original']} -> {change['updated']}")
+ # Add to csv_data for later CSV export
+ csv_data[change[sort_type]].append([
+ record['name'], record['version'], record['build'],
+ record['build_number'], change['original'],
+ change['updated'], change['reason']
+ ])
+
+
+def write_csv():
+ """
+ Writes update data to CSV files in the 'updates' directory.
+ """
+ if not os.path.exists("updates"):
+ os.makedirs("updates")
+
+ for issue_type, data in csv_data.items():
+ with open(f"updates/{issue_type}_numpy2_updates.csv", 'w', newline='') as csvfile:
+ csv.writer(csvfile).writerow(['Package', 'Version',
+ 'Build', 'Build Number',
+ 'Original Dependency', 'Updated Dependency',
+ 'Reason'])
+ csv.writer(csvfile).writerows(data)
+
+
 def _replace_vc_features_with_vc_pkg_deps(name, record, depends):
  python_vc_deps = {
  "2.6": "vc 9.*",
@@ -671,6 +771,9 @@ def patch_record_in_place(fn, record, subdir):
  depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,")
  break
 
+ if NUMPY_2_CHANGES is not {}:
- if NUMPY_2_CHANGES is not {}:
+ if NUMPY_2_CHANGES:
- if NUMPY_2_CHANGES is not {}:
+ if NUMPY_2_CHANGES:
+ apply_numpy2_changes(record, subdir, fn)
+
  ###########
  # pytorch #
  ###########
@@ -734,7 +837,6 @@ def patch_record_in_place(fn, record, subdir):
  ######################
  # scipy dependencies #
  ######################
-
  # scipy 1.8 and 1.9 introduce breaking API changes impacting these packages
  if name == "theano":
  if version in ["1.0.4", "1.0.5"]:
@@ -969,7 +1071,6 @@ def patch_record_in_place(fn, record, subdir):
 
  # kealib 1.4.8 changed sonames, add new upper bound to existing packages
  replace_dep(depends, "kealib >=1.4.7,<1.5.0a0", "kealib >=1.4.7,<1.4.8.0a0")
-
  # Other broad replacements
  for i, dep in enumerate(depends):
  # glib is compatible up to the major version
@@ -1534,6 +1635,8 @@ def do_hotfixes(base_dir):
 def main():
  base_dir = join(dirname(__file__), CHANNEL_NAME)
  do_hotfixes(base_dir)
+ if NUMPY_2_CHANGES != {}:
+ write_csv()
 
 
 if __name__ == "__main__":