Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add numpy 2 hotfix to main #227

Merged
merged 48 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
7f0f367
add numpy 2 hotfix to main
JamesRobertsonGames Jun 14, 2024
39ec561
ensure the lint passes
JamesRobertsonGames Jun 14, 2024
013d769
Apply suggestions from code review
JamesRobertsonGames Jun 18, 2024
b011dcc
re-add numpy old hotfix
JamesRobertsonGames Jun 18, 2024
e38027f
ensure find packages with lock without ==
JamesRobertsonGames Jun 18, 2024
67417d8
propogate find changes to be made
JamesRobertsonGames Jun 18, 2024
0a93cab
ensure correct application of upper bound hotfix
JamesRobertsonGames Jun 19, 2024
d0d073b
avoid anaconda_depends for repin of numpy
JamesRobertsonGames Jun 19, 2024
d35eab8
add protection list and logic to the n2 hotfix
JamesRobertsonGames Jun 25, 2024
513846a
lint issues
JamesRobertsonGames Jun 25, 2024
9e1ab7b
lint issue
JamesRobertsonGames Jun 25, 2024
b0fd63e
pyyaml added
JamesRobertsonGames Jun 25, 2024
71306c3
pyyaml in testenv
JamesRobertsonGames Jun 25, 2024
ce7e20b
modify yaml and ensure upto date filtering of edge cases
JamesRobertsonGames Jul 16, 2024
0863898
remove need for yaml
JamesRobertsonGames Jul 16, 2024
c979b12
remove pyyaml
JamesRobertsonGames Jul 16, 2024
881cf75
add numpy 2 hotfix rework
JamesRobertsonGames Jul 25, 2024
18e3c4d
formatting changes
JamesRobertsonGames Jul 25, 2024
e3cb04b
main trimmed whitespace
JamesRobertsonGames Jul 25, 2024
86c54a1
repair lint issues
JamesRobertsonGames Jul 25, 2024
0027dab
remove all the lint errors from flake
JamesRobertsonGames Jul 25, 2024
571832b
revert test-hotfix
JamesRobertsonGames Jul 25, 2024
34f2d32
py-rattler adding
JamesRobertsonGames Jul 25, 2024
fec2680
add to readme the new processes
JamesRobertsonGames Jul 25, 2024
ed2659a
remove dep none remover
JamesRobertsonGames Jul 25, 2024
4d654e5
remove yaml
JamesRobertsonGames Jul 25, 2024
c26a1fd
upload proposed changes
JamesRobertsonGames Jul 25, 2024
71e30cd
Apply suggestions from code review
JamesRobertsonGames Jul 26, 2024
d75edb5
Update numpy2.py
JamesRobertsonGames Jul 26, 2024
411223c
change code for clearer changes for numpy 2
JamesRobertsonGames Jul 26, 2024
6716a8f
remove rattler goodness
JamesRobertsonGames Jul 26, 2024
8dd4c11
change to n2
JamesRobertsonGames Jul 26, 2024
c896b1b
flake8
JamesRobertsonGames Jul 26, 2024
d2ab4c9
remove logging for items not being updated
JamesRobertsonGames Jul 26, 2024
2d59ddb
regenerate n2 patch
JamesRobertsonGames Jul 26, 2024
7f237ee
Merge branch 'master' into numpy2-hotfix
ryanskeith Jul 27, 2024
c1b7795
make changes suggested in previous review
JamesRobertsonGames Aug 1, 2024
7639c1a
linting errors fixed
JamesRobertsonGames Aug 1, 2024
c09fa2f
delete numpy2 config and lint modifications
JamesRobertsonGames Aug 1, 2024
4326c27
speed up patching process with better data handling
JamesRobertsonGames Aug 1, 2024
1eb8cd5
README updated with changes
JamesRobertsonGames Aug 1, 2024
e5bb64f
make changes for legibility to per review
JamesRobertsonGames Aug 8, 2024
3b6625c
linting correctors
JamesRobertsonGames Aug 8, 2024
c54945f
remove correction code
JamesRobertsonGames Aug 13, 2024
36222d0
revert to force git to pick up the change
JamesRobertsonGames Aug 13, 2024
4d8e248
revert depends and contrains
JamesRobertsonGames Aug 13, 2024
b6595af
ensure return case is not needed
JamesRobertsonGames Aug 13, 2024
4d468b1
remove protect dict to await issues
JamesRobertsonGames Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 76 additions & 13 deletions README.md
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you need to update this to reflect current code changes.

Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# repodata-hotfixes
## Changes to package metadata to fix behavior

When packages are created, authors do their best to specify constraints that make their package work. Sometimes things change, and their constraints are not accurate for making things work. This results in broken environments. People need to be able to patch the package metadata long after the packages are built, so that we can prevent conda from creating broken environments. This repository holds python scripts that generate JSON files, which are then applied on top of the repodata.json index files that are generated from the original package content.
When packages are created, authors do their best to specify constraints that make their package work. Sometimes things change, and their constraints are not accurate for making things work. This results in broken environments. People need to be able to patch the package metadata long after the packages are built, so that we can prevent conda from creating broken environments. This repository holds python scripts that generate JSON files, which are then applied on top of the repodata.json index files that are generated from the original package content.

## Things that may require a metadata hotfix:

Expand All @@ -13,25 +13,25 @@ When packages are created, authors do their best to specify constraints that mak

### Dependency and Constraint updates

Changing dependencies and constraints is the primary reason hotfixes are applied. Their
Changing dependencies and constraints is the primary reason hotfixes are applied. Their
may be reasons why you need to change a longstanding package but rebuilding may not be
feasible or perhaps not worth the time. By changing dependencies and constraints,
feasible or perhaps not worth the time. By changing dependencies and constraints,
the data used to solve for dependencies can be modified and leave the larger ecosystem
unharmed.

NOTE: Hotfixes are applied in a overwrite manner. So any changes are implemented
NOTE: Hotfixes are applied in a overwrite manner. So any changes are implemented
will effect the the entire dependency or constraint list (i.e. If someone
changes one out of the ten dependency for a single package, all ten will still should be in the
"patch-instructions" as patching is an overwriting operation).

### Removal

Adding a package to the removal list will remove the entire entry from the repodata.json. It will no longer be searchable by conda search.
Adding a package to the removal list will remove the entire entry from the repodata.json. It will no longer be searchable by conda search.

We should put things on the remove list when:
- We need a quick fix to stop consumers from downloading a bad package.

Another approach might be to move the package into broken package directory (see directions in perseverance-skills). This will cause it not to be indexed in the first place.
Another approach might be to move the package into broken package directory (see directions in perseverance-skills). This will cause it not to be indexed in the first place.

### Revoked

Expand All @@ -45,12 +45,76 @@ We should put things on the revoke list when:
- We feel we want a customer to still have access but not the whole consumer population by default
- ?

## Numpy 2.0 Compatibility Checks and Updates

### Running numpy2.py

The `numpy2.py` script is used to check and update package dependencies for compatibility with numpy 2.0. To run the script, use the following command:

```
python numpy2.py
```

### What numpy2.py does

`numpy2.py` performs the following tasks:
1. Scans through the repodata for packages depending on numpy.
2. Checks if these dependencies need updates to ensure compatibility with numpy 2.0.
3. Proposes changes to add upper bounds to numpy dependencies where necessary.
4. Generates a `numpy2_patch.json` file containing all proposed changes.

### When to use numpy2.py

Use `numpy2.py` when:
- Preparing for a major numpy version update (e.g., transitioning to numpy 2.0).
- You need to audit and update numpy dependencies across many packages.
- You want to ensure compatibility of the ecosystem with upcoming numpy versions.

### Running main.py with proposed_numpy_changes.json

After running `numpy2.py`, you'll have a `numpy2_patch.json` file. To apply these changes:

1. Ensure `numpy2_patch.json` is in the same directory as `main.py`.
2. Run `main.py` as usual:

```
python main.py
```

`main.py` will automatically detect and incorporate the changes from `numpy2_patch.json` into the hotfix process.

## Reviewing CSV Updates

After running `numpy2.py` or `main.py`, CSV files are generated containing detailed information about the proposed changes. To review these updates:

1. Locate the generated CSV files in your working directory. They will be named according to the type of update, e.g., `dep_numpy2_updates.csv`, `constr_numpy2_updates.csv`.

2. For a quick review, you can open these files with any spreadsheet application on your local machine.

3. For a more collaborative review or to share the updates with your team, you can upload the CSV files to a cloud-based service:

- Google Sheets:
1. Go to [Google Sheets](https://sheets.google.com).
2. Click on "Blank" to create a new spreadsheet.
3. Go to File > Import > Upload and select your CSV file.
4. Choose your import options and click "Import data".

4. Once uploaded, you can easily sort, filter, and analyze the proposed changes. Look for:
- Packages affected
- Types of changes (e.g., adding upper bounds, modifying existing bounds)
- Reasons for changes

5. Use this review to make informed decisions about which changes to approve or modify before applying the hotfixes.

Remember to handle these CSVs securely, especially if they contain sensitive package information.

## Utility scripts:

### Seeing current hotfixes with `gen-current-hotfix-report.py`:

It can be quite difficult to grok what the hotfix scripts are doing. The script, `gen-current-hotfix-report.py`, attempts to make it easier to see what the current state of the applied hotfixes looks like.
It can be quite difficult to grok what the hotfix scripts are doing. The script, `gen-current-hotfix-report.py`, attempts to make it easier to see what the current state of the applied hotfixes looks like.

The script downloads the current repodata. It then shows you a diff. Example usage of this script:
The script downloads the current repodata. It then shows you a diff. Example usage of this script:

```
python gen-current-hotfix-report.py main --subdir linux-64 osx-64 win-64 osx-arm64 linux-ppc64le linux-aarch64 linux-s390x noarch
Expand All @@ -60,16 +124,15 @@ For repeated runs add `--use-cache` to avoid downloading the repodata files.

### Testing hotfixes with `test-hotfix.py`:

The script, `test-hotfix.py`, downloads the current repodata and runs your instructions against it. It then shows you a diff.

This useful for testing out changes before they are committed and deployed. This will show differences in current state of hotfixes
The script, `test-hotfix.py`, downloads the current repodata and runs your instructions against it. It then shows you a diff.
This useful for testing out changes before they are committed and deployed. This will show differences in current state of hotfixes
and the ones you are working on.

Example usage of this script:

```
python test-hotfix.py main --subdir linux-64 osx-64 win-64 osx-arm64 linux-ppc64le linux-aarch64 linux-s390x noarch
```

Use the `--color` or `--show-pkgs` options for different outputs.

For repeated runs add `--use-cache` to avoid downloading the repodata files.
For repeated runs add `--use-cache` to avoid downloading the repodata files.
111 changes: 107 additions & 4 deletions main.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,21 @@
import sys
from collections import defaultdict
from os.path import dirname, isdir, isfile, join

from conda.models.version import VersionOrder

import csv
import requests
import logging
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is a good idea to add in logging into this script. This script is feed into a conda-index process.


# Global dictionary to store data for CSV output
csv_data = defaultdict(list)

# Configure the logging
logging.basicConfig(level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[logging.FileHandler('hotfixes.log', mode='w'),
logging.StreamHandler()])
# Create a logger object
logger = logging.getLogger(__name__)

CHANNEL_NAME = "main"
CHANNEL_ALIAS = "https://repo.anaconda.com/pkgs"
Expand Down Expand Up @@ -261,6 +272,95 @@
]


def load_numpy2_changes():
try:
with open('numpy2_patch.json', 'r') as f:
return json.load(f)
except FileNotFoundError:
logger.error("numpy2_patch.json not found. Aborting hotfixes.")
sys.exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just remove this block and use my oneliner below.



NUMPY_2_CHANGES = load_numpy2_changes()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
NUMPY_2_CHANGES = load_numpy2_changes()
NUMPY_2_CHANGES = json.loads(Path("numpy2_patch.json").read_text())

Of course, add in from pathlib import Path above.



def apply_numpy2_changes(record, subdir, filename):
"""
Applies predefined numpy changes to a record based on its directory and filename.

Parameters:
- record: The record to update.
- subdir: The subdirectory of the record.
- filename: The filename of the record.
"""
if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if subdir not in NUMPY_2_CHANGES or filename not in NUMPY_2_CHANGES[subdir]:
subdir_changes = NUMPY_2_CHANGES.get(subdir, set())
if filename not in subdir_changes:

Small optimization.

return
changes = NUMPY_2_CHANGES[subdir][filename]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can then refer to subdir_changes from above.

for change in changes:
depends = _get_dependency_list(record, change['type'])
if depends is None:
continue
_apply_changes_to_dependencies(depends, change, record, filename, 'type')


def _get_dependency_list(record, change_type):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this warrants its own function. If you store "depends" and "constrains" directly in the numpy2_patch.json file, the replace_dep line becomes:

replace_dep(record[change["change_type"]], change["original"], change["updated"]

Your process already did all the cleaning so we shouldn't have any weird third cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also gets rid of the unnecessary for loop.

"""
Returns the appropriate dependency list based on the change type.

Parameters:
- record (dict): The record containing dependency information.
- change_type (str): The type of change ('dep' for dependencies, 'constr' for constraints).

Returns:
- list: The list of dependencies or constraints based on the change type, None if the change type is unrecognized.
"""
if change_type == 'dep':
return record['depends']
elif change_type == 'constr':
return record.get('constrains', [])
return None


def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _apply_changes_to_dependencies(depends, change, record, filename, sort_type='reason'):
def _apply_changes_to_dependencies(depends_or_constraints, change, record, filename, sort_type='reason'):

"""
Applies changes to dependencies and logs the changes.

Parameters:
- depends (list): The list of dependencies to be modified.
- change (dict): A dict containing the original dependency, the updated dependency, the reason for the change.
- record (dict): The record to which the changes apply.
- filename (str): The name of the file being processed.
- sort_type (str, optional): The key in the 'change' dictionary to sort the CSV data by. Defaults to 'reason'.
"""
for i, dep in enumerate(depends):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i, dep in enumerate(depends):
replace_dep(depends, change["original"], change["updated"])

replace_dep is the standard usage for replacing deps in main.py.

if dep == change['original']:
depends[i] = change['updated']
if change['reason'] == 'Upper bound added':
logger.info(f"Applied numpy change for {filename}: {change['original']} -> {change['updated']}")
# Add to csv_data for later CSV export
csv_data[change[sort_type]].append([
record['name'], record['version'], record['build'],
record['build_number'], change['original'],
change['updated'], change['reason']
])


def write_csv():
"""
Writes update data to CSV files in the 'updates' directory.
"""
if not os.path.exists("updates"):
os.makedirs("updates")

for issue_type, data in csv_data.items():
with open(f"updates/{issue_type}_numpy2_updates.csv", 'w', newline='') as csvfile:
csv.writer(csvfile).writerow(['Package', 'Version',
'Build', 'Build Number',
'Original Dependency', 'Updated Dependency',
'Reason'])
csv.writer(csvfile).writerows(data)


def _replace_vc_features_with_vc_pkg_deps(name, record, depends):
python_vc_deps = {
"2.6": "vc 9.*",
Expand Down Expand Up @@ -671,6 +771,9 @@ def patch_record_in_place(fn, record, subdir):
depends[i] = depends[i].replace(">=1.21.5,", ">=1.21.2,")
break

if NUMPY_2_CHANGES is not {}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if NUMPY_2_CHANGES is not {}:
if NUMPY_2_CHANGES:

Python truthiness. Depending on how you generated the patch. We should either apply all the numpy patches at the very beginning or at the very end. There are other places in main where numpy is touched and gets an upper bound. We don't want to overwrite them or redo them.

apply_numpy2_changes(record, subdir, fn)

###########
# pytorch #
###########
Expand Down Expand Up @@ -734,7 +837,6 @@ def patch_record_in_place(fn, record, subdir):
######################
# scipy dependencies #
######################

# scipy 1.8 and 1.9 introduce breaking API changes impacting these packages
if name == "theano":
if version in ["1.0.4", "1.0.5"]:
Expand Down Expand Up @@ -969,7 +1071,6 @@ def patch_record_in_place(fn, record, subdir):

# kealib 1.4.8 changed sonames, add new upper bound to existing packages
replace_dep(depends, "kealib >=1.4.7,<1.5.0a0", "kealib >=1.4.7,<1.4.8.0a0")

# Other broad replacements
for i, dep in enumerate(depends):
# glib is compatible up to the major version
Expand Down Expand Up @@ -1534,6 +1635,8 @@ def do_hotfixes(base_dir):
def main():
base_dir = join(dirname(__file__), CHANNEL_NAME)
do_hotfixes(base_dir)
if NUMPY_2_CHANGES != {}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be creating csv files as a by product of the hotfixing process.

write_csv()


if __name__ == "__main__":
Expand Down
Loading
Loading