Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split convert observed data #7334

Merged

Conversation

michaelosthege
Copy link
Member

@michaelosthege michaelosthege commented May 23, 2024

Description

convert_observed_data is a messy do-it-all conversion function that was recently refactored in #7299.

This PR takes the next step and splits it such that generator data is treated separately.

Related Issue

Checklist

Type of change

  • New feature / enhancement
  • Bug fix
  • Documentation
  • Maintenance
  • Other (please specify):
    • Generator data is no longer supported with pm.Data.
    • Generator data of integer type is no longer converted to floatX, but only to intX.

📚 Documentation preview 📚: https://pymc--7334.org.readthedocs.build/en/7334/

@michaelosthege michaelosthege added maintenance VI Variational Inference labels May 23, 2024
@michaelosthege michaelosthege self-assigned this May 23, 2024
@michaelosthege michaelosthege force-pushed the split-convert-observed-data branch from 6b8e8f3 to f8f554e Compare May 23, 2024 23:12
@ricardoV94
Copy link
Member

ricardoV94 commented May 24, 2024

I'm still of the opinion we should drop generators altogether. I don't find any documentation on how to use it, and I can't really think of all the implications this Op means inside PyTensor, which is supposed to be side-effects free unless told otherwise (it's not being told so here). It's also only applicable in the default backend / and even here only if you have models without Deterministics (or any workflow where separate functions are compiled from the same underlying graph).

If this is useful in VI, the machinery that creates and call functions can use regular Python code to work with generators (such as setting a shared variable to the next value in every iteration of the loop). I don't see why we need this magic at the PyTensor level.

@michaelosthege
Copy link
Member Author

I'll add a depreaction warning to the conversion of generator data, asking people to get in touch if this is relevant.

michaelosthege and others added 3 commits May 24, 2024 17:01
This will be used to fix the `GeneratorAdapter` when applied
to generators producing int-valued data.
Previously, the `GeneratorAdapter` applied `floatX` to float data,
but kept the original integer dtypes.
`floatX` was then applied to everything by `convert_observed_data`.

This refactor changes the handling of integer-valued generator data,
such that `intX` is applied, and no `floatX` conversion takes place.
@michaelosthege michaelosthege force-pushed the split-convert-observed-data branch from f8f554e to 5a73475 Compare May 24, 2024 16:55
@michaelosthege
Copy link
Member Author

michaelosthege commented May 24, 2024

  • Rebased
  • Added deprecation warning
  • Hopefully fixed the tests by defaulting to floatX conversion when user-provided data does not carry dtype information.

@@ -68,22 +68,38 @@
"floatX",
"intX",
"smartfloatX",
"smarttypeX",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not add these new methods to all as I prefer we to get rid of them in the future, so the less discoverable they are the better.

For reference the whole intX is pretty questionable: #7331

@michaelosthege michaelosthege force-pushed the split-convert-observed-data branch from 876efc3 to d9bd20c Compare May 26, 2024 21:31
Copy link

codecov bot commented May 26, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.45%. Comparing base (a197b19) to head (d9bd20c).
Report is 135 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7334      +/-   ##
==========================================
- Coverage   92.47%   92.45%   -0.03%     
==========================================
  Files         102      102              
  Lines       17188    17180       -8     
==========================================
- Hits        15894    15883      -11     
- Misses       1294     1297       +3     
Files Coverage Δ
pymc/data.py 89.44% <100.00%> (+0.13%) ⬆️
pymc/pytensorf.py 90.38% <100.00%> (-0.53%) ⬇️

... and 20 files with indirect coverage changes

@michaelosthege michaelosthege merged commit e5c3030 into pymc-devs:main May 27, 2024
22 checks passed
@michaelosthege michaelosthege deleted the split-convert-observed-data branch June 17, 2024 05:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance VI Variational Inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants