-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Awkward v2 transition #736
Conversation
f6d6d30
to
18765f6
Compare
@kratsg hey - what does a form key that looks like:
mean? In particular the |
Checking, but I don't recall an |
Gotcha - things broke in interesting ways - there's only so much of this code nick/myself actually own and know how to maintain! Thanks for looking. |
3147db7
to
3927dd3
Compare
@gordonwatts You should be aware that things have changed a lot with forms under the hood in awkward2. I have gotten your tests to pass again for the auto schema but there is not much confirmation that they work beyond that. @'ing you since it would be worth your time to work the rest of this out. I'm not sure if the servicex tests are awkward2 compatible yet either. |
@tejinc awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (pdune) to pass tests but you may want to check it to ensure proper functionality is kept. |
@kpedro88 awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (treemaker) to pass your tests but you may want to check it to ensure proper functionality is kept. |
|
@nikoladze awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (physlite) to pass your tests but you may want to check it to ensure proper functionality is kept. |
@kratsg awkward2 brings quite some changes to the inner workings of coffea. I have gotten your nanoevents schema (delphes) to pass your tests but you may want to check it to ensure proper functionality is kept. |
Checked it and it seemed ok out of the box? It at least works for me for the quick tests I ran so I think it's ok. EDIT: works on some other older Delphes files I had too that crashed with coffea before... so I think the changes are going to give us better support(?!). Thanks for all the work! |
events = NanoEventsFactory.from_root("file:mytestfile.root", entry_start=0, entry_stop=100, schemaclass=TreeMakerSchema, treepath='TreeMaker2/PreSelection').events() # This works fine
events = NanoEventsFactory.from_root("file:mytestfile.root", entry_start=0, entry_stop=1000, schemaclass=TreeMakerSchema, treepath='TreeMaker2/PreSelection').events() # This errors Traceback dump:
|
@yimuchen thanks! what's the size of the file you're testing against? |
This is a 2GB file containing ~60K events. So about 35KB (?) per event. |
@yimuchen can you try seeing if it's a particular event that causes problems, or if it's just really the size of the event range you want to pick up? |
2103840
to
2bde16c
Compare
Note to self: python 3.11 is a limitation of at least numba |
f347ada
to
38dd522
Compare
Tried a few points, it looks like if the event size fails for some slice size (seems fail at about ~400 events in this case), it would consistently fail. So it isn't a case of a single very large event. |
@yimuchen so it's really just the absolute size of the slice? or where you are in the file? or both? It may be useful to scan through the file 100 events at a time to make sure there's not some pattern that's bugging out. |
form["contents"][field], prefix + f",{field},!item" | ||
) | ||
elif field.startswith("@"): | ||
# workaround uproot5 bug |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug is fixed in scikit-hep/uproot5#808
Tests will be broken for a bit as I get things daskified. |
@nikoladze I could use your help in daskifying PHYSLITESchema. Something is going wrong in zipping the forms but everything looks more or less OK. I think if that can be fixed up it'll more or less work. You'll need to use uproot main to get it to run in the first place. |
@kratsg I think I did this correctly. Can you please check the usage of NanoEvents Delphes with dask_awkward? I have no idea what correct use really looks like. |
Thanks a lot for putting all this effort into this! See #776 for a fix for the form zipping. |
fix form zipping for physlite schema
@nikoladze there's still some problems in the rendered arrays with the trackParticleLinks - look at the test failures. :-/ Ah - I see it's passing around some dask_awkward stuff within all your offsets functions. I'll get to it in time but it's a fair amount of rather specific code, it would be best for you to figure out where to put in the map_partitions calls and such. Basically the big change with dask_awkward is that we can only use the factory-reference-through-behavior trick on the client side, so your arrays have to render to something that doesn't require reindexing later. |
These latest commits require to be-released features in dask-awkward. |
OK so what we're gonna do is merge this puppy and start release candidates. There is a very short list of things that need doing to get a full release on the coffea side. @nikoladze @kratsg I will need your help to fix your schemas. Parquet nanoevents will have to wait for a better implementation of form-remapping in dask-awkward (which should also tidy up some code in uproot). @martindurant @douglasdavis I will reformulate the task list from this PR into the remaining tasks that need doing, and release an RC for people to start exploring what's new. |
Get all tests working again regardless of laziness in nanoevents (I should break this out into sub-bullets)
base schema passes tests
nanoaod schema passes tests
protodune schema passes tests
physlite schema passes tests
delphes schema passes tests
treemaker schema passes tests
local executor tests pass
dask executor passes tests
parsl executor passes tests
spark executor passes tests (requires spark upgrade and new data ingestion step!)
lookup tools passes tests
jetmet tools - individual correctors
jetmet tools - CorrectedJetsFactory (heavily lazy)
the above, again, except with laziness implemented in dask_awkward