-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant-memory IArray packing #52
base: master
Are you sure you want to change the base?
Conversation
The IArray serializer previously used, effectively, `putListOf . elems` to serialize immutable arrays. Since the bounds uniquely define the length of the list, eliminating one length call allows the list to be partially GCd during serialization, so large lists take no additional space.
The Travis failure reminds me that I have really internalized the base-4.8 changes to Prelude. I'll get that fixed up! |
Thanks for the PR! I'm not opposed to breaking changes, especially when they come with such nice performance benefits! |
@elliottt Looks like we're green now. As for the breaking change, totally up to you but if backwards compatibility is important, we have the possibility to attain it by adding that (unnecessary) length field back in. Thanks for the quick response! |
I like the suggestion to add the length field back in, even if it's not used. If you make that change, I'll happily merge this :) |
You know what, I'm digging a bit further into this and I'm not sure I've slain the beast on the Get side (Put is fine as far as I can tell). I'll add both to my list and update this PR. |
@elliottt A brief status update on this work... it's been interesting but the rabbit hole goes deep here. This is half keeping you updated and half trying to rubber-duck debug the situation I'm in by explaining it. The fundamental problem I'm seeing with the So I've reduced the problem to eliminating the list construction from the Array deserializer. My current angle of attack is to stream the entries into an array incrementally under I had success with this approach using the So that leaves the option of trying to turn Do you have any thoughts, positive or negative, about this approach? Am I overthinking this? Thanks! |
I'm a little reluctant to make Get into a transformer, but if you've got a branch that implements this and doesn't induce performance regressions, I'd consider it. It's been a requested feature in the past, so that parsing could be done in IO, for example. Do you have thoughts on what the changes to the Serialize class would look like? Would you need to turn Put into a transformer as well, to allow putting IOArrays, and so forth? |
Yeah, I was concerned about performance regressions too... it's too early for me to tell whether this is going to be a good idea. I have some changes to Get.hs that implement parts of this change, but I haven't gotten all of the types to fit together yet. I'm working through all of the downstream implications of what this would mean, and unfortunately I am working at the edge of (and stretching) my intuitive understanding of monad transformers now. So far, the big change is that My hope is that when this is done, GHC will be smart enough to optimize away the monadic binds when running under I didn't think much about the desire for a parallel Thanks again for your time! I'll let you know if/when I have further results to show on the |
Kind of hijacking this PR for the discussion in it. @bradediger as I understand it, you have done some work on that, but didn't get it "ready"? Can you share your current work? |
@Ongy Unfortunately I have nothing else to show on the GetT approach. I had worked around this problem in my own code for the time being (my original problem was My approach was a fairly heavy-handed one; it scrapped the This was the skeleton of the interface I started with -- note that data Result m a =
Fail Text ByteString
| Partial (ByteString -> m (Result m a))
| Done (m a) ByteString
newtype GetT m a = GetT {
-- | Feed a chunk of data and get a partial result.
runPartial :: ByteString -> m (Result m a)
} Unfortunately, I basically only got as far as chasing these type definitions through Get.hs before giving up on this approach. I still think it's promising, but it ended up surpassing the time I had available. I think the main hard question that needs to be answered is whether the Happy to answer additional questions about my approach, but this is basically what I remember at this point. @elliottt Happy to move this to a mailing list or somewhere else if this thread isn't the appropriate place for this discussion. Thanks! |
I don't mind the discussion here, it's a nice record for anyone else that's interested in the transformer approach :) |
Currently, the
IArray
serializer delegates toelems
and the list serializer. Unfortunately, this results in a second full copy of the array, as the list serializer needs to walk the list (holding the head) to find its length. This is necessary for arbitrary lists, but not for arrays where we know the dimensions beforehand.This change does away with the extra length field; it outputs the bounds and then spits elements one by one, in array (
Ix
) order. The parser was similarly changed. In the meantime, I also added a small round-trip test for the array serialization functionality.As I was writing this, I just realized that this is a breaking change to the IArray serialization format, as it removes the superfluous
Word64
length. If backwards compatibility is desired, we could calculate and add that field back in so the format is identical.Here is the test script I used to verify the before/after memory behavior: