Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update go-ipld-git to a go-ipld-prime codec #46
Update go-ipld-git to a go-ipld-prime codec #46
Changes from 6 commits
716ba8d
dae2706
9889aa2
fc1615b
d2d95b3
d0f7f53
8a28576
e3c68b0
5966572
59d8b00
a4c429b
67a7768
cced9d6
fc09011
91ee580
a3ace0d
d811145
b2b169a
4e0e59b
5cc00cf
9218898
6059d7a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since you're not keeping dates in a human-friendly single string format, I wonder if you could make this unix timestamp an Int too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea here that this is easier to work with an a single object? It doesn't seem to match how the Git spec handles things internally. Although perhaps it's fair for us to try and make things easier for our users here.
I was talking with @Stebalien about this and as far as we can tell there hasn't been much tooling developed around this codec so making a breaking change that helps people out and seems sane is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(delayed reply to this) the main idea here is to keep the data lossless and not do too many tricks to make it palatable to the user.
Encoded form looks something like:
But there's even some flexibility in what those final two fields can contain and apparently whether we get two, one or zero of them (I'm not sure about that detail though). So we treat them as strings and present them to the user as they are.
The current version of this codec keeps them internally but only presents a nice human-readable form as a DAG node, which isn't going to work when we're treating DAG traversal as traversal over the data model like we're doing with ipld-prime.
We did discuss (in this thread and other places) some options for making the human-readable form emerge out of the data to be available, but @willscott made the case that since this codec is a bit of a reference codec that we'd want to point others to for how to implement one that we'd better keep it simple for now.
So, lossless data, pure data model, no trickery, user gets to interpret the nodes as they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd strongly advise you to use something like strings.Builder instead; each
+=
on a string is an allocation and copy of the entire thing. Just twenty of these operations could already add up to showing up on CPU profiles.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is copied verbatim from https://github.com/ipfs/go-ipld-git/blob/master/git.go#L310
I'll file as an issue to optimize if we end up caring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you think this buffer could get large, you could also use a bufio.Writer, which will write to the final writer in chunks. But if we think it will never be more than a few kilobytes, it's likely irrelevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simplify these by using %x instead :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'll leave as is since copied verbatim
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change this to a map rather than a stringjoin?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's not being used as a stringjoin in any way? We have a custom
PersonInfo#GitString()
that's used to encode it in the git format andparsePersonInfo()
to parse the input string for decoding. Stringjoin here is only going to mean that when it's encoded with dag-cbor or dag-json that it comes out as<date> <timezone> <email> <name>
, which IMO is lossy and probably not what we're wanting out of this. Unless I'm missing something?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<date> <timezone> <email> <name>
is the git format though. I would hope that wouldn't be lossy.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's
$name <$email> $maybedate $maybetimezone
- it's lossy because the email has a<>
around it and we can use the<
to delineate the name which itself can contain any number of spaces, so simple stringjoin isn't enoughThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Rod's take here: this seems better to do at the codec layer and yield up to data model as a map.
If I was going to transliterate some git data to dag-json, I'd probably want this to become a map in dag-json. So that means the codec should do this parse/transform immediately.