Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update go-ipld-git to a go-ipld-prime codec #46

Merged
merged 22 commits into from
Aug 12, 2021
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .gx/lastpubver

This file was deleted.

9 changes: 9 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# CHANGELOG

## v0.1.0

This release includes BREAKING CHANGES

* go-ipld-git is now a [go-ipld-prime](https://github.com/ipld/go-ipld-prime) IPLD codec. Use `Decode(na ipld.NodeAssembler, r io.Reader) error` and `Encode(n ipld.Node, w io.Writer) error` for direct use if required.
* There is now only one `Tag` type, `MergeTag` has been removed which had a `text` property. Use `Tag`'s `message` property instead to retrieve the tag message from a commit's `mergetag`. i.e. `<commit>/mergetag/message` instead of `<commit>/mergetag/text`.
* `PersonInfo` no longer exposes the human-readable RFC3339 format `date` field as a DAG node. The `date` and `timezone` fields are kept as their original string forms (to enable precise round-trips) as they exist in encoded Git data. e.g. `<commit>/author/date` now returns seconds in string form rather than an RFC3339 date string. Use this value and `<commit>/author/timezone` to reconstruct the original if needed.
90 changes: 73 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,8 @@ Git ipld format

[![](https://img.shields.io/badge/made%20by-Protocol%20Labs-blue.svg?style=flat-square)](http://ipn.io)
[![](https://img.shields.io/badge/project-IPFS-blue.svg?style=flat-square)](http://ipfs.io/)
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](http://webchat.freenode.net/?channels=%23ipfs)
[![Coverage Status](https://codecov.io/gh/ipfs/go-ipld-git/branch/master/graph/badge.svg)](https://codecov.io/gh/ipfs/go-ipld-git/branch/master)
[![Travis CI](https://travis-ci.org/ipfs/go-ipld-git.svg?branch=master)](https://travis-ci.org/ipfs/go-ipld-git)

> An ipld codec for git objects allowing path traversals across the git graph!

Note: This is WIP and may not be an entirely correct parser.

## Lead Maintainer

[Łukasz Magiera](https://github.com/magik6k)
rvagg marked this conversation as resolved.
Show resolved Hide resolved
> An IPLD codec for git objects allowing path traversals across the git graph.

## Table of Contents

Expand All @@ -29,19 +20,49 @@ go get github.com/ipfs/go-ipld-git
```

## About

This is an IPLD codec which handles git objects. Objects are transformed
into IPLD graph in the following way:
into IPLD graph as detailed below. Objects are demonstrated here using both
[IPLD Schemas](https://ipld.io/docs/schemas/) and example JSON forms.

### Commit

```ipldsch
type GpgSig string

type PersonInfo struct {
date String
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you're not keeping dates in a human-friendly single string format, I wonder if you could make this unix timestamp an Int too.

timezone String
email String
name String
}

type Commit struct {
tree &Tree # see "Tree" section below
parents [&Commit]
message String
author optional PersonInfo
committer optional PersonInfo
encoding optional String
signature optional GpgSig
mergetag [Tag]
other [String]
}
```

As JSON, real data would look something like:

* Commit:
```json
{
"author": {
"date": "1503667703 +0200",
"date": "1503667703",
"timezone": "+0200",
"email": "author@mail",
"name": "Author Name"
},
"committer": {
"date": "1503667703 +0200",
"date": "1503667703",
"timezone": "+0200",
Comment on lines +64 to +65

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea here that this is easier to work with an a single object? It doesn't seem to match how the Git spec handles things internally. Although perhaps it's fair for us to try and make things easier for our users here.

I was talking with @Stebalien about this and as far as we can tell there hasn't been much tooling developed around this codec so making a breaking change that helps people out and seems sane is probably fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(delayed reply to this) the main idea here is to keep the data lossless and not do too many tricks to make it palatable to the user.

Encoded form looks something like:

author A U Thor <author@example.com> 1465981137 +0000

But there's even some flexibility in what those final two fields can contain and apparently whether we get two, one or zero of them (I'm not sure about that detail though). So we treat them as strings and present them to the user as they are.

The current version of this codec keeps them internally but only presents a nice human-readable form as a DAG node, which isn't going to work when we're treating DAG traversal as traversal over the data model like we're doing with ipld-prime.

We did discuss (in this thread and other places) some options for making the human-readable form emerge out of the data to be available, but @willscott made the case that since this codec is a bit of a reference codec that we'd want to point others to for how to implement one that we'd better keep it simple for now.

So, lossless data, pure data model, no trickery, user gets to interpret the nodes as they want.

"email": "author@mail",
"name": "Author Name"
},
Expand All @@ -51,10 +72,22 @@ into IPLD graph in the following way:
],
"tree": <LINK>
}
```

### Tag

```ipldsch
type Tag struct {
object &Any
type String
tag String
tagger PersonInfo
message String
}
```

* Tag:
As JSON, real data would look something like:

```json
{
"message": "message\n",
Expand All @@ -69,10 +102,21 @@ into IPLD graph in the following way:
},
"type": "commit"
}
```

### Tree

```ipldsch
type Tree {String:TreeEntry}

type TreeEntry struct {
mode String
hash &Any
}
```

* Tree:
As JSON, real data would look something like:

```json
{
"file.name": {
Expand All @@ -87,11 +131,23 @@ into IPLD graph in the following way:
}
```

### Blob

```ipldsch
type Blob bytes
```

As JSON, real data would look something like:

* Blob:
```json
"<base64 of 'blob <size>\0<data>'>"
```

## Lead Maintainers

* [Will Scott](https://github.com/willscott)
* [Rod Vagg](https://github.com/rvagg)

## Contribute

PRs are welcome!
Expand Down
83 changes: 27 additions & 56 deletions blob.go
Original file line number Diff line number Diff line change
@@ -1,71 +1,42 @@
package ipldgit

import (
"encoding/json"
"errors"
"bufio"
"fmt"
"io"

cid "github.com/ipfs/go-cid"
node "github.com/ipfs/go-ipld-format"
"github.com/ipld/go-ipld-prime"
)

type Blob struct {
rawData []byte
cid cid.Cid
}

func (b *Blob) Cid() cid.Cid {
return b.cid
}

func (b *Blob) Copy() node.Node {
nb := *b
return &nb
}

func (b *Blob) Links() []*node.Link {
return nil
}

func (b *Blob) Resolve(_ []string) (interface{}, []string, error) {
return nil, nil, errors.New("no such link")
}

func (b *Blob) ResolveLink(_ []string) (*node.Link, []string, error) {
return nil, nil, errors.New("no such link")
}

func (b *Blob) Loggable() map[string]interface{} {
return map[string]interface{}{
"type": "git_blob",
// DecodeBlob fills a NodeAssembler (from `Type.Blob__Repr.NewBuilder()`) from a stream of bytes
func DecodeBlob(na ipld.NodeAssembler, rd *bufio.Reader) error {
sizen, err := readNullTerminatedNumber(rd)
if err != nil {
return err
}
}

func (b *Blob) MarshalJSON() ([]byte, error) {
return json.Marshal(b.rawData)
}

func (b *Blob) RawData() []byte {
return []byte(b.rawData)
}
prefix := fmt.Sprintf("blob %d\x00", sizen)
buf := make([]byte, len(prefix)+sizen)
copy(buf, prefix)

func (b *Blob) Size() (uint64, error) {
return uint64(len(b.rawData)), nil
}
n, err := io.ReadFull(rd, buf[len(prefix):])
if err != nil {
return err
}

func (b *Blob) Stat() (*node.NodeStat, error) {
return &node.NodeStat{}, nil
}
if n != sizen {
return fmt.Errorf("blob size was not accurate")
}

func (b *Blob) String() string {
return "[git blob]"
return na.AssignBytes(buf)
}

func (b *Blob) Tree(p string, depth int) []string {
return nil
}
func encodeBlob(n ipld.Node, w io.Writer) error {
b, err := n.AsBytes()
if err != nil {
return err
}

func (b *Blob) GitSha() []byte {
return cidToSha(b.Cid())
_, err = w.Write(b)
return err
}

var _ node.Node = (*Blob)(nil)
1 change: 0 additions & 1 deletion ci/Jenkinsfile

This file was deleted.

Loading