-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Offer a ReadableStream/WritableStream interface to work getRawObject blobs... #4
Comments
Heya, Interesting you should bring this up, because the latest version of libgit2 - v0.11.0 (released yesterday) actually offers readable streams for objects now. I'm still in the process of getting on top of these changes (there's quite a few), a new version of gitteh is a few days off yet. I'll see if I can fit in the time to expose the libgit2 object streams to Node.js! |
Was looking into the possibility of streaming tonight, and I noticed a potential snag: Opening a stream on an object in libgit2 is only possible if the object is currently stored as a "loose" object. A loose object is when the object is stored in a flatfile in the objects/ directory. If the git repository has been compacted at any point, then objects are more than likely going to instead be residing in a packfile, which uses zlib, deltas, and all sorts of magic that means streaming a file is somewhat impossible. If I do implement this feature, I suppose as a user of the library you'd have to try streaming first, and then fallback to a standard object read if the requested blob has been packed. |
Well how does "getRawObject" work with packed blobs then? It does the magic Wouldn't something like that be possible? On Tue, Mar 29, 2011 at 6:38 AM, samcday <
|
Alas, no. Currently the way raw objects are being delivered is simply a request to libgit2 that returns a void* pointer containing the data. There is no data until the method is called, and the method does not return until the data buffer has been fully populated. Welcome to the world of low level C libraries (: Look the thing is, pack files are only built by git.git when you run "git gc" in the repo, or when you push/pull a repo remotely. If you're using gitteh to serve git resources directly, then you can just require that the blob resources being served are loose. I think theres actually a git.git command that forces all pack files to be decompressed into loose files anyway. Libgit2 doesn't have this right now, but it will eventually Sent from my iPad On Mar 30, 2011, at 2:58 AM, TooTallNatereply@reply.github.com wrote:
|
Thanks for all the work on gitteh thus far Sam, v0.1.0 looks exciting! This one still urks me though. It seems that libgit2 needs to provide some more low-level functions. Why couldn't the function you're talking about (that gets the raw objects' data) be modified (in libgit2), to instead return immediately and begin filling the void* from another tread? It would be ideal to work with some sort of readiness API, so that perhaps the internal node Because it occured to me that, even though it's extremely lame, I could spawn a |
Hey man, I see your point, however I still need to stress that this situation is a little different. That serial port example you linked, if you check out the binding code, it's just opening /dev/stty or something. Either way it's opening a stream on the local filesystem. The difference with libgit2 is, the blob you're trying to open might be a simple zlib compressed object on the filesystem, OR it might be a delta stored in a pack file, which in turn is a differencing blob from another blob in ANOTHER pack file ;) Modifying libgit2 is a possibility, however I'm not really involved in libgit2 development at all. Regarding the way git CLI works, even though you think it's streaming on stdout, I think you'd find if you timed it, there's a short delay while git unpacks the file the blob is contained in. I'm going to do a couple of tests when I get into the office shortly to demonstrate this. Oh and btw, I should note that everything I'm talking about right now is regarding getting blobs from pack files right now. I'm going to implement a streaming method for loose objects in the next release. In then end though, we should probably just run some timing tests and see how long it takes to get a Buffer of packed blob data from libgit2, and compare it to piping from git CLI for example. If the difference is severe enough, we can investigate ways to improve it. The other thing is that caching could just be the solution in the context if projects like gitProvider that are surfacing git data directly to a client. Sent from my iPad On Apr 4, 2011, at 5:09 AM, TooTallNatereply@reply.github.com wrote:
|
Recently, I was also looking for a stream interface for blogs, since I'm trying to read large blobs from the repository. In your comment above, you said that you were going to implement a stream interface for unpacked files, but I can't seem to find it, either in the source or the documentation. Was this ever implemented? |
@ all of us Yes, this is implemented. However, this would require implementation of an ODB class. |
@samcday Please, mark this as |
This is more of a feature request / API suggestion:
Ideally, especially for very large files (or in the case of git, blobs), http responses should be streamed back to the client. In node, this is ideally done with
Stream#pipe()
.Currently with node-gitteh, streaming a revision of a file over a socket or to a file is impossible. I suggest that you offer a way to create both a
ReadableStream
andWritableStream
that would work with a Repository instance's RawObjects. These streams would be very similar to theirfs
module counterparts.Perhaps something like:
for a read stream would be cool. The stream would periodically emit
data
events. Then we could easily pipe the contents of a blob to a file, or socket, or http response, or whatever. Let me know what you think. Thanks!The text was updated successfully, but these errors were encountered: