Skip to content
This repository has been archived by the owner on Aug 16, 2024. It is now read-only.

cmd/doctree: Add automatic re-indexing #28

Merged
merged 8 commits into from
May 17, 2022

Conversation

KShivendu
Copy link
Contributor

  • By selecting this checkbox, I agree to license my contributions to this project under the license(s) described in the LICENSE file, and I have the right to do so or have received
    permission to do so by an employer or client I am producing work for whom has this right.

@KShivendu KShivendu force-pushed the auto-indexing branch 2 times, most recently from 3a95fa2 to 99d0235 Compare May 12, 2022 11:12
@KShivendu KShivendu marked this pull request as ready for review May 12, 2022 19:03
@KShivendu
Copy link
Contributor Author

KShivendu commented May 12, 2022

Hey @slimsag

Here's how automatic re-indexing works at the moment:

  • User registers the project using doctree add .
  • Store the project metadata in ~/.doctree/autoindex in the format: [{projectName: ..., path: ..., hash: ...}]
  • Calculate the hash using the command tar -cf - directory | md5sum

Whenever the user stars the server using doctree serve. The following happens:

  • Start the HTTP server as expected
  • Recompute hashes of all the registered projects and reindex if the hash has changed. (because changes can happen when the server was down)
  • Initialize a fsnotify.watcher and recursively register all the dirs mentioned in ~/.doctree/autoindex (excludes hidden dirs)
  • If the watcher says that a file has been altered, find its corresponding project and re-index the project.

I haven't implemented partial re-indexing in this PR. I'm planning to cover it in another PR. Please review this one and suggest changes.

Please note that Go isn't my first language. So I'm not aware of all the best practices and might have made some mistakes (esp. related to error handling and channels). So please be careful with the review :)

doctree/schema/schema.go Outdated Show resolved Hide resolved
//
// Reference: https://unix.stackexchange.com/questions/35832/how-do-i-get-the-md5-sum-of-a-directorys-contents-as-one-sum
func GetDirHash(dir string) string {
tarCmd := exec.Command("tar", "-cf", "-", dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be OK for now, but we'll need to come up with something smarter than this I think. I'm not totally sure I follow the needed for a directory hash, maybe you can explain the idea behind that a bit more for me?

The problem with this is:

  1. tar isn't available on Windows, might not be available in Docker images, etc. Not nice to depend on an external tool in general
  2. It can be quite slow - I have a large project that this takes almost a minute to complete on:
$ time sh -c 'tar -cf - . | md5sum'
cd434e78961999ed1a7e56cfcd29cdf7  -

real	0m57.865s
user	0m14.312s
sys	0m24.263s

May be good enough for now, though! Happy to land this as-is and revisit the idea later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Will try to look for something better than this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

additionally I'm unsure if it is required for tar to have deterministic output, ie does it sort directory entries before adding them? It may just rely on the order of readdir which can change even if the underlying files have not changed.

cmd/doctree/serve.go Outdated Show resolved Hide resolved
cmd/doctree/serve.go Outdated Show resolved Hide resolved
cmd/doctree/index.go Outdated Show resolved Hide resolved
cmd/doctree/add.go Outdated Show resolved Hide resolved
Copy link
Member

@slimsag slimsag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great, thanks for working on this! I'm very excited to have this functionality :D

I left some suggestions & questions inline, but looks great to me otherwise!

@KShivendu
Copy link
Contributor Author

KShivendu commented May 13, 2022

Apart from your suggestions, I also did the following:

  1. Set projectFlag based on the value of dir. (Previously it always stored the URL of the current directory irrespective of the dir value)
  2. Use map instead of arrays in ~/.doctree/autoindex to avoid adding the same dir (project) multiple times.
    Before: [{"name": "github.com/user/repo", "path": "/path/to/project"}]
    After: {"/path/to/project": {"name": "github.com/user/repo"}}
  3. Create ~/.doctree/autoindex if it doesn't exist when running doctree serve (I feel that its code could be cleaner. Would be great if you give suggestions to improve it)

@KShivendu KShivendu requested a review from slimsag May 13, 2022 20:53
Signed-off-by: Stephen Gutekanst <stephen@sourcegraph.com>
@slimsag
Copy link
Member

slimsag commented May 17, 2022

Nice work! I've added an EXPERIMENTAL warning to the helper menu while we sort out the remaining issues here, but otherwise looks like a great starting point for this functionality & happy to merge this as-is! Thanks for doing this :)


Some notes for myself to follow up on later:

  • macOS: I had to run ulimit -n 10000, or else I'd get bad file descriptor errors.
  • On Linux we may need to suggest something like this
  • Probably should query for the above at runtime, then suggest the user run those commands if needed. May be able to steal code for this from the sg tool.
  • Apparently fsevents on macOS is nicer than fsnotify: file descriptor limit on macOS fsnotify/fsnotify#8 FSEvents on macOS fsnotify/fsnotify#11 - I should look into integrating this for macOS maybe.
  • Will need to fix the tar issue.
  • Need to add a "settle" period for event handling, when ran in a project that has had a ton of changes it can end up reindexing quite a lot.
  • Need to sort out how to ignore .git, node_modules, etc. events (and exclude those from doc generation/indexing in general!)

@slimsag slimsag merged commit 6616fb5 into sourcegraph:main May 17, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants