Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

option to unbundle docstrings and potential consequences #1411

Open
llmII opened this issue Feb 21, 2024 · 8 comments
Open

option to unbundle docstrings and potential consequences #1411

llmII opened this issue Feb 21, 2024 · 8 comments

Comments

@llmII
Copy link
Contributor

llmII commented Feb 21, 2024

I saw in 5de889419ff26b710b706958bf99e180d084f564 that docstrings take up RAM. This thought never crossed my mind until I saw that.

Further thought revealed there is a huge dependence on the website for more extensive documentation.

The more I gave this thought, along with @sogaiu, an idea came to mind.

What if in a production build (with jpm) docstrings were stripped out of the build completely. What if doc strings were stored in a separate file somewhere and doc knew how to look up and retrieve from it. What if the documentation system new mendoza related things and pretend mendoza had tags. A doc string could reference a tag and doc could be extended to understand tags, and one could then use doc to bring in the info from another set of files (say, the entire website but stored as mendoza documents).

The idea sums up with having a way to store documentation associated with the build of Janet (both the site, and doc strings) separately from within the Janet executable, saving RAM, and leading to being able to access extensive documentation offline.

The key thing to note is that this is an idea. It may or may not be worth pursuing. I think the RAM benefits might be good, and the offline documentation benefits phenomenal. I also think it's a lot of work which means someone (be it myself or another) would have to give it a considerable amount of thought and bring it to fruition.

Finally, I realized, man pages and info pages are entirely unsuitable because Janet is cross platform and must work under windows too, and those are more unix centric documentation bundles.

If this idea has enough merit, perhaps this should be a discussion as to how such a thing might should work.

NOTE:
I believe the first thing would be to try unbundling doc strings from libraries (like spork or similar) before trying to figure out how to (in the janet executable itself) remove docstrings from the vm but have them in a file that the doc function can access. Next would possibly be a mendoza reader for the site (and if there aren't tag capabilities, figure out a different manner to go about it, or add such). Then somehow include the site, in mendoza, in a build of Janet (with correct build options) so one can have those offline and searchable by doc or referenced by doc strings somehow. After that would be getting this to work with the janet executable and for the docstrings that are generally part of the janet vm to live in another file when compiled (but remain inside the source files).

@llmII llmII changed the title Janet documentation, doc strings, and reference between. (Also, why not man/info pages) Janet documentation, doc strings, and reference between. (Also, why not man/info pages, and unbundling/RAM Saving) Feb 21, 2024
@llmII llmII changed the title Janet documentation, doc strings, and reference between. (Also, why not man/info pages, and unbundling/RAM Saving) option to unbundle docstrings and potential consequences Feb 21, 2024
@llmII
Copy link
Contributor Author

llmII commented Feb 21, 2024

Further thought yields that perhaps the builtin doc for Janet should remain tight and focused on docstrings. Maybe there can be a way to get these out into separate files to offload the memory burden that a docstring can entail. I think exploring in that direction is, at the very least, not a bad idea. The idea of having offline site docs, searching through tags, and doc strings referencing them is still a fine idea. It might be good yet still for docstrings to do tag references even if the builtin doc doesn't support anything to do with those tags.

With doc within Janet remaining tightly focused, I'm thinking that a different library (perhaps spork, or otherwise) could have functions extending doc (possibly by shadowing it) which could support further features like the idea of offline site search against tags and extended documentation via such.

I believe, that this spot in specials.c#L331 might be the key point where with some compile time options docs could be written out to files instead of being dealt with in RAM. I also think it's a bit more difficult than that since there needs to be a way to keep from regenerating them per each load of a file except where stale or forcibly requested perhaps. I'm not yet sure how to do this. I'm yet to be certain how one might determine where a docs directory should be (for both janet itself, as well as jpm installed libraries) or if there should be a way to detect that something was not loaded from a standard path so don't even bother pretending to cache, regenerate completely. There would probably need to be a dynamic that allows one to build an executable with JPM that strips all doc strings completely and refuses the generation thereof. There are a bunch of points of consideration for this idea, and it probably needs more thought than me just thinking "I'll take a hack at that". That said... I'll give it some more thought, and might proceed to do just that.

@pepe
Copy link
Member

pepe commented Feb 21, 2024

I may be mistaken, I never used it, but isn't this https://github.com/janet-lang/janet/blob/master/src%2Fconf%2Fjanetconf.h#L23 for that purpose?

@llmII
Copy link
Contributor Author

llmII commented Feb 21, 2024

@pepe

It more depends on your purpose. I believe that configuration option is to disable all docstrings in their entirety?

The idea here doesn't leave docstrings disabled, but separates them out from the janet executable (and the loaded source files), eventually.

EDIT:
I'm hoping the way I find to exclude them from RAM utilization does not mean emitting *.janet files that are stripped or similar. If my hunch on how it'd be done in the specials.c#L331, the source would continue to have the docstrings, they'd just never make it to RAM. I need to determine, however, if janet unloads the source buffer after compilation of it to bytecode or some such (which I think it does, and am kind of riding along with that being the assumption for the time being).

@sogaiu
Copy link
Contributor

sogaiu commented Feb 22, 2024

The idea of having offline site docs, searching through tags, and doc strings referencing them is still a fine idea.

Some possibly related tidbits:


Perhaps a bit on the hasty side, but I think if there's a way to integrate some examples (including those that cover PEG usage) that'd be nice too:

@llmII
Copy link
Contributor Author

llmII commented Feb 22, 2024

The idea of having offline site docs, searching through tags, and doc strings referencing them is still a fine idea.

Some possibly related tidbits:

* [Zeal / Dash Docset for Janet #1357](https://github.com/janet-lang/janet/discussions/1357)

The key would be if the document browser worked across all Janet platforms primarily. Figuring out how to output things capable of being dealt with from Janet's doc function and consumed by other software might be a good target to go for. Might would be more some other side-project converting from Janet's doc files to something another program would support, unless this project is okay with blessing another project as the defacto document browsing system outside of REPL or such.

Perhaps a bit on the hasty side, but I think if there's a way to integrate some examples (including those that cover PEG usage) that'd be nice too:

* janet-lang.org's repository has [this directory](https://github.com/janet-lang/janet-lang.org/tree/master/examples)

* janet's repository has [this directory](https://github.com/janet-lang/janet/tree/master/examples)

* spork's repository has [this directory](https://github.com/janet-lang/spork/tree/master/examples)

* janetdocs content can be fetched from [here](https://janetdocs.com/export.json) in JSON format

That sounds like a good idea in the long run, a way to link towards examples such that they could be shown in line (at least, with an extended documentation/help function that isn't tied down to just handling docstrings).

Perhaps the idea would be "look in all jpm directories and if there is an examples folder map them to $project/examples/$path and evaluate links that reference it like such" or something.

@sogaiu
Copy link
Contributor

sogaiu commented Feb 22, 2024

The key would be if the document browser worked across all Janet platforms primarily. Figuring out how to output things capable of being dealt with from Janet's doc function and consumed by other software might be a good target to go for.

I think that makes sense.

@llmII
Copy link
Contributor Author

llmII commented Feb 23, 2024

In case someone wants to track progress, or provide feedback, or provide hints towards doing something better, I've started work on this idea here.

It is absolutely awful code and I do know that. I wanted to just get some files written out and see how well the idea might could work. I've not yet implemented the Janet side, and right now a path is hard coded, as well as the option for this being hard coded. It will need a good deal of effort to get into a working and possibly acceptable state.

I'll also note that creating, at this time 1562 402 files, whilst building with make, does increase build times, though not by much.

Edit: Note... ls -l's total shows wrong amounts.

To detail what the progress is currently:

  • Writes a file per binding that has a docstring. (1)
  • doc can reference and read docs from files.
  • Handles stale docs by updating them automatically when a module is loaded.

Further details:

  1. This at the moment uses a specialized (for lack of a better word) base64 encoder to encode the file names into something filesystems will find acceptable.

Caveats:

I believe the way I'm working on it currently would make for some unfortunate results were someone to get it to do a generation for say... spork. If someone loaded spork with (use spork) it'd probably put the docstrings somewhere other than if they were to (import spork) (such that there would sadly now be two copies of the same docstring in the filesystem under a different name). I'm in need of ideas on how to remedy this. I'm unsure if getting the function handleattr that I've modified to be module aware would make sense, but I'm also unsure of if I should target another function for handling this task either.

The problem is a bit more intricate. Generating filenames based on symbol alone will lead to collisions in the long run. In general, the Janet VM has no idea of modules at all, that's all Janet side. That's actually a good thing. The problem here is that the symbol that the VM sees is whatever it is in that file. For instance, doing (use spork) places an argparse/argparse symbol in the environment but the VM only ever sees argparse which is defined in an argparse.janet file.

I think the solution is to make use of the source map to build a string including the file name, line, column, whatever, and combine it with the symbol name, and then base64 encode that to get the name for the file the document would be written to. DONE

I also have what I think may be an "ok" solution for the problem of stale documentation files. Check if the document exists in the "$JANET_DOC" directory (or whatever env var we need to decide to use) and if not, only ever generate it in a temporary directory (say... "/tmp/janet-docs"). There would be a switch (say -install-docs) for the janet executable to install the documentation and perhaps jpm would know to use that such that the actual directory for documents doesn't grow stale. There might need to be a part of jpm that would also know how to look through all installed things and determine the valid set of documentation files for existence, and remove old ones. This needs further thought because how will jpm know which *.janet files to load to generate documentation anyway?

I think the way jpm builds an executable would keep from having docs generated in files when that executable is called. I know the janet executable built by my branch only emits docs at build so it's not as if it's internally reloading boot.janet when it starts and from there ending up emitting a ton of files. That said, I believe a -production-mode type switch would be useful to allow for when executables are being built to avoid emitting doc files. I also think that the equivalent of that switch needs to be set within the built executable as well to keep it from generating documents perhaps. This is where further thought is definitely needed.

The last issue would be that this has no effect on modules for Janet written in C. This exists as a problem for the janet executable generated by the branch, and for anything installed that is a C module. A separate technique needs to be created for that.

These are mainly ideas of what problems there are, and probable solutions. It's likely that for some of these we'll arrive at a solution different from the one I've thought up in this post so far. This is the post I'll update when I settle on a solution or make progress on achieving the overall encompassing idea.

@sogaiu
Copy link
Contributor

sogaiu commented Feb 24, 2024

Some notes for folks who might want to try things out:

  • I edited janet/src/conf/janetconf.h to have #define JANET_UNBUNDLED_DOCS before running make.

  • I found making the /tmp/docs subdirectory (i.e. docs) before running make was necessary (otherwise fopen seems to fail every time (^^; ).

  • There seem to be issues creating files for the following symbols:

    • fiber?
    • function?
    • table?
    • array?
    • tuple?
    • false?
    • empty?
    • as?->
    • every?
    • ul?
    • ol?
    • nl?

I guess the last point suggests that at least ? needs different handling?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants