Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide access as a library? #3

Open
CAD97 opened this issue Jun 2, 2018 · 4 comments
Open

Provide access as a library? #3

CAD97 opened this issue Jun 2, 2018 · 4 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@CAD97
Copy link
Contributor

CAD97 commented Jun 2, 2018

I've just done work to add Unicode Property matchers to The Elegant Parser pest. This uses ucd-generate-generated tables and works great!

It would be very nice, however, if I could write a reusable cargo script to regenerate the tables. This would be enabled by being able to use ucd-generate as a library.

Would this be a use case you'd like to support? (Long-term, using UNIC as the tables would probably be better, but I've made sure it's encapsulated and at least until I get around to finishing the first pass of table optimization I feel more confident using ucd-trie than UNIC.)

@BurntSushi
Copy link
Owner

BurntSushi commented Jun 2, 2018

Philosophically speaking, this sounds great.

Maintenance wise... I'm not sure. I guess it depends on what the nature of the library is. For example, I can think of two different paths that have vastly different maintenance implications:

  1. Build out a nice library API turns the CLI interface into a convenient assortment of types and methods.
  2. Create a nominal library that has ~one function which takes an argv list and executes the command.

My suspicion is that I don't have the bandwidth for (1). It's too much work for a use case that I don't personally use.

I'd probably be OK with (2) though, since there's basically no API design involved, and the extent of the refactoring should just be shuffling some code around.

if I could write a reusable cargo script to regenerate the tables

Could you say more about this? What's stopping you from doing it today? That is, why can't this script invoke ucd-generate as a command? For my use cases, I just run the ucd-generate commands directly, which are included in the generated files, so they should be easy to come up with. If I wanted more sophistication beyond that, I'd probably just write a small shell script. So I guess I'd like to hear more about what specifically you're trying to do.

@CAD97
Copy link
Contributor Author

CAD97 commented Jun 2, 2018

Mainly that it's a lot easier to declare dependency on a library than on a binary. It's a lot easier to say "do cargo run --package regenerate_unicode_tables or cargo script regenerate_unicode_tables.rs (disclaimer that I haven't actually used cargo-script yet) than to document somewhere the multiple commands required to do so (download + unzip UCD, run ucd-generate multiple times).

That, and I'm on Windows so I have a natural aversion to "just write a shell script". cargo-script is awesome because we get cross platform scripts without requiring another language installation.

Of course, cargo-make does actually support steps that depend on a cargo-installed binary, but that's a lot heavy-handed for a project not using it.


Disclaimer: I haven't looked at how you've structured the code, but:

I was thinking that the simple way to structure the library would to use structopt options structures. The library exports a function for each subcommand that takes a single argument of a structopt struct. Then the binary just makes the enum structopt and parses it from argv whereas library users can just create the arguments manually into the struct.

@BurntSushi
Copy link
Owner

All righty, understood.

I was thinking that the simple way to structure the library would to use structopt options structures. The library exports a function for each subcommand that takes a single argument of a structopt struct. Then the binary just makes the enum structopt and parses it from argv whereas library users can just create the arguments manually into the struct.

This seems like a middleground between 1 and 2 in my previous comment, but still a bit too close to 1 for my taste. I don't really want to maintain a full library API. Can you make do with a library that exports a function that accepts the argv?

Two other points:

  1. I would love to have a way to express binary dependencies via Cargo. I think if we had that, then that would resolve your use case. IIRC, I've heard others express a similar desire.
  2. I think structopt is probably a red herring. I don't see it as something that really impacts my maintenance burden.

@BurntSushi BurntSushi added enhancement New feature or request question Further information is requested labels Jul 21, 2019
@thomcc
Copy link
Contributor

thomcc commented May 13, 2020

I think structopt is probably a red herring. I don't see it as something that really impacts my maintenance burden.

One benefit of it is that it supports stuff like #[structopt(flatten)] which can help handle groups of options in a more structured way -- I often have trouble figuring out all the places I need to add / not add a new flag.

I don't really love the current setup for CLI args here. In practice, I usually get it wrong a few times. But I don't really know if I think structopt is actually a solution there (I do suspect it could be be an improvement... But really, only a slight one).

I think the problem is more likely more that the backend of ucd-generate is very ad-hoc and organically grown. A more principled approach to the things we generate would make it more clear how to handle the different ways to configure it. I don't know what this looks like, though.


On the subject of libraries... There are a few parts of ucd-generate's internal code that would be nice if it were exposed. Not as a ucd-generate library itself, but maybe in ucd-parse or ucd-util, depending. Specifically:

  • A lot of the code in util.rs, even if it's not exactly hard to write.
  • Parts of general_category.rs such as parsing and building the category sets...
  • Parts of property_bool that produce the property sets...
  • ...

... Really, any of the code which can be described as parsing or performing standard processing of the data read from the UCD directory. There's a bunch of this, and it would be nice if it were in ucd-util or ucd-parse.

Then, if people needed to generate their own tables, the fact that ucd-generate isn't a library would probably be less painful.

Maybe this comment is off topic for this issue, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants