Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reporting indexes #84

Closed
bakkot opened this issue Mar 19, 2022 · 21 comments · Fixed by #129
Closed

Reporting indexes #84

bakkot opened this issue Mar 19, 2022 · 21 comments · Fixed by #129
Labels
enhancement New feature or request

Comments

@bakkot
Copy link
Collaborator

bakkot commented Mar 19, 2022

This came up here, but I wanted to split it out to track it explicitly.

I think that reporting the indexes of arguments within args would be a helpful addition, because it lets you build a bunch of more complicated logic on top of parseArgs without needing to do a full re-parse.

For example, this would make it straightforward to:

  • resolve conflicts between --foo and --no-foo as last-wins
  • throw on duplicate flags (check if the flag occurs at multiple indexes)
  • if someday defaults are added, check to see if an option was provided by a user or came from the default (Restructure result conventions (again) #70 (comment))
  • enforce that value-taking arguments are all specified using --foo=bar rather than --foo bar (check each to see if args[index].includes('='))
  • enforce that two flags are specified in a specific order (--enable-experimental-options before --some-unstable-option, e.g.)
@ljharb
Copy link
Member

ljharb commented Mar 19, 2022

enforce that value-taking arguments are all specified using --foo=bar rather than --foo bar (check each to see if args[index].includes('='))

i am very interested in enforcing this!

@shadowspawn
Copy link
Collaborator

Playing with this idea (and #70 (comment)):

  parseDetails: {
    options: {
      foo: { argIndex: 1, argIndices: [1] },
    },
    positionals: [
       { argIndex: 3 }
    ]
    }

@shadowspawn
Copy link
Collaborator

I was thinking about how to implement and use this, and a couple of complications:

  • the returned index is into an array of args that might have been determined by parseArgs internally. The "parseDetails" could include a property for (say) "originalArgs".
  • short option groups mean multiple options could have the same index (which affects two ordering use cases of index: enforcing order of options, and last overlapping option wins)

@bakkot
Copy link
Collaborator Author

bakkot commented Mar 27, 2022

might have been determined by parseArgs internally

It's either the author-provided thing or process.mainArgs, right? As long as it's no more complicated than that, it seems like the script author is unlikely to have trouble figuring out the appropriate array to index. Though I also don't see a problem with returning originalArgs.

short option groups mean multiple options could have the same index (which affects two ordering use cases of index: enforcing order of options and last overlapping option wins)

Hm, yeah, the author would need to write slightly more complicated logic to handle these cases. It's especially annoying with the awful -xyzSTRING style, because then you have to figure out where the "flags" part stops and the "string value" part starts, and can't just use lastIndexOf.

If we want to make this case easy, in the case of short options there could be an additional shortOptionsGroupIndex property or something, which is an index within that string. So { index: 3, shortOptionsGroupIndex: 2 } would correspond to something like the f in --a --b --c -xf.

(I'm not sure if the shortOptionsGroupIndex should count from the - or the character after it, hm. Probably from the -? For the main use cases, of comparing locations, it doesn't end up mattering.)

@shadowspawn shadowspawn added enhancement New feature or request discussion and removed discussion labels Mar 27, 2022
@shadowspawn
Copy link
Collaborator

🔨 Moving towards MVP Milestone 1 (#87)

I propose post-MVP.

@shadowspawn
Copy link
Collaborator

Rather than returning a structure indexed by parsed options, I am thinking about returning an array for each thing the parser discovers. e.g. option, option-terminator, positional. The option elements might have information like whether the option was specified with a short or long name, whether the option values was embedded or in a separate argument, whether the option was auto-discovered or a known option.

@bcoe and @bakkot have previously mentioned AST (Abstract Syntax Tree) as concept. I don't have experience with AST so coming at it from first principles and CLI experience.

@darcyclarke is trying something similar in minargs: darcyclarke/minargs@f7f4bd4

@bakkot
Copy link
Collaborator Author

bakkot commented May 14, 2022

I think it'll be hard to evaluate the difference between that (index -> option) and the originally proposed inverse of that (option -> indices) without writing them out in a little more detail and seeing what different use cases look like with each style.

@shadowspawn
Copy link
Collaborator

What got me thinking in the array direction was disliking needing two indices to tell the order of short options from a short option group. I am imagining a short option group would generate multiple (flat) elements in the array, and not a tree.

I can imagine the current parseArgs results being generated from the AST-ish array with a simple loop calling storeOption on each option, adding each positional to positionals. Eating our own dog food.

But only in my head. Not written any details. :-)

@shadowspawn
Copy link
Collaborator

Playing with a prototype. I'm not trying to make decisions, so feel free to ignore until after the upstream PR lands!

% node index.js -ab --foo=bar -- --pos 
{
  originalArgs: [ '-ab', '--foo=bar', '--', '--pos' ],
  ast: [
    {
      symbol: 'option',
      optionName: 'a',
      short: 'a',
      argIndex: 0,
      value: undefined,
      valueIndex: undefined
    },
    {
      symbol: 'option',
      optionName: 'b',
      short: 'b',
      argIndex: 0,
      value: undefined,
      valueIndex: undefined
    },
    {
      symbol: 'option',
      optionName: 'foo',
      short: null,
      argIndex: 1,
      value: 'bar',
      valueIndex: 1
    },
    { symbol: 'option-terminator', argIndex: 2 },
    { symbol: 'positional', value: '--pos', argIndex: 3 }
  ]
}

@bakkot
Copy link
Collaborator Author

bakkot commented May 23, 2022

Some thoughts on that prototype:

  • When aliases are defined, I think that the optionName should be the long option. That is, if you have foo: { alias: 'f' }, then -f should be represented as { optionName: 'foo' }. This is so a script author can do ast.findLastIndex(x => x.symbol === 'option' && x.optionName === 'foo'), without having to worry about handling aliases explicitly.
  • Why make short be the character itself instead of a boolean? You can look up the character from the config, if you need to.
  • For distinguishing --foo=bar from --foo bar (assuming a type: 'string' option), it may make more sense to have a boolean along the lines of inlineValue, rather than having a valueIndex which is always either the current index or the next index.
  • From your example it's not clear if --foo bar (assuming a type: 'string' option) would end up as two elements, or just one where valueIndex is different from argIndex. I'd hope just a single element.
  • The convention is to name the field which identifies the type of thing as type, not symbol.
  • Maybe name the field elements or parseElements? "AST" is kind of technical term, not something most users would know.

And here's what the use cases in the OP would look like, given those tweaks:

resolve conflicts between --foo and --no-foo as last-wins

let { values, positionals, elements } = parseArgs(config);
let getLastIndex = name => elements.findLastIndex(x => x.type === 'option' && x.optionName === name);
if (values.foo && values['no-foo']) {
  values.foo = getLastIndex('foo') > getLastIndex('no-foo');
}

throw on duplicate flags

let { values, elements } = parseArgs(config);
let dupes = Object.keys(value)
  .map(name => elements.filter(e => e.type === 'option' && e.optionName === name))
  .filter(e => e.length > 1);
if (dupes.length > 0) {
  throw new Error(`duplicate flag${dupes.length > 1 ? 's' : ''} ${dupes.join(', ')}`);
}

enforce that value-taking arguments are all specified using --foo=bar rather than --foo bar

let { values, elements } = parseArgs(config);
for (let element of elements) {
  if (element.type === 'option' && !element.short && element.valueIndex != null && element.valueIndex > element.argIndex) {
    throw new Error(`argument ${element.optionName} must specify its argument as \`--foo=bar\`, not \`--foo bar\``);
  }
}

enforce that two flags are specified in a specific order

let { values, elements } = parseArgs(config);
let getFirstIndex = name => elements.findIndex(x => x.type === 'option' && x.optionName === name);
let getLastIndex = name => elements.findLastIndex(x => x.type === 'option' && x.optionName === name);
if (values.first && value.second && getFirstIndex('first') > getLastIndex('second')) {
  throw new Error(`"first" must be specified before "second"`);
}

These are all ok, I guess?

I notice a lot of type === 'option' filtering, but maybe that's unavoidable. (Actually it's not necessary in these examples, because optionName is only present on type: 'option' elements, but it feels kind of gross not to have that guard.)

Also I'm a bit worried that people are going to conflate "index in this array" with "index in the original array", which works for positional --foo --bar but fails when there's short options or options with values. Again, maybe that's unavoidable, though.

@shadowspawn
Copy link
Collaborator

When aliases are defined, I think that the optionName should be the long option.

Yes. It is the thing to use for lookups into the supplied config and returned values.

Why make short be the character itself instead of a boolean?

I was thinking about supplying enough information to display an error message without looking up option in config. But a bit subtle to use, can look it up as you say, and I think a boolean might be clearer.

For distinguishing --foo=bar from --foo bar (assuming a type: 'string' option), it may make more sense to have a boolean along the lines of inlineValue, rather than having a valueIndex which is always either the current index or the next index.

I actually started with that but tried the index, and it wasn't compelling. I agree, inlineValue would be better.

From your example it's not clear if --foo bar (assuming a type: 'string' option) would end up as two elements, or just one where valueIndex is different from argIndex. I'd hope just a single element.

Yes, single.

The convention is to name the field which identifies the type of thing as type, not symbol.

I avoided "type" in first instance because we use that in input configuration. Would "elementType" be ok? (Per next suggestion.)

Maybe name the field elements or parseElements? "AST" is kind of technical term, not something most users would know.

Yes. (I was over-reaching with AST! 😊 )

@bakkot
Copy link
Collaborator Author

bakkot commented May 23, 2022

I was thinking about supplying enough information to display an error message without looking up option in config

Yeah, I don't think "you don't have to look at the config" should be a goal. It's necessarily on hand, so it's not going to be hard to get to.

I avoided "type" in first instance because we use that in input configuration. Would "elementType" be ok? (Per next suggestion.)

Ehhhhh that's gonna be painful to write out every time. kind is the name I generally use if type doesn't work. This annoys the type theorists, but that's ok.

@shadowspawn
Copy link
Collaborator

Updated prototype per suggestions:

% node index.js -ab --foo=bar -- --pos 
{
  values: [Object: null prototype] { a: true, b: true, foo: 'bar' },
  positionals: [ '--pos' ],
  originalArgs: [ '-ab', '--foo=bar', '--', '--pos' ],
  parseElements: [
    {
      kind: 'option',
      optionName: 'a',
      short: true,
      argIndex: 0,
      value: undefined,
      inlineValue: undefined
    },
    {
      kind: 'option',
      optionName: 'b',
      short: true,
      argIndex: 0,
      value: undefined,
      inlineValue: undefined
    },
    {
      kind: 'option',
      optionName: 'foo',
      short: false,
      argIndex: 1,
      value: 'bar',
      inlineValue: true
    },
    { kind: 'option-terminator', argIndex: 2 },
    { kind: 'positional', value: '--pos', argIndex: 3 }
  ]
}

@bakkot
Copy link
Collaborator Author

bakkot commented May 25, 2022

@shadowspawn Now that valueIndex has become inlineValue, I think it would make sense to replace argIndex with just index. Otherwise, that's looking pretty good, I think.

@bakkot
Copy link
Collaborator Author

bakkot commented May 25, 2022

I'm still a little worried about the potential for people assuming that elements and args are 1-to-1, which is not true in the above design. So here's a possible alternative which occurred to me, where we maintain a closer relationship between the two lists. It seems like it would be a bit harder to use than the design above, so this is mostly presented for the sake of exploring the design space.

Assuming config is

{
  'ex': { type: 'boolean', short: 'x' },
  'why': { type: 'boolean', short: 'x' },
  'see': { type: 'string', short: 'C' },
  'foo': { type: 'string' },
}

then running node index.js -xy -C4 - xC5 -xC 6 --ex --foo=bar --foo bar -- pos would give

elements: [
  { kind: 'short', options: ['ex', 'why'], value: undefined, inlineValue: undefined },
  { kind: 'short', options: ['see'], value: '4', inlineValue: true },
  // in short option groups, `value` if defined is for the last element of `options`:
  { kind: 'short', options: ['ex', 'see'], value: '5', inlineValue: true },
  { kind: 'short', options: ['ex', 'see'], value: '6', inlineValue: false },
  // the value for non-inline values is present both under the `value` key in the option's element and as its own element
  { kind: 'value', value: '6' },
  { kind: 'long', options: ['ex'], value: undefined, inlineValue: undefined },
  { kind: 'long', options: ['foo'], value: 'bar', inlineValue: true },
  { kind: 'long', options: ['foo'], value: 'bar', inlineValue: false },
  { kind: 'value', value: 'bar' },
  { kind: 'option-terminator' },
  { kind: 'positional', value: '--pos' },
]

Note that this preserves elements.length === args.length.

And just for example, here's what "resolve conflicts between --foo and --no-foo as last-wins" looks like with this design:

let { values, positionals, elements } = parseArgs(config);
let optElements = elements.flatMap(e => e.options ?? []);
if (values.foo && values['no-foo']) {
  values.foo = optElements.lastIndexOf('foo') > optElements.lastIndexOf('no-foo');
}

... ok actually that's not as bad as I thought it was going to be. flatMap is great but I guess this would be a lot trickier if you're not comfortable with flatMap.

@shadowspawn
Copy link
Collaborator

shadowspawn commented May 26, 2022

Now that valueIndex has become inlineValue, I think it would make sense to replace argIndex with just index.

I still prefer argIndex for a bit more context about what it is? I know you like short names, but none of your earlier examples actually needed it (after switch to inlineValue) so not heavily used.

I'm still a little worried about the potential for people assuming that elements and args are 1-to-1, which is not true in the above design.

Hmm, I'm not expecting this to be a problem. I'm hopeful either confusion won't occur, or it won't come up in use as working with just elements and not referencing back to args.

So here's a possible alternative which occurred to me, where we maintain a closer relationship between the two lists. It seems like it would be a bit harder to use than the design above, so this is mostly presented for the sake of exploring the design space.

I agree the 1:1 is a bit harder to use. (Thanks for working an example.)

Edit: albeit flatMap made it similarly concise for that example, but I have never used flatMap!

@shadowspawn
Copy link
Collaborator

A minor musing on name of returned originalArgs. I prefer to avoid overlap of property names on input configuration and output result, but args might be an exception since actually same thing. If you supply args with configuration, then you don't need the returned args, so less potential for name clash of variables?

const { values, positionals, elements } = parseArgs({ args, strict });
// or
const { values, positionals, elements, args } = parseArgs({ strict });
// in either case, args is what got parsed

@bakkot
Copy link
Collaborator Author

bakkot commented May 26, 2022

I still prefer argIndex for a bit more context about what it is?

I don't think the extra context is useful, personally. There's only one thing it could really be. And odds are you're going to find out about it by reading existing code or documentation in any case, which should make it very clear.

Still, I don't feel super strongly. It just feels redundant to me.

Hmm, I'm not expecting this to be a problem.

Mostly I mention it because I made exactly that mistake when writing up this snippet despite having already known they were different. It's just very easy to think "i want the index" and reach for .findIndex() or .indexOf() rather than .find().argIndex.

I prefer to avoid overlap of property names on input configuration and output result, but args might be an exception since actually same thing.

Yeah, if we're going to return args at all then it is fine to name it args. (Though I really really do not think is necessary - most functions do not return their input and there is nothing in particular about this function which should make us deviate from that. The only reason to do so is that the default value for args isn't readily available, but that should be fixed by adding process.mainArgs, not by returning the input.)

@shadowspawn
Copy link
Collaborator

shadowspawn commented May 27, 2022

(1:1) Mostly I mention it because I made exactly that mistake

Oh, your experience trumps my speculation! Something to be aware of with naming and documentation.

(args) Though I really really do not think is necessary ... but that should be fixed by adding process.mainArgs

All fair points. I am not aware of any activity on mainArgs. (There was a brief PR, albeit less sophisticated than our placeholder: https://github.com/nodejs/node/pull/29990/files)

@shadowspawn shadowspawn changed the title Reporting indexes (maybe post-MVP) Reporting indexes May 27, 2022
@bakkot
Copy link
Collaborator Author

bakkot commented May 28, 2022

@shadowspawn Having thought about it some more, I think the usability benefits of having at most one option per element probably outweighs the benefits of having the indices line up, so probably we should stick with your original design. People will definitely make the same mistake I did, but I don't see a way to avoid that without taking too large of a hit to usability.

Do you want to make a PR? I can put one together at some point if not.

@shadowspawn
Copy link
Collaborator

I would like to make a PR. I can open a draft now if anyone would like code to run against for experimenting. Functional, but some distance from final form (Refactor, primordials, et al.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants