Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to specify style? #13

Open
mustafa0x opened this issue Nov 3, 2023 · 13 comments
Open

Is it possible to specify style? #13

mustafa0x opened this issue Nov 3, 2023 · 13 comments
Labels
enhancement New feature or request feedback wanted Additional feedback is needed

Comments

@mustafa0x
Copy link

Eg, specify that a field should be a multiline string, or an inline dict.

@cyyynthia
Copy link
Member

It isn't possible to specify anything related to style at this time when stringifying at this time unfortunately. I don't think it'd be possible to use some sort of hacky trick to get it to work, at least without post-processing outside of the lib...

It would be great to have options about the output style, I definitely agree and it'd be a great addition to the lib! I'm just a bit unsure on how styles regarding how to format a given field could be passed. Since smol-toml works with plain js objects it'd have to be passed as an option of the stringify function and I can't think of anything that wouldn't be a bit of a pain to use... If you have any ideas feel free to share!

@mustafa0x
Copy link
Author

@cyyynthia
Copy link
Member

Hm, part of me finds it a bit sad to be re-creating the object and not using "plain objects" as it makes the whole thing much less transparent 🤔

Ideally I wanted smol-toml to have a way to keep the document's format by using extra metadata in a Symbol key; but the problem is that adding new properties makes it difficult to update metadata at the same time... and I'd love to avoid having to resort to special objects like { type: 'multiline-string', value: '...' } that'd break the transparent toml -> js object -> toml conversion chain 😔

The other idea I had was to let stringify accept a schema-like thing that'd define how things are stringified, something like:

stringify(obj, {
  keysStyle: {
    myMultilineString: { multiline: true },
    table: {
      otherMultilineString: { multiline: true, literal: true },
      dotted: { dotted: true },
    },
  },
})

But I have no idea if this is good or bad 🤔

@mustafa0x
Copy link
Author

Yes, agreed, it isn't too clean, and a schema like approach would be pretty nice.

Maybe also it should accept deep key access (eg 'data.name.bio': {multiline: true}).

@cyyynthia cyyynthia added the enhancement New feature or request label Nov 4, 2023
@cyyynthia cyyynthia added the feedback wanted Additional feedback is needed label Apr 25, 2024
@uncenter
Copy link

I'd go with the simpler the better - a single option, prefer-multiline-table (or similar), on a configuration object passed to stringify.

@innermatrix
Copy link

innermatrix commented Sep 13, 2024

@cyyynthia I am running into this myself and considering contributing code to implement a solution, but I see that no clear direction has been chosen. I understand that you want to use a plain object, rather than wrapping, but I am not sure what you meant by

the problem is that adding new properties makes it difficult to update metadata

when talking about the option of adding a Symbol prop. Can you clarify that?

That said: you only have three options here, really:

  • TOML formatting options are stored inline (inside the object, such as with Symbols or object wrappers)
  • TOML formatting options are stored out-of-line (passed in as a separate config object, such as a flat config, a flat config with dot-separated keys, or a config as a deep object)
  • TOML formatting options are supplied by a callback

My personal preference is for inline with Symbols, out-of-line with deep object config, or a callback, because those choices offer maximum flexibility (for example, all of them can assign different format settings to different elements inside an array, which would be a pain to do with either flat config or dot-separated keys).

For clarity, when I say out-of-line with deep object config, I mean something like

stringify(
  {a: 1, b: {c: 3}},
  {a: {format: {multiline: true}, b: {format: {multiline: true}}}
)

where format is a Symbol.

@cyyynthia
Copy link
Member

cyyynthia commented Sep 14, 2024

The problem is that it works for objects (with the risk that metadata gets wiped during assignation), but for things like strings or numerals it's not that simple.

Given the following TOML document:

a = { a = 1, b = 2 }
b = [ { a = 1 }, { a = 2 } ]
c = """
I am a multiline string
"""

num = 1e13
import TOML from 'smol-toml'

const obj = TOML.parse(doc)

obj.a = { ... } // This would create a full table instead of the inline one. Desired?
obj.a = { ..., [TOML.metadata]: { inline: true } } // Can get annoying, may be annoying to type
obj.a = TOML.Types.InlineTable({ ... }) // Long identifiers, may poorly tree-shake, does not handle non-object things

obj.b = [ {}, {} ] // Does it create a full array of tables? Can't specify TOML.metadata directly!

obj.c = 'I don\'t want to be multiline anymore' // Changing type requires updating the parent data :/
obj[TOML.metadata].c ??= {} // That's super annoying, but required for type safety
obj[TOML.metadata].c.multiline = false // Type checking issues again maybe? Not really convenient
obj[TOML.metadata].c = { multiline: false } // May erase some metadata, e.g. comment information

obj.num = 1000 // How to tell it to use underscores if I want to? How to tell it to use `1e3` if I want to?

The problem is that it can get quite repetitive too, if I take the example of Cargo.toml files, we don't want to specify for all dependencies that they should be inline tables, but most likely say it as a global rule.

Maybe a concept of global and scoped basic rules could do the trick when it comes to formatting:

stringify(doc, {
  arrayTrailingComma: 'multiline', // true or 'singleline' or 'multiline'
  arrayMultiline: 3, // < 3 items: single-line ; 3+ items: multi-line
  scoped: [
    {
      targets: [ 'dependencies', 'dev-dependencies', 'build-dependencies' ],
      inlineTables: true, // true or 'array' or 'value'
      arrayMultiline: false,
    },
  ]
})

The con is that this is quite verbose, but probably good enough for most use-cases 🤔 - Hopefully it's simple enough that it's possible to keep the stringify function fast

@innermatrix
Copy link

All great points! Thank you!

Here's my proposal based on what I am hearing from you and my own thoughts:

  1. Define a low-level formatting API. Its purpose would be to present a generic way to format a JSON data structure in TOML, but not to define a convenient and concise way to supply formatting instructions. It would look something like (Note: this is definitely just a sketch and doesn't cover several important areas.)
// This part is trivial
function* tomlFormatStringValue(val: string, opts: StringFormatOptions) {
  // Format string based on opts
  yield formatted;
}
function* tomlFormatNumberValue(val: number, opts: NumberFormatOptions) { 
  // Format number based on opts
  yield formatted;
}
// Etc for other atoms

// This part is more interesting
interface ArrayFormatOptions {
  singleLine: bool
  childOpts: (elt, idx, arr) => AnyFormatOptions;
}


function* tomlFormatArrayValue<ValueT>(val: ValueT[], opts: ArrayFormatOptions) {
  if (opts.singleLine) {
    yield '[ ';
    val.foreach((elt, idx, arr) => {
      yield* tomlFormat(elt, opts.childOpts(elt, idx, arr));
      if (idx < arr.length - 1) {
        yield ', ';
    });
    yield(' ]');
  } else {
    // …
  }
}

interface TableFormatOptions {
  childOpts: (key, value, table) => AnyFormatOptions;
}

function* tomlFormatTable<TableT extends Record<string, any>>(table: TableT, opts: TableFormatOptions) {
  // I am definitely skipping a lot of nuance about table formatting here
  for (const [k, v] of Object.entries(table)) {
    if (!opts.inline && !isLeafTable) {
      yield* `[${tomlFormatKey(k)}]`;
      yield tomlFormatTable(v, opts.childOpts(k, v, table));
  } else {
    …
  }
}

// And finally
function* tomlFormatAnyValue(val: any, opts: AnyFormatOptions) {
  yield* is.array(val) ? tomlFormatArray(val, opts) : …;
}
  1. Define a high-level API that accepts whatever convenient/concise format specifications you like. For example:
interface TOMLFormatOptions {
  arrayMultiline: bool
}

function makeValueOpts(val: unknown, opts: TOMLFormatOptions) {
  function arrayOpts(val) {
    return {
      multiline: ops.arrayMultiline,
      childOpts(val, _idx, _arr) { return makeValueOpts(val, opts) },
  };

  function tableOpts() {
    return {
      childOpts(_key, val, _table) { return makeValueOpts(val, opts) },
    }
  }

  return is.array(val) ? arrayOpts(val) : …
}

function* tomlFormat(val: any, opts: TOMLFormatOptions) {
  yield* tomlFormatAnyValue(val, makeValueOpts());
}

So now if you want to have other ways of specifying formatting options, all you need to do is implement a different version of makeValueOpts that (lazily) converts whatever formatting specification you like into options expected by the low-level API.

This leaves you flexibility to implement additional formatting specifications later if you find new use cases / change your mind, and it allows clients of the API to call the low-level API directly if they need to express something that your high-level API left out in order to attain simplicity.

@cyyynthia
Copy link
Member

Generators are painfully slow, so they're a no-no for the final implementation. I'm not sure a function to get the formatting to use is really necessary, I quite like my declarative approach. 🤔

Also, formatting options aren't sadly this separate and there are global formatting options that apply to all structs, e.g. indentation. It's purely stylistic but quite common in some places like Minecraft modding; some find it easier to read as it's a bit more yaml-ish. So it's likely all stringify routines will receive the full object and pull what they're interested in directly.

The defaults I've picked below differ from the current style. They were picked to produce good-looking, readable documents OOTB - which aligns with the goal of TOML imho.

type FormattingOptions = {
  indent?: number | string | undefined // number = n spaces; default = '\t'
  indentTables?: boolean | undefined // default: false

  ignoreNull?: boolean | undefined // default: false (aka throws)
  arrayIgnoreNull?: boolean | undefined // default: false (aka throws)
  arrayIgnoreUndefined?: boolean | undefined // default: false (aka throws)

  // Whether to generate [x] [x.y] or just [x.y]
  ignoreEmptyParentTables?: boolean | undefined // default: true
  subTableDefinition?: 'full' | 'inline' | 'inline-nested' | 'dotted' | undefined // default: full -- most likely useful in scoped styles
  inlineTableBracketSpacing?: boolean | undefined // default: true

  arrayOfInlineTables?: boolean | undefined // default: false
  multilineArrays?: boolean | number | undefined // default: 3
  arrayTrailingComma?: boolean | 'multiline' | 'singleline' | undefined // default: 'multiline'
  arrayBracketSpacing?: boolean | undefined // default: true

  // With this option, only bigint will serialize as integer. Useful when int vs float semantic matter.
  numbersAlwaysFloat?: boolean | undefined // default: false -- integer when no decimal part, float otherwise.
  numbersGrouping?: boolean | 'indian' | number | undefined // default: false, true = 3 (western)
  numbersGroupingThreshold?: number | undefined // numbers < won't have separators. default 0 (always group if enabled)
  numbersFormat?: 'dec' | 'hex' | 'bin' | 'oct' | undefined // default: 'dec' -- most likely useful in scoped styles

  literalStrings?: boolean | undefined // default: false -- most likely useful in scoped styles?
  multilineStrings?: boolean | 'multiline' | undefined // multiline = iff the string has newlines. default = false
  longStringsAsMultiline?: boolean | number | undefined // number = max length before wrap. true = 120; default: false

  datetimeSpaceSeparator?: boolean | undefined // default: true; false = "T"
  datetimeZone?: 'keep' | 'utc' | 'host' // default: 'keep'
}

type ScopedFormattingOptions =
  | ({ target: string } & FormattingOptions)
  | ({ targets: string[] } & FormattingOptions)

type DocumentFormattingOptions = FormattingOptions & {
  scoped?: Array<ScopedFormattingOptions> | undefined
  comments?: never // reserved for future use
}

I think this covers most formatting needs; some aren't expressed though (e.g. all variants of float e.g. 1e10) and I don't know if that's a big deal or not... It may be annoying when using the lib to programatically edit a TOML document. While the lib will never preserve the previous formatting, expanding 1e19 in 10000000000000000000.0 isn't quite pretty. Since the value is printed using native toString, the expansion will apply to all values >= 10e+20, < 1e-6, and all numbers outside this range will be serialized using true scientific notation with explicit exponent sign (which again might be unwanted; 0.195e23 -> 1.95e+22; 11.95e22 -> 1.195e+23).

@innermatrix
Copy link

Nice! What are the semantics of the target string in your scoped options? Specifically, how would I use them to target only the inner a or only the outer a in the following two scenarios:

{a: {b: { a: { … }}}}
{a: {b: [ { a: { … }}]}}

@cyyynthia
Copy link
Member

I was thinking about using TOML key semantics. a.b.a and a respectively.

Note that affecting only the top level a isn't something I necessarily had in mind; to me applying a style to a table means it'd also apply to all its descendants unless specified otherwise.

@maliknajjar
Copy link

why dont you use jsonPath?
since css selectors are used to select elements and apply rules on them in html (which is xml)
the equivelant of that for json (and js objects) is jsonPath
and have maybe something like this

stringify(obj, {
  "$.phoneNumbers[*].type": {
    multiline: true
  },
})

@cyyynthia
Copy link
Member

smol-toml is already capable of parsing toml keys, that's not the case for jsonpath. This'd mean adding a lot of code for parsing these when it's unlikely to ever be needed (or would be too advanced to be relevant for this lib).

While the features planned here are becoming quite a lot more than basic stringification (which is what the library supports today), I don't want for the lib to become a lot larger for the sake of supporting every single case; I am focused on keeping the library smol, and fast.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feedback wanted Additional feedback is needed
Projects
None yet
Development

No branches or pull requests

5 participants