Skip to content

Added -Xi to include new fields in JSON output#7838

Merged
wilzbach merged 1 commit intodlang:masterfrom
marler8997:jsonQuery
Feb 15, 2018
Merged

Added -Xi to include new fields in JSON output#7838
wilzbach merged 1 commit intodlang:masterfrom
marler8997:jsonQuery

Conversation

@marler8997
Copy link
Contributor

@marler8997 marler8997 commented Feb 4, 2018

Adds a new option (hidden for now) that allows an application to select what it wants in the JSON output, i.e.

dmd -Xi=compilerInfo -Xi=semantics

This would produce a JSON file looking like this:

{
    "compilerInfo" : {
        ...
    },
    "semantics": {
        ...
    }
}

The option is hidden because the only users of the option (currently known) will be rdmd and dub.

NOTE: the -probe equivalent would be:

dmd -Xf=- -Xi=compilerInfo

** json output now works even with no input files

@dlang-bot
Copy link
Contributor

dlang-bot commented Feb 4, 2018

Thanks for your pull request and interest in making D better, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Auto-close Bugzilla Severity Description
18367 normal dmd should not segfault on -X with libraries, but no source files

@marler8997
Copy link
Contributor Author

NOTE: I also plan on adding the extra fields that were added in (#7521), however, I thought that should be it's own set of changes.

Hopefully we can get this one merged quickly so I can add the new fields from -probe with a good amount of time before the next release.

Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. That looks a lot better than your previous approach.
I'm fully in favor of this PR and hope we can get it as soon as possible ;-)

src/dmd/json.d Outdated
}

/**
Returns true if `c` is a valid character for the start of a name..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double .

src/dmd/json.d Outdated
json.arrayEnd();
json.removeComma();

if (global.params.jsonQuery is null)
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • could something like this work? it would be cleaner (less special cases)
if global.params.jsonQuery is null
  global.params.jsonQuery = "modules"; // or whatever we want default query to be; maybe put all the queries here

# then the rest can assume we always have `jsonQuery`
  • can the default (with no jsonQuery) output the array of modules as nested under a root json object instead (as done with jsonQuery) instead of at top-level, or would that be a breaking change we want to avoid? It kind of sucks to have 2 different formats (one with top-level array and one where it's nested)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its definitely better to not break backwards compatibility. If at some point in the future someone determines it's OK in this case, then it's simple to remove the special case.

src/dmd/json.d Outdated

json.objectStart();
auto parser = QueryParser(global.params.jsonQuery);
for (;;)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while(true) seems more idiomatic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: for (;;) is the "idiomatic" way of Phobos. We should probably put such trivias in the style guide, s.t. they don't constantly pop up...

src/dmd/json.d Outdated
{
//json.generateError(format("invalid query: expected name but got '%s' (0x%02x)",
// *parser.next, cast(ubyte)*parser.next));
json.generateError("invalid query: expected name or EOF but got something else".ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

most useful diagnostic would be a to show byte offset from start of string where parser got confused

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(also this is uncovered)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workin on this one

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I ended up using vsnprintf, also added tests.

if (global.params.jsonQuery)
{
generateJson(null);
return EXIT_SUCCESS;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice. I didn't see this initially.

This is more or less what I am trying to achieve in #7840

{
// Write to stdout; assume it succeeds
size_t n = fwrite(buf.data, 1, buf.offset, stdout);
assert(n == buf.offset); // keep gcc happy about return values
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a test.
Example:

#!/usr/bin/env bash

set -ueo pipefail

name="$(basename "$0" .sh)"
dir="${RESULTS_DIR}/compilable/"
out="$dir/${name}.json.out"

"$DMD" -X | ${RESULTS_DIR}/sanitize_json > "$out"
diff "$out" compilable/extra-files/$name.json
rm "$out"

"$DMD" -Xf=- | ${RESULTS_DIR}/sanitize_json > "$out"
diff "$out" compilable/extra-files/$name.json
rm "$out"

(sanitize_json currently doesn't accept stdin)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diff can take -: https://stackoverflow.com/a/9847527/1426932

"$DMD" -X | ${RESULTS_DIR}/sanitize_json  | diff - compilable/extra-files/$name.json
``

Copy link
Contributor Author

@marler8997 marler8997 Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code, I think any changes to it would be a seperate PR.

ditto

break;
case 'q':
if (!p[3])
goto Lnoarg;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Write buf to file
const(char)* name = global.params.jsonfilename;
if (name && name[0] == '-' && name[1] == 0)
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • use strcmp?
  • how about factoring this use case (probably useful for other flags; also more self-documenting)
enum stdout_alias="-;
if (name && !strcmp(name, stdout_alias)) {...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code, I think any changes to it would be a seperate PR.

src/dmd/json.d Outdated
// of modules representing their syntax.

json.arrayStart();
for (size_t i = 0; i < modules.dim; i++)
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foreach (+ everywhere relevant)

src/dmd/json.d Outdated
auto fieldID = tryParseJsonField(fieldName);
if (!fieldID.hasValue)
{
json.generateError(("invalid query: unknown field name '" ~ fieldName ~ "'\0").ptr);
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aren't backticks "`" used instead of "'" in messages ? (for syntax highlight)

src/dmd/json.d Outdated
auto fieldID = tryParseJsonField(fieldName);
if (!fieldID.hasValue)
{
json.generateError(("invalid query: unknown field name '" ~ fieldName ~ "'\0").ptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.ptr is not needed; D string litterals are null terminated and can decay to const(char)*
+everywhere relevant

Copy link
Contributor Author

@marler8997 marler8997 Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't appear to work in this case, says it can't convert char[] to char*.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like it did :) (not seeing .ptr after string literals in your latest commit)
if not, i'm curious in what case it didn't work? i thought string literals would convert?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest doesn't using ~ to append any strings. Looks like appending strings breaks implicit conversion from string to const(char)*.

src/dmd/json.d Outdated
JsonField value;
}
OptionalJsonField tryParseJsonField(const(char)[] fieldName)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY:

foreach(a; __traits(allMembers, JsonField))
  if(fieldName == a) return OptionalJsonField(true, mixin(`JsonField.`~a));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I was thinking I needed to use phobos EnumMembers...I forgot I could do this with __traits.

else
{
// Generate json file name from first obj name
const(char)* n = global.params.objfiles[0];
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for global.params.objfiles.length > 0 (cf to avoid thinkgs like https://github.com/dlang/dmd/pull/7839/files)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is existing code, I think any changes to it would be a seperate PR.

ditto

}
else
{
/* The filename generation code here should be harmonized with Module::setOutfile()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup!

"cwd": "VALUE_REMOVED_FOR_TEST",
"importPaths": [
"compilable",
"..\/..\/druntime\/import",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about making sanitize replace \/ with / ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the JSON generated by std.json (not dmd). Dmd doesn't generate these weird escaped slashes. To fix this we would need to post-process the JSON after it is sanitized, basically another JSON parser that sanitizes after std.json...lol :)

Copy link
Contributor

@timotheecour timotheecour Feb 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol indeed... std.json really sux; i guess a JsonPolicy to not escape / would work here (yes, unrelated to this PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, there already is an option for this! https://dlang.org/phobos/std_json.html#.JSONOptions.doNotEscapeSlashes

I'll updated the PR to use it :)

foreach (ref obj; root.array)
{
auto kind = obj.object["kind"].str;
if (kind == "compilerInfo")
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY + type safety:
s/"compilerInfo"/JsonField.compilerInfo.stringof/
+everywhere relevant

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test suite, not dmd. It doesn't have access to the JsonField type defined in the compiler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import dmd.json : JsonField; ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would add a new dependency, making the compiler a dependency on the sanitizer. The sanitizer is akin to an external tool like rdmd or dub. Like the sanitizer, they are reading the JSON file using a pre-defined interface, a public interface that needs to be well-defined. These field names are apart of that interface, and a tool shouldn't need to import the compiler source code to use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the testsuite shouldn't (directly) depend on what it is testing.

src/dmd/mars.d Outdated
case 'q':
if (!p[3])
goto Lnoarg;
params.jsonQuery = p + 3 + (p[3] == '=');
Copy link
Contributor

@timotheecour timotheecour Feb 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check p[3] == '=' and params.jsonQuery = p + 4
We really should not allow things like -Xqfoo in new flags; this is old baggage from dmd cmd line (eg -offile instead of (-of=file), the downsides have these have been discussed several times: it's not standard, hard to read visually, and disallows future new flags (eg -Xquery=baz)

Also, -Xf- (at mentioned in toplevel msg) reads ambiguously: -o- means no output, but -Xf- as you wrote means output to stdout; with what I suggest there is no ambiguity: -Xf=- means: stdout

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I agree with this, I'll remove support for omitting =.

compilerInfo = 0x01,
buildInfo = 0x02,
modules = 0x04,
semantics = 0x08,
Copy link
Contributor

@wilzbach wilzbach Feb 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically binary shifts are used, s.t. it's easier for reviewers when this gets bumped.

src/dmd/json.d Outdated
string prefix = "";
foreach (enumName; __traits(allMembers, JsonFieldFlags))
{
if (enumName != "none")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

foreach (idx, enumName; __traits(allMembers, JsonFieldFlags))
{
    if (idx > 0)
    {
        s ~= ", " ~ enumName;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, that produces a different string ", compilerInfo, ..." instead of "compilerInfo, ..."

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah it's too late here. I meant:

if (idx > 0)
    s ~= ", ";
s ~= enumName;

(the main point was about not hard-coding that the first enumName is named "none")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh...but now you're hardcoding that the first member should be ignored...is that better than hardcoding that the value named "none" should be ignored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh...I suppose I don't have a preference for either...so we'll go with your preference :)

src/dmd/json.d Outdated
auto fieldNameString = fieldName[0 .. strlen(fieldName)];
foreach (enumName; __traits(allMembers, JsonFieldFlags))
{
if (enumName != "none" && fieldNameString == enumName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could move if (enumName != "none") out of the loop.
That saves a few duplicated comparisons.

{
sanitizeSemantics(semantics.object);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DRY idea:

static foreach (e; [tuple("compilerInfo", "sanitizeCompilerInfo")
                    tuple("buildInfo", "sanitizeBuildInfo")])])
{{
    auto infonfo = rootObject.get(e[0], JSONValue.init);
    if (compilerInfo.type != JSON_TYPE.NULL)
        mixin(e[1] ~ "(info.object);");
}}

It's okay to use newer compiler features here as DMD + DRuntime + Phobos are just freshly built from master.

@marler8997 marler8997 force-pushed the jsonQuery branch 6 times, most recently from 6fa64f8 to aec1bb7 Compare February 13, 2018 06:19
if (s.length == 0 || (s[0] < 'a' || s[0] > 'z'))
return s;
return (cast(char)(s[0] - ('a' - 'A'))) ~ s[1 .. $];
}
Copy link
Contributor

@wilzbach wilzbach Feb 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like that one won't work, it converts the rest to lower-case...which is not what is wanted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's unexpected. I thought there's a function for this in Phobos :/
Well, you could do this:

import std. unit : asCapitalized;
return s.take(1).asCapitalized.chain(s.drop(1));

It would be more idiomatic, but put more stress on the CTFE (though it's a small standalone script for which this is probably not a concern);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh...I can't get it to work. I added .array to the end but it ends up returning dchar[]...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting it worked for me (https://run.dlang.io/is/xP19Bv),

the end but it ends up returning dchar[]

You can use .byCodeUnit to opt-out of auto-decoding, but to!string or text from std.conv are both smart enough (e.g. https://run.dlang.io/is/zv8RlX)

For pure ASCII, there's also std.ascii.toLower:

s.enumerate.map!(a => a.index == 0 ? a.value.toUpper : a.value).text;

https://run.dlang.io/is/rHnADh

But this is definitely nit-picking here ;-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like std.conv.text did the trick. My std.range / std.algorithm 'fu could use work.

Copy link
Contributor

@wilzbach wilzbach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still like it 👍

@JinShil
Copy link
Contributor

JinShil commented Feb 14, 2018

This PR could use a statement regarding motivation (at least if you need my review). I don't have an intimate understanding of the problem this solves or the use case. Also, why do we need multiple options? Is there a problem with just generating everything?

Also, I don't see why it shouldn't be documented. What's the motivation for keeping it hidden? Are we not sure if it's a good idea?

The code is not 100% covered by tests. Can the lack of coverage be justified?

Overall, though, I don't have any issue with this PR, and I'm not objecting to it; I'm actually trying to move it forward by adding my support. But I need some understanding (answers to the questions above) before I can support it.

@marler8997
Copy link
Contributor Author

marler8997 commented Feb 14, 2018

This PR could use a statement regarding motivation (at least if you need my review)

This PR is a result of a suggestion from @WalterBright . We needed a way of capturing information from the compiler without redirecting stdout/stderr. I suggested providing a way to redirect verbose output to a file but Walter suggested we add the information to the JSON file, so this PR is the result.

Also, why do we need multiple options? Is there a problem with just generating everything?

Generating everything is fine, but the output is meant for tools that can process the JSON output. So it makes sense to allow the tool to specify what it wants. This also has the advantage that the tool won't need to pay the overhead to generate everything if they don't need it. For example, dub only needs to generate compilerInfo and generating it quickly will likely be a requirement. Also it provides a mechanism to allow a tool to run the compiler with no source files, something DUB will do to query the compiler information. It also maintains backwards compatibility. Existing use cases where JSON is generated is unnafected, it generates the original format, but if a tool uses -Xi then it will generate the new format with the requested fields. And it will remain forward compatible because any new fields that might be added later will not be included since they won't be requested via -Xi.

Also, I don't see why it shouldn't be documented. What's the motivation for keeping it hidden? Are we not sure if it's a good idea?

The only reason for not documenting it is no one has presented a good use case for this feature to be used out in the wild. The only current use cases are dub and rdmd, both of which are internal tools. If it deemed to be useful outside of internal use, then documenting it makes sense. I'm basically taking the conservative approach where I'm not including any changes that can always be decided later.

The code is not 100% covered by tests. Can the lack of coverage be justified?

I haven't checked the code coverage. Depends on which code paths aren't taken, I'll see if I can take a look.

@JinShil
Copy link
Contributor

JinShil commented Feb 14, 2018

Thanks @marler8997

Who are the primary consumers of this JSON output and for what purpose? You mentioned dub and rdmd, so what do they need with this information?

What about feature overlap? Isn't this information something 3rd party tools can obtain for themselves using DMD-as-a-library?

@marler8997
Copy link
Contributor Author

marler8997 commented Feb 14, 2018

rdmd will use it to output all the files used during a compilation. It uses this for dependency analysis. It will also use compilerInfo to know what features a particular compiler binary supports. Same with dub.

Dmd as a library really isn't applicable for these use cases. The JSON output is used to get information about a particular compiler binary or a particular build, whereas dmd as a library enables an application to have access to compiler functionality.

src/dmd/mars.d Outdated
{
if (global.params.objfiles.dim == 0)
{
error(Loc.initial, "cannot determine JSON filename, use -Xf=<file> or provide a source file");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tick marks: e.g. `-Xf=`

REQUIRED_ARGS: -Xi=UNKNOWN_FIELD_NAME
TEST_OUTPUT:
---
Error: unknown JSON field `-Xi=UNKNOWN_FIELD_NAME`, expected one of compilerInfo, buildInfo, modules, semantics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please modify the code to surround each expected option in tick marks, e.g. `compilerInfo`, `buildInfo`, `modules`, `semantics`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, thanks for the suggestion

@JinShil JinShil removed the Merge:72h no objection -> merge The PR will be merged if there are no objections raised. label Feb 14, 2018
@wilzbach
Copy link
Contributor

This also has the advantage that the tool won't need to pay the overhead to generate everything if they don't need it. For example, dub only needs to generate compilerInfo and generating it quickly will likely be a requirement.

Yes, at the moment Dub creates and compiles a probing file and invocates DMD on it to parse the pragma(msg) output every time before it builds something (see #7521).

For example,

There are many other potential use cases - run.dlang.io is another or basically any build tool that needs more information about the currently installed D compiler.

Also, I don't see why it shouldn't be documented. What's the motivation for keeping it hidden? Are we not sure if it's a good idea?

Yep this was done, s.t. we have one or two releases to experiment with until we set it into stone.

@JinShil
Copy link
Contributor

JinShil commented Feb 14, 2018

Thank you @marler8997 and @wilzbach. I see where this is going now. I'm just waiting on coverage.

@marler8997 marler8997 force-pushed the jsonQuery branch 6 times, most recently from 161662f to 7c09995 Compare February 14, 2018 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants