Skip to content

Add customizable diagnostic reporter for the lexer and parser#9350

Merged
dlang-bot merged 1 commit intodlang:masterfrom
jacob-carlborg:diagnostic-reporter
Feb 13, 2019
Merged

Add customizable diagnostic reporter for the lexer and parser#9350
dlang-bot merged 1 commit intodlang:masterfrom
jacob-carlborg:diagnostic-reporter

Conversation

@jacob-carlborg
Copy link
Contributor

This replace the existing ErrorHandler with the new DiagnosticReporter for the lexer and parser. This will use the diagnostic reporter not just for reporting errors but for all types of diagnostic reporting. Since the diagnostic reporter is passed to the lexer and parser it is possible to customize what happens when a diagnostic is reported. This is useful for testing and when using the compiler as a library.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @jacob-carlborg! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + dmd#9350"

@jacob-carlborg
Copy link
Contributor Author

jacob-carlborg commented Feb 10, 2019

Depends on: #8528. #8528 has been merged.

@jacob-carlborg jacob-carlborg added the Review:WIP Work In Progress - not ready for review or pulling label Feb 10, 2019
@jacob-carlborg jacob-carlborg force-pushed the diagnostic-reporter branch 3 times, most recently from 8f57973 to 5706b55 Compare February 11, 2019 12:20
@jacob-carlborg jacob-carlborg removed the Review:WIP Work In Progress - not ready for review or pulling label Feb 11, 2019
Copy link
Contributor

@thewilsonator thewilsonator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM. Theres something wrong with the docs somewhere though.

@jacob-carlborg jacob-carlborg force-pushed the diagnostic-reporter branch 3 times, most recently from c1a7d83 to 6263228 Compare February 13, 2019 10:53
@jacob-carlborg jacob-carlborg added the Review:Blocking Other Work review and pulling should be a priority label Feb 13, 2019
@jacob-carlborg
Copy link
Contributor Author

Blocks #9359.

This replace the existing `ErrorHandler` with the new
`DiagnosticReporter` for the lexer and parser. This will use the
diagnostic reporter not just for reporting errors but for all types of
diagnostic reporting. Since the diagnostic reporter is passed to the
lexer and parser it is possible to customize what happens when a
diagnostic is reported. This is useful for testing and when using the
compiler as a library.
@dlang-bot dlang-bot merged commit c58cd93 into dlang:master Feb 13, 2019
@jacob-carlborg
Copy link
Contributor Author

I don't understand the point of this. The lexer already had an error reporting mechanism. Any "compiler as library" would use that. It's a great deal of code added that appears to do nothing.

It allows to configure where the output goes and how it's formatted and even if an error/warning/deprecation should be reported at all.

@WalterBright
Copy link
Member

Why does it need to be configured?

@jacob-carlborg
Copy link
Contributor Author

To be able to write the diagnostic message to other places than stderr, like a text area or store in a variable. Or not print at all. It's used here #9359. A passing test shouldn't have any output.

@PetarKirov
Copy link
Member

So the internals of the compiler can be unit tested. For example, unit test may parse a snippet of the D code, assert that the resulting AST is the expected one and also assert that emitted diagnostic messages are as expected. You can't do this with the old end to end compiler testing.

@PetarKirov
Copy link
Member

The old error reporting is suitable only in a very narrow number of use cases - when standard dmd compiler driver is used. When dmd is used as a library the old error reporting mechanism is completely unusable as the custom applications need to drive the dmd library in a custom manner.

@WalterBright
Copy link
Member

  1. Error messages are currently tested in test/fail_compilation. No need to test them again.
  2. The compiler-as-library can include errors.d
  3. The user can simply provide their own versions of the functions in errors.d and not include errors.d.
  4. Providing a sink to errors.d for the output is far, far simpler.
  5. The internals of the compiler are very far from being unit testable, mainly because of very poor encapsulation. This is starting at the wrong end of the problem. Actually rather few functions generate error messages.

@PetarKirov
Copy link
Member

PetarKirov commented Mar 28, 2019

  1. test/fail_compilation provides no information exactly which part of the compiler produced the message. Sometimes there are more than one code path that produces the same message. There's no clean way to verify which without unit tests.
  2. It's easy to do, yet useless.
  3. This like designing a map function operating on ranges, where the user can't specify the mapping function because the map function internally imports one from a hard-coded module name. And the documentation saying: "If the default mapping function is not suitable, you can always put a custom mapping function in a module named std.algorithm.map_function and fiddle with the import paths in your build system until it works as needed." Hopefully, it's obvious why such design would be highly counter-productive.
  4. I agree that in general output range based designs are superior, though outputting strings is not useful if you need to immediately parse them again.
  5. Adding unit test infrastructure is the first step to unit testing and further improvements to the encapsulation.

@jacob-carlborg
Copy link
Contributor Author

jacob-carlborg commented Mar 28, 2019

Error messages are currently tested in test/fail_compilation. No need to test them again.

test/fail_compilation is a slow and complicated way to run the tests. I want to move all those test to test/unit/fail_compilation.

The compiler-as-library can include errors.d

That's not suitable for customizing.

The user can simply provide their own versions of the functions in errors.d and not include errors.d.

Can you please show an example of how to do this with the Dub package?

Providing a sink to errors.d for the output is far, far simpler.

That would require changing the error handling in hundreds of places. You don't like big pull requests.

Actually rather few functions generate error messages.

The error function is called from other 1000 places through out the compiler.

@WalterBright
Copy link
Member

provides no information exactly which part of the compiler produced the message.

grep (I'm not being flip here, I use grep all the time for exactly this purpose.)

Sometimes there are more than one code path that produces the same message.

True. Which is why I have been modifying them so the messages are different.

That's not suitable for customizing.

Sure it is. Let me explain how libraries work. The linker is presented with an undefined symbol. It first looks in the object files for a resolution. If not found, then it looks in the library for a resolution. If found, that and only that object file is pulled out of the library. Repeat for any new undefined symbols. Hence, to "customize" a function that's in a library, simply provide your own function in your object file, with the same signature.

That would require changing the error handling in hundreds of places.

No it doesn't, because the default sink would be what it does now. Setting a sink should be only one function call upon startup, not on every call.

Adding unit test infrastructure is the first step to unit testing and further improvements to the encapsulation.

Adding "unit test infrastructure" is a failure right there. D has unit tests, just add them. Note that Phobos has a massive amount of unit tests, and no unit test infrastructure.

@Geod24
Copy link
Member

Geod24 commented Mar 29, 2019

Sure it is. Let me explain how libraries work. The linker is presented with an undefined symbol. It first looks in the object files for a resolution. If not found, then it looks in the library for a resolution. If found, that and only that object file is pulled out of the library. Repeat for any new undefined symbols. Hence, to "customize" a function that's in a library, simply provide your own function in your object file, with the same signature.

I don't know where to start. Yes it works, but you can't be serious suggesting it as a long term solution.
So, a non-exhaustive list:

  1. This approach is not usually possible in D, it is only possible in this specific case because we're using extern(C++). So not only is it alien to users, it also forces us to stay with extern(C++);
  2. Changing the API of those functions would break code in a silent way;
  3. We can't use deprecated on those functions;
  4. We can't compose or reuse code: there is only 1 such symbol in the program. Can't have a wrapper around it that logs the call stack then calls the right function, can't have a library have its own independent function. There's no isolation.

Using the linker to override the symbol is neat when you have to get something to work quickly. It should be regarded as a hack, an exception, not something that is required for a library to do.
To make an analogy, it's similar to you suggesting someone to use goto instead of a while.

Now, when it comes to the unittests, this is what @jacob-carlborg was talking about, I believe:

unittest
{
    // The code we wish to analyze
    immutable moduleString = q{
        pragma(msg, int, double, real);
        pragma(msg, int.stringof, ':', " ", ulong.max);
        pragma(msg, "Integer expr: ", 4242_424242);
        pragma(msg, "With hex: ", 0x2_A);
    };

    // DMD is an object that represent the compiler
    // `BufferDiagnosticReporter` is a diagostic reporter that just buffers everything
    // in a "string[] messages" variable
    scope dmd = new DMD(new BufferDiagnosticReporter);
    dmd.initialize();
    dmd.runSemanticOnModule(moduleString);
    auto diag = dmd.getDiagnosticReporter();

    assert(diag.messages.length == 4);
    assert(diag.messages[0].text == "intdoublereal", diag.messages[0].text);
    assert(diag.messages[1].text == "int: 18446744073709551615", diag.messages[1].text);
    assert(diag.messages[2].text == "Integer expr: 4242_424242", diag.messages[2].text);
    assert(diag.messages[3].text == "With hex: 0x2_A", diag.messages[3].text);
}

If we want the compiler as a library to be more than a student project, and actually something people use to build tools for dlang, which we so desperately need, this should be trivial to do.

@jacob-carlborg
Copy link
Contributor Author

Sure it is. Let me explain how libraries work. The linker is presented with an undefined symbol. It first looks in the object files for a resolution. If not found, then it looks in the library for a resolution. If found, that and only that object file is pulled out of the library. Repeat for any new undefined symbols. Hence, to "customize" a function that's in a library, simply provide your own function in your object file, with the same signature.

That won't work for unit tests where usually all the files are compiled into a single binary without any library involved.

No it doesn't, because the default sink would be what it does now. Setting a sink should be only one function call upon startup, not on every call.

DiagnosticReporter is a sink. After the move to dmd.errors I plan to have all the existing error functions forward to the DiagnosticReporter. You always require to us to make small pull requests, then there will be intermediate code that is not ideal and not the final structure. Creating many separate PRs will also make it more difficult to see the whole picture.

Adding "unit test infrastructure" is a failure right there. D has unit tests, just add them. Note that Phobos has a massive amount of unit tests, and no unit test infrastructure.

Yes, I'm already using the unittest blocks:

unittest

But without any further customizations that requires you to run all of the unit tests even if you're only interested in only running a single unittest block. Part of this infrastructure is to provide helpers and utilities to implement the unit tests. I'm not making up anything new. Phobos contains helpers and utilities as well:

https://github.com/dlang/phobos/blob/2f030ebbd947c3fed874b3884ca40c793f766dfe/std/exception.d#L139

@WalterBright
Copy link
Member

This approach is not usually possible in D, it is only possible in this specific case because we're using extern(C++). So not only is it alien to users, it also forces us to stay with extern(C++);

I do not see how extern(C++) has anything to do with it.

Changing the API of those functions would break code in a silent way;
We can't use deprecated on those functions;

It's literally ONE function we're talking about - errors.verrorPrint() (I know there are a couple other functions that write to stderr, but those can be pushed into errors.verrorPrint() or equivalent.)

We can't compose or reuse code: there is only 1 such symbol in the program. Can't have a wrapper around it that logs the call stack then calls the right function, can't have a library have its own independent function. There's no isolation.

You can override it with a function that has multiple dispatch.

It should be regarded as a hack, an exception, not something that is required for a library to do.

There's nothing wrong with it, it's another tool in the toolbox. The only thing about it is it appears to be a forgotten technique. It has an advantage that not only is the call direct rather than indirect, it can be inlined.

when it comes to the unittests, this is what @jacob-carlborg was talking about

As shown, it offers nothing over what Seb and others have already well developed in the autotester. Even so, encouraging people to use error message texts in their unit tests will prove disastrous. It will introduce a whole new category of things we cannot improve due to backwards compatibility problems.

This goes back to what I wrote about the dependency of dmd on global.errors being a problem. Whether a parse or semantic pass succeeds or fails should be apparent from its returned state, not the state of a global variable, and certainly not whether an error message text string matches.

@WalterBright
Copy link
Member

What all these changes do is address overriding one function - verrorPrint() - with a thousand lines of new code and thousands of lines of changes all through dmd. Going down this path as a proper way to do things will end in disaster for DMD.

@WalterBright
Copy link
Member

Suggestion:

  1. Add a function in errors.d: void errorMsgSinkStderr(string s) that writes s to stderr.
  2. Add a function pointer in errors.d: void function(string) errorMsgSink = &errorMsgSinkStderr;
  3. Refactor verrorPrint() and other code that writes to stderr to instead sent the text to errorMsgSink().
  4. Add a function:
void function(string) setErrorMsgSink(void function(string) fp) {
    auto prev = errorMsgSink;
    errorMsgSink = fp;
    return prev;
}

Then, user does this:

auto save = setErrorMsgSink(&mySink);
scope (exit) setErrorMsgSink(save);
... my marvelous code ...

It's not pure, but it gets the job done without upending the code base.

@Geod24
Copy link
Member

Geod24 commented Mar 31, 2019

I do not see how extern(C++) has anything to do with it.

Indeed, I thought the symbol was extern(C++). But verrorPrint is not, indeed.
So if I understand correctly, your suggestion was for people that want to use dmd as a library to do:

pragma(mangle, "_D3dmd6errors11verrorPrintFKxSQBc7globals3LocEQBs7console5ColorPxaQdPS4core4stdc6stdarg13__va_list_tagQBnQBqZv")
private void verrorPrint(const ref Loc loc, Color headerColor, const(char)* header,
        const(char)* format, va_list ap, const(char)* p1 = null, const(char)* p2 = null)
{
  // Some code
}

In order to override the default behavior ?

It's literally ONE function we're talking about

It doesn't make the solution more appealing. It might be one function, but it's a very basic need to be able to capture messages. Is there a library out there that uses this approach ?

You can override it with a function that has multiple dispatch.

Could you provide a code example ? I have one in mind but don't want to assume too much.

There's nothing wrong with it, it's another tool in the toolbox. The only thing about it is it appears to be a forgotten technique. It has an advantage that not only is the call direct rather than indirect, it can be inlined.

There's nothing wrong with having many tools in hands, but most tools are simply better at achieving what we want to achieve, with less downsides. And the call is not direct if you use a function that does multiple dispatch. Also, to have inlining, since it's done at the linker level, you need LTO.

Even so, encouraging people to use error message texts in their unit tests will prove disastrous.

That was an example of an unittest you could put in DMD's test suite.

An example of an actual application of using DMD as a library, and capturing messages would a web server like tour.dlang.io. At the moment tour.dlang.io uses a Docker container to compile the code, having a library would allow to simply write a Vibe.d server.
Then there is IDE integration, and other tooling for static analysis. The limit of what can be done is really up to one's imagination, but at the moment the state of DMD as a library is so dire that the topic that needs focus are the same for everyone : reducing global state, making the code reusable, thread-safe, etc...

Suggestion: [...]

It works, and is already much, much better than the linker hack you were previously suggesting. It is not pure, and it goes against one of the current goal of reducing global state in DMD, but IMO it's an acceptable middle ground. But I'd like to hear @jacob-carlborg 's opinion on this.

@WalterBright
Copy link
Member

So if I understand correctly, your suggestion was for people that want to use dmd as a library to do:

There's no reason to use pragma(mangle). Just declare it the same way. And the function signature is no harder than writing void errorMsgSink(string); That's all we're talking about.

Could you provide a code example ?

void myErrorMsgSink(string s) {
    if (gohere) myErrorMsgSinkHere(s);
    else myErrorMsgSinkThere(s);
]

I don't understand the question?

Also, to have inlining, since it's done at the linker level, you need LTO.

If the overriding function is known to the caller, which it would be if it was included in the caller's source code, it can be inlined by the compiler. (I've written librarians, linkers, compilers, and inliners. I should know how this works.)

A thousand lines of code to intercept one function call is not desirable. Multiply that by the number of globals dmd has, and we've burned down the house to kill a few termites. Getting rid of globals is a goal, not an imperative overriding all other considerations. Replacing 5 lines of code with a thousand, plus a thousand more lines of disruption, requires enormous justification, and I'm not seeing that.

Please don't use unittesting error messages as a justification for this. It just legitimizes a place we don't want to go. Use justifications that are worthwhile. I don't want to see any unittesting of error messages, because that will prevent us from improving the error messages. The messages must be considered to be ephemeral. The only place they should be tested is in test/fail_compilation, where we already have a well-developed testing apparatus for them.

@Geod24
Copy link
Member

Geod24 commented Mar 31, 2019

Just declare it the same way.

Which requires you to modify DMD's source code in order to use it as a library. Absolutely unacceptable.

void myErrorMsgSink(string s) {
    if (gohere) myErrorMsgSinkHere(s);
    else myErrorMsgSinkThere(s);
}

Where is "gohere" defined ? What does it correspond to ?

If the overriding function is known to the caller, which it would be if it was included in the caller's source code, it can be inlined by the compiler. (I've written librarians, linkers, compilers, and inliners. I should know how this works.)

It is not included in the caller's source code.
You said it yourself:

Sure it is. Let me explain how libraries work. The linker is presented with an undefined symbol. It first looks in the object files for a resolution. If not found, then it looks in the library for a resolution. If found, that and only that object file is pulled out of the library. Repeat for any new undefined symbols. Hence, to "customize" a function that's in a library, simply provide your own function in your object file, with the same signature.


Please don't use unittesting error messages as a justification for this. It just legitimizes a place we don't want to go. Use justifications that are worthwhile. I don't want to see any unittesting of error messages, because that will prevent us from improving the error messages. The messages must be considered to be ephemeral. The only place they should be tested is in test/fail_compilation, where we already have a well-developed testing apparatus for them.

Please read my message. The example was to show the expected API. No one should tests the error messages but the compiler. There is no use case I can think of, and I am not trying to encourage it. I provided you with other use cases for using the messages, why did you choose to ignore it ?

@jacob-carlborg
Copy link
Contributor Author

jacob-carlborg commented Mar 31, 2019

In theory I'm fine with doing something like errorMsgSinkStderr. The problem is that the functions that call verrorPrint are performing additional logic that should be configurable as well. For example, verror is counting errors in global.errors and is deciding when something is fatal or not.

The only place they should be tested is in test/fail_compilation, where we already have a well-developed testing apparatus for them.

The problem with test/fail_compilation is that it requires one file per kind of error message to test. It also requires to invoke the compiler once per file resulting in over 1000 invocations of the compiler. With my approach the compiler is only invoked once. To clarify I don't want to test error message in multiple places, I want to move test/fail_compilation to test/unit/fail_compilation.

@jacob-carlborg
Copy link
Contributor Author

@WalterBright do you prefer errorMsgSink to be defined in dmd.errors or in dmd.globals.Global?

@WalterBright
Copy link
Member

Which requires you to modify DMD's source code in order to use it as a library. Absolutely unacceptable.

I think there's some terrible misunderstanding here. If I want to override strlen() in the C library and use my own, I just write my own strlen() in my code and the one in the C library will never be linked in. I don't have to modify the C library source code at all. There's nothing clever or hackish about it. Perhaps at DConf we can sit down and I can show you how libraries and linking work, as this back-and-forth is clearly not working. It should take about 5 minutes. I invite anyone else to join us.

@WalterBright
Copy link
Member

It belongs as part of dmd.globals.Global. Also, it should probably be a delegate, not a mere function pointer, so the user can supply their own context. And lastly, verrorPrint() and all the code that writes to stderr should be refactored to write to errorMsgSink() instead.

@Geod24
Copy link
Member

Geod24 commented Apr 1, 2019

I think there's some terrible misunderstanding here. If I want to override strlen() in the C library and use my own, I just write my own strlen() in my code and the one in the C library will never be linked in. I don't have to modify the C library source code at all. There's nothing clever or hackish about it. Perhaps at DConf we can sit down and I can show you how libraries and linking work, as this back-and-forth is clearly not working. It should take about 5 minutes. I invite anyone else to join us.

I very well understand, thanks. The point was that verrorPrint mangling includes the module name, so there is no way you can override it without pragma(mangle) as you would in C or C++.

@WalterBright
Copy link
Member

1000 invocations of the compiler

That's because there are 1000 files. Those files can be concatenated into fewer files.

@WalterBright
Copy link
Member

The point was that verrorPrint mangling includes the module name, so there is no way you can override it without pragma(mangle) as you would in C or C++.

You can:

  1. declare it as extern (C)
  2. declare it as extern(C++)
  3. name the module in your source code the same as it is in the library

@Geod24
Copy link
Member

Geod24 commented Apr 1, 2019

For 1 & 2, I already mentioned the cons here, to which you replied you "do not see how extern(C++) has anything to do with it."...
For 3, the modules will just conflict and the compiler will error out.

Now we're just going in circles.

@adamdruppe
Copy link
Contributor

so point on the module thing, the compiler won't necessarily error out. If one module is passed on command line and the other found in the -I path, the compiler will prefer the command line one and not even look at the import path, thereby bypassing the error condition and letting the linker do its thing.

@WalterBright
Copy link
Member

@adamdruppe that's right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merge:auto-merge Review:Blocking Other Work review and pulling should be a priority

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants