(Option 1) Improve logging accuracy and reduce exists && isFile to 1 file query by marler8997 · Pull Request #295 · dlang/tools

marler8997 · 2018-01-22T17:56:32Z

No description provided.

dlang-bot · 2018-01-22T17:56:32Z

Thanks for your pull request, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

smaller, focused PRs are easier to review than big ones
try not to mix up refactoring or style changes with bug fixes or feature enhancements
provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

WebDrake · 2018-01-22T22:38:28Z

I'd like to hear more detail about the motivation behind these changes, because on the face of it this PR seems quite problematic. Let's break it down:

the original code uses standard library functions that are easy to understand even if one is not familiar with them (e.g. compilerPath.exists and compilerPath.isFile are super obvious in their meaning)
this PR replaces these calls with custom written functions that are less intuitive to understand at the call site (e.g. my intuitive reaction to yapExists(fullExeTemp) was "what's yap and why do I need to know if it exists for fullExeTemp?)
the PR introduces a new custom-written struct and 3 new functions which will need to be maintained in rdmd if this patch is accepted, as opposed to just relying on well defined standard library functions
one of these functions introduces a hard constraint on OS support (Windows and Posix only). While the same constraint might exist elsewhere in rdmd or in the standard library functions (I haven't checked), any new introduction of such a constraint brings an extra maintenance burden. If this is an entirely new constraint not previously imposed, this may even be a breaking change.
the micro-optimization introduced by this PR hasn't been benchmarked and seems very unlikely to be a performance bottleneck for rdmd.

In summary, this patch seems to make rdmd's code longer, harder to understand and harder to maintain, for the sake of a micro-optimization that likely is not even noticeable to the user.

Absent a very good reason (e.g. benchmarks showing significant improvement to real-world use cases) this PR should probably just be closed.

marler8997 · 2018-01-22T22:50:40Z

Your evaluation seems a bit dramatic to me and also very one sided. You attempted to find everything wrong with this PR but didn't mention any of the benefits. Would you mind listing all the benefits as well?

WebDrake · 2018-01-22T23:10:32Z

@marler8997 I'm afraid I don't see what those benefits are. This is one reason why I asked about the motivation behind the PR.

Neither the patch nor the PR have any description other than the title, and this says only what has changed -- it says nothing about why this change was deemed necessary or what criteria were used to do so.

This is information that presumably as author you already know, so it would be good to include it in the patch description, to help reviwers understand things more readily.

As things stand, though, this patch looks like it decreases the clarity and maintainability of rdmd for no clear benefit.

marler8997 · 2018-01-22T23:20:37Z

Based on your initial comment it's clear that you have a strong bias against this change. You've included alot of detail about why you don't like this change but spent no time analyzing what is good about it. I've found that when someone is in this state of mind it is impossible for them to discuss the subject objectively.

It's been found that when people are in this state, the more evidence you present against their opinion, the more they will dig in their heels and cling to their initial opinion. Addressing your concerns and explaining the motivation isn't a problem for me, however, this would only cause you to find more a more wrong with the PR and be counter-productive.

Unless you can re-evaluate this PR objectively and take just as much time to evaluate the pros as you did the cons, it's better that we don't start an argument over this.

WebDrake · 2018-01-22T23:43:33Z

@marler8997 I've offered you a detailed set of feedback with clearly spelled out criteria behind it. Do you really feel it's either appropriate or helpful to respond with ad hominem about what you perceive as my state of mind?

I'm prepared to listen to evidence. But have you considered that it's possible that the lack of objectivity could be on your side, not mine?

What makes you think that this PR is a good contribution?

marler8997 · 2018-01-22T23:49:47Z

I'm not going to argue with someone who is biased, it doesn't do anyone any good. The only way to prove to me that you aren't biased is to evaluate the opposing side to your viewpoint. Do that and then we can talk.

jacob-carlborg · 2018-01-23T07:57:58Z

this PR replaces these calls with custom written functions that are less intuitive to understand at the call site (e.g. my intuitive reaction to yapExists(fullExeTemp) was "what's yap and why do I need to know if it exists for fullExeTemp?)

I agree that the names are not that good. Turns out yap is Andrei's funny word for "verbose".

one of these functions introduces a hard constraint on OS support (Windows and Posix only). While the same constraint might exist elsewhere in rdmd or in the standard library functions (I haven't checked), any new introduction of such a constraint brings an extra maintenance burden. If this is an entirely new constraint not previously imposed, this may even be a breaking change.

The Phobos implementations of these functions have the same constraints.

the micro-optimization introduced by this PR hasn't been benchmarked and seems very unlikely to be a performance bottleneck for rdmd.

I agree that benchmarks that shows the need for this should be provided.

DmitryOlshansky · 2018-01-23T08:51:59Z

rdmd.d

        fullExe = fullExe.defaultExtension(".exe");

    // Delete the old executable before we start building.
-    yap("stat ", fullExe);


So yap was a precedent already

yap appears to be a utility function to report commands issued directly by rdmd. It's responsible for verbose output. The name makes sense for this usage (see the dictionary definition:-), albeit maybe not so obvious to a non-native speaker.

It just seems odd (and misleading) to use as a prefix for the newly introduced functions.

DmitryOlshansky · 2018-01-23T08:57:12Z

the micro-optimization introduced by this PR hasn't been benchmarked and seems very unlikely to be a performance bottleneck for rdmd.

This seems to be the key point. Any data on the improvement on some example?
Basically if these stat calls are not the bottleneck then indeed the patch is just more stuff to maintain.

I have a hunch that stat calls could be measurable in the case where we don't need to rebuild the project that consists of a multitude of files, use that as the baseline with --build-only mode so that you avoid measuring the actual application.

Basically it should be:

avgtime -r 100 rdmd --build-only some_project_entry_point.d

avgtime is nice D utility program available here:
https://github.com/jmcabo/avgtime

WebDrake · 2018-01-23T09:22:27Z

Turns out yap is Andrei's funny word for "verbose".

It's not a bad choice for the actual yap function (where the usage matches my intuitive understanding of the word). But given that usage it makes even less sense as a prefix for the new function names.

The Phobos implementations of these functions have the same constraints.

I'm not surprised, but that wasn't the concern. If the constraints are only in phobos, then rdmd's support automatically gets better whenever phobos' does. If the constraints are in rdmd then they affect rdmd even when phobos does not impose these constraints.

So, the objection was to introducing new code in rdmd that carries these constraints.

(There is already a similar version block where file suffixes are determined, but that's easier to maintain and could probably be fixed anyway. Ideally we shouldn't add more such rdmd-side limitations.)

marler8997 · 2018-01-23T12:58:29Z

I'd be surprised if this PR had in any measurable performance improvement. The reason for it is logging accuracy. Since I've been working very closely with rdmd recently, the logging inaccuracy has been a pain. Currently rdmd is missing a fair amount of logging because it accesses the file system and logs it independently.

yap("operation ", name);
operation(name);

If you go through the code, you'll notice that a fair amount of file system operations are missing their corresponding yap statement. By combining both the logging and the operation itself into a single function (i.e. yapOperation), this prevents the mistake in the future (maybe a better name is operationWithYap?). Once I did this, I noticed that rdmd was doing multiple stat operations one after the other on the same file. Anyone looking at the logs would immediately question why it was doing the exact same thing twice. The reason is because the phobos API throws exceptions when you are trying to get attributes on a file that doesn't exist. We could use this API and swallow the exception, but this may have a measurable performance degradation, and I've never been a fan of using exceptions for normal control flow. In this case it makes sense to just call the "non-exception throwing" functions directly (GetFileAttributes and `stat). Calling 4 lines of lines of platform specific code a "maintenance burden" is dramatic in my opinion. Especially when the code is calling functions as old/common as these 2.

WebDrake · 2018-01-23T16:33:50Z

@marler8997 OK, so the goal here is not improved performance, but ensuring that verbose output accurately reports every single output with the filesystem?

That's a reasonable concern per se (although one might distinguish between querying the file system, versus modifying it, in terms of logging priority). But it's probably a good idea to try to spell out in detail exactly what's wanted and why so that reviewers can best advise on how to achieve this, and whether it really is a good idea.

I'm AFK right now (writing from my phone) so it's hard to offer detailed thoughts and advice. I'll try to write more this evening or tomorrow, but in the meantime, a few pieces of feedback:

it's a good idea to be clear about your goals and desired outcomes in your commit messages and PR descriptions. The only description of the current commit/PR is the title, which refers to reducing the number of file system interactions. The logical conclusion is that this is the intent behind the PR, and hence that these changes were meant to improve performance (not logging). If the commit title had been, for example, Ensure that all filesystem operations are reported by verbose output, we'd have had a very different conversation.
- TL;DR: commit messages should say why the changes are being made, not what has changed (we can already see what's changed from the diff)
what exactly has been painful about the logging, i.e. what things have been problematic for you when working with rdmd?
are these problems related to running rdmd (as a user) or working on it as a developer, or both?
when looking at the file system interactions: is the problem from your POV the double stat calls for existence and isFile checks, or is it that rdmd code relies on the developer to always check both in the correct order (e.g. would it suffice to have a function bool isExistingFile(string filename) { return filename.exists && filename.isFile; } ...?)
is it reasonable or necessary to report filesystem operations that are not directly requested from rdmd (e.g. those issued from phobos functionality called by rdmd)? If so, why?

I'll follow up with more thoughts when I'm back in front of a computer, but in the meantime it would be good to have feedback on these questions and remarks.

jacob-carlborg · 2018-01-23T17:09:04Z

I'd be surprised if this PR had in any measurable performance improvement. The reason for it is logging accuracy.

Mentioning that from the beginning and ideally in the commit message would have saved a lot of discussions here.

WebDrake · 2018-01-24T16:44:28Z

On further consideration, two things spring to mind that as relevant:

(1) We should recognize that the usage of yap is inherently abstraction-breaking. Consider:

yap("stat ", fullExe);
if (exists(fullExe))
{
    ...
}

The correctness of this yap call rests on the developer's knowledge that std.file.exists uses stat under the hood. But in fact this is only true for POSIX systems.

(2) Given that the total number of filesystem-related functions used by rdmd is quite small, one way to guarantee reporting would be to implement simple local verbose wrappers:

bool exists(string path)
{
    import std.file: exists;
    yap("stat ", path);
    return std.file.exists(path);
}

... but the correctness of the verbose output depends on the abstraction-breaking knowledge mentioned above.

If we really, really want to report file system interactions correctly, we might reasonably ask if a manual solution like yap is the right way to go about it in the first place.

marler8997 · 2018-01-24T17:59:04Z

I've made a separate PR that only improves logging accuracy without reducing the 2 file system calls to 1 in the exists && isFile case: #299

WebDrake · 2018-01-24T18:18:44Z

One further complicating factor in addition to those mentioned above: the --dry-run flag. This automatically means --chatty and outputs the commands that rdmd would have issued if the --dry-run flag were not selected.

So, even if it were possible to somehow track the commands invoked by rdmd and pipe that information back to stderr, this would not suffice for the --dry-run case (where no commands are issued).

stefan-koch-sociomantic · 2018-01-29T15:42:03Z

isn't that one the same as #291 ?

marler8997 · 2018-01-29T17:17:41Z

This one was an attempt to improve logging and reduce the 2 file system call case (exists and isFile). @WebDrake didn't like the version-specific code so I created #291 to just fix the logging on it's own without the file system call consolidation.

marler8997 · 2018-01-29T17:29:28Z

I'm sorry, this is completely unrelated to #291, did you actually mean #299 ?

wilzbach · 2018-02-03T11:08:14Z

rdmd.d

+bool yapExists(const(char)[] name)
+{
+    return yapGetFileAttributes(name).exists();
+}


How about putting all these into one scope?

struct Yap { ... }

marler8997 · 2018-02-03T15:40:22Z

I've updated the names of this PR and #299 to reflect that they are 2 different options of the same change. #299 is basically the same thing but without the extra reduction of filesystem query in the exists && isFile case. I would prefer this one over #299, however, I mostly want the logging accuracy fixed, so I am ok with either.

WebDrake · 2018-02-03T17:51:42Z

The objection is not (solely) to the version dependencies. It's rather that, for a variety of reasons, both the PRs on offer introduce increased complexity of rdmd.d code while not really solving the problem.

The existing code has a pretty simple and consistent design principle: use standard library functions to interact with the filesystem, and manually add yap calls if there's a need for --chatty output.

The downside of that is that the developer (and reviewers) have to take care of making sure the yap statements are correctly placed. But the (big) upside is that it's very easy to read and understand what is being done, and why.

Neither of the current PRs on offer provides a comprehensive replacement for this approach. Both PRs mix up verbose wrapping of std.file functionality and wrapping of at least one dryRun check, while leaving plenty of other manual yap calls and dryRun checks, and plenty of direct uses of std.file hanging around in other parts of rdmd.d outside of the wrapper functions. So, instead of one simple approach that needs some care from developers and reviewers, we now have at least two different ways of doing things (that are both used).

This means the code is inconsistent in style and design, and that makes the code less intuitive to read and reason about. This is before we even get to stuff like rdmd.d copy-pasting internals from phobos in order to save one stat call, or introducing hard-coded version constraints that don't need to exist.

This is why, as suggested multiple times now, it would be better to address the immediate problem (i.e. add missing yap statements to the existing code), and leave a larger-scale reworking of things for a later, more considered approach that makes rdmd.d easier to understand, not harder.

marler8997 · 2018-02-03T17:56:35Z

@WebDrake you've shared your opinion multiple times now, and I've told you multiple times that I disagree, please limit the repetition and allow someone else to chime in.

CyberShadow

Thanks! I think this is a good general direction for improvement.

I would put the stat optimization as a separate commit or PR, but I wouldn't mind it either way. rdmd has generally tried to reduce the number of stat calls, which can be slow with a cold disk cache or a network filesystem, but stat-ing the same file twice in a row probably won't make much of a difference unless it's on an uncached network filesystem or something like that.

CyberShadow · 2018-02-06T02:05:44Z

rdmd.d

+    }
+}
+
+struct Yap


yap wasn't a great name start with, and this is not making it better. This is a namespace for wrapping filesystem operations, so let's just name it Filesystem or such.

Please also add a comment describing the motivation of having this structure.

Added a comment in #299. I like the name Yap for the structure, let's come back to this if you still feel the same after this set of changes.

CyberShadow · 2018-02-06T02:06:13Z

rdmd.d

+
+struct Yap
+{
+    static FileAttributes getFileAttributes(const(char)[] name)


Use static: to make it more clear that this is a namespace and has no non-static members.

Ok (changed in #299)

CyberShadow · 2018-02-06T02:06:47Z

rdmd.d

+}
+
+struct Yap
+{


Move std.file imports here. That way, it will not be possible to accidentally use std.file functions bypassing logging.

There's still a fair amount of std.file functions used in the rest of the file

thisExePath tempDir remove FileException DirEntry dirEntries

Please use selective imports for that.

remove is sticking out: should it be called via the wrapper instead? If not, then it should be a static or fully-qualified import to make it explicit.

The remove case is awkward. It has a try/catch block with logic to handle what happens if the file is being used. It's in the first part of the rebuild function.

I can switch them to selective imports if you think that's better, but none of the other modules use selective imports, so I wasn't sure if I should follow convention or do the "safer" thing.

Yes, please use selective imports. Using selective (or static) imports provides half of the justification for the filesystem operations wrapper: to prevent accidentally bypassing it. Without that, it's more difficult to justify this entire pull request.

The remove call is already accompanied by a yap, so with a bit of refactoring of the !dryRun part it looks like a good candidate of being converted to a call to the wrapper.

dirEntries should also probably be moved to the wrapper, since it's not much different from stat in principle.

ok, done except moving the remove case to a wrapper. have another look at the current set of changes in #299 and let me know what you think

CyberShadow · 2018-02-06T02:07:39Z

rdmd.d

+    static auto timeLastModified(R)(R name)
+    {
+        yap("stat ", name);
+        return std.file.timeLastModified(name);


This could also be written as simply return .timeLastModified(name);, but I guess it's probably better to be more explicit here.

CyberShadow · 2018-02-06T02:09:28Z

rdmd.d

    else stderr.writeln(stuff);
 }
+
+struct FileAttributes


Please document what this is used for.

Ok (changed in #299)

marler8997 · 2018-02-06T04:18:13Z

Closing in favor of #299. Might add the file system call consolidation in a separate PR.

marler8997 force-pushed the rdmdOneFileCheck branch from 899b2b7 to 9f5b1a2 Compare January 22, 2018 18:00

DmitryOlshansky reviewed Jan 23, 2018

View reviewed changes

wilzbach reviewed Feb 3, 2018

View reviewed changes

marler8997 force-pushed the rdmdOneFileCheck branch from 9f5b1a2 to 83de295 Compare February 3, 2018 15:37

marler8997 changed the title ~~Reduce 2 file system queries to one when checking if something is a file.~~ (Option 1) Improve logging accuracy and reduce exists && isFile to 1 file query Feb 3, 2018

marler8997 force-pushed the rdmdOneFileCheck branch from 83de295 to 223145a Compare February 3, 2018 15:45

Improve logging accuracy and reduce exists && isFile to 1 file query.

93a79b8

marler8997 force-pushed the rdmdOneFileCheck branch from 223145a to 93a79b8 Compare February 3, 2018 15:54

CyberShadow reviewed Feb 6, 2018

View reviewed changes

CyberShadow mentioned this pull request Feb 6, 2018

Improve logging accuracy #299

Merged

marler8997 closed this Feb 6, 2018

Uh oh!

Conversation

marler8997 commented Jan 22, 2018

Uh oh!

dlang-bot commented Jan 22, 2018

Bugzilla references

Uh oh!

WebDrake commented Jan 22, 2018

Uh oh!

marler8997 commented Jan 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WebDrake commented Jan 22, 2018

Uh oh!

marler8997 commented Jan 22, 2018

Uh oh!

WebDrake commented Jan 22, 2018

Uh oh!

marler8997 commented Jan 22, 2018

Uh oh!

jacob-carlborg commented Jan 23, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DmitryOlshansky commented Jan 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WebDrake commented Jan 23, 2018

Uh oh!

marler8997 commented Jan 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WebDrake commented Jan 23, 2018

Uh oh!

jacob-carlborg commented Jan 23, 2018

Uh oh!

WebDrake commented Jan 24, 2018

Uh oh!

marler8997 commented Jan 24, 2018

Uh oh!

WebDrake commented Jan 24, 2018

Uh oh!

stefan-koch-sociomantic commented Jan 29, 2018

Uh oh!

marler8997 commented Jan 29, 2018

Uh oh!

marler8997 commented Jan 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marler8997 commented Feb 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WebDrake commented Feb 3, 2018

Uh oh!

marler8997 commented Feb 3, 2018

Uh oh!

CyberShadow left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

marler8997 commented Jan 22, 2018 •

edited

Loading

DmitryOlshansky commented Jan 23, 2018 •

edited

Loading

marler8997 commented Jan 23, 2018 •

edited

Loading

marler8997 commented Feb 3, 2018 •

edited

Loading