Skip to content

(Option 1) Improve logging accuracy and reduce exists && isFile to 1 file query#295

Closed
marler8997 wants to merge 1 commit intodlang:masterfrom
marler8997:rdmdOneFileCheck
Closed

(Option 1) Improve logging accuracy and reduce exists && isFile to 1 file query#295
marler8997 wants to merge 1 commit intodlang:masterfrom
marler8997:rdmdOneFileCheck

Conversation

@marler8997
Copy link
Contributor

No description provided.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@WebDrake
Copy link
Contributor

I'd like to hear more detail about the motivation behind these changes, because on the face of it this PR seems quite problematic. Let's break it down:

  • the original code uses standard library functions that are easy to understand even if one is not familiar with them (e.g. compilerPath.exists and compilerPath.isFile are super obvious in their meaning)

  • this PR replaces these calls with custom written functions that are less intuitive to understand at the call site (e.g. my intuitive reaction to yapExists(fullExeTemp) was "what's yap and why do I need to know if it exists for fullExeTemp?)

  • the PR introduces a new custom-written struct and 3 new functions which will need to be maintained in rdmd if this patch is accepted, as opposed to just relying on well defined standard library functions

  • one of these functions introduces a hard constraint on OS support (Windows and Posix only). While the same constraint might exist elsewhere in rdmd or in the standard library functions (I haven't checked), any new introduction of such a constraint brings an extra maintenance burden. If this is an entirely new constraint not previously imposed, this may even be a breaking change.

  • the micro-optimization introduced by this PR hasn't been benchmarked and seems very unlikely to be a performance bottleneck for rdmd.

In summary, this patch seems to make rdmd's code longer, harder to understand and harder to maintain, for the sake of a micro-optimization that likely is not even noticeable to the user.

Absent a very good reason (e.g. benchmarks showing significant improvement to real-world use cases) this PR should probably just be closed.

@marler8997
Copy link
Contributor Author

marler8997 commented Jan 22, 2018

Your evaluation seems a bit dramatic to me and also very one sided. You attempted to find everything wrong with this PR but didn't mention any of the benefits. Would you mind listing all the benefits as well?

@WebDrake
Copy link
Contributor

@marler8997 I'm afraid I don't see what those benefits are. This is one reason why I asked about the motivation behind the PR.

Neither the patch nor the PR have any description other than the title, and this says only what has changed -- it says nothing about why this change was deemed necessary or what criteria were used to do so.

This is information that presumably as author you already know, so it would be good to include it in the patch description, to help reviwers understand things more readily.

As things stand, though, this patch looks like it decreases the clarity and maintainability of rdmd for no clear benefit.

@marler8997
Copy link
Contributor Author

Based on your initial comment it's clear that you have a strong bias against this change. You've included alot of detail about why you don't like this change but spent no time analyzing what is good about it. I've found that when someone is in this state of mind it is impossible for them to discuss the subject objectively.

It's been found that when people are in this state, the more evidence you present against their opinion, the more they will dig in their heels and cling to their initial opinion. Addressing your concerns and explaining the motivation isn't a problem for me, however, this would only cause you to find more a more wrong with the PR and be counter-productive.

Unless you can re-evaluate this PR objectively and take just as much time to evaluate the pros as you did the cons, it's better that we don't start an argument over this.

@WebDrake
Copy link
Contributor

@marler8997 I've offered you a detailed set of feedback with clearly spelled out criteria behind it. Do you really feel it's either appropriate or helpful to respond with ad hominem about what you perceive as my state of mind?

I'm prepared to listen to evidence. But have you considered that it's possible that the lack of objectivity could be on your side, not mine?

What makes you think that this PR is a good contribution?

@marler8997
Copy link
Contributor Author

I'm not going to argue with someone who is biased, it doesn't do anyone any good. The only way to prove to me that you aren't biased is to evaluate the opposing side to your viewpoint. Do that and then we can talk.

@jacob-carlborg
Copy link

this PR replaces these calls with custom written functions that are less intuitive to understand at the call site (e.g. my intuitive reaction to yapExists(fullExeTemp) was "what's yap and why do I need to know if it exists for fullExeTemp?)

I agree that the names are not that good. Turns out yap is Andrei's funny word for "verbose".

one of these functions introduces a hard constraint on OS support (Windows and Posix only). While the same constraint might exist elsewhere in rdmd or in the standard library functions (I haven't checked), any new introduction of such a constraint brings an extra maintenance burden. If this is an entirely new constraint not previously imposed, this may even be a breaking change.

The Phobos implementations of these functions have the same constraints.

the micro-optimization introduced by this PR hasn't been benchmarked and seems very unlikely to be a performance bottleneck for rdmd.

I agree that benchmarks that shows the need for this should be provided.

fullExe = fullExe.defaultExtension(".exe");

// Delete the old executable before we start building.
yap("stat ", fullExe);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yap was a precedent already

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap appears to be a utility function to report commands issued directly by rdmd. It's responsible for verbose output. The name makes sense for this usage (see the dictionary definition:-), albeit maybe not so obvious to a non-native speaker.

It just seems odd (and misleading) to use as a prefix for the newly introduced functions.

@DmitryOlshansky
Copy link
Member

DmitryOlshansky commented Jan 23, 2018

the micro-optimization introduced by this PR hasn't been benchmarked and seems very unlikely to be a performance bottleneck for rdmd.

This seems to be the key point. Any data on the improvement on some example?
Basically if these stat calls are not the bottleneck then indeed the patch is just more stuff to maintain.

I have a hunch that stat calls could be measurable in the case where we don't need to rebuild the project that consists of a multitude of files, use that as the baseline with --build-only mode so that you avoid measuring the actual application.

Basically it should be:

avgtime -r 100 rdmd --build-only some_project_entry_point.d 

avgtime is nice D utility program available here:
https://github.com/jmcabo/avgtime

@WebDrake
Copy link
Contributor

Turns out yap is Andrei's funny word for "verbose".

It's not a bad choice for the actual yap function (where the usage matches my intuitive understanding of the word). But given that usage it makes even less sense as a prefix for the new function names.

The Phobos implementations of these functions have the same constraints.

I'm not surprised, but that wasn't the concern. If the constraints are only in phobos, then rdmd's support automatically gets better whenever phobos' does. If the constraints are in rdmd then they affect rdmd even when phobos does not impose these constraints.

So, the objection was to introducing new code in rdmd that carries these constraints.

(There is already a similar version block where file suffixes are determined, but that's easier to maintain and could probably be fixed anyway. Ideally we shouldn't add more such rdmd-side limitations.)

@marler8997
Copy link
Contributor Author

marler8997 commented Jan 23, 2018

I'd be surprised if this PR had in any measurable performance improvement. The reason for it is logging accuracy. Since I've been working very closely with rdmd recently, the logging inaccuracy has been a pain. Currently rdmd is missing a fair amount of logging because it accesses the file system and logs it independently.

yap("operation ", name);
operation(name);

If you go through the code, you'll notice that a fair amount of file system operations are missing their corresponding yap statement. By combining both the logging and the operation itself into a single function (i.e. yapOperation), this prevents the mistake in the future (maybe a better name is operationWithYap?). Once I did this, I noticed that rdmd was doing multiple stat operations one after the other on the same file. Anyone looking at the logs would immediately question why it was doing the exact same thing twice. The reason is because the phobos API throws exceptions when you are trying to get attributes on a file that doesn't exist. We could use this API and swallow the exception, but this may have a measurable performance degradation, and I've never been a fan of using exceptions for normal control flow. In this case it makes sense to just call the "non-exception throwing" functions directly (GetFileAttributes and `stat). Calling 4 lines of lines of platform specific code a "maintenance burden" is dramatic in my opinion. Especially when the code is calling functions as old/common as these 2.

@WebDrake
Copy link
Contributor

@marler8997 OK, so the goal here is not improved performance, but ensuring that verbose output accurately reports every single output with the filesystem?

That's a reasonable concern per se (although one might distinguish between querying the file system, versus modifying it, in terms of logging priority). But it's probably a good idea to try to spell out in detail exactly what's wanted and why so that reviewers can best advise on how to achieve this, and whether it really is a good idea.

I'm AFK right now (writing from my phone) so it's hard to offer detailed thoughts and advice. I'll try to write more this evening or tomorrow, but in the meantime, a few pieces of feedback:

  • it's a good idea to be clear about your goals and desired outcomes in your commit messages and PR descriptions. The only description of the current commit/PR is the title, which refers to reducing the number of file system interactions. The logical conclusion is that this is the intent behind the PR, and hence that these changes were meant to improve performance (not logging). If the commit title had been, for example, Ensure that all filesystem operations are reported by verbose output, we'd have had a very different conversation.

    • TL;DR: commit messages should say why the changes are being made, not what has changed (we can already see what's changed from the diff)
  • what exactly has been painful about the logging, i.e. what things have been problematic for you when working with rdmd?

  • are these problems related to running rdmd (as a user) or working on it as a developer, or both?

  • when looking at the file system interactions: is the problem from your POV the double stat calls for existence and isFile checks, or is it that rdmd code relies on the developer to always check both in the correct order (e.g. would it suffice to have a function bool isExistingFile(string filename) { return filename.exists && filename.isFile; } ...?)

  • is it reasonable or necessary to report filesystem operations that are not directly requested from rdmd (e.g. those issued from phobos functionality called by rdmd)? If so, why?

I'll follow up with more thoughts when I'm back in front of a computer, but in the meantime it would be good to have feedback on these questions and remarks.

@jacob-carlborg
Copy link

I'd be surprised if this PR had in any measurable performance improvement. The reason for it is logging accuracy.

Mentioning that from the beginning and ideally in the commit message would have saved a lot of discussions here.

@WebDrake
Copy link
Contributor

On further consideration, two things spring to mind that as relevant:

(1) We should recognize that the usage of yap is inherently abstraction-breaking. Consider:

yap("stat ", fullExe);
if (exists(fullExe))
{
    ...
}

The correctness of this yap call rests on the developer's knowledge that std.file.exists uses stat under the hood. But in fact this is only true for POSIX systems.

(2) Given that the total number of filesystem-related functions used by rdmd is quite small, one way to guarantee reporting would be to implement simple local verbose wrappers:

bool exists(string path)
{
    import std.file: exists;
    yap("stat ", path);
    return std.file.exists(path);
}

... but the correctness of the verbose output depends on the abstraction-breaking knowledge mentioned above.

If we really, really want to report file system interactions correctly, we might reasonably ask if a manual solution like yap is the right way to go about it in the first place.

@marler8997
Copy link
Contributor Author

I've made a separate PR that only improves logging accuracy without reducing the 2 file system calls to 1 in the exists && isFile case: #299

@WebDrake
Copy link
Contributor

One further complicating factor in addition to those mentioned above: the --dry-run flag. This automatically means --chatty and outputs the commands that rdmd would have issued if the --dry-run flag were not selected.

So, even if it were possible to somehow track the commands invoked by rdmd and pipe that information back to stderr, this would not suffice for the --dry-run case (where no commands are issued).

@stefan-koch-sociomantic

isn't that one the same as #291 ?

@marler8997
Copy link
Contributor Author

This one was an attempt to improve logging and reduce the 2 file system call case (exists and isFile). @WebDrake didn't like the version-specific code so I created #291 to just fix the logging on it's own without the file system call consolidation.

@marler8997
Copy link
Contributor Author

I'm sorry, this is completely unrelated to #291, did you actually mean #299 ?

bool yapExists(const(char)[] name)
{
return yapGetFileAttributes(name).exists();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about putting all these into one scope?

struct Yap { ... }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@marler8997 marler8997 changed the title Reduce 2 file system queries to one when checking if something is a file. (Option 1) Improve logging accuracy and reduce exists && isFile to 1 file query Feb 3, 2018
@marler8997
Copy link
Contributor Author

marler8997 commented Feb 3, 2018

I've updated the names of this PR and #299 to reflect that they are 2 different options of the same change. #299 is basically the same thing but without the extra reduction of filesystem query in the exists && isFile case. I would prefer this one over #299, however, I mostly want the logging accuracy fixed, so I am ok with either.

@WebDrake
Copy link
Contributor

WebDrake commented Feb 3, 2018

The objection is not (solely) to the version dependencies. It's rather that, for a variety of reasons, both the PRs on offer introduce increased complexity of rdmd.d code while not really solving the problem.

The existing code has a pretty simple and consistent design principle: use standard library functions to interact with the filesystem, and manually add yap calls if there's a need for --chatty output.

The downside of that is that the developer (and reviewers) have to take care of making sure the yap statements are correctly placed. But the (big) upside is that it's very easy to read and understand what is being done, and why.

Neither of the current PRs on offer provides a comprehensive replacement for this approach. Both PRs mix up verbose wrapping of std.file functionality and wrapping of at least one dryRun check, while leaving plenty of other manual yap calls and dryRun checks, and plenty of direct uses of std.file hanging around in other parts of rdmd.d outside of the wrapper functions. So, instead of one simple approach that needs some care from developers and reviewers, we now have at least two different ways of doing things (that are both used).

This means the code is inconsistent in style and design, and that makes the code less intuitive to read and reason about. This is before we even get to stuff like rdmd.d copy-pasting internals from phobos in order to save one stat call, or introducing hard-coded version constraints that don't need to exist.

This is why, as suggested multiple times now, it would be better to address the immediate problem (i.e. add missing yap statements to the existing code), and leave a larger-scale reworking of things for a later, more considered approach that makes rdmd.d easier to understand, not harder.

@marler8997
Copy link
Contributor Author

@WebDrake you've shared your opinion multiple times now, and I've told you multiple times that I disagree, please limit the repetition and allow someone else to chime in.

Copy link
Member

@CyberShadow CyberShadow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think this is a good general direction for improvement.

I would put the stat optimization as a separate commit or PR, but I wouldn't mind it either way. rdmd has generally tried to reduce the number of stat calls, which can be slow with a cold disk cache or a network filesystem, but stat-ing the same file twice in a row probably won't make much of a difference unless it's on an uncached network filesystem or something like that.

}
}

struct Yap
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap wasn't a great name start with, and this is not making it better. This is a namespace for wrapping filesystem operations, so let's just name it Filesystem or such.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add a comment describing the motivation of having this structure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment in #299. I like the name Yap for the structure, let's come back to this if you still feel the same after this set of changes.


struct Yap
{
static FileAttributes getFileAttributes(const(char)[] name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use static: to make it more clear that this is a namespace and has no non-static members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok (changed in #299)

}

struct Yap
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move std.file imports here. That way, it will not be possible to accidentally use std.file functions bypassing logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's still a fair amount of std.file functions used in the rest of the file

thisExePath
tempDir
remove
FileException
DirEntry
dirEntries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use selective imports for that.

remove is sticking out: should it be called via the wrapper instead? If not, then it should be a static or fully-qualified import to make it explicit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remove case is awkward. It has a try/catch block with logic to handle what happens if the file is being used. It's in the first part of the rebuild function.

I can switch them to selective imports if you think that's better, but none of the other modules use selective imports, so I wasn't sure if I should follow convention or do the "safer" thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please use selective imports. Using selective (or static) imports provides half of the justification for the filesystem operations wrapper: to prevent accidentally bypassing it. Without that, it's more difficult to justify this entire pull request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The remove call is already accompanied by a yap, so with a bit of refactoring of the !dryRun part it looks like a good candidate of being converted to a call to the wrapper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dirEntries should also probably be moved to the wrapper, since it's not much different from stat in principle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, done except moving the remove case to a wrapper. have another look at the current set of changes in #299 and let me know what you think

static auto timeLastModified(R)(R name)
{
yap("stat ", name);
return std.file.timeLastModified(name);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be written as simply return .timeLastModified(name);, but I guess it's probably better to be more explicit here.

else stderr.writeln(stuff);
}

struct FileAttributes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document what this is used for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok (changed in #299)

@marler8997
Copy link
Contributor Author

Closing in favor of #299. Might add the file system call consolidation in a separate PR.

@marler8997 marler8997 closed this Feb 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants