Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rework record logic #308

Closed
wants to merge 10 commits into from
Closed

rework record logic #308

wants to merge 10 commits into from

Conversation

e2
Copy link
Contributor

@e2 e2 commented May 5, 2015

Fixed tests and Record#build logic for #307

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.77%) to 92.72% when pulling d3499fd on e2-rework_record_logic into cba1e51 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.7%) to 92.79% when pulling d3499fd on e2-rework_record_logic into cba1e51 on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-5.84%) to 92.65% when pulling d3499fd on e2-rework_record_logic into cba1e51 on master.

if [nil, '', '.'].include?(dirname) && basename == '.'
@paths[dir.to_s]['.'] = Hash[children.map { |child| [child, {} ] } ]
else
@paths[dir.to_s][rel_path] ||= {}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think the children here is needed because they'll be iterated during _fast_build anyway.
Also Directory::scan would touch each entry populating new directories and their contents.

Also might be a mistake here either way as it only does the children for the root folder?

Lastly the bug with the moving of a folder not showing the old folder in the removed list is still present as the folder itself is not added to its parent directory's hash. This was missing from my patch too. I did a test where in add_dir it adds the directory hash as well as adding itself to the hash of the parent directory and that seemed to work.

So in essencing adding directory takes two actions:

  1. Set up hash for contents
  2. Add it to its parent directory's contents hash

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_fast_build should set things up so that '.' actually contains entries when using dir_entries - that's what add_dir is for here. This is for the previous to return something in Directory.scan on the first call.

Also might be a mistake here either way as it only does the children for the root folder?

That's why I marked it with TODOs - it should be part of the initial build only.

Lastly the bug with the moving of a folder not showing the old folder in the removed list is still present

This needs a separate acceptance test.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above I feel there's separation of thought between '.' and other directories that I don't understand why need treating differently.

Other folders are not populated here, they are populated as each entry is examined in _fast_try_file for example. So why does '.' need populating here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@driskell - I'm just on the fence about it, because so many other things are wrong here. It simplifies the code, but complicates the internal structure - which can get in the way if Record is to handle directories. The Record tests need rework anyway.

Just assume that '.' is fine for now - and I take back what I said about it.

@e2 e2 force-pushed the e2-rework_record_logic branch from d3499fd to 8768e19 Compare May 6, 2015 09:59
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 92.39% when pulling 8768e19 on e2-rework_record_logic into e639493 on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 92.39% when pulling 8768e19 on e2-rework_record_logic into e639493 on master.

@e2 e2 force-pushed the e2-rework_record_logic branch from 8768e19 to 2f8ad95 Compare May 6, 2015 13:10
@coveralls
Copy link

Coverage Status

Coverage increased (+0.44%) to 92.78% when pulling 2f8ad95 on e2-rework_record_logic into e639493 on master.

@e2 e2 force-pushed the e2-rework_record_logic branch from 73bacf3 to a9e8bb9 Compare May 30, 2015 11:55
@coveralls
Copy link

Coverage Status

Coverage increased (+0.26%) to 92.6% when pulling a9e8bb9 on e2-rework_record_logic into e639493 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.4%) to 92.75% when pulling a9e8bb9 on e2-rework_record_logic into e639493 on master.

@e2 e2 force-pushed the e2-rework_record_logic branch from a9e8bb9 to 9281b11 Compare May 30, 2015 13:49
@coveralls
Copy link

Coverage Status

Coverage increased (+0.87%) to 93.22% when pulling 9281b11 on e2-rework_record_logic into e639493 on master.

@e2 e2 force-pushed the e2-rework_record_logic branch from 120f26d to 67f50e7 Compare May 30, 2015 21:28
@coveralls
Copy link

Coverage Status

Coverage increased (+0.88%) to 93.23% when pulling 120f26d on e2-rework_record_logic into e639493 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.81%) to 93.16% when pulling 120f26d on e2-rework_record_logic into e639493 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.59%) to 92.93% when pulling 67f50e7 on e2-rework_record_logic into e639493 on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.59%) to 92.93% when pulling 67f50e7 on e2-rework_record_logic into e639493 on master.

@driskell
Copy link

I haven't forgotten entirely about this. Just very busy and vacation time! I hope to find some time to look through this and give it a trial run soon. Thanks for all the hard work.

@e2
Copy link
Contributor Author

e2 commented Jul 14, 2015

@driskell - I think the next step would be adding directory timestamps. And with TCP out of the picture, it should be easier to make any patches you need (like the ones you already have).

root['.'] = entries
else
root[rel_path] ||= {}
end

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fails to add the directory name to the parent folder's entry list.
For example, if we have a simple directory containing one folder, path, and within that, folder, and within that a file called README, the resulting tree would be:

'.' => {'path' => {}}
'path' => {}
'path/folder' => {'README' => {<file-data>}}

It should be:

'.' => {'path' => {}}
'path' => {'folder' => {}}
'path/folder' => {'README' => {<file-data>}}

Otherwise when Directory::scan is processing deletion for path, it will never recurse folder and not actually raise any deletion events, which it should do for README

This seems to work, and I'll look at making a test unless you're able to find it easier?

    def add_dir(dir, rel_path)
      rel_path = '.' if [nil, '', '.'].include? rel_path
      dirname, basename = Pathname(rel_path).split.map(&:to_s)
      basename = '.' if [nil, '', '.'].include? basename
      root = (@paths[dir.to_s] ||= {})
      dirname = '.' if [nil, '', '.'].include?(dirname)
      entries = (root[dirname] || {})
      entries.merge!(basename => {}) if basename != '.'
      root[dirname] = entries
    end

@driskell
Copy link

I'll leave you with the above for the moment while I test out the recursion processing. I've pulled my own recursion descoping onto the top of your changes and so far so good. Just needs tests I guess!

@driskell
Copy link

I have one question though: Should events for empty directories that are created and removed be produced? I'm not 100% sure if the linux side of things does or if it is meant to. If it's not meant to - then that's another feature for another day.

@driskell
Copy link

Would you advise starting this again on the new 3.x master branch?

@e2
Copy link
Contributor Author

e2 commented Jul 15, 2015

Would you advise starting this again on the new 3.x master branch?

Absolutely! I pretty much want to abandon all active work on 2.x - the upgrade for users should be painless (except for TCP, which has to be redesigned anyway), so there's nothing to lose.

Also, refactoring and fixing (including adding specs) should be easier, since each Record object represents a single directory (e.g. add_dir has only one parameter, not two). Specs are important, because currently you're probably the only person on this planet who can work out what FSEvent events occur in which scenarios. Let me know if you need anything or if you want me to add specs, etc.

Should events for empty directories that are created and removed be produced?

Ultimately, I'd like to create a separate gem to handle dealing with events in a generic way.

The idea is: the backend adapter generates an "invalidation" event, saying e.g. "directory foo has changed and everything inside it" or "something was added to directory foo" or more specific, e.g. "file bar's content was changed". And based on the invalidation, the memory "snapshot" of the filesystem would be invalidated and updated (better backends will have more specific events, which means less invalidating, which means faster updates).

So ideally, the Record object should always represent what's physically on the disk - and events alone should be sufficient enough for the Record to at least know what to scan and update (so if at any time the internal Record is different from what's on the disk - and there are no pending events - then there's a bug somewhere).

The sequence would be:

  1. queuing adapter events
  2. translating adapter events into "invalidation requests" (e.g. FSEvent on a dir - to e.g. :invalidate_tree or something)
  3. scanning/updating the Record (processing invalidation requests, reading file/dir data from filesystem, comparing, md5sums, etc.)
  4. queuing "changes" for Listen to process
  5. continue to step 3 until there are no more pending invalidations
  6. if there are no more events, the Record should match exactly what's on the filesystem

The idea is for Listen to respond just to added/removed/changed files currently. Whether it detects added/removed/changed directories is more in the area of optimization. Ideally, the end result should reflect what's on the filesystem, so even if the directories aren't "reported" yet, the closer to the filesystem things end up, the better (because it allows optimizing and bugfixing).

In short - once Listen finishes handling adapter events, the Record should match what's on the filesystem, or it's a bug.

Let me know if you need anything or you have something working for Listen 3 - anything that can be considered a bug will have high priority anyway (I'll want to add tests and release ASAP).

Oh, and since Celluloid is no longer a dependency, you can refactor as much as you like (the _fast prefixes in method names no longer make sense, since no locking is necessary and processing is sequential too).

And since TCP is gone, you can access the filesystem at any time you want for any reason (the goal is correctness first - then we can use this correctness to properly optimize later).

@driskell
Copy link

Nice. I'll start some work on merging in these changes into the master branch and I'll formulate a list of specs that need creating to prevent regression on the cases I know about, and ultimately document them!

For the invalidation method - that sounds perfect - though I would ensure there is a :invalidate_directory as well as :invalidate_tree. Though I believe I would be accurate in saying the only backend that would need :invalidate_tree is the polling backend and all other backends correctly state where in a tree the change is.

I'll put something forward when I can. Though holidays are coming so may be a couple weeks! I'll keep a watch on the repository too in case any other changes occur that might conflict or supersede the plan we've outlined here.

@e2
Copy link
Contributor Author

e2 commented Jul 15, 2015

though I would ensure there is a :invalidate_directory as well as :invalidate_tree

Yes - very much so. In fact, the types of invalidation events depend entirely on whatever the backend provides. E.g. Polling is pretty much "invalidate everything". FSEvent has a "files" mode, which I believe isn't the default because is spams with a mass of events - but if it was enabled, the invalidations would be much more fine-grained than otherwise.

Invalidations would abstract away the adapters - and it would be up to the Record to decide whether it's worth micro-optimizing or just scanning the whole tree. They would also help avoiding breaking things, since I wouldn't have to care how OSX adapter works, as long as it provide an invalidation with enough "hints" on what could've changed.

And they would help understand other drivers too, e.g. an fsevent on a dir doesn't say much, but saying that a directories contents (children - files or dirs) have changed allows me to simulate the same scenario under Linux - without needing to understand how the OSX adapter works in specific scenarios.

So invalidations would be a "common language" among drivers - so adapters would benefit from each other's optimizations (but without the adapter-specific complexity).

I don't plan any drastic changes in Listen (3.x was already too much work), so take your time with this - I'll probably only be fixing bugs if they are discovered.

@e2 e2 added the ✨ Feature Adds a new feature label Apr 30, 2016
@e2
Copy link
Contributor Author

e2 commented Apr 30, 2016

A lot of work has been put into this. The current code is way too difficult to test, so it would be a waste of your time to improve this.

I need to switch to a new architure in Listen first: #381

But the outcome should be: much easier to optimize and maintain things on OSX.

The upside is: if you're even still interested in this, the new architecture will allow new features, such as:

  • "continuing after restart" (so files changed between listen invocations are treated as new changes to process)
  • "backup-level consistency", so that Listen can become a backup tool
  • reusing parts of listen for other purposes (filesystem snapshots, filesystem diffs, filesystem changelogs)
  • easy support for "fake" filesystems, such as "TCP", "Amazon S3" and other "plugins"

@ioquatix
Copy link
Member

@e2 please rebase on master if you think this work is worth continuing :)

@ioquatix ioquatix closed this Aug 23, 2020
@pjmartorell
Copy link

@e2 could you revive this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ Feature Adds a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants