Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

darwin: use simpleHandleMap that never reuses handle numbers #205

Closed
wants to merge 1 commit into from

Conversation

rfjakob
Copy link
Contributor

@rfjakob rfjakob commented Mar 13, 2018

Implement simpleHandleMap that never reuses handle numbers and
uses a simple map to track handle numbers.

This seems to get rid of the

osxfuse: vnode changed generation

errors we are seeing on darwin, while sacrificing
some performance.

I have tested simpleHandleMap on darwin and linux,
seems to work fine.

rfjakob/gocryptfs#213
#204
macfuse/macfuse#482

Implement simpleHandleMap that never reuses handle numbers and
uses a simple map to track handle numbers.

This seems to get rid of the

	osxfuse: vnode changed generation

errors we are seeing on darwin, while sacrificing
some performance.

I have tested simpleHandleMap on darwin and linux,
seems to work fine.

rfjakob/gocryptfs#213
hanwen#204
macfuse/macfuse#482
t.Log("expected recovery from: ", r)
} else {
panic(s)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The markSeen function was not used anywhere

@hanwen
Copy link
Owner

hanwen commented Mar 14, 2018

I have one concern with this, which is that the other FUSE library (rsc's library and bazil.org which was derived from it.) uses the same approach here. Can you confirm that they too see the vnode changed messages?

@rfjakob
Copy link
Contributor Author

rfjakob commented Mar 14, 2018

I'll try to find out.

Even with simpleHandleMap I see (different) osxfuse messages (something about disappearing nodes), but I don't see how we can code this even more defensively to keep osxfuse happy

@rfjakob
Copy link
Contributor Author

rfjakob commented Mar 14, 2018

I could not reproduce the problem the bazil-based keybase/loopback. Still getting lots of

kernel: (osxfuse) osxfuse: disappearing vnode <private> (name=asm type=2 action=80000002)

but no "vnode changed generation" and no errors returned to user-space.

Also, I have a user report that simpleHandleMap makes the problem appear less often but does not fix it, so this seems to be a dead end.

@rfjakob rfjakob closed this Mar 14, 2018
@rfjakob
Copy link
Contributor Author

rfjakob commented Mar 16, 2018

Plot twist: I had the user re-test the change, and it does fix the problem for him:

Ouch, my bad! I was still using the executable installed by homebrew. I copied 20GB and used the mount all day for all sorts of stuff and I do not have any issues anymore!

This matches what I see on the Mac box I managed to get my hands on.

While it is curious why it does not happen for bazil, MacOS kernel module development seems to be a world of pain - I gave up when I had to edit an sqlite file in recovery mode to get my self-compiled osxfuse.kext to load.

I suggest to merge this workaround that fixes the problem with no obvious downsides.

@rfjakob rfjakob reopened this Mar 16, 2018
@hanwen
Copy link
Owner

hanwen commented Mar 16, 2018

see

m.generation++
here

we get the generation number from the handleMap, and we increase it when we hand out a new node, but the effect is persisted, ie. in the sequence

lookup -> new node A, gen = 1
lookup -> new node B, gen = 2
lookup -> existing node A, gen = 2

A will change generation numbers.

Rather than papering the problem over without us understanding what is going on, can we fix the real bug here?

@rfjakob
Copy link
Contributor Author

rfjakob commented Mar 16, 2018

Oh, now I see it. I'll prepare a patch.

@rfjakob
Copy link
Contributor Author

rfjakob commented Mar 16, 2018

And bazil does not have this bug because it returns the saved generation number:

https://github.com/bazil/fuse/blob/371fbbdaa8987b715bdd21d6adc4c9b20155f748/fs/serve.go#L489

	if id, ok := c.nodeRef[node]; ok {
		sn := c.node[id]
		sn.refs++
		return id, sn.generation
	}

rfjakob added a commit to rfjakob/go-fuse that referenced this pull request Mar 16, 2018
We used to hand out a new generation number even
for already-known handles. This does not seem to
cause problems on Linux, but osxfuse throws errors
to userspace with

	osxfuse: vnode changed generation

showing up in the kernel log.

Introduce a per-handle generation number that stays
constant until the handle is forgotten.

Fixes hanwen#204
See also hanwen#205
rfjakob added a commit to rfjakob/go-fuse that referenced this pull request Mar 16, 2018
We used to hand out a new generation number even
for already-known handles. This does not seem to
cause problems on Linux, but osxfuse throws errors
to userspace with

	osxfuse: vnode changed generation

showing up in the kernel log.

Introduce a per-handle generation number that stays
constant until the handle is forgotten.

Tested on Linux and MacOS via gocryptfs.

Fixes hanwen#204
See also hanwen#205
rfjakob added a commit to rfjakob/go-fuse that referenced this pull request Mar 16, 2018
We used to hand out a new generation number even
for already-known handles. This does not seem to
cause problems on Linux, but osxfuse throws errors
to userspace with

	osxfuse: vnode changed generation

showing up in the kernel log.

Introduce a per-handle generation number that stays
constant until the handle is forgotten.

Tested on Linux and MacOS via gocryptfs.

Fixes hanwen#204
See also hanwen#205
@rfjakob rfjakob closed this Mar 16, 2018
hanwen pushed a commit that referenced this pull request Mar 18, 2018
We used to hand out a new generation number even
for already-known handles. This does not seem to
cause problems on Linux, but osxfuse throws errors
to userspace with

        osxfuse: vnode changed generation

showing up in the kernel log.

Introduce a per-handle generation number that stays
constant until the handle is forgotten.

Tested on Linux and MacOS via gocryptfs. Add test for handle map
behavior.

Fixes #204
See also #205
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants