disk cache #82

ghaering · 2016-05-25T00:17:10Z

You write that goofys does not have a disk cache. Is that because of a design decision or has it just not been implemented yet. Would you possibly accept a patch that implements it?

I'm asking because we are currently using s3fs but we have run into stability problem and would perhaps use an alternative. Some kind of cache is an absolute requirement, though.

Have you thought about goofys => caching http proxy => s3 yet as an alternative to a disk cache? Has this been tried by anybody, yet?

kahing · 2016-05-25T19:59:41Z

Ideally I want to see a generic fuse cache. Implementing all these caching each time someone writes a fuse filesystem doesn't make a lot of sense to me.

I am not aware of anyone trying a caching http proxy, it may work if you make it transparent at the network level. Alternatively you can write a middleware for https://github.com/andrewgaul/s3proxy that does caching.

kahing · 2016-06-07T00:40:39Z

Could you describe a bit about the caching use cases?

blampe · 2016-06-07T18:02:14Z

High-read cases where you don't want or need to re-fetch from S3. In my case I use s3fs with -o use_cache to store files locally and reduce unnecessary API calls to S3 because I read those files many times per day.

I agree that a generic cache at the FUSE level is the correct place to solve this, but I don't know how feasible that is.

ghaering · 2016-06-08T08:36:02Z

My use case is the same. I'm currently using s3fs and am depending on cached reads being fast. One set of servers does read and write operations on the filesystem (video editing), but the most important set of servers that do strictly video rendering mount S3 read-only and only read from the filesystem. These really need to have the cache.

I agree that a generalized caching solution would be neat, but if that involves backwards incompatible changes to the kernel's fuse interface, this is years down the road, if ever.

Thinking of it, maybe it is doable to create a new caching fuse filesystem that wraps goofys?

kahing · 2016-06-09T18:35:19Z

and you'd be doing reads from the same videos when you do rendering many times a day? how fast do you need this to be? How big are these files? Do you read the entire files end to end? How much do you care about cache invalidation? For writes, do you write through or write back?

A caching fuse fs is definitely doable (with the caveat of extra memory copy), goofys can have some build in support to mount the additional layer. If only CacheFS supports fuse, sigh.

ghaering · 2016-06-09T22:11:10Z

To be exact, we render videos from basically a definition file and accompanying files (image files, movie files, fonts, tracking files, etc.). In 95 % of the cases, we read the entire file. It's rare that we actually seek in the files. We need some kind of cache invalidation, currently we're fine with the configurable metadata cache in s3fs. I think it's something like 5 seconds or 30 seconds after which s3fs will recheck with S3 if the file/object has changed.

I don't understand what write-through vs write-back would mean in this context. I would always be fine with the write ending up in S3 only if the file has been closed. The file should then be uploaded to s3 immediately though.

kahing · 2016-06-10T18:08:18Z

when you said you need cache, how much speed up is it providing? ie: what's the read perf for goofys (without cache) vs s3fs (with cache)?

ghaering · 2016-06-16T14:11:18Z

I'll try to come up with some numbers the next days.

ghaering · 2016-06-16T15:49:18Z

Ok, this was interesting. I did some measurements on one of our production servers.

s3fs

Size of data:

root@replaced-host:/data/ifs-cache# du -shx /ifs/iv-project-data/anonymized-project/
1.7G    /ifs/iv-project-data/anonymized-project/

s3fs uncached read:

root@replaced-host:/data/ifs-cache# time find /ifs/iv-project-data/anonymized-project/ -type f -exec cat {} > /dev/null \;

real    0m20.444s
user    0m0.002s
sys 0m0.727s

then, cached read:

root@replaced-host:/data/ifs-cache# time find /ifs/iv-project-data/anonymized-project/ -type f -exec cat {} > /dev/null \;

real    0m1.526s
user    0m0.007s
sys 0m0.734s

trying goofys now

first run

root@replaced-host:/data/ifs-cache# time find /mnt/iv-project-data/anonymized-project/ -type f -exec cat {} > /dev/null \;

real    0m19.522s
user    0m0.002s
sys     0m0.668s

subsequent runs

root@replaced-host:/data/ifs-cache# time find /mnt/iv-project-data/anonymized-project/ -type f -exec cat {} > /dev/null \;

real    0m0.570s
user    0m0.000s
sys     0m0.233s

aha, goofys works with the filesystem cache

echo 3 > /proc/sys/vm/drop_caches

root@replaced-host:/data/ifs-cache# time find /mnt/iv-project-data/anonymized-project/ -type f -exec cat {} > /dev/null \;

real    0m17.851s
user    0m0.001s
sys     0m0.718s

not the latest Linux, by any means:

root@replaced-host:/data/ifs-cache# uname -a
Linux replaced-host.impossible.io 3.13.0-86-generic #131-Ubuntu SMP Thu May 12 23:33:13 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
root@replaced-host:/data/ifs-cache# cat /etc/issue
Ubuntu 14.04.4 LTS \n \l

Reading from a c3.2xlarge EC2 instance in eu-west-1 from a bucket in the same region.

ghaering · 2016-06-16T15:52:26Z

Unfortunately, the filesystem cache does not do the job for us. The applications on the host use almost all the available memory for their own caches and we need reads to be fast even (and especially) if the data has not been accessed for minutes or hours. If the data is "hot" we have it in our own memory cache anyway.

Funny to see that goofys works with the Linux buffer cache, while s3fs does not.

kahing · 2016-06-16T16:07:27Z

I suspect this is just a case of metadata operations taking longer in s3fs.

kahing · 2017-07-22T00:38:45Z

So I've implemented a generic caching filesystem in fuse: https://github.com/kahing/catfs/ . Happy to hear feedback from people who try to use it with goofys!

valentijnvenus · 2017-08-20T23:53:52Z

@kahing Thanks for adding caching!

any ideas how to pass on --uid= --gid= flags to catfs?

for some reason, when --cache is used from fstab the uid and gid are not respected as i'm getting a "Permission denied." error when listing objects using 'ls -lah' (but perhaps it's my own ignorance). With 'sudo' the listing works.

Also, the additional catfs flags are unclear to me in fstab-context. I have translated your example:

--cache "--free:10%:$HOME/cache"

As follows:
goofys#bucket-name /mnt/mount-bucket fuse _netdev,nosuid,nodev,allow_other,rw,--uid=106,--gid=111,--cheap,--storage-class=STANDARD,--cache=--free:10%:/cache/dir 0 0

(which doesn't throw an error, but mount is unsuccessful). When omitting the '--free:10%:' flag as-in:
--cache=/cache/ddl

all is well.

kahing · 2017-08-21T00:25:45Z

you probably need to plumb through -o allow_other to catfs, something like:

goofys#bucket    /mountpoint     fuse    _netdev,allow_other,--uid=1001,--gid=1001,--cache=-oallow_other:--free:10%:/cache/dir  0       0

uid and gid shouldn't be necessary for catfs in this context.

kahidna · 2017-08-25T04:19:07Z

hi @kahing,
I try to mount bucket using cache enabled. the bucket is mounted, but the cache seem not working and the catfs is exited. here is the log that I get:

/tmp/new-goofys/bin/goofys[47199]: s3.INFO Switching from region 'us-east-1' to 'eu-west-1'
/tmp/new-goofys/bin/goofys[47199]: main.INFO Starting catfs [--free 10% --test -ononempty -- /mount/point /tmp/mount/point  /mount/point]
/tmp/new-goofys/bin/goofys[47199]: main.INFO File system has been successfully mounted.
/tmp/new-goofys/bin/goofys[47199]: main.ERROR catfs exited: <nil>

and here is the command that I used :

./goofys bucket-name /mount/point -o rw,allow_other,--uid=1000,-gid=1000,--file-mode=0766,-dir-mode=0766,--cheap,--storage-class=STANDARD,--cache=-oallow_other:--free:10%:/tmp/mount/point

kahing · 2017-08-27T03:01:28Z

@kahidna could you try to enable RUST_LOG=debug and see what catfs outputs?

kahidna · 2017-08-28T07:32:49Z

@kahing if I run catfs after mount the bucket (with or without --cache) I got this error message :

# RUST_LOG=catfs=debug /usr/bin/catfs /mount/point /tmp/mount/point /mount/point 
fuse: mountpoint is not empty
fuse: if you are sure this is safe, use the 'nonempty' mount option
2017-08-28 05:50:50 ERROR - Cannot mount: Success (os error 0) stack backtrace:
   0:     0x562b36e7001c - backtrace::backtrace::trace::h93a8949259c4b9ca
   1:     0x562b36e70162 - backtrace::capture::Backtrace::new::h474ef00b29d1a9be
   2:     0x562b36e3fd43 - <catfs::catfs::error::RError<E>>::from::he56b75089512e52a
   3:     0x562b36e4d6aa - catfs::main_internal::h0d58a3dde981e4d7
   4:     0x562b36e4a443 - catfs::main::hbc5b80fce6257d5f
   5:     0x562b36f5afaa - panic_unwind::__rust_maybe_catch_panic
                        at /checkout/src/libpanic_unwind/lib.rs:98
   6:     0x562b36f546e0 - std::panicking::try<(),closure>
                        at /checkout/src/libstd/panicking.rs:433
                         - std::panic::catch_unwind<closure,()>
                        at /checkout/src/libstd/panic.rs:361
                         - std::rt::lang_start
                        at /checkout/src/libstd/rt.rs:59
   7:     0x7fd20e1fff44 - __libc_start_main
   8:     0x562b36e1cc79 - <unknown>
   9:                0x0 - <unknown> here is what I already tried :

and if I run catfs first, then mount the bucket, here is the output

# RUST_LOG=catfs=debug /usr/bin/catfs /mount/point /tmp/mount/point  /mount/point
2017-08-28 07:11:46 DEBUG - catfs "/mount/point" "/tmp/mount/point "
2017-08-28 07:11:46 DEBUG - total: 927280704 free: 914066947 to_evict: 0
2017-08-28 07:12:01 DEBUG - <-- getattr 1 "" 4096 bytes
2017-08-28 07:12:46 DEBUG - total: 927280704 free: 914066811 to_evict: 0
2017-08-28 07:13:46 DEBUG - total: 927280704 free: 914066569 to_evict: 0

and here is the goofys log :

Aug 28 07:19:48 a /usr/bin/goofys[30699]: s3.INFO Switching from region 'us-east-1' to 'eu-west-1'
Aug 28 07:19:48 a /usr/bin/goofys[30699]: main.INFO File system has been successfully mounted.

mounting works fine, but folder /tmp/mount/point is empty, not create any file or folder after I access /mount/point.
And when the catfs killed/terminated, get message ls: cannot access /mount/point : Transport endpoint is not connected when list files using command /ls mount/point

kahing · 2017-08-30T08:40:48Z

if you want to mount catfs over the same goofys mountpoint, you need to use -ononempty like the error message suggested

valentijnvenus · 2017-09-02T08:50:48Z

Further to the above, i'm getting the following catfs exit message:

$ sudo goofys -f --debug_fuse --uid 106 --gid 111 --dir-mode 0777 --file-mode 0666 --cheap --cache "-oallow_other:-ononempty:--free:10%: /tmp/mount/point" -o allow_other -o nonempty BUCKETNAME /mount/point

2017/09/02 07:19:02.149100 s3.INFO Switching from region 'us-east-1' to 'eu-west-1'
2017/09/02 07:19:02.182055 fuse.DEBUG Op 0x00000001        connection.go:396] <- init
2017/09/02 07:19:02.182104 fuse.DEBUG Op 0x00000001        connection.go:479] -> OK ()
2017/09/02 07:19:02.182143 main.INFO Starting catfs [-ononempty --free 10% --test -ononempty -- /mnt/s3-ddl /cache/ddl /mnt/s3-ddl]
2017/09/02 07:19:02.182777 main.INFO File system has been successfully mounted.
2017/09/02 07:19:02.183622 fuse.DEBUG Op 0x00000002        connection.go:396] <- GetInodeAttributes (inode 1)
2017/09/02 07:19:02.183879 fuse.DEBUG GetAttributes 1 0xc42014d000 []
2017/09/02 07:19:02.183907 fuse.DEBUG Op 0x00000002        connection.go:479] -> OK ()
2017/09/02 07:19:02.184150 main.ERROR catfs exited: <nil>

Using https://github.com/kahing/catfs/releases/download/v0.4.0/catfs Any ideas?

kahing · 2017-09-02T22:18:27Z

Like what I suggested above, could you set RUST_LOG=debug to see what catfs is exiting?

kahing · 2017-09-02T22:19:30Z

if you are launching catfs with goofys, you don't need to explicitly set -ononempty. Also, there's an extra space in "-oallow_other:-ononempty:--free:10%: /tmp/mount/point"

valentijnvenus · 2017-09-03T13:13:36Z

@kahing my bad...the extra space was a typo (not there in my script). I tried enabling RUST_LOG=debug as follows:

$ RUST_LOG=debug RUST_BACKTRACE=1 sudo goofys -f --debug_fuse --uid 106 --gid 111 --dir-mode 0777 --file-mode 0666 --cheap --cache "-oallow_other:--free:10%:$HOME/cache" -o allow_other -o nonempty BUCKETNAME /mnt/goofys-ddl
2017/09/03 13:09:15.124376 s3.INFO Switching from region 'us-east-1' to 'eu-west-1'
2017/09/03 13:09:15.161430 fuse.DEBUG Op 0x00000001        connection.go:396] <- init
2017/09/03 13:09:15.161508 fuse.DEBUG Op 0x00000001        connection.go:479] -> OK ()
2017/09/03 13:09:15.161540 main.INFO Starting catfs [--free 10% --test -ononempty -- /mnt/goofys-ddl /home/ubuntu/cache /mnt/goofys-ddl]
2017/09/03 13:09:15.162127 main.INFO File system has been successfully mounted.
2017/09/03 13:09:15.163014 fuse.DEBUG Op 0x00000002        connection.go:396] <- GetInodeAttributes (inode 1)
2017/09/03 13:09:15.163335 fuse.DEBUG GetAttributes 1 0xc4202a44d0 []
2017/09/03 13:09:15.163368 fuse.DEBUG Op 0x00000002        connection.go:479] -> OK ()
2017/09/03 13:09:15.163612 main.ERROR catfs exited: <nil>
2017/09/03 13:09:57.430664 fuse.DEBUG Op 0x00000003        connection.go:396] <- StatFS
2017/09/03 13:09:57.430858 fuse.DEBUG Op 0x00000003        connection.go:479] -> OK ()
2017/09/03 13:09:57.434099 fuse.DEBUG Op 0x00000004        connection.go:396] <- StatFS
2017/09/03 13:09:57.434453 fuse.DEBUG Op 0x00000004        connection.go:479] -> OK ()

And yes, i'm launching catfs with goofys. Any ideas why catfs exited?

kahing · 2017-09-04T21:52:26Z

sudo is probably masking RUST_LOG, could you add -E to sudo?

kahing · 2017-09-04T22:29:59Z

@valentijnvenus @kahidna should be fixed, thanks for reporting!

…82) Both in s3 and azure, it is possible that when we call listPrefix with limit=N, we might get a result whose size is smaller than N, but still has a continuation token. This behaviour does not hurt when we are listing the fuse files under a directory. But when doing dir checks i.e. 1) testing if the given path is a directory 2) if a given directory is empty, this can make goofys wrongly think a directory is empty or a given prefix is not a directory. Add a wrapper in list.go, that does this: If the backend returns less items than requested and has a continuation token, it will use the continuation token to fetch more items.

gaul mentioned this issue Jun 9, 2016

Read caching middleware gaul/s3proxy#140

Open

kahing closed this as completed in 4a0551a Sep 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk cache #82

disk cache #82

ghaering commented May 25, 2016

kahing commented May 25, 2016

kahing commented Jun 7, 2016

blampe commented Jun 7, 2016

ghaering commented Jun 8, 2016

kahing commented Jun 9, 2016

ghaering commented Jun 9, 2016

kahing commented Jun 10, 2016

ghaering commented Jun 16, 2016

ghaering commented Jun 16, 2016

ghaering commented Jun 16, 2016

kahing commented Jun 16, 2016

kahing commented Jul 22, 2017 •

edited

Loading

valentijnvenus commented Aug 20, 2017 •

edited

Loading

kahing commented Aug 21, 2017 •

edited

Loading

kahidna commented Aug 25, 2017 •

edited

Loading

kahing commented Aug 27, 2017

kahidna commented Aug 28, 2017

kahing commented Aug 30, 2017 •

edited

Loading

valentijnvenus commented Sep 2, 2017

kahing commented Sep 2, 2017

kahing commented Sep 2, 2017

valentijnvenus commented Sep 3, 2017 •

edited

Loading

kahing commented Sep 4, 2017

kahing commented Sep 4, 2017

disk cache #82

disk cache #82

Comments

ghaering commented May 25, 2016

kahing commented May 25, 2016

kahing commented Jun 7, 2016

blampe commented Jun 7, 2016

ghaering commented Jun 8, 2016

kahing commented Jun 9, 2016

ghaering commented Jun 9, 2016

kahing commented Jun 10, 2016

ghaering commented Jun 16, 2016

ghaering commented Jun 16, 2016

s3fs

trying goofys now

ghaering commented Jun 16, 2016

kahing commented Jun 16, 2016

kahing commented Jul 22, 2017 • edited Loading

valentijnvenus commented Aug 20, 2017 • edited Loading

kahing commented Aug 21, 2017 • edited Loading

kahidna commented Aug 25, 2017 • edited Loading

kahing commented Aug 27, 2017

kahidna commented Aug 28, 2017

kahing commented Aug 30, 2017 • edited Loading

valentijnvenus commented Sep 2, 2017

kahing commented Sep 2, 2017

kahing commented Sep 2, 2017

valentijnvenus commented Sep 3, 2017 • edited Loading

kahing commented Sep 4, 2017

kahing commented Sep 4, 2017

kahing commented Jul 22, 2017 •

edited

Loading

valentijnvenus commented Aug 20, 2017 •

edited

Loading

kahing commented Aug 21, 2017 •

edited

Loading

kahidna commented Aug 25, 2017 •

edited

Loading

kahing commented Aug 30, 2017 •

edited

Loading

valentijnvenus commented Sep 3, 2017 •

edited

Loading