You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I hit this while debugging my biorq patch, I think it may be a bug in the code already:
[1355953451:002213 msfsthr20:7fff13fff700:def main.c mslfsop_write 2316] fid: 0x048c00000005969e, size from 2560 to 6062
[1355953451:002226 msfsthr20:7fff13fff700:fcmh main.c mslfsop_write 2338] fcmh@0x8b2820 f+g=0x048c00000005969e:0 flg=0x63104:BAS+ ref=4 sz=6062 bsz=262144 mode=0100664 : write: buf=0x7fffec6e9060 rc=0 sz=3502 off=2560
[1355953451:004472 mseqpollthr:7fff8effd700:rpc events.c pscrpc_reply_in_callback 281] req@0x7ffed8001570 x196539/t0 cb=(nil) c0 o38->@54321-128.182.99.29@tcp10:10 lens 304/304 ref 1 res 0 ret 0 fl Rpc:/0/0 replyc 0 rc 0/0 to=60 sent=1355953450 :: type 1, status 0 initiator ;54321-128.182.99.29@tcp10;
[1355953451:004513 msfsthr13:7ffef17fa700:rpc rpcclient.c pscrpc_check_reply 595] req@0x7ffed8001570 x196539/t0 cb=(nil) c0 o38->@54321-128.182.99.29@tcp10:10 lens 304/304 ref 1 res 0 ret 0 fl Rpc:R/0/0 replyc 0 rc 0/0 to=60 sent=1355953450 :: REPLIED:
[1355953451:004599 msfsthr13:7ffef17fa700:rpc rpcclient.c pscrpc_check_reply 623] req@0x7ffed8001570 x196539/t0 cb=(nil) c0 o38->@54321-128.182.99.29@tcp10:10 lens 304/304 ref 1 res 0 ret 0 fl Rpc:R/0/0 replyc 0 rc 0/0 to=60 sent=1355953450 :: rc=1
[1355953451:004620 msfsthr13:7ffef17fa700:rpc rpcclient.c pscrpc_queue_wait 805] req@0x7ffed8001570 x196539/t0 cb=(nil) c0 o38->@54321-128.182.99.29@tcp10:10 lens 304/304 ref 1 res 0 ret 0 fl Rpc:R/0/0 replyc 0 rc 0/0 to=60 sent=1355953450 :: completed (rc=0), replied=1
[1355953451:004641 msfsthr13:7ffef17fa700:rpc rpcclient.c pscrpc_unregister_reply 637] req@0x7ffed8001570 x196539/t0 cb=(nil) c0 o38->@54321-128.182.99.29@tcp10:10 lens 304/304 ref 1 res 0 ret 0 fl Rpc:R/0/0 replyc 0 rc 0/0 to=60 sent=1355953450 :: receving_reply=0
[1355953451:004674 msfsthr13:7ffef17fa700:def main.c mslfsop_setattr 2201] sstb@0x8b2820 fg:0x048c00000005969e:0 dev:0 mode:0100664 nlink:1 uid:1000 gid:1000 rdev:0 sz:2560 blksz:2728 blkcnt:0 atime:1355953452:689921000 mtime:1355771694:0 ctime:1355953452:702393000 fcmh 0x8b2820 post setattr
[1355953451:004703 msfsthr13:7ffef17fa700:def fuse.c pscfs_reply_setattr 1314] returned item 0x11e1a90 to pool fsrq
[1355953451:004719 msfsthr13:7ffef17fa700:rpc niobuf.c _pscrpc_free_req 698] req@0x7ffed8001570 x196539/t0 cb=(nil) c0 o38->@54321-128.182.99.29@tcp10:10 lens 304/304 ref 0 res 0 ret 0 fl Interpret:R/0/0 replyc 0 rc 0/0 to=60 sent=1355953450 :: freeing (rqcomp_compcnt=-999)
[1355953451:004738 msfsthr13:7ffef17fa700:def main.c mslfsop_setattr 2245] csvc@0x7fff30000d10 fl=0x2:O ref:0 decref
[1355953451:004888 msfsthr20:7fff13fff700:def fuse.c pscfs_fuse_handle_flush 772] got item 0x11e1a40 from pool fsrq
[1355953451:004916 msfsthr20:7fff13fff700:fcmh main.c mslfsop_flush 1497] fcmh@0x8b2820 f+g=0x048c00000005969e:0 flg=0x62104:BA+ ref=3 sz=2560 bsz=2728 mode=0100664 : flushing (mfh=0x7ffee4001e90)
[1355953451:004938 msfsthr20:7fff13fff700:fcmh main.c mslfsop_flush 1503] fcmh@0x8b2820 f+g=0x048c00000005969e:0 flg=0x62104:BA+ ref=3 sz=2560 bsz=2728 mode=0100664 : done flushing (mfh=0x7ffee4001e90)
On 2012-12-20 12:37:29 -0500, Zhihui Zhang wrote:
post setattr -----> sz:2560
Before that: size from 2560 to 6062
Maybe there are some callback intween not captured by the trace
Anyway, I beef up the trace so we can find more evidence next time.
Right now I suspect MDS.
msfsthr13: a setattr gets in while the write is still in progress. I suspect it uses the premature size value to set it.
On 2013-08-05 12:57:25 -0400, Jared Yanovich wrote:
If an application is by nature racing a nonzero truncate(2) with write(2) activity, this may be possible. I think such behavior would be undefined according to POSIX.
Since even mslfsop_setattr() does this:
fcmh_setattrf(c, &mp->attr, FCMH_SETATTRF_SAVELOCAL |
FCMH_SETATTRF_HAVELOCK);
Meaning local values will be preserved.
Further, we probably need "pre setattr" and "post setattr" psclogs to figure out when the SETATTR gets issued.
My suspicion is that something is doing SETSIZE unbeknownst to us that is wiping out the SAVELOCAL. We specifically prohibit nonzero truncate(2) right now so it cannot be an application truncate(2) call. This should be easy to trace if I have the steps to reproduce it.
The text was updated successfully, but these errors were encountered:
On 2012-12-20 12:35:00 -0500, Zhihui Zhang wrote:
The text was updated successfully, but these errors were encountered: