Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fseek() shouldn't try to seek to the same file position #58

Open
mikrosk opened this issue Aug 13, 2023 · 4 comments
Open

fseek() shouldn't try to seek to the same file position #58

mikrosk opened this issue Aug 13, 2023 · 4 comments

Comments

@mikrosk
Copy link
Member

mikrosk commented Aug 13, 2023

I have found out that apparently some apps rely on the fact that code like fseek(f, 0, SEEK_CUR); and even fseek(f, <current_offset>, SEEK_SET); doesn't take any cycles.

In fseek I was able to hack and verify an early return of SEEK_CUR but handling SEEK_SET seems to be above my paycheck. I'd assume that comparing offset and stream->__offset should be enough but apparently isn't as the performance hit was still present.

@th-otto
Copy link
Contributor

th-otto commented Aug 14, 2023

What hack do you mean? Even fseek(f, 0, SEEK_CUR) has to do some work, like resetting the eof condition.

@mikrosk
Copy link
Member Author

mikrosk commented Aug 14, 2023

The use case is simple: there is a code which reads byte-by-byte a 16 KB stream with fseek(f, <current_offset>, SEEK_SET) after every read. That makes the reading crippled as hell (3m30s vs. 5s when fseek() is omitted) while nothing even remotely similar can be observed on Linux.

So it's possible that Linux/glibc does set the EOF condition but I'm sure that it doesn't access disk. (and yes, my hack was just to early return without setting anything so perhaps wasn't correct anyway).

@th-otto
Copy link
Contributor

th-otto commented Aug 14, 2023

Then i would say fix the "a code", rather than breaking fseek for all others. I've never seen such nonsense of seeking for any byte.

Looking at our implementation of fseek, all this needs to be done. Only thing that might be worth to take a look at the call to __stdio_check_offset, but even that seems IMHO be needed, if the application did a lseek() on the file-descriptor.

@mikrosk
Copy link
Member Author

mikrosk commented Aug 14, 2023

Actually the use case isn't total rubbish: imagine you have a file which contains a packed set of other files. And you want to access this file using one stream (i.e. not always open, seek, read and close). So what do you do is that you keep that file open and seek to the right position every time you need to read that "subfile".

Of course, the proper implementation is to read the "subfile" (i.e. seek & read only once) but that is not always possible (imagine a huge archive which consists of other huge "subfiles"). In such case you have to first make sure that the first read is properly seek'ed and then the subsequent reads aren't.

Which is much more complex than a simple readByte() function which wraps seek & read and always works.

I'm not saying they shouldn't make the effort to do it properly, just showing that this is real production code and while I was able to fix this particular case, there may be others slowing things down on our platform so it's worth fixing in mintlib, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants