Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Null-terminated PosixString ByteArray# for performance #168

Open
hasufell opened this issue Sep 28, 2022 · 2 comments
Open

Null-terminated PosixString ByteArray# for performance #168

hasufell opened this issue Sep 28, 2022 · 2 comments

Comments

@hasufell
Copy link
Member

In GitLab by @alt-romes on Sep 29, 2022, 24:12

Hi!

This is issue is meant to discuss some performance tuning and ideas.

I was recently trying to write a programming using directory and filepath for filesystem manipulation.
Something initially in the spirit of du from coreutils.

The problem is the performance is not great.

In writing a recursive directory traversal I must check constantly whether a path is a directory or not to know whether to recurse into it. The function doesDirectoryExist takes 80% of runtime -- which might be right due to having to read the metadata.

I attempted to write a faster fastIsDir by addressing benchmark results.

Here's an attempt:

fastIsDir :: OsPath -> IO Bool
fastIsDir (getPosixString . getOsString -> SBS path) = do
  fp <- mallocForeignPtrBytes (144)
  withForeignPtr fp $ \p -> useAsCString (SBS path) $ \s -> c_stat s p
  return (fileTypeIsDirectory $ fileTypeFromMetadata $ FileStatus fp)

This version ignores errno to go faster, but it clearly shows the use of useAsCString, which copies O(n) the path into a null-terminated bytearray. Since I want to run this on all the files in the filesystem, I think the cost has some impact.

Now, I think that c_stat doesn't need a copy of the string (it's read only, right?), so ideally we would pass path directly to c_stat:

fastIsDir :: OsPath -> IO Bool
fastIsDir (getPosixString . getOsString -> SBS path) = do
  fp <- mallocForeignPtrBytes (144)
  let ptr = Ptr (byteArrayContents# path)
  withForeignPtr fp $ \p -> c_stat ptr p
  return (fileTypeIsDirectory $ fileTypeFromMetadata $ FileStatus fp)

The issue is that path, I believe, is not null-terminated, which makes this code wrong. The only way I see to add a NULL at the end of the string is copying it into a length+1 array and terminate it manually (which is what useAsCString does).

What I want to discuss in this issue is:

Is it possible, since PosixString is opaque up to Internal, to have PosixString be under the hood a null-terminated ByteArray# such that libraries like directory can directly pass it to read-only functions without the need for copying it into the stack to add a null terminator?

I might be able to attempt an implementation, given some pointers.

Thanks,
Romes

@hasufell
Copy link
Member Author

In GitLab by @maerwald on Oct 28, 2022, 10:41

I'm not involved in directory package and try to avoid it usually.

You might be interested in:

Wrt filepaths itself: if you read thousands of files via pinned memory, you will possibly cause very high memory fragmentation. The point of the new type is to avoid memory fragmentation.

See:

  1. https://hasufell.github.io/posts/2022-06-29-fixing-haskell-filepaths.html
  2. https://github.com/hasufell/filepath-debug/blob/master/result.txt

Adding null termination ad-hoc doesn't seem like a good idea for such a specific use case. It will complicate all the code.

I'm also not convinced that you will have better performance with pinned ByteString. Do you have proof?

@hasufell
Copy link
Member Author

hasufell commented Nov 3, 2022

In GitLab by @alt-romes on Nov 3, 2022, 20:31

Okay, I'm in agreement, and indeed, it is a very specific use case. The links you provided are helpful and might be a better alternative.

I don't have concrete proof, I thought it might be a cool idea and hence the issue for discussion.

I think meanwhile we can close this and if I take the chance to think about this again I can reopen it.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant