-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster $driver$get_hash() for the RDS driver #96
Comments
Hmm... the speed differences from https://gist.github.com/hadley/6353939 do seem really nice at first glance. path <- file.path(R.home("doc"), "COPYING")
size <- file.info(path)$size
microbenchmark::microbenchmark(
old = readLines(path),
new = readChar(path, size, useBytes = TRUE)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> old 224.137 226.2785 243.32306 232.1250 255.6135 302.260 100 b
#> new 61.374 64.3230 70.43012 67.8185 72.3140 124.196 100 a However, at least for writeLines(digest::digest(NA, "md5"), "key")
path <- "key"
microbenchmark::microbenchmark(
old = readLines(path),
new = readChar(path, nchars = 32, useBytes = FALSE)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> old 32.795 35.1310 37.86379 36.4545 37.7230 59.820 100 b
#> new 24.267 25.8445 28.56502 26.5890 27.6045 171.191 100 a I will still open a PR in case you think it is worth the change. |
The first C++ solution from https://gist.github.com/hadley/6353939 is much faster. writeLines(digest::digest(NA, "md5"), "key")
path <- "key"
Rcpp::sourceCpp("~/read-file.cpp")
microbenchmark::microbenchmark(
old = readLines(path),
new = readChar(path, nchars = 32, useBytes = FALSE),
cpp1 = read_file_cpp1(path),
cpp2 = read_file_cpp1(path)
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> old 31.072 33.9355 36.03173 35.1105 36.6300 112.651 100 a
#> new 22.445 23.9945 25.72207 25.5470 26.4130 44.078 100 a
#> cpp1 6.665 7.1135 9.42972 7.8900 11.1230 42.210 100 a
#> cpp2 6.679 7.3545 139.04901 10.5105 11.8915 12931.184 100 a |
One thing I wondered: is it possible to go any faster if we drop even farther down to pure C? Overly optimistic C function: // read.c
void readkey_c() {
char buf[32];
FILE *fp;
fp = fopen("key", "rb");
fread(buf, 32, 1, fp);
// printf("%s", buf);
fclose(fp);
return;
} Benchmarks: writeLines(digest::digest(NA, "md5"), "key")
path <- "key"
Rcpp::sourceCpp("~/read-file.cpp")
dyn.load("~/read.so") # R CMD SHLIB read.c -i read.so
microbenchmark::microbenchmark(
old = readLines(path),
new = readChar(path, nchars = 32, useBytes = FALSE),
cpp1 = read_file_cpp1(path),
cpp2 = read_file_cpp1(path),
c = .C("readkey_c")
)
#> Unit: microseconds
#> expr min lq mean median uq max neval cld
#> old 35.464 38.750 43.38168 40.7825 43.3590 129.943 100 a
#> new 25.399 29.286 31.87871 30.7565 31.9520 56.339 100 a
#> cpp1 7.128 8.148 161.75054 11.0595 12.4700 15076.276 100 a
#> cpp2 7.196 8.290 11.45663 10.9160 12.4450 81.264 100 a
#> c 6.968 9.398 10.89305 10.4880 11.1485 32.054 100 a Created on 2019-01-08 by the reprex package (v0.2.1) The performance bump from dropping down to C is negligible. I think the best we can do is this: #include <fstream>
#include <sstream>
#include <string>
#include <Rcpp.h>
using namespace Rcpp;
// [[Rcpp::export]]
CharacterVector read_file_cpp1(std::string path) {
std::ifstream t(path.c_str());
std::stringstream ss;
ss << t.rdbuf();
return ss.str();
} |
Fixed in #98 |
I am trying to speed up
drake
(ref: ropensci/drake#647, ropensci/drake#655). Note whereself_hash()
appears in the middle of this flame graph. Candidate alternatives toreadLines()
: https://gist.github.com/hadley/6353939.Would you be open to a PR that dropped down into C?
The text was updated successfully, but these errors were encountered: