Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Trace file shows Source tree scan stage takes 38+ seconds #10880

Closed
jchavarri opened this issue Sep 4, 2024 · 23 comments · Fixed by #10884
Closed

Performance: Trace file shows Source tree scan stage takes 38+ seconds #10880

jchavarri opened this issue Sep 4, 2024 · 23 comments · Fixed by #10884

Comments

@jchavarri
Copy link
Collaborator

We continue trying to figure out in which ways we could optimize our Dune build setup at Ahrefs.

One of the things that quickly stands out is that when enabling --trace-file, we can see that the "Source tree scan" step of folder . takes more than 38s.

One of the instances of the Source tree scan events shows the folder backend/api/src, which takes ~37s. But after that event, there's only very short lived events which don't add much information.

Is there any way in which we could modify Dune to get more granular information about where the time is being spent? For example, if would be interesting to know if this time is spent parsing rules (of which dune files), executing commands that generate sexp files, etc. If those things is something that can happen during the Source tree scan stage, of course.

Please find below the trace file (obtained with dune build @fe.all @aa --trace-file fe.all-aa.txt) and a screenshot loading the trace in Brave (Chrome-like browser):

dune-build-fe.all-aa-trace.txt

image

Specifications

  • Version of dune: 3.15.0
  • Version of ocaml: 4.14.0
  • Operating system (distribution and version): Debian 12, kernel 6.5.0
@ejgallego
Copy link
Collaborator

@jchavarri , are your sources in this case stored on some kind of networked file system?

@jchavarri
Copy link
Collaborator Author

No, they are local. The only non-conventional thing of this setup is that the build is running in a remote machine that I access through ssh.

@ejgallego
Copy link
Collaborator

I see, thanks. stracing dune could also provide some interesting information on where the time is being spent.

@jchavarri
Copy link
Collaborator Author

I can share strace report, but looking at how time is being spent:

$ time dune build @fe.all @aa
                                               
real    0m40.724s
user    0m35.698s
sys     0m4.171s

I doubt it will give us a lot of insights. It seems time is mostly spent on user CPU- or memory-bound tasks.

@nojb
Copy link
Collaborator

nojb commented Sep 4, 2024

From memory I remember that the "Source tree scan" event actually englobes the whole build; ie the actual build commands are sub-events of "Source tree scan", so the 38+ s that you are observing may just be the build doing its thing.

Does it make sense?

@jchavarri
Copy link
Collaborator Author

@ejgallego Here's the output of strace, shows very small times as expected from previous time command:

$ strace -c dune build @fe.all @aa
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 27.59    0.239930           1    224380       411 newfstatat
 16.07    0.139802         198       705           write
 13.46    0.117093           0    131842    131841 mkdir
 13.17    0.114519           1     94422     40419 openat
  9.05    0.078707           0    135781         4 lseek
  8.64    0.075177           0     84601           read
  5.81    0.050530           3     16614           getdents64
  5.31    0.046152           0     54011           close
  0.48    0.004218           1      3188       284 futex
  0.21    0.001869         233         8           munmap
  0.16    0.001396           0      2864           rt_sigprocmask
  0.02    0.000152           2        57           mmap
  0.01    0.000110           1        77           brk
  0.00    0.000029          29         1           readlink
  0.00    0.000019           1        12           mprotect
  0.00    0.000014           3         4           vfork
  0.00    0.000011           1         6           unlink
  0.00    0.000002           0         5           rt_sigaction
  0.00    0.000002           0         3           ioctl
  0.00    0.000002           1         2           sigaltstack
  0.00    0.000002           2         1           prlimit64
  0.00    0.000002           2         1           getrandom
  0.00    0.000001           0         6           pipe2
  0.00    0.000000           0         2           pread64
  0.00    0.000000           0         1         1 access
  0.00    0.000000           0         2           getpid
  0.00    0.000000           0         2           socket
  0.00    0.000000           0         2         2 connect
  0.00    0.000000           0         1           execve
  0.00    0.000000           0         1           wait4
  0.00    0.000000           0         2           fcntl
  0.00    0.000000           0         1           flock
  0.00    0.000000           0         5           getcwd
  0.00    0.000000           0         1           chdir
  0.00    0.000000           0         2         1 rmdir
  0.00    0.000000           0         1           chmod
  0.00    0.000000           0         1           arch_prctl
  0.00    0.000000           0         1           set_tid_address
  0.00    0.000000           0         1           set_robust_list
  0.00    0.000000           0         1           rseq
  0.00    0.000000           0         8           clone3
------ ----------- ----------- --------- --------- ----------------
100.00    0.869739           1    748626    172963 total

ie the actual build commands are sub-events of "Source tree scan"

@nojb This is a "noop" build (all build commands already happened in previous runs of dune build, and Dune doesn't have any rules that need executing). I suspect that Source tree scan includes both the actual source tree scan and the rules generation phase. Is that what you mean?

(thanks both for your input!)

@nojb
Copy link
Collaborator

nojb commented Sep 4, 2024

I suspect that Source tree scan includes both the actual source tree scan and the rules generation phase. Is that what you mean?

Yes, in fact Source tree scan includes pretty much everything under it (if memory serves).

@rgrinberg
Copy link
Member

It does not generate the rules, but it does initialize all the projects, constructs the scopes, and parses every dune file.

@rgrinberg
Copy link
Member

Some perf output would be useful here

@jchavarri
Copy link
Collaborator Author

Here it is:

dune_build_feall_aa.perf.zip

@jchavarri
Copy link
Collaborator Author

it does initialize all the projects, constructs the scopes, and parses every dune file

Maybe the result of this issue could be to rename this stage to "Project initialization, scope construction, dune parsing" instead of "Source tree scan". Does that make sense?

@rgrinberg
Copy link
Member

It would be more accurate. I can't read your perf file unfortunately

@jchavarri
Copy link
Collaborator Author

Try loading it in https://profiler.firefox.com/

@rgrinberg
Copy link
Member

Thanks, that works. I do see a lot of rule generation in that trace. Are you using dynamic includes perhaps?

@jchavarri
Copy link
Collaborator Author

We use atd a lot, so we have a bunch of .sexp files with Dune rules, I can see a few of them with 900+ rules, a couple of them with 1100 rules. In total, there are almost 5000 rules in these files:

$ find backend -type f -name "*.sexp" -exec grep -o "(rule" {} + | wc -l
4887

Do you think this could be causing trouble? I can look into optimizing these to try to reduce the number of rules.

@jchavarri
Copy link
Collaborator Author

I do see a lot of rule generation in that trace.

Side note: curious what kind of things you look at on the Firefox profiler to reach this kind of conclusion. What kind of views do you use? And what led you realize rule generation could be the issue? Thanks!

@jchavarri
Copy link
Collaborator Author

jchavarri commented Sep 4, 2024

Edit: ignore this comment, I was measuring wrongly 😕

I created some synthetic dune files to compare times between different configs at scale.

One interesting realization is that using write-file is about 2x slower than using with-stdout-to + echo. In my case, a zero-ops builds using a dune file with 50k rules of this kind takes 10s for the former approach vs 5s with the latter.

Here are python scripts to generate these files:

with open("dune", "w") as dune_file:
    for i in range(1, 50001):
        dune_file.write("""
(rule
 (targets {0}.txt)
 (action (write-file {0}.txt "This is file {0}")))
""".format(i))

print("dune file generated with 50000 rules.")
with open("dune", "w") as dune_file:
    for i in range(1, 50001):
        dune_file.write("""
(rule
 (targets {0}.txt)
 (action (with-stdout-to {0}.txt (echo "This is file {0}"))))
""".format(i))

print("dune file generated with 50000 rules.")

@rgrinberg
Copy link
Member

I saw the load_rules function in the trace which invokes the various rule generation callbacks.

Even though your measurement is incorrect, we have in fact sped up reading and writing of files and that let to a noticeable speed up in dune internally. Don't get your hopes too high though, I'd imagine this will shave off a few seconds - not much more.

@jchavarri
Copy link
Collaborator Author

In case it helps, I managed to get a profile including src lines for the function calls to the actual Dune code. For backtracking these, the commit used is a054882.

Unfortunately Firefox Profiler doesn't support srclines, it has to be opened with https://www.speedscope.app/. Make sure to choose Left Heavy view and thread id (tid) 245288.

dune_build_feall_aa_srclines.perf.zip

image

@rgrinberg
Copy link
Member

Thanks for the updated trace. It mostly points to things that we've already known to be problematic. We've fixed them in our fork, but the truth is that that they might make your build at most 2x faster. So in total, we're looking at 19 seconds instead of 38. I doubt that will make you happy. Of course more optimizations are possible, and you're welcome to try to implement them. There's not much low hanging fruit though, so it will take real engineering work to shave off more seconds beyond that.

Feel free to open tickets with traces for more concrete bottlenecks that you've detected. This issue is too general to be actionable. Source scanning isn't slow, it's just doing a bunch of work in your specific build. I don't have any concrete ideas on how to make it faster or delay some of the work. Suggestions are very much welcome.

@nojb
Copy link
Collaborator

nojb commented Sep 5, 2024

we have in fact sped up reading and writing of files and that let to a noticeable speed up in dune internally.

We've fixed them in our fork,

@rgrinberg out of curiosity, whose fork is this? Jane Street? Tarides? Are these improvements going to be merged back upstream eventually? Thanks!

@rgrinberg
Copy link
Member

JS. We're planning to merge them bit by bit starting this year.

@jchavarri
Copy link
Collaborator Author

So in total, we're looking at 19 seconds instead of 38. I doubt that will make you happy.

It definitely does make me happy :) That would be a great improvement. I think if we combine that with some specific internal actions as well (mostly simplify the amount of explicit rule stanzas we generate) it'd be great if we can reach times ~10s eventually.

Looking forward to test any of the improvements as they get merged upstream. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants