Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile non-cpu time? #72

Closed
javierluraschi opened this issue Nov 8, 2016 · 7 comments · Fixed by #154
Closed

Profile non-cpu time? #72

javierluraschi opened this issue Nov 8, 2016 · 7 comments · Fixed by #154

Comments

@javierluraschi
Copy link
Contributor

Feature Request: It would be really helpful to support a mode where CPU and non-CPU time is measured.

Details: From Rprof: "How time is measured varies by platform: on a Unix-alike it is the CPU time of the R process, so for example excludes time when R is waiting for input or for processes run by system to return."

Therefore, consider:

system.time(download.file(url = "https://raw.githubusercontent.com/rstudio/sparklyr/master/inst/extdata/install_spark.csv", destfile = tempfile()))

with output:

trying URL 'https://raw.githubusercontent.com/rstudio/sparklyr/master/inst/extdata/install_spark.csv'
Content type 'text/plain; charset=utf-8' length 2534 bytes
==================================================
downloaded 2534 bytes

   user  system elapsed 
  0.017   0.004   0.423 

However, profiling:

profvis::profvis(download.file(url = "https://raw.githubusercontent.com/rstudio/sparklyr/master/inst/extdata/install_spark.csv", destfile = tempfile()))

yields, 20ms:

screen shot 2016-11-08 at 11 30 22 am

while this is correct from profvis use of Rprof, R is often used to proxy to other processes that users would want to be able to profile as well. Including an option to include the whole execution time without excluding IO calls, would be really helpful. That said, this probably requires a change in Rprof and R itself. Never the less, opening this feature request to discuss this further.

@wch
Copy link
Member

wch commented Nov 10, 2016

Yes, that would be really useful. The same thing happens if you call Sys.sleep() -- the time spent in the system call isn't recorded in the profiler output. If my memory serves, we discussed this a while back and concluded that the R profiler would need to be modified include timestamp information in the output.

If the data does include timestamps for each sample, we would need to figure out how to reconcile it with the assumption that each sample is the exact same interval. For example, if the sampling interval is 20ms, we currently assume that each sample is exactly 20ms apart; however, the timestamps (if taken from the system clock) could have different intervals like 23, 30, or even 1000ms, and we would need to decide how to display those. I suppose that if timestamp information is present in the data, the flamegraph could use those intervals instead of fixed 20ms intervals, although I think this would require some nontrivial work because the code assumes that the intervals are all the same.

@lionel-
Copy link
Member

lionel- commented Dec 7, 2020

Including timestamps in the output would not help with this problem because the profiler would still sample in CPU intervals. This means that if you have:

{
  sys.sleep(1)
  f()
}

We would incorrectly infer from the timestamps that f() takes a long time. See #96 for a similar problem.

To avoid this we need to sample the stacks at real-time intervals rather than CPU-time. Doing so is a simple change to Rprof(): lionel-/r-source@be4d4201

f <- function() {
  io()
  cpu()
}
io <- function() Sys.sleep(1)
cpu <- function() pause(0.05)

vctrs::vec_unrep(rprof_lines(f()))
#>                 key times
#> 1 "pause" "cpu" "f"     4

vctrs::vec_unrep(rprof_lines(f(), real.time = TRUE))
#>                    key times
#> 1 "Sys.sleep" "io" "f"   100
#> 2    "pause" "cpu" "f"     4

Unfortunately, it also means we can't provide a toggle in the UI unless we create two samples. Or perhaps we can sample with both kinds of timers at the same time? This is not what gperftools does though, but I'll try this out.

@hadley
Copy link
Member

hadley commented Dec 7, 2020

If we had to choose only one, I think defaulting to real-time would be the right decision for profvis.

@wch
Copy link
Member

wch commented Dec 9, 2020

I agree that real-time is a better default.

@tomasruizt
Copy link

Is there atm any alternative to get real-time profiling with R? In a different package or a custom function?

@lionel-
Copy link
Member

lionel- commented Mar 18, 2021

@tomasruizt If you use profvis on a Windows machine you will get real-time profiling.

@Houdanait
Copy link

@lionel- @hadley is there any update on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants