Skip to content
Mikolaj edited this page Nov 16, 2012 · 8 revisions

Outline

BTW, which post to publish first?

A hint from Bernie: Do we have an example use case where something interesting is indicated by the perf profile information?

The post about the perf stuff specifically

  • short intro
  • using linux-perf and related tools (a tiny version here and the full version in the other post, or the other way around? may depend on how long the technical stuff is, probably long enough that this should go to the other post)
  • with screenshots 600px wide max
  • describe the library and show a tiny program that uses it
  • summarize design decisions from the emails
  • summarize lore from the perf list
  • mention branches
    • sync-by-syscall
    • control-execution
    • pid-filter: display scheduler event arguments, as described, but not started yet on the branch
    • say the perf lib is for parsing and manipulating raw perf data, so it's useful if one wants to do that, but at this point, for profiling Haskell programs with perf, perf-script has bugs and limitations, but has insider's knowledge and turned out best for this release
  • mention changes in GHC, ghc-events and TS and how each contributes to perf profiling
  • future work, ask for feedback and contributions

The post about this release of all the 3 packages, in general

  • short intro
  • using linux-perf and related tools (see above)
  • with screenshots 600px wide max
  • using the other new user-visible features in threadscope (how far back? how detailed?)
  • shortly mention changes in GHC, ghc-events (merge) and TS needed for the to perf profiling, but not visible or not obviously useful in other contexts
  • ask for feedback and contributions

Texts to use

short intro

sponsors? authors? history? why important?

linux-perf, a library for parsing, representing in Haskell and pretty printing the data file output of the Linux 'perf' command (Linux performance counters); plus a set of tools to automate and a set of patches elsewhere to visualize.

see install instructions at https://github.com/bjpop/haskell-linux-perf/blob/master/README.txt

pick something general from the Bernie's text:

This library is for parsing, representing in Haskell and pretty printing the data file output of the Linux @perf@ command. The @perf@ command provides performance profiling information for applications running under the Linux operating system. This information includes hardware performance counters and kernel tracepoints.

Modern CPUs can provide information about the runtime behaviour of software through so-called hardware performance counters http://en.wikipedia.org/wiki/Hardware_performance_counter. Recent versions of the Linux kernel (since 2.6.31) provide a generic interface to low-level events for running processes. This includes access to hardware counters but also a wide array of software events such as page faults, scheduling activity and system calls. A userspace tool called 'perf' is built on top of the kernel interface, which provides a convenient way to record and view events for running processes.

The @perf@ tool has many sub-commands which do a variety of things, but in general it has two main purposes:

  1. Recording events.
  2. Displaying events.

The @perf record@ command records information about performance events in a file called (by default) @perf.data@. It is a binary file format which is basically a memory dump of the data structures used to record event information. The file has two main parts:

  1. A header which describes the layout of information in the file (section sizes, etcetera) and common information about events in the second part of the file (an encoding of event types and their names).
  2. The payload of the file which is a sequence of event records.

Each event field has a header which says what general type of event it is plus information about the size of its body.

There are nine types of event:

  1. @PERF_RECORD_MMAP@: memory map event.
  2. @PERF_RECORD_LOST@: an unknown event.
  3. @PERF_RECORD_COMM@: maps a command name string to a process and thread ID.
  4. @PERF_RECORD_EXIT@: process exit.
  5. @PERF_RECORD_THROTTLE@:
  6. @PERF_RECORD_UNTHROTTLE@:
  7. @PERF_RECORD_FORK@: process creation.
  8. @PERF_RECORD_READ@:
  9. @PERF_RECORD_SAMPLE@: a sample of an actual hardware counter or a software event.

The @PERF_RECORD_SAMPLE@ events (samples) are the most interesting ones in terms of program profiling. The other events seem to be mostly useful for keeping track of process technicalities. Samples are timestamped with an unsigned 64 bit word, which records elapsed nanoseconds since some point in time (system running time, based on the kernel scheduler clock). Samples have themselves a type which is defined in the file header and linked to the sample by an integer identifier

using linux-perf and related tools

Crash-course of the ghc-events-perf tool:

ghc --make -eventlog -rtsopts -threaded MyProgram.hs

ghc-events-perf record MyProgram --RTS +RTS -N2 -l-g-p

ghc-events-perf convert MyProgram

threadscope MyProgram.total.eventlog

A longer example, pointing out some common pitfalls and using the test data files from the test/ directory of the library's github repository. The Haskell program test/ParFib.hs is compiled with GHC >= 7.8 as follows

ghc --make -eventlog -rtsopts -threaded ParFib.hs

Test data files corresponding to test/ParFib.perf.data and test/ParFib.eventlog can be created with

sudo path-to/ghc-events-perf record +GhcEventsPerf -o ParFib.perf.data -GhcEventsPerf ./ParFib --RTS +RTS -N2 -la

At this point, one can change the owner or permissions of ParFib.perf.data and parse and view it with the standard 'perf script' command or with the 'dump-perf' tool that uses our Haskell library for parsing perf data.

The perf data can be transformed to an eventlog and synchronized and merged with the standard eventlog using

ghc-events-perf convert ParFib ParFib.total.eventlog ParFib.perf.data ParFib.eventlog

The resulting big eventlog can be displayed with

ghc-events show ParFib.total.eventlog | less

but it's best viewed in ThreadScope with

threadscope ParFib.total.eventlog

in the Timeline main pane, with Instant Events selected in the Traces tab and with the View/Event_Labels option on. If no perf events show up in the Instant Events traces, your GHC is probably 7.6 or older.

Note: in this version of haskell-linux-perf one can obtain the perf data just as well by running 'perf record' or 'perf script record' by hand instead of by running 'ghc-events-perf record'. In other versions (to be found on the sync-by-syscall and control-execution branches) 'ghc-events-perf record' is mandatory, since it inserts special synchronizing events.

with screenshots 600px wide max

describe the library and show a tiny program that uses it

Bernie's example. There are others, but this one is elegant.

Below is an example program which reads a @perf.data@ file and prints out the number of events that it contains.

      module Main where
      
      import Profiling.Linux.Perf (readPerfData)
      import Profiling.Linux.Perf.Types (PerfData (..))
      import System.Environment (getArgs)
      
      main :: IO ()
      main = do
         args <- getArgs
         case args of
            [] -> return ()
            (file:_) -> do
               perfData <- readPerfData file
               print $ length $ perfData_events perfData

summarize design decisions from the emails

summarize lore from the perf list

We greatly benefited from the help of the perf mailing list linux-perf-users@vger.kernel.org. Here is a condensed summary.

I'm writing a tool to parse the perf.data output of perf and was wondering about the source of the timestamps found in perf_sample events? Specifically I mean the u64 time field in the perf_sample struct in util/event.h. I believe it is in nanoseconds from a particular event (system start?), but I can't find any documentation about where the time comes from.

I'm also interested in reading from this time source from a userspace program, is that possible?

Code wise, you'll need to follow perf_clock(). I believe for x86 with a stable TSC you end up in native_sched_clock() in arch/x86/kernel/tsc.c.

As for the corresponding userspace call I believe it is clock_gettime(CLOCK_MONOTONIC, ...).

But later it does not prove to be the case:

Now our main problem is relating the perf timestamps to a public clock API (like posix_gettime, etc.) in order to chronologically merge the perf log with our external event logs. We've so far determined experimentally that even the clock_gettime(CLOCK_MONOTONIC, ...) values are not identical to the perf timestamps at the same places, though they are quite close.

We're working around right now by syncing with a timestamp of a known event, but that leaves us vulnerable to a time drift. If there's no way currently to generate externally meaningful timestamps, could we file a feature request for the sake of the future?

You mean like this: https://lkml.org/lkml/2011/6/7/638?

I'm still waiting for certain changes to core perf before I try again to get it committed. I haven't forgotten; it's just not ready yet.

And using the time-of-day patch against a newer kernel (tried 3.2, 3.4 and 3.6) I do see the offset you mentioned.

Here's a fresh discussion of timestamps vs. CLOCK_MONOTONIC:

http://thread.gmane.org/gmane.linux.kernel.perf.user/1076

which points to the following plan of action:

https://lkml.org/lkml/2012/10/16/173

Another issue:

Are we missing anything? Is the "--timestamp" option of perf-record related in any way? Any other tips?

That option adds the timestamps to the samples if it were not otherwise added.

Yet another:

Could someone clarify the behaviour of the -p PID flag of perf-record? In our experiments (3.2.0-27 #43-Ubuntu SMP x86_64), it ignores events on threads spawned after perf-record is started. Is this the intended behaviour, and is there any work-around?

Generally, what is the best (vs CPU/IO, distortion of profiling results) way to record only events of a given OS process, with all its threads?

The version of perf-record that we use already contains the separate -p PID and -t TID options, so the initial PID patch is already in. It's just that the -p PID option ignores all threads spawned after the process is started, AFAICT.

AFAIK perf events aren't inherited when -p, -t or -u option is used due to a performance reason. But if you start the process on the command line with perf record, it'll be inherited and counted properly.

I guess your problem might be solved by using a cgroup:

 mkdir ${cgroup_root}/mygroup
 echo NNNN > ${cgroup_root}/mygroup/tasks
 perf record -a -e cycles -G mygroup

Another answer:

I see the problem now. The perf tool gets FORK events as threads are created, but it does not respond to them and create new counters. Some refactoring is needed to make that happen.

sync-by-syscall

Note: in this version of haskell-linux-perf we can only obtain the perf data by running 'ghc-events-perf record'. Neither 'perf record' nor 'perf script record' suffice, because we need to insert special synchronizing events into the perf data.

Note: this version of the code uses the linux-perf library to parse the perf data file and an artificially inserted nanosleep syscall to synchronize perf events and the standard eventlog events. This approach has its benefits and drawbacks compared to the usage of perf-script for both tasks. Which of the approaches proves better depends on the evolution of the perf toolset (we are waiting for a patchset that makes perf clock trustworthy and externally meaningful) and on the future needs of Haskell users of perf events. For now, we retain this version on a branch.

control-execution

Note: this version of the code controls the execution of the Haskell program to be profiled and of the perf-record tool by launching them, passing around a PID and waiting for termination or terminating, as appropriate. If needed, this can be extended to temporarily suspending the Haskell program, e.g., to make a synchronizing syscall more often than just at the start of execution.

Apart of that, the code uses the linux-perf library to parse the perf data file and the nanosleep syscall to synchronize perf events with the standard eventlog. (See the start of the ghc-events-perf-record.hs file for more details.) This approach has its benefits and drawbacks compared to the usage of perf-script and to starting the Haskell program by the perf-record tool. Which of the approaches proves better depends on the evolution of the perf toolset (we are waiting for a patchset that makes perf clock trustworthy and externally meaningful, as well as for a patch that fixes the '-p' option). It also depends on the future needs of Haskell users of perf events. In particular, the '-p' option seems to produce much more perf events, even though in its present state it ignores all tasks spawned after the Haskell program is started. Whether the extra events are useful requires further investigation. For now, we retain this version on a branch.

mention changes in GHC, ghc-events and TS and how each contributes to perf profiling

the other new user-visible features in threadscope (how far back? how detailed?)

A blurb from PP Project News wiki section:

We have been continuing our work to make ThreadScope more helpful and informative in tracking down your parallel and concurrent Haskell performance problems. We now have the ability to collect heap (and some other) statistics from the GHC runtime system and present them in ThreadScope for a selected runtime interval. These features are available for users of GHC 7.6 or newer. On feedback from users, we have improved support for user events of different granularity. We have released a preliminary version of tools for collecting information from hardware performance counters, more specifically the Linux Perf Events. This can be useful for studying IO-heavy programs, the idea being to visualise system calls as being distinct from actual execution of Haskell code. The perf events support will be available for users of a recent development GHC (7.7.x) or the eventual 7.8 release.

46ffdcb Present summary numbers as e.g. 35.3 GiB as well as number of bytes
56baacb Change the SummaryView to use widgets rather than a big text layout Also do a bit of tidying up of the stats collection code. The old view is not yet removed so we can compare and check it's ok. When we're happy we can remove the old one. The new presentation still needs some improvements, like formatting using Mb and with ',' for 1000 separators etc.
1652b5c Support the new UserMarker event, converting them to bookmarks Previously we tried the experiment of visualising all the user events generated by Debug.Trace.traceEvent as bookmarks. This was ok but traceEvent is a bit too general, and it doesn't work for high frequency events to visualise them as bookmarks. But since it really is useful, we've got a new dedicated marker event generated by Debug.Trace.traceMarker and we now visualise these markers as bookmarks. See also ticket #27.
2153227 Fix #23 (black labels drawn over dark green hardly visible)
62f8622 Show SummaryView for the selected interval
ce418ce Add tooltips to the key view
bc3e43e Change the startup info panel to display the info in a structured way Using a grid with label and list widgets rather than a blob of text.
bdd5c23 Simplistic program run information tab; needs a scrollbar for env
fe662b4 Link to TS guided tour in 2 places
468d6c6 Overhaul the display of keys
3cfd7ce Point scale ticks outward and related tweaks
1bd1182 Allow mouse-wheel scroll when it points at the scales
434bf21 Enable histogram by default now that it does not need extra packages
f941485 Rewrite Charts, step 24: draw logarithmic X scale
c0288af Add a prototype of instant events trace (mostly for user events)
0c63375 Disable traceEvents as bookmarks for now
5d817aa Change axes colour from blue to black
deb138f Mark event text lines that are within the selected timeline interval
d8d7220 Make the green and orange colours a bit darker Better contrast
66401c4 Move X scale outside, step 5: simplify and unify X and Y scale drawing
74e379b Add labels and Y scales to PDF/PNG; TODO: wrong when scrolled
e8e321b Move Y spark scale outside; step 4: move numbers from right to left
aba7298 Connect up the histogram to the timeline range selection feature
99cae5c add first draft of spark duration histogram drawing code
f083146 Handle zooming when there is a selection
1c29e77 Allow adding bookmarks for range selections
a8a8f5f Allow grabbing the timeline view and dragging it around
8a5a44e When zooming with scroll wheel, center on mouse pointer
9fc5159 Add keyboard shortcuts for open,quit,reload menu items
da07d46 Make the visibility toggle on groups in the trace view work Lets you toggle all the HECs in one go.
976f0de Use the 'traceEvent' user messages as initial bookmarks
a7f9e03 Add an editable bookmark label column
c825c2b Allow activating bookmarks
ccf7951 Show bookmarks in seconds too
e932232 Do the image file export properly Use a save-as dialog and let the user choose the file name and format.
f63d962 Bump version to 0.2.0