Saturday 28 April 2018

Profiling C code with clang using "fprofile-instr-generate"

Target platform is x86-64 (works well with MIPS64 as well) with Clang.
Clang version is clang-3.8.

Using profiler to detect hot-spots in code.

Test code:

File: profile-coverage.c

#include <stdio.h>
#include <stdlib.h>
#define CTR 10

int
main()
{
    int i, j, k;
    for(i=0; i < CTR; ++i) {
        printf("3: %d", i);
    }
    for(i=0; i < CTR*10; ++i) {
        printf("3: %d", i);
    }
    for(i=0; i < CTR*100; ++i) {
        printf("3: %d", i);
    }
    //  exit(0);
    return 0;
}

Build Flags

Compiler

-g -fprofile-instr-generate -fcoverage-mapping

Linker

-fprofile-instr-generate

On MIPS following extra flags might be needed to make the process dump valid data:
-O0 -mstackrealign -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer


Note: I ran into trouble when instrumenting multiple libraries within a process. Instrumenting just one library, or instrumenting just the main process, always worked for me.

Collect Data

1. Environment variables

To control the location and name of dumped profile file:

// dump to current directory with name - llvm.prof
export LLVM_PROFILE_FILE=./llvm.prof 

// dump to current directory with name - llvm_<pid>.prof
export LLVM_PROFILE_FILE=./llvm_%p.prof 

For more supported flags: llvm documentation


2. Data dump


To dump the data from process, the process has to exit. If the process doesn't exits, then attach GDB to force it it dump data:

gdb## call exit(0)

Or else, register a signal handler to exit(0) the process.
Assume above steps dumps the raw profile data file: test_1212.prof


Extract data

llvm-profdata merge

Pass in the raw dumped profile file for further processing:

llvm-profdata merge -output=test_1212.merge -instr test_1212.prof

llvm-profdata show

To view raw block counters, run following command

llvm-profdata show -all-functions -counts -ic-targets  test_1212.merge > test_1212.log

Output will look like:

  1 Counters:
  2   main:
  3     Hash: 0x0000000000004104
  4     Counters: 4
  5     Function count: 1
  6     Indirect Call Site Count: 0
  7     Block counts: [10, 100, 1000]
  8     Indirect Target Results:
  9 Functions shown: 1
 10 Total functions: 1
 11 Maximum function count: 1
 12 Maximum internal block count: 1000

Above output doesn't make much sense, to make the output more human friendly use the llvm-cov tool.


llvm-cov show


Get counters instrumented in source code (more meaningful data). This requires passing in the built binary or shared library:

llvm-cov show test.bin -instr-profile=merge.out


output:

       |    1|#include <stdio.h>
       |    2|#include <stdlib.h>
  1.11k|    3|#define CTR 10
       |    4|
       |    5|int
       |    6|main()
      1|    7|{
      1|    8|    int i, j, k;
     11|    9|    for(i=0; i < CTR; ++i) {
     10|   10|        printf("3: %d", i);
     10|   11|    }
    101|   12|    for(i=0; i < CTR*10; ++i) {
    100|   13|        printf("3: %d", i);
    100|   14|    }
  1.00k|   15|    for(i=0; i < CTR*100; ++i) {
  1.00k|   16|        printf("3: %d", i);
  1.00k|   17|    }
      1|   18|    //  exit(0);
      1|   19|    return 0;

      1|   20|}


Reference:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
https://johanengelen.github.io/ldc/2016/04/13/PGO-in-LDC-virtual-calls.html


Harry

Author & Editor

A technology enthusiast and addictive blogger who likes to hacking tricks and wish to be the best White Hacket Hacker of the World.

0 comments:

Post a Comment

Note: only a member of this blog may post a comment.