Target platform is x86-64 (works well with MIPS64 as well) with Clang.
Clang version is clang-3.8.
Using profiler to detect hot-spots in code.
#include <stdio.h>
#include <stdlib.h>
#define CTR 10
int
main()
{
int i, j, k;
for(i=0; i < CTR; ++i) {
printf("3: %d", i);
}
for(i=0; i < CTR*10; ++i) {
printf("3: %d", i);
}
for(i=0; i < CTR*100; ++i) {
printf("3: %d", i);
}
// exit(0);
return 0;
}
Clang version is clang-3.8.
Using profiler to detect hot-spots in code.
Test code:
File: profile-coverage.c#include <stdio.h>
#include <stdlib.h>
#define CTR 10
int
main()
{
int i, j, k;
for(i=0; i < CTR; ++i) {
printf("3: %d", i);
}
for(i=0; i < CTR*10; ++i) {
printf("3: %d", i);
}
for(i=0; i < CTR*100; ++i) {
printf("3: %d", i);
}
// exit(0);
return 0;
}
Build Flags
Compiler
-g -fprofile-instr-generate -fcoverage-mappingLinker
-fprofile-instr-generateOn MIPS following extra flags might be needed to make the process dump valid data:
-O0 -mstackrealign -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
Note: I ran into trouble when instrumenting multiple libraries within a process. Instrumenting just one library, or instrumenting just the main process, always worked for me.
Collect Data
1. Environment variables
To control the location and name of dumped profile file:// dump to current directory with name - llvm.prof
export LLVM_PROFILE_FILE=./llvm.prof
// dump to current directory with name - llvm_<pid>.prof
export LLVM_PROFILE_FILE=./llvm_%p.prof
For more supported flags: llvm documentation
2. Data dump
To dump the data from process, the process has to exit. If the process doesn't exits, then attach GDB to force it it dump data:
gdb## call exit(0)
Or else, register a signal handler to exit(0) the process.
Assume above steps dumps the raw profile data file: test_1212.prof
Extract data
llvm-profdata merge
Pass in the raw dumped profile file for further processing:llvm-profdata merge -output=test_1212.merge -instr test_1212.prof
llvm-profdata show
To view raw block counters, run following command
llvm-profdata show -all-functions -counts -ic-targets test_1212.merge > test_1212.log
Output will look like:
1 Counters:
2 main:
3 Hash: 0x0000000000004104
4 Counters: 4
5 Function count: 1
6 Indirect Call Site Count: 0
7 Block counts: [10, 100, 1000]
8 Indirect Target Results:
9 Functions shown: 1
10 Total functions: 1
11 Maximum function count: 1
12 Maximum internal block count: 1000
Above output doesn't make much sense, to make the output more human friendly use the llvm-cov tool.
Get counters instrumented in source code (more meaningful data). This requires passing in the built binary or shared library:
llvm-cov show test.bin -instr-profile=merge.out
output:
| 1|#include <stdio.h>
| 2|#include <stdlib.h>
1.11k| 3|#define CTR 10
| 4|
| 5|int
| 6|main()
1| 7|{
1| 8| int i, j, k;
11| 9| for(i=0; i < CTR; ++i) {
10| 10| printf("3: %d", i);
10| 11| }
101| 12| for(i=0; i < CTR*10; ++i) {
100| 13| printf("3: %d", i);
100| 14| }
1.00k| 15| for(i=0; i < CTR*100; ++i) {
1.00k| 16| printf("3: %d", i);
1.00k| 17| }
1| 18| // exit(0);
1| 19| return 0;
1| 20|}
llvm-profdata show -all-functions -counts -ic-targets test_1212.merge > test_1212.log
Output will look like:
1 Counters:
2 main:
3 Hash: 0x0000000000004104
4 Counters: 4
5 Function count: 1
6 Indirect Call Site Count: 0
7 Block counts: [10, 100, 1000]
8 Indirect Target Results:
9 Functions shown: 1
10 Total functions: 1
11 Maximum function count: 1
12 Maximum internal block count: 1000
Above output doesn't make much sense, to make the output more human friendly use the llvm-cov tool.
llvm-cov show
Get counters instrumented in source code (more meaningful data). This requires passing in the built binary or shared library:
llvm-cov show test.bin -instr-profile=merge.out
output:
| 1|#include <stdio.h>
| 2|#include <stdlib.h>
1.11k| 3|#define CTR 10
| 4|
| 5|int
| 6|main()
1| 7|{
1| 8| int i, j, k;
11| 9| for(i=0; i < CTR; ++i) {
10| 10| printf("3: %d", i);
10| 11| }
101| 12| for(i=0; i < CTR*10; ++i) {
100| 13| printf("3: %d", i);
100| 14| }
1.00k| 15| for(i=0; i < CTR*100; ++i) {
1.00k| 16| printf("3: %d", i);
1.00k| 17| }
1| 18| // exit(0);
1| 19| return 0;
1| 20|}
Reference:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html
https://johanengelen.github.io/ldc/2016/04/13/PGO-in-LDC-virtual-calls.html
0 comments:
Post a Comment
Note: only a member of this blog may post a comment.