Friday, 3 January 2020

I have been chasing for a tool/compiler option to generate call graph for user space C program.  Some time back i wrote a tool that was dependent on gdb to generate stack frames and generate call graph out of it using python:
https://github.com/tarun27sh/gdb_graphs

The problem with it is it's very slow. One stackoverflow user was kind enough to use it, only to complain it's very slow. So, it's time to explore more options:

1. use gcc -finstrument-functions
2. use LLVM  to write a transform pass that adds profiler instructions to each function.


In this post I'll cover #1, and will try to cover #2 in a future post - .


How to work with gcc -finstrument-functions??

Lets start with a simple hello world.

for hw.c:
$ cat hw.c
#include <stdio.h>
int main()
{
    printf("Hello World\n");
    return 0;
}

comile with:
gcc hw.c    // generates a.out

and dump assembly for main:
objdump -S a.out
. . .
0000000000400526 <main>:
  400526:       55                      push   %rbp
  400527:       48 89 e5                mov    %rsp,%rbp
  40052a:       bf c4 05 40 00          mov    $0x4005c4,%edi
  40052f:       e8 cc fe ff ff          callq  400400 <puts@plt>
  400534:       b8 00 00 00 00          mov    $0x0,%eax
  400539:       5d                      pop    %rbp
  40053a:       c3                      retq
  40053b:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
. . .

This is a normal assembly generated by gcc for our main function.

Now lets compile with -finstrument-functions and see what gets added:

gcc -finstrument-functions hw.c    // generates a.out

and dump assembly for main:
objdump -S a.out
. . .
00000000004005d6 <main>:
  4005d6:       55                      push   %rbp
  4005d7:       48 89 e5                mov    %rsp,%rbp
  4005da:       53                      push   %rbx
  4005db:       48 83 ec 08             sub    $0x8,%rsp
  4005df:       48 8b 45 08             mov    0x8(%rbp),%rax
  4005e3:       48 89 c6                mov    %rax,%rsi
  4005e6:       bf d6 05 40 00          mov    $0x4005d6,%edi
  4005eb:       e8 d0 fe ff ff          callq  4004c0 <__cyg_profile_func_enter@plt>
  4005f0:       bf a4 06 40 00          mov    $0x4006a4,%edi
  4005f5:       e8 96 fe ff ff          callq  400490 <puts@plt>
  4005fa:       bb 00 00 00 00          mov    $0x0,%ebx
  4005ff:       48 8b 45 08             mov    0x8(%rbp),%rax
  400603:       48 89 c6                mov    %rax,%rsi
  400606:       bf d6 05 40 00          mov    $0x4005d6,%edi
  40060b:       e8 a0 fe ff ff          callq  4004b0 <__cyg_profile_func_exit@plt>
  400610:       89 d8                   mov    %ebx,%eax
  400612:       48 83 c4 08             add    $0x8,%rsp
  400616:       5b                      pop    %rbx
  400617:       5d                      pop    %rbp
  400618:       c3                      retq
  400619:       0f 1f 80 00 00 00 00    nopl   0x0(%rax)
. . .

Now we see two additional function calls have been injected by gcc:
1. __cyg_profile_func_enter@plt
2. _cyg_profile_func_exit@plt

@plt just means these functions will be resolved during run time by linker.

If the user doesn't define these functions, gcc has a default definition which just returns:
(gdb) disas __cyg_profile_func_enter
Dump of assembler code for function __cyg_profile_func_enter:
   0x00007ffff7b23200 <+0>:     repz retq
End of assembler dump.
(gdb) disas __cyg_profile_func_exit
Dump of assembler code for function __cyg_profile_func_enter:
   0x00007ffff7b23200 <+0>:     repz retq
End of assembler dump.
(gdb)

Turns out repz retq is an  interesting topic in itself:

By defining these two APIs, one can override the default return behavior.

Define Entry/Exit hook

  1 #include<stdio.h>
  2
  3 static void __attribute__((no_instrument_function))
  4 __cyg_profile_func_enter (void *this_fn,
  5                                void *call_site)
  6 {
  7       printf("[+]\n");
  8 }
  9
 10 static void __attribute__((no_instrument_function))
 11 __cyg_profile_func_exit  (void *this_fn,
 12                                void *call_site)
 13 {
 14     printf("[-]\n");
 15 }
 16
 17 int main()
 18 {
 19     printf("HW\n");
 20     return 0;
 21 }

- line# 3,10 - tell gcc to not inject calls to enter/exit apis in profiler functions

Compile and run
$ gcc -finstrument-functions hw.c  // generates a.out

$ ./a.out
[+]
HW
[-]

Great!
Now we are able to make use of injected functions. Next step would be to printf function name from where it is called and generate some kind of command line function graph. Something similar to what ftrace does.

But first where are these functions declared/defined?

I searched in the gcc source code, found following references to the enter function, but none of them point to its definition where it sets repz retq instructions.

Text string: __cyg_profile_func_enter

  File                                       Line
0 gcc/testsuite/g++.dg/pr49718.C                 5 /* { dg-final { scan-assembler-times "__cyg_profile_func_enter" 1 { target { ! { hppa*-*-hpux* } } } } } */
1 gcc/testsuite/g++.dg/pr49718.C                 6 /* { dg-final { scan-assembler-times "__cyg_profile_func_enter,%r" 1 { target hppa*-*-hpux* } } } */
2 testsuite/gcc.c-torture/execute/eeprof-1.c    65 void __cyg_profile_func_enter (void*, void*) NOCHK;
3 testsuite/gcc.c-torture/execute/eeprof-1.c    69 void __cyg_profile_func_enter (void *fn, void *parent)
4 gcc/testsuite/gcc.dg/20001117-1.c             31 __cyg_profile_func_enter(void *this_fn, void *call_site)
5 gcc/testsuite/gcc.dg/instrument-1.c            6 /* { dg-final { scan-assembler "__cyg_profile_func_enter" } } */
6 gcc/testsuite/gcc.dg/instrument-2.c            6 /* { dg-final { scan-assembler-not "__cyg_profile_func_enter" } } */
7 gcc/testsuite/gcc.dg/instrument-3.c            6 /* { dg-final { scan-assembler-not "__cyg_profile_func_enter" } } */
8 gcc/testsuite/gcc.dg/pr78333.c                 4 /* Add empty implementations of __cyg_profile_func_enter() and
9 gcc/testsuite/gcc.dg/pr78333.c                 8 __cyg_profile_func_enter(void *this_fn, void *call_site)
a gcc/tree.c                                 10683 local_define_builtin ("__cyg_profile_func_enter", ftype,
b gcc/tree.c                                 10685 "__cyg_profile_func_enter", 0);

May be this is something arch dependent.
Let me know if you know how to find the place where gcc sets its definition.

How to add code to generate call stacks?

Now the only thing that our program has to do is to define what these hooks do when called:


  1 #define _GNU_SOURCE
  2 #include <dlfcn.h>
  3
  4 static void __attribute__((no_instrument_function))
  5 __cyg_profile_func_enter (void *this_fn,
  6                                void *call_site)
  7 {
  8     Dl_info info;
  9     dladdr(__builtin_return_address(0), &info);
 10     printf("[+] %s\n", info.dli_sname);
 11 }
 12
 13 static void __attribute__((no_instrument_function))
 14 __cyg_profile_func_exit  (void *this_fn,
 15                                void *call_site)
 16 {
 17     Dl_info info;
 18     dladdr(__builtin_return_address(0), &info);
 19     printf("[-] %s\n", info.dli_sname);
 20 }



- line #1,2 include headers for dl* apis needed to get symbol name from address

- line #4 - tell gcc to not inject calls to enter/exit apis in profiler functions

- line #9 - get current stack frame address and pass it to dladdr to get symbol name. From gcc docs:
Built-in Function: void * __builtin_return_address (unsigned int level)
This function returns the return address of the current function, or of one of its callers. The level argument is number of frames to scan up the call stack. A value of 0 yields the return address of the current function, a value of 1 yields the return address of the caller of the current function, and so forth. 


Now compile with:
$ gcc -finstrument-functions hw.c -ldl -rdynamic // generates a.out

or to compile and link separately:
$ gcc -finstrument-functions -c hw.c -o hw.o   // generates hw.o
$ gcc  hw.o  -ldl -rdynamic                             // generates a.out


Finally run the executable:

$ ./a.out
[+] main
HW
[-] main

Now we get symbols too and overhead is much less :)

I added some sample code on how to add code for main, shared objects - check it out at:

References:

1. https://lwn.net/Articles/370423/



Harry

Author & Editor

A technology enthusiast and addictive blogger who likes to hacking tricks and wish to be the best White Hacket Hacker of the World.

0 comments:

Post a Comment

Note: only a member of this blog may post a comment.