文章地址:https://gitee.com/kiraskyler/Articles/blob/master/eBPF/eBPF%20CO-RE%20%E5%AE%9E%E7%8E%B0%E6%96%B9%E5%BC%8F.md
文章目录
- 简介
- bpf.c中的宏`CORE_READ`
- __builtin_preserve_access_index
- CO-RE过程
- core_relo 条目
- .BTF.ext
- btf_ext_header
- btf_ext_info_sec
- bpf_core_relo
- btf_header->str
- 修改指令 bpf_core_patch_insn
简介
使用gitee simple_bpf仓库代码构建出的.o文件分析,欧拉24.03 lts sp2环境
关联文章:
elf.o 文件内容.md
bpf.c中的宏CORE_READ
和__builtin_preserve_access_index有关
__builtin_preserve_access_index
bpf头文件有一些简单解释
/usr/include/bpf/bpf_core_read.h: 419 /* * BPF_CORE_READ() is used to simplify BPF CO-RE relocatable read, especially * when there are few pointer chasing steps. * E.g., what in non-BPF world (or in BPF w/ BCC) would be something like: * int x = s->a.b.c->d.e->f->g; * can be succinctly achieved using BPF_CORE_READ as: * int x = BPF_CORE_READ(s, a.b.c, d.e, f, g); * * BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF * CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically * equivalent to: * 1. const void *__t = s->a.b.c; * 2. __t = __t->d.e; * 3. __t = __t->f; * 4. return __t->g; * * Equivalence is logical, because there is a heavy type casting/preservation * involved, as well as all the reads are happening through * bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to * emit CO-RE relocations. * * N.B. Only up to 9 "field accessors" are supported, which should be more * than enough for any practical purpose. */ #define BPF_CORE_READ(src, a, ...) ({ \ ___type((src), a, ##__VA_ARGS__) __r; \ BPF_CORE_READ_INTO(&__r, (src), a, ##__VA_ARGS__); \ __r; \ })这里解释说在这个宏可以生成一个信息,包含了对于struct/union的偏移信息
/usr/include/bpf/bpf_core_read.h: 230 /* * bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures * offset relocation for source address using __builtin_preserve_access_index() * built-in, provided by Clang. * * __builtin_preserve_access_index() takes as an argument an expression of * taking an address of a field within struct/union. It makes compiler emit * a relocation, which records BTF type ID describing root struct/union and an * accessor string which describes exact embedded field that was used to take * an address. See detailed description of this relocation format and * semantics in comments to struct bpf_field_reloc in libbpf_internal.h. * * This relocation allows libbpf to adjust BPF instruction to use correct * actual field offset, based on target kernel BTF type that matches original * (local) BTF, used to record relocation. */ #define bpf_core_read(dst, sz, src) \ bpf_probe_read_kernel(dst, sz, (const void *)__builtin_preserve_access_index(src))gcc对于__builtin_preserve_access_index解释:
https://gcc.gnu.org/onlinedocs/gcc/BPF-Built-in-Functions.html Built-in Function: type __builtin_preserve_access_index (type expr) BPF Compile Once-Run Everywhere (CO-RE) support. Instruct GCC to generate CO-RE relocation records for any accesses to aggregate data structures (struct, union, array types) in expr. This builtin is otherwise transparent; expr may have any type and its value is returned. This builtin has no effect if -mco-re is not in effect (either specified or implied).CO-RE过程
CO-RE实现的核心是bpf_object__relocate_core函数
#0 bpf_object__relocate_core (targ_btf_path=<optimized out>, obj=0x4fbf70) at libbpf.c:5828 #1 bpf_object__relocate (targ_btf_path=<optimized out>, obj=0x4fbf70) at libbpf.c:6476 #2 bpf_object_load (extra_log_level=0, target_btf_path=0x0, obj=<optimized out>) at libbpf.c:7885 #3 bpf_object__load (obj=<optimized out>) at libbpf.c:7934 #4 0x00007ffff7f942d5 in bpf_object__load_skeleton (s=0x4fbee0) at libbpf.c:12751 #5 0x00000000004024a5 in trace_bpf__load (obj=0x4fbeb0) at /root/simple_bpf/build/src/trace.bpf.skel.h:84 #6 0x00000000004024d2 in trace_bpf__open_and_load () at /root/simple_bpf/build/src/trace.bpf.skel.h:96 #7 0x0000000000402898 in main (args=1, argv=0x7fffffffde38) at /root/simple_bpf/src/trace.cpp:41/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/libbpf.c: 5732 static int bpf_object__relocate_core(struct bpf_object *obj, const char *targ_btf_path) { const struct btf_ext_info_sec *sec; struct bpf_core_relo_res targ_res; const struct bpf_core_relo *rec; // 这里记录的哪个指令,是哪个位置(eg: 0:1) const struct btf_ext_info *seg; struct hashmap *cand_cache = NULL; struct bpf_program *prog; struct bpf_insn *insn; const char *sec_name; int i, err = 0, insn_idx, sec_idx, sec_num; cand_cache = hashmap__new(bpf_core_hash_fn, bpf_core_equal_fn, NULL); // 存储从vmlinux找到的对应信息,key是下面遍历时候的rec->type_id seg = &obj->btf_ext->core_relo_info; // .BTF.ext中的core_relo部分 sec_num = 0; for_each_btf_ext_sec(seg, sec) { // 遍历.BTF.ext的core_relo区域 sec_idx = seg->sec_idxs[sec_num]; sec_num++; for_each_btf_ext_rec(seg, sec, i, rec) { // 遍历sec->numinfo if (rec->insn_off % BPF_INSN_SZ) return -EINVAL; insn_idx = rec->insn_off / BPF_INSN_SZ; prog = find_prog_by_sec_insn(obj, sec_idx, insn_idx); /* adjust insn_idx from section frame of reference to the local * program's frame of reference; (sub-)program code is not yet * relocated, so it's enough to just subtract in-section offset */ insn_idx = insn_idx - prog->sec_insn_off; if (insn_idx >= prog->insns_cnt) return -EINVAL; insn = &prog->insns[insn_idx]; err = bpf_core_resolve_relo(prog, rec, i, obj->btf, cand_cache, &targ_res); // 从.BTF和vmlinux中分别查找信息放到targ_res中 err = bpf_core_patch_insn(prog->name, insn, insn_idx, rec, i, &targ_res);core_relo 条目
对应seg = &obj->btf_ext->core_relo_info; // .BTF.ext中的core_relo部分这一行代码解释
.BTF.ext
core信息存储在.BTF.ext节中
# readelf -S -W build/src/trace.bpf.o [Nr] Name Type Address Off Size ES Flg Lk Inf Al [15] .BTF.ext PROGBITS 0000000000000000 08b1c8 00019c 00 0 0 4btf_ext_header
.BTF.ext数据部分的起始是struct btf_ext_header,CO-RE使用的是core_relo部分
core_relo部分数据 =.BTF.ext的地址 +sizeof(struct btf_ext_header)+btf_ext_header->core_relo_off
struct btf_ext_header { __u16 magic; __u8 version; __u8 flags; __u32 hdr_len; /* All offsets are in bytes relative to the end of this header */ __u32 func_info_off; __u32 func_info_len; __u32 line_info_off; __u32 line_info_len; /* optional part of .BTF.ext header */ __u32 core_relo_off; __u32 core_relo_len; };core_relo=0x8b1c8+0x20+0x130
# hexdump -s 0x8b1c8 -n 32 build/src/trace.bpf.o 008b1c8 eb9f 0001 0020 0000 0000 0000 0014 0000 008b1d8 0014 0000 011c 0000 0130 0000 004c 0000btf_ext_info_sec
btf_ext_header->core_relo部分是一些struct btf_ext_info_sec条目,该区域内显示一个u32大小的成员标记btf_ext_info_sec大小
struct btf_ext_info_sec=core_relo+ sizeof(__u32); // 跳过record_size成员
/usr/src/debug/bpftool-7.2.0-1.x86_64/libbpf/src/libbpf_internal.h: 451 struct btf_ext_info_sec { __u32 sec_name_off; // .BTF(非.BTF.ext,ref:btf__name_by_offset)中str偏移量 __u32 num_info; // 有多少个struct btf_ext_info_sec /* Followed by num_info * record_size number of bytes */ __u8 data[]; // struct btf_ext_info_sec内容 };record_size=0x10
# hexdump -s 0x8b308 -n 4 build-debug/src/trace.bpf.o 008b308 01b8 0000struct btf_ext_info_sec=0x8b318+ 4 =0x8b31c
num_info= 4
# hexdump -s 0x8b31c -n 16 build-debug/src/trace.bpf.o 008b31c 004b 0000 0004 0000 0018 0000 000a 0000bpf_core_relo
每一条btf_ext_info_sec中又包含btf_ext_info_sec->num_info个`
这里便存储着CORE的信息,哪一条指令insn_off的偏移是哪个位置access_str_off
/usr/include/linux/bpf.h: 7338 struct bpf_core_relo { __u32 insn_off; // 指令偏移bit,/8 = 指令偏移index __u32 type_id; // .BTF中types的index __u32 access_str_off; // .BTF中str的index,内容是低层core,eg: 0:1:2:3 enum bpf_core_relo_kind kind; };bpf_core_relo从btf_ext_info_sec->data开始,每个条目是record_size大小,btf_ext_info_sec->num_info个条目
/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/libbpf_internal.h: 397 #define for_each_btf_ext_rec(seg, sec, i, rec) \ for (i = 0, rec = (void *)&(sec)->data; \ i < (sec)->num_info; \ i++, rec = (void *)rec + (seg)->rec_size)insn_off= 0x18 bit,每条指令8bit,所以是第三条指令type_id= 0x0aaccess_str_off=0x2ac3
hexdump -s 0x8b324 -n 32 build-debug/src/trace.bpf.o 008b324 0018 0000 000a 0000 2ac3 0000 0000 0000对应的第3条指令
# llvm-objdump -d build/src/trace.bpf.o 3: b7 01 00 00 28 00 00 00 r1 = 0x28btf_header->str
access_str_off值得是.BTF中str部分
# readelf -S -W build/src/trace.bpf.o [Nr] Name Type Address Off Size ES Flg Lk Inf Al [13] .BTF PROGBITS 0000000000000000 083c20 0075a6 00 0 0 4/usr/include/linux/btf.h: 11 struct btf_header { __u16 magic; // 0xeB9F,/sys/kernel/btf/vmlinux整个文件就像一个.BTF节 __u8 version; __u8 flags; __u32 hdr_len; /* All offsets are in bytes relative to the end of this header */ __u32 type_off; /* offset of type section */ __u32 type_len; /* length of type section */ __u32 str_off; /* offset of string section 相对于头结束时的偏移量 */ __u32 str_len; /* length of string section */ };btf_header->str=struct btf_header+sizeof(struct btf_header)+btf_header->str_off
0x83c20+24+0x4640=0x88272
# hexdump -s 0x83c20 -n 24 build-debug/src/trace.bpf.o 0083c20 eb9f 0001 0018 0000 0000 0000 4640 0000 0083c30 4640 0000 2f4e 0000刚才bpf_core_relo->access_str_off=0x2ac3
str =0x88272+bpf_core_relo->access_str_off=0x8ad3b
即,存储的字符串是0:1
# hexdump -C -s 0x8ad3b -n 4 build-debug/src/trace.bpf.o 0008ad3b 30 3a 31 00 |0:1.|找到信息
第10个btf信息,结构体task_struct
0:1:
- 第一个0表示假如
task_struct是数组时候的偏移,偏移 = val(当前是0) * sizeof(task_struct) = 0 - 接下来的1表示第1个成员(下标是从0开始),即
_state
0:1处的计算可参考/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/relo_core.c:1296 bpf_core_calc_relo_insn函数
[10] STRUCT 'task_struct' size=10048 vlen=269 'thread_info' type_id=11 bits_offset=0 '__state' type_id=15 bits_offset=320- task_struct
size=10048task_struct的大小vlen=269接下来有269个members(thread_info, _state等),可参考elf.o 文件内容.md
- __state
- type_id=15,关于
_state的详细描述在.BTF的15 index info位置,# bpftool btf dump file build-debug/src/trace.bpf.o:[15] INT 'unsigned int' size=4 bits_offset=0 nr_bits=32 encoding=(none) - bits_offset=320,
_state相对于task_struct的偏移,单位bit,即40字节
- type_id=15,关于
0:1= 0 * sizeof(task_struct) + (task_struct[1])成员_state偏移(bits_offset 320 / 8 = 40) = 40
修改指令 bpf_core_patch_insn
上述步骤已找到第3条指令,对应是task_struct->_state,找到原始偏移40,只需要在vmlinux中找到对应的task_struct->_state找到新的偏移量替代即可
# llvm-objdump -d build/src/trace.bpf.o 3: b7 01 00 00 28 00 00 00 r1 = 0x28/usr/src/debug/libbpf-1.2.2-11.oe2403sp2.x86_64/src/relo_core.c: 1023 /* * Patch relocatable BPF instruction. * * Patched value is determined by relocation kind and target specification. * For existence relocations target spec will be NULL if field/type is not found. * Expected insn->imm value is determined using relocation kind and local * spec, and is checked before patching instruction. If actual insn->imm value * is wrong, bail out with error. * * Currently supported classes of BPF instruction are: * 1. rX = <imm> (assignment with immediate operand); * 2. rX += <imm> (arithmetic operations with immediate operand); * 3. rX = <imm64> (load with 64-bit immediate value); * 4. rX = *(T *)(rY + <off>), where T is one of {u8, u16, u32, u64}; * 5. *(T *)(rX + <off>) = rY, where T is one of {u8, u16, u32, u64}; * 6. *(T *)(rX + <off>) = <imm>, where T is one of {u8, u16, u32, u64}. */ int bpf_core_patch_insn(const char *prog_name, struct bpf_insn *insn, int insn_idx, const struct bpf_core_relo *relo, int relo_idx, const struct bpf_core_relo_res *res) { __u64 orig_val, new_val; __u8 class; class = BPF_CLASS(insn->code); new_val = res->new_val; switch (class) { case BPF_ALU: case BPF_ALU64: insn->imm = new_val; // 修改指令 break;