好在有強大的 Valgrind 可以幫助我們真測出問題。
Case 1: Memory Leak
最常見的錯誤,就是 allocate 的記憶體忘記 free,而且已經沒有任何指標指著它。我們舉個簡單的例子,並使用 valgrind 來協助偵測:
#include <stdlib.h> void func (void) { char *buff = malloc(10); } int main (void) { func(); return 0; }
編譯並執行 (記得編譯時要加 -g 的參數,valgrind 的報告才會指明有問題的行數)
# gcc -g -o mem_test mem_test.c
Valgrind 使用方法很簡單,不需要修改程式碼,用 valgrind 把自己的 program 帶起來即可,如果有參數就加在後面,程式執行結束就會產生報告。
valgrind [-valgrind parameter] ./my_program [-program parameter]
# valgrind --leak-check=full ./mem_test
==59095== Memcheck, a memory error detector ==59095== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==59095== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==59095== Command: ./mem_test ==59095== ==59095== ==59095== HEAP SUMMARY: ==59095== in use at exit: 10 bytes in 1 blocks ==59095== total heap usage: 1 allocs, 0 frees, 10 bytes allocated ==59095== ==59095== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1 ==59095== at 0x4C2C857: malloc (vg_replace_malloc.c:291) ==59095== by 0x400505: func (mem_test.c:5) ==59095== by 0x400514: main (mem_test.c:10) ==59095== ==59095== LEAK SUMMARY: ==59095== definitely lost: 10 bytes in 1 blocks ==59095== indirectly lost: 0 bytes in 0 blocks ==59095== possibly lost: 0 bytes in 0 blocks ==59095== still reachable: 0 bytes in 0 blocks ==59095== suppressed: 0 bytes in 0 blocks ==59095== ==59095== For counts of detected and suppressed errors, rerun with: -v ==59095== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)報告中指明有一個 10 bytes 的 block 被偵測為 "definitely lost",位置在 mem_test.c 的第 5 行
可以看到 memory lost 分成幾種類型:
- definitely lost: 真的 memory leak 了
- indirectly lost: 間接的 memory leak,structure 本身發生 memory leak,而內部的 member 如果是 allocate 的出來的,一樣會 memory leak,但是只要修好前面的問題,後面的問題也會跟著修復。
- possibly lost: allocate 一塊記憶體,並且放到指標 ptr,但事後又改變 ptr 指到這會計一體的中間 (這一點我目前也不是很清楚,建議看原文說明)
- still reachable: 程式結束時有未釋放的記憶體,不過卻還有指標指著,通常會發生在 global 變數
Case 2: Invalid Memory Access
Invalid memory access 有時候並不會立即造成 segmentation fault,所以不會有 core dump可以查詢,需要借助像 valgrind 這類的工具來偵測。一般情況可能是用了 allocate 的 memory 之外的地方,或是用了已經 free 的 memory,請看下面幾個例子:
#include <stdlib.h> #include <stdio.h> #include <string.h> int main (void) { // 1. Invalid write char *str = malloc(4); strcpy(str, "Brian"); free(str); // 2. Invalid read int *arr = malloc(3); printf("%d", arr[4]); free(arr); // 3. Invalid read printf("%d", arr[0]); // 4. Invalid free free(arr); return 0; }編譯並執行
==60019== Memcheck, a memory error detector ==60019== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==60019== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==60019== Command: ./mem_test ==60019== ==60019== Invalid write of size 2 ==60019== at 0x4005AB: main (mem_test.c:9) ==60019== Address 0x51f4044 is 0 bytes after a block of size 4 alloc'd ==60019== at 0x4C2C857: malloc (vg_replace_malloc.c:291) ==60019== by 0x400595: main (mem_test.c:8) ==60019== ==60019== Invalid read of size 4 ==60019== at 0x4005D1: main (mem_test.c:14) ==60019== Address 0x51f40a0 is 13 bytes after a block of size 3 alloc'd ==60019== at 0x4C2C857: malloc (vg_replace_malloc.c:291) ==60019== by 0x4005C4: main (mem_test.c:13) ==60019== ==60019== Invalid read of size 4 ==60019== at 0x4005F7: main (mem_test.c:18) ==60019== Address 0x51f4090 is 0 bytes inside a block of size 3 free'd ==60019== at 0x4C2B75D: free (vg_replace_malloc.c:468) ==60019== by 0x4005F2: main (mem_test.c:15) ==60019== ==60019== Invalid free() / delete / delete[] / realloc() ==60019== at 0x4C2B75D: free (vg_replace_malloc.c:468) ==60019== by 0x400618: main (mem_test.c:21) ==60019== Address 0x51f4090 is 0 bytes inside a block of size 3 free'd ==60019== at 0x4C2B75D: free (vg_replace_malloc.c:468) ==60019== by 0x4005F2: main (mem_test.c:15) ==60019== 00==60019== ==60019== HEAP SUMMARY: ==60019== in use at exit: 0 bytes in 0 blocks ==60019== total heap usage: 2 allocs, 3 frees, 7 bytes allocated ==60019== ==60019== All heap blocks were freed -- no leaks are possible ==60019== ==60019== For counts of detected and suppressed errors, rerun with: -v ==60019== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)錯誤一:Invalid write of size 2,試圖寫入一個非法的區域,valgrind 還好心告訴你這個地方是在 mem_test.c:6 allocate 出來的 memory 之後的 3 byte,通常遇到這種情況都是忘記檢查 buffer 的 size 就去用。
錯誤二:Invalid read of size 4,試圖讀取一個非法的區域。
錯誤三:Invalid read of size 4,讀取的區域已經被 free 了,free 的位置 valgrind 也幫你指出來 mem_test.c:12。
錯誤四:Invalid free,也就是 free 一個不存在的地方,或是 double free
其他提醒
遇到 Conditional jump or move depends on uninitialised value(s) 的錯誤可能是用了沒有結束字元 (Null-terminated string) 的字串
有一種錯誤不太好找,如 A function 用到用到 B function 產生出來的 memory,對於系統本身並無任何違法,但卻會造成程式出現不預期的值。
對於 local variable 的存取如果超過範圍,還有可能造成 stack corrupt,然後噴出這類的錯誤:
*** stack smashing detected ***: ./mem_test terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x64f8f47] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x64f8f10] ./mem_test(func+0x2db)[0x407f22] ======= Memory map: ======== 00400000-00420000 r-xp 00000000 08:01 280804 /tmp/mem_test 0061f000-00620000 r--p 0001f000 08:01 280804 /tmp/mem_test 00620000-00622000 rw-p 00020000 08:01 280804 /tmp/mem_test 00622000-00623000 rw-p 00000000 00:00 0 04000000-04022000 r-xp 00000000 08:01 160861 /lib/x86_64-linux-gnu/ld-2.15.so
如果不想 show possibly lost,可以加下面的參數
3.9版 --show-leak-kinds=definite
3.7版 --show-possibly-lost=no
另外 file descriptor 開了沒關也可以偵測,只要加上 --track-fds=yes
之前有遇過 valgrind 在 ubuntu 12.04 一直有錯誤訊息,記得要安裝額外的套件 sudo apt-get install libc6-dbg
12.04 - Valgrind does debug error - Ask Ubuntu