寫 C 的人對於記憶體管理要非常的精確,不像其他高階語言有 garbage collection,好習慣可以減少錯誤發生,讓 malloc() 與 free() 對等的出現,而且寫在同一個層級。但不免還是會有疏忽的時候,像我自己在寫 daemon,這種長期服務的程式,如果遇到 memroy leak,就會看到系統的記憶體一點一點的被吃光,最後免不了要重開的命運,如果是線上的服務器豈不就要重斷服務了!
好在有強大的
Valgrind 可以幫助我們真測出問題。
Case 1: Memory Leak
最常見的錯誤,就是 allocate 的記憶體忘記 free,而且已經沒有任何指標指著它。
我們舉個簡單的例子,並使用 valgrind 來協助偵測:
#include <stdlib.h>
void func (void)
{
char *buff = malloc(10);
}
int main (void)
{
func();
return 0;
}
編譯並執行 (記得編譯時要加 -g 的參數,valgrind 的報告才會指明有問題的行數)
# gcc -g -o mem_test mem_test.c
Valgrind 使用方法很簡單,不需要修改程式碼,用 valgrind 把自己的 program 帶起來即可,如果有參數就加在後面,程式執行結束就會產生報告。
valgrind [-valgrind parameter] ./my_program [-program parameter]
# valgrind --leak-check=full ./mem_test
==59095== Memcheck, a memory error detector
==59095== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==59095== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==59095== Command: ./mem_test
==59095==
==59095==
==59095== HEAP SUMMARY:
==59095== in use at exit: 10 bytes in 1 blocks
==59095== total heap usage: 1 allocs, 0 frees, 10 bytes allocated
==59095==
==59095== 10 bytes in 1 blocks are definitely lost in loss record 1 of 1
==59095== at 0x4C2C857: malloc (vg_replace_malloc.c:291)
==59095== by 0x400505: func (mem_test.c:5)
==59095== by 0x400514: main (mem_test.c:10)
==59095==
==59095== LEAK SUMMARY:
==59095== definitely lost: 10 bytes in 1 blocks
==59095== indirectly lost: 0 bytes in 0 blocks
==59095== possibly lost: 0 bytes in 0 blocks
==59095== still reachable: 0 bytes in 0 blocks
==59095== suppressed: 0 bytes in 0 blocks
==59095==
==59095== For counts of detected and suppressed errors, rerun with: -v
==59095== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
報告中指明有一個 10 bytes 的 block 被偵測為 "definitely lost",位置在 mem_test.c 的第 5 行
可以看到 memory lost 分成幾種類型:
- definitely lost: 真的 memory leak 了
- indirectly lost: 間接的 memory leak,structure 本身發生 memory leak,而內部的 member 如果是 allocate 的出來的,一樣會 memory leak,但是只要修好前面的問題,後面的問題也會跟著修復。
- possibly lost: allocate 一塊記憶體,並且放到指標 ptr,但事後又改變 ptr 指到這會計一體的中間 (這一點我目前也不是很清楚,建議看原文說明)
- still reachable: 程式結束時有未釋放的記憶體,不過卻還有指標指著,通常會發生在 global 變數
就算是 library 所 malloc 的記憶體,如 evbuffer_new(),也都可以偵測的到。
Case 2: Invalid Memory Access
Invalid memory access 有時候並不會立即造成 segmentation fault,所以不會有 core dump可以查詢,需要借助像 valgrind 這類的工具來偵測。
一般情況可能是用了 allocate 的 memory 之外的地方,或是用了已經 free 的 memory,請看下面幾個例子:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
int main (void)
{
// 1. Invalid write
char *str = malloc(4);
strcpy(str, "Brian");
free(str);
// 2. Invalid read
int *arr = malloc(3);
printf("%d", arr[4]);
free(arr);
// 3. Invalid read
printf("%d", arr[0]);
// 4. Invalid free
free(arr);
return 0;
}
編譯並執行
==60019== Memcheck, a memory error detector
==60019== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==60019== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==60019== Command: ./mem_test
==60019==
==60019== Invalid write of size 2
==60019== at 0x4005AB: main (mem_test.c:9)
==60019== Address 0x51f4044 is 0 bytes after a block of size 4 alloc'd
==60019== at 0x4C2C857: malloc (vg_replace_malloc.c:291)
==60019== by 0x400595: main (mem_test.c:8)
==60019==
==60019== Invalid read of size 4
==60019== at 0x4005D1: main (mem_test.c:14)
==60019== Address 0x51f40a0 is 13 bytes after a block of size 3 alloc'd
==60019== at 0x4C2C857: malloc (vg_replace_malloc.c:291)
==60019== by 0x4005C4: main (mem_test.c:13)
==60019==
==60019== Invalid read of size 4
==60019== at 0x4005F7: main (mem_test.c:18)
==60019== Address 0x51f4090 is 0 bytes inside a block of size 3 free'd
==60019== at 0x4C2B75D: free (vg_replace_malloc.c:468)
==60019== by 0x4005F2: main (mem_test.c:15)
==60019==
==60019== Invalid free() / delete / delete[] / realloc()
==60019== at 0x4C2B75D: free (vg_replace_malloc.c:468)
==60019== by 0x400618: main (mem_test.c:21)
==60019== Address 0x51f4090 is 0 bytes inside a block of size 3 free'd
==60019== at 0x4C2B75D: free (vg_replace_malloc.c:468)
==60019== by 0x4005F2: main (mem_test.c:15)
==60019==
00==60019==
==60019== HEAP SUMMARY:
==60019== in use at exit: 0 bytes in 0 blocks
==60019== total heap usage: 2 allocs, 3 frees, 7 bytes allocated
==60019==
==60019== All heap blocks were freed -- no leaks are possible
==60019==
==60019== For counts of detected and suppressed errors, rerun with: -v
==60019== ERROR SUMMARY: 4 errors from 4 contexts (suppressed: 0 from 0)
錯誤一:Invalid write of size 2,試圖寫入一個非法的區域,valgrind 還好心告訴你這個地方是在 mem_test.c:6 allocate 出來的 memory 之後的 3 byte,通常遇到這種情況都是忘記檢查 buffer 的 size 就去用。
錯誤二:Invalid read of size 4,試圖讀取一個非法的區域。
錯誤三:Invalid read of size 4,讀取的區域已經被 free 了,free 的位置 valgrind 也幫你指出來 mem_test.c:12。
錯誤四:Invalid free,也就是 free 一個不存在的地方,或是 double free
其他提醒
遇到 Conditional jump or move depends on uninitialised value(s) 的錯誤
可能是用了沒有結束字元 (Null-terminated string) 的字串
有一種錯誤不太好找,如 A function 用到用到 B function 產生出來的 memory,對於系統本身並無任何違法,但卻會造成程式出現不預期的值。
對於 local variable 的存取如果超過範圍,還有可能造成 stack corrupt,然後噴出這類的錯誤:
*** stack smashing detected ***: ./mem_test terminated
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x37)[0x64f8f47]
/lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x0)[0x64f8f10]
./mem_test(func+0x2db)[0x407f22]
======= Memory map: ========
00400000-00420000 r-xp 00000000 08:01 280804 /tmp/mem_test
0061f000-00620000 r--p 0001f000 08:01 280804 /tmp/mem_test
00620000-00622000 rw-p 00020000 08:01 280804 /tmp/mem_test
00622000-00623000 rw-p 00000000 00:00 0
04000000-04022000 r-xp 00000000 08:01 160861 /lib/x86_64-linux-gnu/ld-2.15.so
如果不想 show possibly lost,可以加下面的參數
3.9版 --show-leak-kinds=definite
3.7版 --show-possibly-lost=no
另外 file descriptor 開了沒關也可以偵測,只要加上 --track-fds=yes
之前有遇過 valgrind 在 ubuntu 12.04 一直有錯誤訊息,記得要安裝額外的套件 sudo apt-get install libc6-dbg
12.04 - Valgrind does debug error - Ask Ubuntu