新聞中心
10張圖22段代碼,萬字長文帶你搞懂虛擬內(nèi)存模型和Malloc內(nèi)部原理
作者:程序喵大人 2020-11-11 08:25:45
云計算
虛擬化
存儲軟件 我們會通過/proc文件系統(tǒng)找到正在運行的進程的字符串所在的虛擬內(nèi)存地址,并通過更改此內(nèi)存地址的內(nèi)容來更改字符串內(nèi)容,使你更深入了解虛擬內(nèi)存這個概念!

成都網(wǎng)站建設(shè)哪家好,找創(chuàng)新互聯(lián)建站!專注于網(wǎng)頁設(shè)計、成都網(wǎng)站建設(shè)公司、微信開發(fā)、小程序制作、集團成都定制網(wǎng)站等服務(wù)項目。核心團隊均擁有互聯(lián)網(wǎng)行業(yè)多年經(jīng)驗,服務(wù)眾多知名企業(yè)客戶;涵蓋的客戶類型包括:社區(qū)文化墻等眾多領(lǐng)域,積累了大量豐富的經(jīng)驗,同時也獲得了客戶的一致贊賞!
本文轉(zhuǎn)載自微信公眾號「程序喵大人」,作者程序喵大人 。轉(zhuǎn)載本文請聯(lián)系程序喵大人公眾號。
攤牌了,不裝了,其實我是程序喵辛苦工作一天還要回家編輯公眾號到大半夜的老婆,希望各位大哥能踴躍轉(zhuǎn)發(fā),完成我一千閱讀量的KPI(夢想),謝謝!
咳咳,有點跑題,以下是程序喵的廢話,麻煩給個面子劃到最后點擊在看或者贊,證明我比程序喵人氣高,謝謝!
通過/proc文件系統(tǒng)探究虛擬內(nèi)存
我們會通過/proc文件系統(tǒng)找到正在運行的進程的字符串所在的虛擬內(nèi)存地址,并通過更改此內(nèi)存地址的內(nèi)容來更改字符串內(nèi)容,使你更深入了解虛擬內(nèi)存這個概念!這之前先介紹下虛擬內(nèi)存的定義!
虛擬內(nèi)存
虛擬內(nèi)存是一種實現(xiàn)在計算機軟硬件之間的內(nèi)存管理技術(shù),它將程序使用到的內(nèi)存地址(虛擬地址)映射到計算機內(nèi)存中的物理地址,虛擬內(nèi)存使得應(yīng)用程序從繁瑣的管理內(nèi)存空間任務(wù)中解放出來,提高了內(nèi)存隔離帶來的安全性,虛擬內(nèi)存地址通常是連續(xù)的地址空間,由操作系統(tǒng)的內(nèi)存管理模塊控制,在觸發(fā)缺頁中斷時利用分頁技術(shù)將實際的物理內(nèi)存分配給虛擬內(nèi)存,而且64位機器虛擬內(nèi)存的空間大小遠超出實際物理內(nèi)存的大小,使得進程可以使用比物理內(nèi)存大小更多的內(nèi)存空間。
在深入研究虛擬內(nèi)存前,有幾個關(guān)鍵點:
- 每個進程都有它自己的虛擬內(nèi)存
- 虛擬內(nèi)存的大小取決于系統(tǒng)的體系結(jié)構(gòu)
- 不同操作管理有著不同的管理虛擬內(nèi)存的方式,但大多數(shù)操作系統(tǒng)的虛擬內(nèi)存結(jié)構(gòu)如下圖:
virtual_memory.png
上圖并不是特別詳細的內(nèi)存管理圖,高地址其實還有內(nèi)核空間等等,但這不是這篇文章的主題。從圖中可以看到高地址存儲著命令行參數(shù)和環(huán)境變量,之后是??臻g、堆空間和可執(zhí)行程序,其中棧空間向下延申,堆空間向上增長,堆空間需要使用malloc分配,是動態(tài)分配的內(nèi)存的一部分。
首先通過一個簡單的C程序探究虛擬內(nèi)存。
- #include
- #include
- #include
- /**
- * main - 使用strdup創(chuàng)建一個字符串的拷貝,strdup內(nèi)部會使用malloc分配空間,
- * 返回新空間的地址,這段地址空間需要外部自行使用free釋放
- *
- * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
- */
- int main(void)
- {
- char *s;
- s = strdup("test_memory");
- if (s == NULL)
- {
- fprintf(stderr, "Can't allocate mem with malloc\n");
- return (EXIT_FAILURE);
- }
- printf("%p\n", (void *)s);
- return (EXIT_SUCCESS);
- }
- 編譯運行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
- 輸出:0x88f010
我的機器是64位機器,進程的虛擬內(nèi)存高地址為0xffffffffffffffff, 低地址為0x0,而0x88f010遠小于0xffffffffffffffff,因此大概可以推斷出被復制的字符串的地址(堆地址)是在內(nèi)存低地址附近,具體可以通過/proc文件系統(tǒng)驗證.
ls /proc目錄可以看到好多文件,這里主要關(guān)注/proc/[pid]/mem和/proc/[pid]/maps
mem & maps
- man proc
- /proc/[pid]/mem
- This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).
- /proc/[pid]/maps
- A file containing the currently mapped memory regions and their access permissions.
- See mmap(2) for some further information about memory mappings.
- The format of the file is:
- address perms offset dev inode pathname
- 00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon
- 00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon
- 00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon
- 00e03000-00e24000 rw-p 00000000 00:00 0 [heap]
- 00e24000-011f7000 rw-p 00000000 00:00 0 [heap]
- ...
- 35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a21000-35b1a22000 rw-p 00000000 00:00 0
- 35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so
- ...
- f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986]
- ...
- 7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack]
- 7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
- The address field is the address space in the process that the mapping occupies.
- The perms field is a set of permissions:
- r = read
- w = write
- x = execute
- s = shared
- p = private (copy on write)
- The offset field is the offset into the file/whatever;
- dev is the device (major:minor); inode is the inode on that device. 0 indicates
- that no inode is associated with the memory region,
- as would be the case with BSS (uninitialized data).
- The pathname field will usually be the file that is backing the mapping.
- For ELF files, you can easily coordinate with the offset field
- by looking at the Offset field in the ELF program headers (readelf -l).
- There are additional helpful pseudo-paths:
- [stack]
- The initial process's (also known as the main thread's) stack.
- [stack:
] (since Linux 3.4) - A thread's stack (where the
is a thread ID). - It corresponds to the /proc/[pid]/task/[tid]/ path.
- [vdso] The virtual dynamically linked shared object.
- [heap] The process's heap.
- If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
- There is no easy way to coordinate
- this back to a process's source, short of running it through gdb(1), strace(1), or similar.
- Under Linux 2.0 there is no field giving pathname.
通過mem文件可以訪問和修改整個進程的內(nèi)存頁,通過maps可以看到進程當前已映射的內(nèi)存區(qū)域,有地址和訪問權(quán)限偏移量等,從maps中可以看到堆空間是在低地址而棧空間是在高地址. 從maps中可以看到heap的訪問權(quán)限是rw,即可寫,所以可以通過堆地址找到上個示例程序中字符串的地址,并通過修改mem文件對應(yīng)地址的內(nèi)容,就可以修改字符串的內(nèi)容啦,程序:
- #include
- #include
- #include
- #include
- /**
- * main - uses strdup to create a new string, loops forever-ever
- *
- * Return: EXIT_FAILURE if malloc failed. Other never returns
- */
- int main(void)
- {
- char *s;
- unsigned long int i;
- s = strdup("test_memory");
- if (s == NULL)
- {
- fprintf(stderr, "Can't allocate mem with malloc\n");
- return (EXIT_FAILURE);
- }
- i = 0;
- while (s)
- {
- printf("[%lu] %s (%p)\n", i, s, (void *)s);
- sleep(1);
- i++;
- }
- return (EXIT_SUCCESS);
- }
- 編譯運行:gcc -Wall -Wextra -pedantic -Werror main.c -o loop; ./loop
- 輸出:
- [0] test_memory (0x21dc010)
- [1] test_memory (0x21dc010)
- [2] test_memory (0x21dc010)
- [3] test_memory (0x21dc010)
- [4] test_memory (0x21dc010)
- [5] test_memory (0x21dc010)
- [6] test_memory (0x21dc010)
- ...
這里可以寫一個腳本通過/proc文件系統(tǒng)找到字符串所在位置并修改其內(nèi)容,相應(yīng)的輸出也會更改。
首先找到進程的進程號
- ps aux | grep ./loop | grep -v grep
- zjucad 2542 0.0 0.0 4352 636 pts/3 S+ 12:28 0:00 ./loop
2542即為loop程序的進程號,cat /proc/2542/maps得到
- 00400000-00401000 r-xp 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 00600000-00601000 r--p 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 00601000-00602000 rw-p 00001000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 021dc000-021fd000 rw-p 00000000 00:00 0 [heap]
- 7f2adae2a000-7f2adafea000 r-xp 00000000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adafea000-7f2adb1ea000 ---p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1ea000-7f2adb1ee000 r--p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1ee000-7f2adb1f0000 rw-p 001c4000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1f0000-7f2adb1f4000 rw-p 00000000 00:00 0
- 7f2adb1f4000-7f2adb21a000 r-xp 00000000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb3fa000-7f2adb3fd000 rw-p 00000000 00:00 0
- 7f2adb419000-7f2adb41a000 r--p 00025000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb41a000-7f2adb41b000 rw-p 00026000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb41b000-7f2adb41c000 rw-p 00000000 00:00 0
- 7ffd51bb3000-7ffd51bd4000 rw-p 00000000 00:00 0 [stack]
- 7ffd51bdd000-7ffd51be0000 r--p 00000000 00:00 0 [vvar]
- 7ffd51be0000-7ffd51be2000 r-xp 00000000 00:00 0 [vdso]
- ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
看見堆地址范圍021dc000-021fd000,并且可讀可寫,而且021dc000<0x21dc010<021fd000,這就可以確認字符串的地址在堆中,在堆中的索引是0x10(至于為什么是0x10,后面會講到),這時可以通過mem文件到0x21dc010地址修改內(nèi)容,字符串輸出的內(nèi)容也會隨之更改,這里通過python腳本實現(xiàn)此功能。
- #!/usr/bin/env python3
- '''
- Locates and replaces the first occurrence of a string in the heap
- of a process
- Usage: ./read_write_heap.py PID search_string replace_by_string
- Where:
- - PID is the pid of the target process
- - search_string is the ASCII string you are looking to overwrite
- - replace_by_string is the ASCII string you want to replace
- search_string with
- '''
- import sys
- def print_usage_and_exit():
- print('Usage: {} pid search write'.format(sys.argv[0]))
- sys.exit(1)
- # check usage
- if len(sys.argv) != 4:
- print_usage_and_exit()
- # get the pid from args
- pid = int(sys.argv[1])
- if pid <= 0:
- print_usage_and_exit()
- search_string = str(sys.argv[2])
- if search_string == "":
- print_usage_and_exit()
- write_string = str(sys.argv[3])
- if search_string == "":
- print_usage_and_exit()
- # open the maps and mem files of the process
- maps_filename = "/proc/{}/maps".format(pid)
- print("[*] maps: {}".format(maps_filename))
- mem_filename = "/proc/{}/mem".format(pid)
- print("[*] mem: {}".format(mem_filename))
- # try opening the maps file
- try:
- maps_file = open('/proc/{}/maps'.format(pid), 'r')
- except IOError as e:
- print("[ERROR] Can not open file {}:".format(maps_filename))
- print(" I/O error({}): {}".format(e.errno, e.strerror))
- sys.exit(1)
- for line in maps_file:
- sline = line.split(' ')
- # check if we found the heap
- if sline[-1][:-1] != "[heap]":
- continue
- print("[*] Found [heap]:")
- # parse line
- addr = sline[0]
- perm = sline[1]
- offset = sline[2]
- device = sline[3]
- inode = sline[4]
- pathname = sline[-1][:-1]
- print("\tpathname = {}".format(pathname))
- print("\taddresses = {}".format(addr))
- print("\tpermisions = {}".format(perm))
- print("\toffset = {}".format(offset))
- print("\tinode = {}".format(inode))
- # check if there is read and write permission
- if perm[0] != 'r' or perm[1] != 'w':
- print("[*] {} does not have read/write permission".format(pathname))
- maps_file.close()
- exit(0)
- # get start and end of the heap in the virtual memory
- addr = addr.split("-")
- if len(addr) != 2: # never trust anyone, not even your OS :)
- print("[*] Wrong addr format")
- maps_file.close()
- exit(1)
- addr_start = int(addr[0], 16)
- addr_end = int(addr[1], 16)
- print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
- # open and read mem
- try:
- mem_file = open(mem_filename, 'rb+')
- except IOError as e:
- print("[ERROR] Can not open file {}:".format(mem_filename))
- print(" I/O error({}): {}".format(e.errno, e.strerror))
- maps_file.close()
- exit(1)
- # read heap
- mem_file.seek(addr_start)
- heap = mem_file.read(addr_end - addr_start)
- # find string
- try:
- i = heap.index(bytes(search_string, "ASCII"))
- except Exception:
- print("Can't find '{}'".format(search_string))
- maps_file.close()
- mem_file.close()
- exit(0)
- print("[*] Found '{}' at {:x}".format(search_string, i))
- # write the new string
- print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
- mem_file.seek(addr_start + i)
- mem_file.write(bytes(write_string, "ASCII"))
- # close files
- maps_file.close()
- mem_file.close()
- # there is only one heap in our example
- break
運行這個Python腳本
- zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ sudo ./loop.py 2542 test_memory test_hello
- [*] maps: /proc/2542/maps
- [*] mem: /proc/2542/mem
- [*] Found [heap]:
- pathname = [heap]
- addresses = 021dc000-021fd000
- permisions = rw-p
- offset = 00000000
- inode = 0
- Addr start [21dc000] | end [21fd000]
- [*] Found 'test_memory' at 10
- [*] Writing 'test_hello' at 21dc010
同時字符串輸出的內(nèi)容也已更改
- [633] test_memory (0x21dc010)
- [634] test_memory (0x21dc010)
- [635] test_memory (0x21dc010)
- [636] test_memory (0x21dc010)
- [637] test_memory (0x21dc010)
- [638] test_memory (0x21dc010)
- [639] test_memory (0x21dc010)
- [640] test_helloy (0x21dc010)
- [641] test_helloy (0x21dc010)
- [642] test_helloy (0x21dc010)
- [643] test_helloy (0x21dc010)
- [644] test_helloy (0x21dc010)
- [645] test_helloy (0x21dc010)
實驗成功。
通過實踐畫出虛擬內(nèi)存空間分布圖
再列出內(nèi)存空間分布圖
基本上每個人或多或少都了解虛擬內(nèi)存的空間分布,那如何驗證它呢,下面會提到。
堆棧空間
首先驗證??臻g的位置,我們都知道C中局部變量是存儲在??臻g的,malloc分配的內(nèi)存是存儲在堆空間,所以可以通過打印出局部變量地址和malloc的返回內(nèi)存地址的方式來驗證堆棧空間在整個虛擬空間中的位置。
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(void)
- {
- int a;
- void *p;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- return (EXIT_SUCCESS);
- }
- 編譯運行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
- 輸出:
- Address of a: 0x7ffedde9c7fc
- Allocated space in the heap: 0x55ca5b360670
通過結(jié)果可以看出堆地址空間在棧地址空間下面,整理如圖:
可執(zhí)行程序
可執(zhí)行程序也在虛擬內(nèi)存中,可以通過打印main函數(shù)的地址,并與堆棧地址相比較,即可知道可執(zhí)行程序地址相對于堆棧地址的分布。
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(void)
- {
- int a;
- void *p;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- printf("Address of function main: %p\n", (void *)main);
- return (EXIT_SUCCESS);
- }
- 編譯運行:gcc main.c -o test; ./test
- 輸出:
- Address of a: 0x7ffed846de2c
- Allocated space in the heap: 0x561b9ee8c670
- Address of function main: 0x561b9deb378a
由于main(0x561b9deb378a) < heap(0x561b9ee8c670) < (0x7ffed846de2c),可以畫出分布圖如下:
virtual_memory_stack_heap_executable.png
命令行參數(shù)和環(huán)境變量
程序入口main函數(shù)可以攜帶參數(shù):
- 第一個參數(shù)(argc): 命令行參數(shù)的個數(shù)
- 第二個參數(shù)(argv): 指向命令行參數(shù)數(shù)組的指針
- 第三個參數(shù)(env): 指向環(huán)境變量數(shù)組的指針
通過程序可以看見這些元素在虛擬內(nèi)存中的位置:
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(int ac, char **av, char **env)
- {
- int a;
- void *p;
- int i;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- printf("Address of function main: %p\n", (void *)main);
- printf("First bytes of the main function:\n\t");
- for (i = 0; i < 15; i++)
- {
- printf("%02x ", ((unsigned char *)main)[i]);
- }
- printf("\n");
- printf("Address of the array of arguments: %p\n", (void *)av);
- printf("Addresses of the arguments:\n\t");
- for (i = 0; i < ac; i++)
- {
- printf("[%s]:%p ", av[i], av[i]);
- }
- printf("\n");
- printf("Address of the array of environment variables: %p\n", (void *)env);
- printf("Address of the first environment variable: %p\n", (void *)(env[0]));
- return (EXIT_SUCCESS);
- }
- 編譯運行:gcc main.c -o test; ./test nihao hello
- 輸出:
- Address of a: 0x7ffcc154a748
- Allocated space in the heap: 0x559bd1bee670
- Address of function main: 0x559bd09807ca
- First bytes of the main function:
- 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
- Address of the array of arguments: 0x7ffcc154a848
- Addresses of the arguments:
- [./test]:0x7ffcc154b94f [nihao]:0x7ffcc154b956 [hello]:0x7ffcc154b95c
- Address of the array of environment variables: 0x7ffcc154a868
- Address of the first environment variable: 0x7ffcc154b962
結(jié)果如下:
main(0x559bd09807ca) < heap(0x559bd1bee670) < stack(0x7ffcc154a748) < argv(0x7ffcc154a848) < env(0x7ffcc154a868) < arguments(0x7ffcc154b94f->0x7ffcc154b95c + 6)(6為hello+1('\0')) < env first(0x7ffcc154b962)
可以看出所有的命令行參數(shù)都是相鄰的,并且緊接著就是環(huán)境變量。
argv和env數(shù)組地址是相鄰的嗎
上例中argv有4個元素,命令行中有三個參數(shù),還有一個NULL指向標記數(shù)組的末尾,每個指針是8字節(jié),8*4=32, argv(0x7ffcc154a848) + 32(0x20) = env(0x7ffcc154a868),所以argv和env數(shù)組指針是相鄰的.
命令行參數(shù)地址緊隨環(huán)境變量地址之后嗎
首先需要獲取環(huán)境變量數(shù)組的大小,環(huán)境變量數(shù)組是以NULL結(jié)束的,所以可以遍歷env數(shù)組,檢查是否為NULL,獲取數(shù)組大小,代碼如下:
- #include
- #include
- #include
- /**
- * main - print locations of various elements
- *
- * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
- */
- int main(int ac, char **av, char **env)
- {
- int a;
- void *p;
- int i;
- int size;
- printf("Address of a: %p\n", (void *)&a);
- p = malloc(98);
- if (p == NULL)
- {
- fprintf(stderr, "Can't malloc\n");
- return (EXIT_FAILURE);
- }
- printf("Allocated space in the heap: %p\n", p);
- printf("Address of function main: %p\n", (void *)main);
- printf("First bytes of the main function:\n\t");
- for (i&n
名稱欄目:10張圖22段代碼,萬字長文帶你搞懂虛擬內(nèi)存模型和Malloc內(nèi)部原理
當前鏈接:http://fisionsoft.com.cn/article/copdeeo.html


咨詢
建站咨詢
