
本文转载自微信公众号「程序喵大人」,作者程序喵大人 。转载本文请联系程序喵大人公众号。
摊牌了,不装了,其实我是程序喵辛苦工作一天还要回家编辑公众号到大半夜的老婆,希望各位大哥能踊跃转发,完成我一千阅读量的KPI(梦想),谢谢!
咳咳,有点跑题,以下是程序喵的废话,麻烦给个面子划到最后点击在看或者赞,证明我比程序喵人气高,谢谢!
通过/proc文件系统探究虚拟内存
我们会通过/proc文件系统找到正在运行的进程的字符串所在的虚拟内存地址,并通过更改此内存地址的内容来更改字符串内容,使你更深入了解虚拟内存这个概念!这之前先介绍下虚拟内存的定义!
虚拟内存
虚拟内存是一种实现在计算机软硬件之间的内存管理技术,它将程序使用到的内存地址(虚拟地址)映射到计算机内存中的物理地址,虚拟内存使得应用程序从繁琐的管理内存空间任务中解放出来,提高了内存隔离带来的安全性,虚拟内存地址通常是连续的地址空间,由操作系统的内存管理模块控制,在触发缺页中断时利用分页技术将实际的物理内存分配给虚拟内存,而且64位机器虚拟内存的空间大小远超出实际物理内存的大小,使得进程可以使用比物理内存大小更多的内存空间。
在深入研究虚拟内存前,有几个关键点:
每个进程都有它自己的虚拟内存 虚拟内存的大小取决于系统的体系结构 不同操作管理有着不同的管理虚拟内存的方式,但大多数操作系统的虚拟内存结构如下图:
virtual_memory.png
上图并不是特别详细的内存管理图,高地址其实还有内核空间等等,但这不是这篇文章的主题。从图中可以看到高地址存储着命令行参数和环境变量,之后是栈空间、堆空间和可执行程序,其中栈空间向下延申,堆空间向上增长,堆空间需要使用malloc分配,是动态分配的内存的一部分。
首先通过一个简单的C程序探究虚拟内存。
- #include
#include - #include
- int main(void) {
- char *s;
- s = strdup("test_memory"); if (s == NULL)
- { fprintf(stderr, "Can't allocate mem with mallocn");
- return (EXIT_FAILURE); }
- printf("%pn", (void *)s); return (EXIT_SUCCESS);
- }
- 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test 输出:0x88f010
我的机器是64位机器,进程的虚拟内存高地址为0xffffffffffffffff, 低地址为0x0,而0x88f010远小于0xffffffffffffffff,因此大概可以推断出被复制的字符串的地址(堆地址)是在内存低地址附近,具体可以通过/proc文件系统验证.
ls /proc目录可以看到好多文件,这里主要关注/proc/[pid]/mem和/proc/[pid]/maps
mem & maps
- man proc
- /proc/[pid]/mem This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).
- /proc/[pid]/maps
- A file containing the currently mapped memory regions and their access permissions. See mmap(2) for some further information about memory mappings.
- The format of the file is:
- address perms offset dev inode pathname
- 00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon 00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon
- 00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon 00e03000-00e24000 rw-p 00000000 00:00 0 [heap]
- 00e24000-011f7000 rw-p 00000000 00:00 0 [heap] ...
- 35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so 35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so
- 35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so 35b1a21000-35b1a22000 rw-p 00000000 00:00 0
- 35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so 35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
- 35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so 35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so
- ... f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986]
- ... 7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack]
- 7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
- The address field is the address space in the process that the mapping occupies. The perms field is a set of permissions:
- r = read
- w = write x = execute
- s = shared p = private (copy on write)
- The offset field is the offset into the file/whatever;
- dev is the device (major:minor); inode is the inode on that device. 0 indicates that no inode is associated with the memory region,
- as would be the case with BSS (uninitialized data).
- The pathname field will usually be the file that is backing the mapping. For ELF files, you can easily coordinate with the offset field
- by looking at the Offset field in the ELF program headers (readelf -l).
- There are additional helpful pseudo-paths:
- [stack] The initial process's (also known as the main thread's) stack.
- [stack:
] (since Linux 3.4) - A thread's stack (where the
is a thread ID). It corresponds to the /proc/[pid]/task/[tid]/ path. - [vdso] The virtual dynamically linked shared object.
- [heap] The process's heap.
- If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
- There is no easy way to coordinate this back to a process's source, short of running it through gdb(1), strace(1), or similar.
- Under Linux 2.0 there is no field giving pathname.
通过mem文件可以访问和修改整个进程的内存页,通过maps可以看到进程当前已映射的内存区域,有地址和访问权限偏移量等,从maps中可以看到堆空间是在低地址而栈空间是在高地址. 从maps中可以看到heap的访问权限是rw,即可写,所以可以通过堆地址找到上个示例程序中字符串的地址,并通过修改mem文件对应地址的内容,就可以修改字符串的内容啦,程序:
- #include
#include - #include
#include - int main(void) {
- char *s; unsigned long int i;
- s = strdup("test_memory");
- if (s == NULL) {
- fprintf(stderr, "Can't allocate mem with mallocn"); return (EXIT_FAILURE);
- } i = 0;
- while (s) {
- printf("[%lu] %s (%p)n", i, s, (void *)s); sleep(1);
- i++; }
- return (EXIT_SUCCESS); }
- 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o loop; ./loop 输出:
- [0] test_memory (0x21dc010) [1] test_memory (0x21dc010)
- [2] test_memory (0x21dc010) [3] test_memory (0x21dc010)
- [4] test_memory (0x21dc010) [5] test_memory (0x21dc010)
- [6] test_memory (0x21dc010) ...
这里可以写一个脚本通过/proc文件系统找到字符串所在位置并修改其内容,相应的输出也会更改。
首先找到进程的进程号
- ps aux | grep ./loop | grep -v grep zjucad 2542 0.0 0.0 4352 636 pts/3 S+ 12:28 0:00 ./loop
2542即为loop程序的进程号,cat /proc/2542/maps得到
- 00400000-00401000 r-xp 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop 00600000-00601000 r--p 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
- 00601000-00602000 rw-p 00001000 08:01 811716 /home/zjucad/wangzhiqiang/loop 021dc000-021fd000 rw-p 00000000 00:00 0 [heap]
- 7f2adae2a000-7f2adafea000 r-xp 00000000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so 7f2adafea000-7f2adb1ea000 ---p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1ea000-7f2adb1ee000 r--p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so 7f2adb1ee000-7f2adb1f0000 rw-p 001c4000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f2adb1f0000-7f2adb1f4000 rw-p 00000000 00:00 0 7f2adb1f4000-7f2adb21a000 r-xp 00000000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb3fa000-7f2adb3fd000 rw-p 00000000 00:00 0 7f2adb419000-7f2adb41a000 r--p 00025000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f2adb41a000-7f2adb41b000 rw-p 00026000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so 7f2adb41b000-7f2adb41c000 rw-p 00000000 00:00 0
- 7ffd51bb3000-7ffd51bd4000 rw-p 00000000 00:00 0 [stack] 7ffd51bdd000-7ffd51be0000 r--p 00000000 00:00 0 [vvar]
- 7ffd51be0000-7ffd51be2000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
看见堆地址范围021dc000-021fd000,并且可读可写,而且021dc000<0x21dc010<021fd000,这就可以确认字符串的地址在堆中,在堆中的索引是0x10(至于为什么是0x10,后面会讲到),这时可以通过mem文件到0x21dc010地址修改内容,字符串输出的内容也会随之更改,这里通过python脚本实现此功能。
- #!/usr/bin/env python3 '''
- Locates and replaces the first occurrence of a string in the heap of a process
- Usage: ./read_write_heap.py PID search_string replace_by_string
- Where: - PID is the pid of the target process
- - search_string is the ASCII string you are looking to overwrite - replace_by_string is the ASCII string you want to replace
- search_string with '''
- import sys
- def print_usage_and_exit():
- print('Usage: {} pid search write'.format(sys.argv[0])) sys.exit(1)
- # check usage
- if len(sys.argv) != 4: print_usage_and_exit()
- # get the pid from args
- pid = int(sys.argv[1]) if pid <= 0:
- print_usage_and_exit() search_string = str(sys.argv[2])
- if search_string == "": print_usage_and_exit()
- write_string = str(sys.argv[3]) if search_string == "":
- print_usage_and_exit()
- # open the maps and mem files of the process maps_filename = "/proc/{}/maps".format(pid)
- print("[*] maps: {}".format(maps_filename)) mem_filename = "/proc/{}/mem".format(pid)
- print("[*] mem: {}".format(mem_filename))
- # try opening the maps file try:
- maps_file = open('/proc/{}/maps'.format(pid), 'r') except IOError as e:
- print("[ERROR] Can not open file {}:".format(maps_filename)) print(" I/O error({}): {}".format(e.errno, e.strerror))
- sys.exit(1)
- for line in maps_file: sline = line.split(' ')
- # check if we found the heap if sline[-1][:-1] != "[heap]":
- continue print("[*] Found [heap]:")
- # parse line
- addr = sline[0] perm = sline[1]
- offset = sline[2] device = sline[3]
- inode = sline[4] pathname = sline[-1][:-1]
- print("tpathname = {}".format(pathname)) print("taddresses = {}".format(addr))
- print("tpermisions = {}".format(perm)) print("toffset = {}".format(offset))
- print("tinode = {}".format(inode))
- # check if there is read and write permission if perm[0] != 'r' or perm[1] != 'w':
- print("[*] {} does not have read/write permission".format(pathname)) maps_file.close()
- exit(0)
- # get start and end of the heap in the virtual memory addr = addr.split("-")
- if len(addr) != 2: # never trust anyone, not even your OS :) print("[*] Wrong addr format")
- maps_file.close() exit(1)
- addr_start = int(addr[0], 16) addr_end = int(addr[1], 16)
- print("tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
- # open and read mem try:
- mem_file = open(mem_filename, 'rb+') except IOError as e:
- print("[ERROR] Can not open file {}:".format(mem_filename)) print(" I/O error({}): {}".format(e.errno, e.strerror))
- maps_file.close() exit(1)
- # read heap
- mem_file.seek(addr_start) heap = mem_file.read(addr_end - addr_start)
- # find string
- try: i = heap.index(bytes(search_string, "ASCII"))
- except Exception: print("Can't find '{}'".format(search_string))
- maps_file.close() mem_file.close()
- exit(0) print("[*] Found '{}' at {:x}".format(search_string, i))
- # write the new string
- print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i)) mem_file.seek(addr_start + i)
- mem_file.write(bytes(write_string, "ASCII"))
- # close files maps_file.close()
- mem_file.close()
- # there is only one heap in our example break
运行这个Python脚本
- zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ sudo ./loop.py 2542 test_memory test_hello [*] maps: /proc/2542/maps
- [*] mem: /proc/2542/mem [*] Found [heap]:
- pathname = [heap] addresses = 021dc000-021fd000
- permisions = rw-p offset = 00000000
- inode = 0 Addr start [21dc000] | end [21fd000]
- [*] Found 'test_memory' at 10 [*] Writing 'test_hello' at 21dc010
同时字符串输出的内容也已更改
- [633] test_memory (0x21dc010) [634] test_memory (0x21dc010)
- [635] test_memory (0x21dc010) [636] test_memory (0x21dc010)
- [637] test_memory (0x21dc010) [638] test_memory (0x21dc010)
- [639] test_memory (0x21dc010) [640] test_helloy (0x21dc010)
- [641] test_helloy (0x21dc010) [642] test_helloy (0x21dc010)
- [643] test_helloy (0x21dc010) [644] test_helloy (0x21dc010)
- [645] test_helloy (0x21dc010)
实验成功。
通过实践画出虚拟内存空间分布图
再列出内存空间分布图
基本上每个人或多或少都了解虚拟内存的空间分布,那如何验证它呢,下面会提到。
堆栈空间
首先验证栈空间的位置,我们都知道C中局部变量是存储在栈空间的,malloc分配的内存是存储在堆空间,所以可以通过打印出局部变量地址和malloc的返回内存地址的方式来验证堆栈空间在整个虚拟空间中的位置。
- #include
#include - #include
- int main(void)
- { int a;
- void *p;
- printf("Address of a: %pn", (void *)&a); p = malloc(98);
- if (p == NULL) {
- fprintf(stderr, "Can't mallocn"); return (EXIT_FAILURE);
- } printf("Allocated space in the heap: %pn", p);
- return (EXIT_SUCCESS); }
- 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test 输出:
- Address of a: 0x7ffedde9c7fc Allocated space in the heap: 0x55ca5b360670
通过结果可以看出堆地址空间在栈地址空间下面,整理如图:
可执行程序
可执行程序也在虚拟内存中,可以通过打印main函数的地址,并与堆栈地址相比较,即可知道可执行程序地址相对于堆栈地址的分布。
- #include
#include - #include
- int main(void)
- { int a;
- void *p;
- printf("Address of a: %pn", (void *)&a); p = malloc(98);
- if (p == NULL) {
- fprintf(stderr, "Can't mallocn"); return (EXIT_FAILURE);
- } printf("Allocated space in the heap: %pn", p);
- printf("Address of function main: %pn", (void *)main); return (EXIT_SUCCESS);
- } 编译运行:gcc main.c -o test; ./test
- 输出: Address of a: 0x7ffed846de2c
- Allocated space in the heap: 0x561b9ee8c670 Address of function main: 0x561b9deb378a
由于main(0x561b9deb378a) < heap(0x561b9ee8c670) < (0x7ffed846de2c),可以画出分布图如下:
virtual_memory_stack_heap_executable.png
命令行参数和环境变量
程序入口main函数可以携带参数:
第一个参数(argc): 命令行参数的个数 第二个参数(argv): 指向命令行参数数组的指针 第三个参数(env): 指向环境变量数组的指针
通过程序可以看见这些元素在虚拟内存中的位置:
- #include
#include - #include
- int main(int ac, char **av, char **env)
- { int a;
- void *p; int i;
- printf("Address of a: %pn", (void *)&a);
- p = malloc(98); if (p == NULL)
- { fprintf(stderr, "Can't mallocn");
- return (EXIT_FAILURE); }
- printf("Allocated space in the heap: %pn", p); printf("Address of function main: %pn", (void *)main);
- printf("First bytes of the main function:nt"); for (i = 0; i < 15; i++)
- { printf("%02x ", ((unsigned char *)main)[i]);
- } printf("n");
- printf("Address of the array of arguments: %pn", (void *)av); printf("Addresses of the arguments:nt");
- for (i = 0; i < ac; i++) {
- printf("[%s]:%p ", av[i], av[i]); }
- printf("n"); printf("Address of the array of environment variables: %pn", (void *)env);
- printf("Address of the first environment variable: %pn", (void *)(env[0])); return (EXIT_SUCCESS);
- } 编译运行:gcc main.c -o test; ./test nihao hello
- 输出: Address of a: 0x7ffcc154a748
- Allocated space in the heap: 0x559bd1bee670 Address of function main: 0x559bd09807ca
- First bytes of the main function: 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
- Address of the array of arguments: 0x7ffcc154a848 Addresses of the arguments:
- [./test]:0x7ffcc154b94f [nihao]:0x7ffcc154b956 [hello]:0x7ffcc154b95c Address of the array of environment variables: 0x7ffcc154a868
- Address of the first environment variable: 0x7ffcc154b962
结果如下:
main(0x559bd09807ca) < heap(0x559bd1bee670) < stack(0x7ffcc154a748) < argv(0x7ffcc154a848) < env(0x7ffcc154a868) < arguments(0x7ffcc154b94f->0x7ffcc154b95c + 6)(6为hello+1(' ')) < env first(0x7ffcc154b962)
可以看出所有的命令行参数都是相邻的,并且紧接着就是环境变量。
argv和env数组地址是相邻的吗
上例中argv有4个元素,命令行中有三个参数,还有一个NULL指向标记数组的末尾,每个指针是8字节,8*4=32, argv(0x7ffcc154a848) + 32(0x20) = env(0x7ffcc154a868),所以argv和env数组指针是相邻的.
命令行参数地址紧随环境变量地址之后吗
首先需要获取环境变量数组的大小,环境变量数组是以NULL结束的,所以可以遍历env数组,检查是否为NULL,获取数组大小,代码如下:
- #include
#include - #include
- int main(int ac, char **av, char **env)
- { int a;
- void *p; int i;
- int size;
- printf("Address of a: %pn", (void *)&a); p = malloc(98);
- if (p == NULL) {
- fprintf(stderr, "Can't mallocn"); return (EXIT_FAILURE);
- } printf("Allocated space in the heap: %pn", p);
- printf("Address of function main: %pn", (void *)main); printf("First bytes of the main function:nt");
- for (i = 0; i < 15; i++) {
- printf("%02x ", ((unsigned char *)main)[i]); }
- printf("n"); printf("Address of the array of arguments: %pn", (void *)av);
- printf("Addresses of the arguments:nt"); for (i = 0; i < ac; i++)
- { printf("[%s]:%p ", av[i], av[i]);
- } printf("n");
- printf("Address of the array of environment variables: %pn", (void *)env); printf("Address of the first environment variables:n");
- for (i = 0; i < 3; i++) {
- printf("t[%p]:"%s"n", env[i], env[i]); }
- i = 0;
- while (env[i] != NULL) {
- i++; }
- i++; size = i * sizeof(char *);
- printf("Size of the array env: %d elements -> %d bytes (0x%x)n", i, size, size); return (EXIT_SUCCESS);
- }
- 编译运行:gcc main.c -o test; ./test nihao hello 输出:
- Address of a: 0x7ffd5ebadff4 Allocated space in the heap: 0x562ba4e13670
- Address of function main: 0x562ba2f1881a First bytes of the main function:
- 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0 Address of the array of arguments: 0x7ffd5ebae0f8
- Addresses of the arguments: [./test]:0x7ffd5ebae94f [nihao]:0x7ffd5ebae956 [hello]:0x7ffd5ebae95c
- Address of the array of environment variables: 0x7ffd5ebae118 Address of the first environment variables:
- [0x7ffd5ebae962]:"LS_COLORS=rs=0:di=01;34:ln=01;36:mh=00:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:mi=00:su=37;41:sg=30;43:ca=30;41:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.arc=01;31:*.arj=01;31:*.taz=01;31:*.lha=01;31:*.lz4=01;31:*.lzh=01;31:*.lzma=01;31:*.tlz=01;31:*.txz=01;31:*.tzo=01;31:*.t7z=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.lrz=01;31:*.lz=01;31:*.lzo=01;31:*.xz=01;31:*.zst=01;31:*.tzst=01;31:*.bz2=01;31:*.bz=01;31:*.tbz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.war=01;31:*.ear=01;31:*.sar=01;31:*.rar=01;31:*.alz=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.cab=01;31:*.wim=01;31:*.swm=01;31:*.dwm=01;31:*.esd=01;31:*.jpg=01;35:*.jpeg=01;35:*.mjpg=01;35:*.mjpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.svgz=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.webm=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.flv=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.cgm=01;35:*.emf=01;35:*.ogv=01;35:*.ogx=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.m4a=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:*.oga=00;36:*.opus=00;36:*.spx=00;36:*.xspf=00;36:" [0x7ffd5ebaef4e]:"HOSTNAME=3e8650948c0c"
- [0x7ffd5ebaef64]:"OLDPWD=/" Size of the array env: 11 elements -> 88 bytes (0x58)
- 运算结果如下:
- root@3e8650948c0c:/ubuntu# bc bc 1.07.1
- Copyright 1991-1994, 1997, 1998, 2000, 2004, 2006, 2008, 2012-2017 Free Software Foundation, Inc. This is free software with ABSOLUTELY NO WARRANTY.
- For details type `warranty'. obase=16
- ibase=16 58+7ffd5ebae118
- (standard_in) 3: syntax error 58+7FFD5EBAE118
- 7FFD5EBAE170 quit
- 通过结果可知7FFD5EBAE170 != 0x7
通过结果可知7FFD5EBAE170 != 0x7ffd5ebae94f,所以命令行参数地址不是紧随环境变量地址之后。
截至目前画出图表如下:
栈内存真的向下增长吗
可以通过调用函数来确认,如果真的是向下增长,那么调用函数的地址应该高于被调用函数地址, 代码如下:
- #include
#include - #include
- void f(void) {
- int a; int b;
- int c;
- a = 98; b = 1024;
- c = a * b; printf("[f] a = %d, b = %d, c = a * b = %dn", a, b, c);
- printf("[f] Adresses of a: %p, b = %p, c = %pn", (void *)&a, (void *)&b, (void *)&c); }
- int main(int ac, char **av, char **env)
- { int a;
- void *p; int i;
- int size;
- printf("Address of a: %pn", (void *)&a); p = malloc(98);
- if (p == NULL) {
- fprintf(stderr, "Can't mallocn"); return (EXIT_FAILURE);
- } printf("Allocated space in the heap: %pn", p);
- printf("Address of function main: %pn", (void *)main); f();
- return (EXIT_SUCCESS); }
- 编译运行:gcc main.c -o test; ./test 输出:
- Address of a: 0x7ffefc75083c Allocated space in the heap: 0x564d46318670
- Address of function main: 0x564d45b9880e [f] a = 98, b = 1024, c = a * b = 100352
- [f] Adresses of a: 0x7ffefc7507ec, b = 0x7ffefc7507f0, c = 0x7ffefc7507f4
结果可知: f{a} 0x7ffefc7507ec < main{a} 0x7ffefc75083c
可画图如下:
其实也可以写一个简单的代码,通过查看/proc文件系统中map内容来查看内存分布,这里就不举例啦。
堆内存(malloc)
malloc
malloc是常用的动态分配内存的函数,malloc申请的内存分配在堆中,注意malloc是glibc函数,不是系统调用。
man malloc:
- [...] allocate dynamic memory[...] void *malloc(size_t size);
- [...] The malloc() function allocates size bytes and returns a pointer to the allocated memory.
不调用malloc,就不会有堆空间[heap]
看一段不调用malloc的代码
- #include
#include - int main(void) {
- getchar(); return (EXIT_SUCCESS);
- } 编译运行:gcc test.c -o 2; ./2
- step 1 : ps aux | grep ./2$ 输出:
- zjucad 3023 0.0 0.0 4352 788 pts/3 S+ 13:58 0:00 ./2 step 2 : /proc/3023/maps
- 输出: 00400000-00401000 r-xp 00000000 08:01 811723 /home/zjucad/wangzhiqiang/2
- 00600000-00601000 r--p 00000000 08:01 811723 /home/zjucad/wangzhiqiang/2 00601000-00602000 rw-p 00001000 08:01 811723 /home/zjucad/wangzhiqiang/2
- 007a4000-007c5000 rw-p 00000000 00:00 0 [heap] 7f954ca02000-7f954cbc2000 r-xp 00000000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f954cbc2000-7f954cdc2000 ---p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so 7f954cdc2000-7f954cdc6000 r--p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7f954cdc6000-7f954cdc8000 rw-p 001c4000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so 7f954cdc8000-7f954cdcc000 rw-p 00000000 00:00 0
- 7f954cdcc000-7f954cdf2000 r-xp 00000000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so 7f954cfd2000-7f954cfd5000 rw-p 00000000 00:00 0
- 7f954cff1000-7f954cff2000 r--p 00025000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so 7f954cff2000-7f954cff3000 rw-p 00026000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7f954cff3000-7f954cff4000 rw-p 00000000 00:00 0 7ffed68a1000-7ffed68c2000 rw-p 00000000 00:00 0 [stack]
- 7ffed690e000-7ffed6911000 r--p 00000000 00:00 0 [vvar] 7ffed6911000-7ffed6913000 r-xp 00000000 00:00 0 [vdso]
- ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
可以看到,如果不调用malloc,maps中就没有[heap]
下面运行一个带有malloc的程序
- #include
#include - int main(void) {
- void *p;
- p = malloc(1); printf("%pn", p);
- getchar(); return (EXIT_SUCCESS);
- } 编译运行:gcc test.c -o 3; ./3
- 输出:0xcc7010 验证步骤及输出:
- zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ ps aux | grep ./3$ zjucad 3113 0.0 0.0 4352 644 pts/3 S+ 14:06 0:00 ./3
- zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ cat /proc/3113/maps 00400000-00401000 r-xp 00000000 08:01 811726 /home/zjucad/wangzhiqiang/3
- 00600000-00601000 r--p 00000000 08:01 811726 /home/zjucad/wangzhiqiang/3 00601000-00602000 rw-p 00001000 08:01 811726 /home/zjucad/wangzhiqiang/3
- 00cc7000-00ce8000 rw-p 00000000 00:00 0 [heap] 7fc7e9128000-7fc7e92e8000 r-xp 00000000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7fc7e92e8000-7fc7e94e8000 ---p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so 7fc7e94e8000-7fc7e94ec000 r--p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
- 7fc7e94ec000-7fc7e94ee000 rw-p 001c4000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so 7fc7e94ee000-7fc7e94f2000 rw-p 00000000 00:00 0
- 7fc7e94f2000-7fc7e9518000 r-xp 00000000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so 7fc7e96f8000-7fc7e96fb000 rw-p 00000000 00:00 0
- 7fc7e9717000-7fc7e9718000 r--p 00025000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so 7fc7e9718000-7fc7e9719000 rw-p 00026000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
- 7fc7e9719000-7fc7e971a000 rw-p 00000000 00:00 0 7ffc91c18000-7ffc91c39000 rw-p 00000000 00:00 0 [stack]
- 7ffc91d5f000-7ffc91d62000 r--p 00000000 00:00 0 [vvar] 7ffc91d62000-7ffc91d64000 r-xp 00000000 00:00 0 [vdso]
- ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
程序中带有malloc,那maps中就有[heap]段,并且malloc返回的地址在heap的地址段中,但是返回的地址却不再heap的最开始地址上,相差了0x10字节,为什么呢?看下面:
strace, brk, sbrk
malloc不是系统调用,它是一个正常函数,它必须调用某些系统调用才可以操作堆内存,通过使用strace工具可以追踪进程的系统调用和信号,为了确认系统调用是malloc产生的,所以在malloc前后添加write系统调用方便定位问题。
- #include
#include - #include
- int main(void)
- { void *p;
- write(1, "BEFORE MALLOCn", 14);
- p = malloc(1); write(1, "AFTER MALLOCn", 13);
- printf("%pn", p); getchar();
- return (EXIT_SUCCESS); }
- 编译运行:gcc test.c -o 4 zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ strace ./4
- execve("./4", ["./4"], []) = 0 brk(NULL) = 0x781000
- access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
- open("/etc/ld.so.cache", O_RDonLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=111450, ...}) = 0
- mmap(NULL, 111450, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f37720fa000 close(3) = 0
- access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) open("/lib/x86_64-linux-gnu/libc.so.6", O_RDonLY|O_CLOEXEC) = 3
- read(3, "177ELF2113 3 > 1 Pt2 "..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1868984, ...}) = 0
- mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37720f9000 mmap(NULL, 3971488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f3771b27000
- mprotect(0x7f3771ce7000, 2097152, PROT_NONE) = 0 mmap(0x7f3771ee7000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c0000) = 0x7f3771ee7000
- mmap(0x7f3771eed000, 14752, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f3771eed000 close(3) = 0
- mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37720f8000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f37720f7000
- arch_prctl(ARCH_SET_FS, 0x7f37720f8700) = 0 mprotect(0x7f3771ee7000, 16384, PROT_READ) = 0
- mprotect(0x600000, 4096, PROT_READ) = 0 mprotect(0x7f3772116000, 4096, PROT_READ) = 0
- munmap(0x7f37720fa000, 111450) = 0 write(1, "BEFORE MALLOCn", 14BEFORE MALLOC
- ) = 14 brk(NULL) = 0x781000
- brk(0x7a2000) = 0x7a2000 write(1, "AFTER MALLOCn", 13AFTER MALLOC
- ) = 13 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
- write(1, "0x781010n", 90x781010 ) = 9
- fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
最后几行的输出可知,malloc主要调用brk系统调用来操作堆内存。
- man brk ...
- int brk(void *addr); void *sbrk(intptr_t increment);
- ... DEscriptION
- brk() and sbrk() change the location of the program break, which defines the end of the process's data segment (i.e., the program break is the first
- location after the end of the uninitialized data segment). Increasing the program break has the effect of allocating memory to the process; decreas‐
- ing the break deallocates memory.
- brk() sets the end of the data segment to the value specified by addr, when that value is reasonable, the system has enough memory, and the process
- does not exceed its maximum data size (see setrlimit(2)).
- sbrk() increments the program's data space by increment bytes. Calling sbrk() with an increment of 0 can be used to find the current location of
- the program break.
程序中断是虚拟内存中程序数据段结束后的第一个位置的地址,malloc通过调用brk或者sbrk,增加程序中断的值就可以创建新空间来动态分配内存,首次调用brk会返回当前程序中断的地址,第二次调用brk也会返回程序中断的地址,可以发现第二次brk返回地址大于第一次brk返回地址,brk就是通过增加程序中断地址的方式来分配内存,可以看出现在的堆地址范围是0x781000-0x7a2000,通过cat /proc/[pid]/maps也可以验证,此处就不贴上实际验证的结果啦。
多次malloc
如果多次malloc会出现什么现象呢,代码如下:
- #include
#include - #include
- int main(void)
- { void *p;
- write(1, "BEFORE MALLOC #0n", 17);
- p = malloc(1024); write(1, "AFTER MALLOC #0n", 16);
- printf("%pn", p);
- write(1, "BEFORE MALLOC #1n", 17); p = malloc(1024);
- write(1, "AFTER MALLOC #1n", 16); printf("%pn", p);
- write(1, "BEFORE MALLOC #2n", 17);
- p = malloc(1024); write(1, "AFTER MALLOC #2n", 16);
- printf("%pn", p);
- write(1, "BEFORE MALLOC #3n", 17); p = malloc(1024);
- write(1, "AFTER MALLOC #3n", 16); printf("%pn", p);
- getchar();
- return (EXIT_SUCCESS); }
- 编译运行:gcc test.c -o 5; strace ./5 摘要输出结果如下:
- write(1, "BEFORE MALLOC #0n", 17BEFORE MALLOC #0 ) = 17
- brk(NULL) = 0x561605c7a000 brk(0x561605c9b000) = 0x561605c9b000
- write(1, "AFTER MALLOC #0n", 16AFTER MALLOC #0 ) = 16
- fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 write(1, "0x561605c7a260n", 150x561605c7a260
- ) = 15 write(1, "BEFORE MALLOC #1n", 17BEFORE MALLOC #1
- ) = 17 write(1, "AFTER MALLOC #1n", 16AFTER MALLOC #1
- ) = 16 write(1, "0x561605c7aa80n", 150x561605c7aa80
- ) = 15 write(1, "BEFORE MALLOC #2n", 17BEFORE MALLOC #2
- ) = 17 write(1, "AFTER MALLOC #2n", 16AFTER MALLOC #2
- ) = 16 write(1, "0x561605c7ae90n", 150x561605c7ae90
- ) = 15 write(1, "BEFORE MALLOC #3n", 17BEFORE MALLOC #3
- ) = 17 write(1, "AFTER MALLOC #3n", 16AFTER MALLOC #3
- ) = 16 write(1, "0x561605c7b2a0n", 150x561605c7b2a0
- ) = 15 fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
可以发现并不是每次调用malloc都会触发brk系统调用,首次调用malloc,内部会通过brk系统调用更改程序中断地址,分配出一大块内存空间,后续再调用malloc,malloc内部会优先使用之前分配出来的内存空间,直到内部内存空间已经不够再次分配给外部时才会再次触发brk系统调用。
0x10 那丢失的16字节是什么
上面分析可以看见程序第一次调用malloc返回的地址并不是heap段的首地址,而是相差了0x10个字节,那这16个字节究竟是什么,可以通过程序打印出这前16个字节的内容。
- 编译运行:gcc test.c -o test;./test 输出:
- 0x5589436ce260 bytes at 0x5589436ce250:
- 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00 0x5589436cea80
- bytes at 0x5589436cea70: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
- 0x5589436cf290 bytes at 0x5589436cf280:
- 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00 0x5589436cfea0
- bytes at 0x5589436cfe90: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
- 0x5589436d0eb0 bytes at 0x5589436d0ea0:
- 00 00 00 00 00 00 00 00 11 14 00 00 00 00 00 00 0x5589436d22c0
- bytes at 0x5589436d22b0: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
- 0x5589436d3ad0 bytes at 0x5589436d3ac0:
- 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00 0x5589436d56e0
- bytes at 0x5589436d56d0: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
- 0x5589436d76f0 bytes at 0x5589436d76e0:
- 00 00 00 00 00 00 00 00 11 24 00 00 00 00 00 00 0x5589436d9b00
- bytes at 0x5589436d9af0: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
可以看出规律:这16个字节相当于malloc出来的地址的头,包含一些信息,目前可以看出它包括已经分配的地址空间的大小,第一次malloc申请了0x400(1024)字节,可以发现11 04 00 00 00 00 00 00大于0x400(1024),这8个字节表示数字 0x 00 00 00 00 00 00 04 11 = 0x400(1024) + 0x10(头的大小16) + 1(后面会说明它的含义),可以发现每次调用malloc,这前8个字节代表的含义都是malloc字节数+16+1。
可以猜测,malloc内部会把这前16个字节强转成某种数据结构,数据结构包含某些信息,最主要的是已经分配的字节数,尽管我们不了解具体结构,但是也可以通过代码操作这16个字节验证我们上面总结的规律是否正确,注意代码中不调用free释放内存。
- #include
#include - #include
- void pmem(void *p, unsigned int bytes)
- { unsigned char *ptr;
- unsigned int i;
- ptr = (unsigned char *)p; for (i = 0; i < bytes; i++)
- { if (i != 0)
- { printf(" ");
- } printf("%02x", *(ptr + i));
- } printf("n");
- }
- int main(void)
- { void *p;
- int i; size_t size_of_the_chunk;
- size_t size_of_the_previous_chunk; void *chunks[10];
- for (i = 0; i < 10; i++)
- { p = malloc(1024 * (i + 1));
- chunks[i] = (void *)((char *)p - 0x10); printf("%pn", p);
- } free((char *)(chunks[3]) + 0x10);
- free((char *)(chunks[7]) + 0x10); for (i = 0; i < 10; i++)
- { p = chunks[i];
- printf("chunks[%d]: ", i); pmem(p, 0x10);
- size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1; size_of_the_previous_chunk = *((size_t *)((char *)p));
- printf("chunks[%d]: %p, size = %li, prev = %lin", i, p, size_of_the_chunk, size_of_the_previous_chunk);
- } return (EXIT_SUCCESS);
- } 编译运行输出:
- root@3e8650948c0c:/ubuntu# gcc test.c -o test root@3e8650948c0c:/ubuntu# ./test
- 0x559721de4260 0x559721de4a80
- 0x559721de5290 0x559721de5ea0
- 0x559721de6eb0 0x559721de82c0
- 0x559721de9ad0 0x559721deb6e0
- 0x559721ded6f0 0x559721defb00
- chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00 chunks[0]: 0x559721de4250, size = 1040, prev = 0
- chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00 chunks[1]: 0x559721de4a70, size = 2064, prev = 0
- chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00 chunks[2]: 0x559721de5280, size = 3088, prev = 0
- chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00 chunks[3]: 0x559721de5e90, size = 4112, prev = 0
- chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00 chunks[4]: 0x559721de6ea0, size = 5135, prev = 4112
- chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00 chunks[5]: 0x559721de82b0, size = 6160, prev = 0
- chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00 chunks[6]: 0x559721de9ac0, size = 7184, prev = 0
- chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00 chunks[7]: 0x559721deb6d0, size = 8208, prev = 0
- chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00 chunks[8]: 0x559721ded6e0, size = 9231, prev = 8208
- chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00 chunks[9]: 0x559721defaf0, size = 10256, prev = 0
结果可以看出,malloc返回的地址往前的16个字节可以表示已经分配的内存大小, 如图:
注意上述是没有调用free释放内存的结果,然而malloc只用了8个字节表示已经分配的内存大小,那么另外8个字节被用来表示什么含义呢,看下malloc函数的注释:
- 1055 void pmem(void *p, unsigned int bytes)
- { unsigned char *ptr;
- unsigned int i;
- ptr = (unsigned char *)p; for (i = 0; i < bytes; i++)
- { if (i != 0)
- { printf(" ");
- } printf("%02x", *(ptr + i));
- } printf("n");
- }
- int main(void)
- { void *p;
- int i; size_t size_of_the_chunk;
- size_t size_of_the_previous_chunk; void *chunks[10];
- for (i = 0; i < 10; i++)
- { p = malloc(1024 * (i + 1));
- chunks[i] = (void *)((char *)p - 0x10); printf("%pn", p);
- } free((char *)(chunks[3]) + 0x10);
- free((char *)(chunks[7]) + 0x10); for (i = 0; i < 10; i++)
- { p = chunks[i];
- printf("chunks[%d]: ", i); pmem(p, 0x10);
- size_of_the_chunk = *((size_t *)((char *)p + 8)) - 1; size_of_the_previous_chunk = *((size_t *)((char *)p));
- printf("chunks[%d]: %p, size = %li, prev = %lin", i, p, size_of_the_chunk, size_of_the_previous_chunk);
- } return (EXIT_SUCCESS);
- }
- 编译运行输出: root@3e8650948c0c:/ubuntu# gcc test.c -o test
- root@3e8650948c0c:/ubuntu# ./test 0x55fbebf20260
- 0x55fbebf20a80 0x55fbebf21290
- 0x55fbebf21ea0 0x55fbebf22eb0
- 0x55fbebf242c0 0x55fbebf25ad0
- 0x55fbebf276e0 0x55fbebf296f0
- 0x55fbebf2bb00 chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
- chunks[0]: 0x55fbebf20250, size = 1040, prev = 0 chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
- chunks[1]: 0x55fbebf20a70, size = 2064, prev = 0 chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
- chunks[2]: 0x55fbebf21280, size = 3088, prev = 0 chunks[3]: 00 00 00 00 00 00 00 00 11 10 00 00 00 00 00 00
- chunks[3]: 0x55fbebf21e90, size = 4112, prev = 0 chunks[4]: 10 10 00 00 00 00 00 00 10 14 00 00 00 00 00 00
- chunks[4]: 0x55fbebf22ea0, size = 5135, prev = 4112 chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
- chunks[5]: 0x55fbebf242b0, size = 6160, prev = 0 chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
- chunks[6]: 0x55fbebf25ac0, size = 7184, prev = 0 chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
- chunks[7]: 0x55fbebf276d0, size = 8208, prev = 0 chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
- chunks[8]: 0x55fbebf296e0, size = 9231, prev = 8208 chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
- chunks[9]: 0x55fbebf2baf0, size = 10256, prev = 0
程序代码通过free释放了3和7数据块的空间,所以4和8的前8个字节已经不全是0啦,和其它不同,它们表示之前数据块没有被分配的大小,也可以注意到4和8块的后8个字节不像其它块一样需要加1啦,可以得出结论,malloc通过是否加1来作为前一个数据块是否已经分配的标志,加1表示前一个数据块已经分配。所以之前的程序代码可以修改为如下形式:
- #include
#include - #include
- void pmem(void *p, unsigned int bytes)
- { unsigned char *ptr;
- unsigned int i;
- ptr = (unsigned char *)p; for (i = 0; i < bytes; i++)
- { if (i != 0)
- { printf(" ");
- } printf("%02x", *(ptr + i));
- } printf("n");
- }
- int main(void)
- { void *p;
- int i; size_t size_of_the_chunk;
- size_t size_of_the_previous_chunk; void *chunks[10];
- char prev_used;
- for (i = 0; i < 10; i++) {
- p = malloc(1024 * (i + 1)); chunks[i] = (void *)((char *)p - 0x10);
- } free((char *)(chunks[3]) + 0x10);
- free((char *)(chunks[7]) + 0x10); for (i = 0; i < 10; i++)
- { p = chunks[i];
- printf("chunks[%d]: ", i); pmem(p, 0x10);
- size_of_the_chunk = *((size_t *)((char *)p + 8)); prev_used = size_of_the_chunk & 1;
- size_of_the_chunk -= prev_used; size_of_the_previous_chunk = *((size_t *)((char *)p));
- printf("chunks[%d]: %p, size = %li, prev (%s) = %lin", i, p, size_of_the_chunk,
- (prev_used? "allocated": "unallocated"), size_of_the_previous_chunk); }
- return (EXIT_SUCCESS); }
- 编译运行输出: root@3e8650948c0c:/ubuntu# gcc test.c -o test
- root@3e8650948c0c:/ubuntu# ./test chunks[0]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
- chunks[0]: 0x56254f888250, size = 1040, prev (allocated) = 0 chunks[1]: 00 00 00 00 00 00 00 00 11 08 00 00 00 00 00 00
- chunks[1]: 0x56254f888660, size = 2064, prev (allocated) = 0 chunks[2]: 00 00 00 00 00 00 00 00 11 0c 00 00 00 00 00 00
- chunks[2]: 0x56254f888e70, size = 3088, prev (allocated) = 0 chunks[3]: 00 00 00 00 00 00 00 00 11 04 00 00 00 00 00 00
- chunks[3]: 0x56254f889a80, size = 1040, prev (allocated) = 0 chunks[4]: 00 0c 00 00 00 00 00 00 10 14 00 00 00 00 00 00
- chunks[4]: 0x56254f88aa90, size = 5136, prev (unallocated) = 3072 chunks[5]: 00 00 00 00 00 00 00 00 11 18 00 00 00 00 00 00
- chunks[5]: 0x56254f88bea0, size = 6160, prev (allocated) = 0 chunks[6]: 00 00 00 00 00 00 00 00 11 1c 00 00 00 00 00 00
- chunks[6]: 0x56254f88d6b0, size = 7184, prev (allocated) = 0 chunks[7]: 00 00 00 00 00 00 00 00 11 20 00 00 00 00 00 00
- chunks[7]: 0x56254f88f2c0, size = 8208, prev (allocated) = 0 chunks[8]: 10 20 00 00 00 00 00 00 10 24 00 00 00 00 00 00
- chunks[8]: 0x56254f8912d0, size = 9232, prev (unallocated) = 8208 chunks[9]: 00 00 00 00 00 00 00 00 11 28 00 00 00 00 00 00
- chunks[9]: 0x56254f8936e0, size = 10256, prev (allocated) = 0
堆空间是向上增长吗?
通过代码验证:
- #include
#include - #include
- int main(void)
- { int i;
- write(1, "STARTn", 6);
- malloc(1); getchar();
- write(1, "LOOPn", 5); for (i = 0; i < 0x25000 / 1024; i++)
- { malloc(1024);
- } write(1, "ENDn", 4);
- getchar(); return (EXIT_SUCCESS);
- } 编译运行部分摘要输出:
- root@3e8650948c0c:/ubuntu# gcc test.c -o test root@3e8650948c0c:/ubuntu# strace ./test
- execve("./test", ["./test"], 0x7ffe0d7cbd80 ) = 0 brk(NULL) = 0x555a2428f000
- access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
- openat(AT_FDCWD, "/etc/ld.so.cache", O_RDonLY|O_CLOEXEC) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=13722, ...}) = 0
- mmap(NULL, 13722, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f6423455000 close(3) = 0
- access("/etc/ld.so.nohwcap", F_OK) = -1 ENOENT (No such file or directory) openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libc.so.6", O_RDonLY|O_CLOEXEC) = 3
- read(3, "177ELF2113 3 > 1 260342 "..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=2030544, ...}) = 0
- mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f6423453000 mmap(NULL, 4131552, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f6422e41000
- mprotect(0x7f6423028000, 2097152, PROT_NONE) = 0 mmap(0x7f6423228000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1e7000) = 0x7f6423228000
- mmap(0x7f642322e000, 15072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f642322e000 close(3) = 0
- arch_prctl(ARCH_SET_FS, 0x7f64234544c0) = 0 mprotect(0x7f6423228000, 16384, PROT_READ) = 0
- mprotect(0x555a22f5f000, 4096, PROT_READ) = 0 mprotect(0x7f6423459000, 4096, PROT_READ) = 0
- munmap(0x7f6423455000, 13722) = 0 write(1, "STARTn", 6START
- ) = 6 brk(NULL) = 0x555a2428f000
- brk(0x555a242b0000) = 0x555a242b0000 fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0
- read(0, "n", 1024) = 1
- write(1, "LOOPn", 5LOOP ) = 5
- brk(0x555a242d1000) = 0x555a242d1000 brk(0x555a242f2000) = 0x555a242f2000
- brk(0x555a24313000) = 0x555a24313000 brk(0x555a24334000) = 0x555a24334000
- brk(0x555a24355000) = 0x555a24355000 brk(0x555a24376000) = 0x555a24376000
- brk(0x555a24397000) = 0x555a24397000 brk(0x555a243b8000) = 0x555a243b8000
- brk(0x555a243d9000) = 0x555a243d9000 brk(0x555a243fa000) = 0x555a243fa000
可以看出堆空间是向上增长的。
随机化地址空间布局
从开始到现在运行了好多个进程,通过查看对应进程的maps,发现每个进程的heap的起始地址和可执行程序的结束地址都不紧邻,而且差距还每次都不相同。
- [3718]: 01195000 – 00602000 = b93000 [3834]: 024d6000 – 00602000 = 1ed4000
- [4014]: 00e70000 – 00602000 = 86e000 [4172]: 01314000 – 00602000 = d12000
- [7972]: 00901000 – 00602000 = 2ff000
可以看出这个差值是随机的,查看fs/binfmt_elf.c源代码
- if ((current->flags & PF_RANDOMIZE) && (randomize_va_space > 1)) { current->mm->brk = current->mm->start_brk =
- arch_randomize_brk(current->mm); #ifdef compat_brk_randomized
- current->brk_randomized = 1; #endif
- } // current->mm->brk是当前进程程序中断的地址
arch_randomize_brk函数在arch/x86/kernel/process.c中
- unsigned long arch_randomize_brk(struct mm_struct *mm) {
- unsigned long range_end = mm->brk + 0x02000000; return randomize_range(mm->brk, range_end, 0) ? : mm->brk;
- }
randomize_range函数在drivers/char/random.c中
- unsigned long
- randomize_range(unsigned long start, unsigned long end, unsigned long len) {
- unsigned long range = end - len - start;
- if (end <= start + len) return 0;
- return PAGE_ALIGN(get_random_int() % range + start); }
可以看出上面所说的这个差值其实就是0-0x02000000中的一个随机数,这种技术称为ASLR(Address Space Layout Randomisation),是一种计算机安全技术,随机安排虚拟内存中堆栈空间的位置,可以有效防止黑客攻击。通过以上分析,可以画出内存分布图如下:
malloc(0)发生了什么?
当调用malloc(0)会发生什么,代码如下:
- #include
#include - #include
- void pmem(void *p, unsigned int bytes)
- { unsigned char *ptr;
- unsigned int i;
- ptr = (unsigned char *)p; for (i = 0; i < bytes; i++)
- { if (i != 0)
- { printf(" ");
- } printf("%02x", *(ptr + i));
- } printf("n");
- }
- int main(void)
- { void *p;
- size_t size_of_the_chunk; char prev_used;
- p = malloc(0);
- printf("%pn", p); pmem((char *)p - 0x10, 0x10);
- size_of_the_chunk = *((size_t *)((char *)p - 8)); prev_used = size_of_the_chunk & 1;
- size_of_the_chunk -= prev_used; printf("chunk size = %li bytesn", size_of_the_chunk);
- return (EXIT_SUCCESS); }
- 编译运行输出如下: root@3e8650948c0c:/ubuntu# gcc test.c -o test
- root@3e8650948c0c:/ubuntu# ./test 0x564ece64b260
- 00 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 chunk size = 32 bytes
可以看出malloc(0)实际使用了32个字节,其中包括我们之前说的16个字节头部,然而有时候malloc(0)可能会有不同的结果输出,也有可能会返回NULL。
- man malloc NULL may also be returned by a successful call to malloc() with a size of zero
操作环境
- 示例代码主要在两种环境下跑过: ubuntu 16.04
- gcc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0
- ubuntu 18.04 docker gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
本文是从这一系列文章翻译并结合自己理解提炼出来的,代码都自己实践过,有时间的也可以直接阅读英文原链接
Hack The Virtual Memory: C strings & /proc
Hack the Virtual Memory: drawing the VM diagram
Hack the Virtual Memory: malloc, the heap & the program break