Linux ptrace() system call provides a means by which one process may observe and control the execution of another process. It is primarily used to implement breakpoint debugging with gdb and system call tracing with strace. In this article I will look at the security implications of ptrace, and how to overcome them using Linux PID namespaces.
Public enemy numero uno
This is an experiment every Linux enthusiast should try. Start an ssh connection in a terminal and stop when you are just about to enter the password:
Open another terminal, find the pid of your ssh session (ps ax | grep ssh) and start strace on it (strace -p 3660). Then, go back to your ssh terminal, type in your password, and watch it flying across strace terminal:
If you get strace complaining and refusing to run unless you are root, the fix is chmod u+s /usr/bin/strace. This will enable the user to run strace directly on any process that belongs to him.
In theory, somebody compromising a program such as Firefox or Pigdin could then trace your logins regardless what programs you are using – all they need is some rudimentary software calling ptrace().
As more and more people start using Linux, this becomes a problem. Already some programs are trying to protect themselves against ptrace attacks, and there seem to be a general interest in disabling ptrace functionality on embedded devices and servers.
Linux PID namespaces
Since ptrace() only works on processes created by the same user, an idea to limit the damage after a break-in would be to reduce the number of processes visible to the compromised process using Linux PID namespaces. As the program starts, it is placed alone in a PID namespace, where it becomes PID 1 and it will see only its own children.
This is a very simple implementation using clone() system call. A child process is placed in new PID and mount namespaces. Similar to the program in part 1, the child makes / filesystem a slave to the original filesystem, so mount events from the new namespace are not mirrored back onto the original filesystem. I also remount /proc in order to get it to reflect the new PID namespace. In the end, I replace (execlp) the child program with a bash session:
#define _GNU_SOURCE #include <sched.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mount.h> #include <sys/wait.h> #define errExit(msg) do { perror(msg); exit(1);} while (0) #define STACK_SIZE (1024 * 1024) static char child_stack[STACK_SIZE]; // space for child's stack int worker(void* arg) { // mount --make-rslave / if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) < 0) errExit("mount slave"); // re-mount /proc in order to have ps utility working umount("/proc"); mount("proc", "/proc", "proc", MS_MGC_VAL, NULL); perror("mount /proc"); chdir("/"); execlp("/bin/bash", "bash", NULL); return 0; } int main(int argc, char **argv) { // clone environment const pid_t child = clone(worker, child_stack + STACK_SIZE, CLONE_NEWNS | CLONE_NEWPID | SIGCHLD /* | CLONE_NEWNET |*/, NULL); if (child == -1) errExit("clone"); // wait for the child to finish waitpid(child, NULL, 0); return 0; }
I compile the program, start it as root, and check the process table using ps command:
# gcc -o nstest nstest.c # ./nstest mount /proc: Success # ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.2 19456 1960 pts/0 S 18:34 0:00 bash root 2 0.0 0.1 16832 1208 pts/0 R+ 18:34 0:00 ps aux #
I can start now my Firefox session in the new namespace:
# su netblue $ cd ~ $ firefox &
Conclusion
One weakness of the Linux kernel interface is that users are able to control and modify the memory and running state of any of their processes. If one application was compromised, the attacker could further compromise any existing process run by the user. For example, a compromised browser would allow the attacker to monitor any login session the user would initiate, regardless what program they are using.
Linux version of Google Chrome/Chromium uses a PID namespaces sandbox to prevent ptrace attacks. The sandbox code can be used independently in other projects. For more on sandboxing technology used in Chrome/Chromium, check out LinuxSandboxing page on Chromium wiki.
Related posts
Wow, I did not know this and to be honest I could not imagine this. Why ship an OS with full debugging functionality *enabled* for *all* user accounts? Beats me. Thank you for this eye opener!
Another huge security hole is X GUI isolation. I turns out that programs connected to the same X server can read each others key strokes. So if you start an ssh session inside an Xterm, all programs connected to the same X server can read your password. Joanna Rutkowska (one of the authors of Qubes OS) wrote an interesting article about this:
http://theinvisiblethings.blogspot.nl/2011/04/linux-security-circus-on-gui-isolation.html
X server isolation can be done with namespaces too (using Xephyr). Have not done this myself yet. Will write an article about this on my blog if you don’t beat me to it 😉
Rob.
BTW: The best way to protect against the ptrace threat is to drop CAP_SYS_PTRACE from the capabilities bounding set.
Thanks for the x11 link, really interesting! There is some tool I’ve been using for isolating x11, I have a write-up here. It is based on LXC and Xephyr. It requires however OverlayFS support in Linux kernel – available so far on Ubuntu and openSUSE kernels.
Pingback: Debian Virtualization: Back to the Basics, part 2 | Hallow Demon
Pingback: Links 26/1/2014: Instructionals | Techrights
Pingback: Debian virtualization basics (2) | 0ddn1x: tricks with *nix