Debian Virtualization: Back to the Basics, part 2

Linux ptrace() system call provides a means by which one process may observe and control the execution of another process. It is primarily used to implement breakpoint debugging with gdb and system call tracing with strace. In this article I will look at the security implications of ptrace, and how to overcome them using Linux PID namespaces.

Public enemy numero uno

This is an experiment every Linux enthusiast should try. Start an ssh connection in a terminal and stop when you are just about to enter the password:

Starting an ssh session

Starting an ssh session

Open another terminal, find the pid of your ssh session (ps ax | grep ssh) and start strace on it (strace -p 3660). Then, go back to your ssh terminal, type in your password, and watch it flying across strace terminal:

ptrace usage example

ptrace usage example

If you get strace complaining and refusing to run unless you are root, the fix is chmod u+s /usr/bin/strace. This will enable the user to run strace directly on any process that belongs to him.

In theory, somebody compromising a program such as Firefox or Pigdin could then trace your logins regardless what programs you are using – all they need is some rudimentary software calling ptrace().

As more and more people start using Linux, this becomes a problem. Already some programs are trying to protect themselves against ptrace attacks, and there seem to be a general interest in disabling ptrace functionality on embedded devices and servers.

Linux PID namespaces

Since ptrace() only works on processes created by the same user, an idea to limit the damage after a break-in would be to reduce the number of processes visible to the compromised process using Linux PID namespaces. As the program starts, it is placed alone in a PID namespace, where it becomes PID 1 and it will see only its own children.

This is a very simple implementation using clone() system call. A child process is placed in new PID and mount namespaces. Similar to the program in part 1, the child makes / filesystem a slave to the original filesystem, so mount events from the new namespace are not mirrored back onto the original filesystem. I also remount /proc in order to get it to reflect the new PID namespace. In the end, I replace (execlp) the child program with a bash session:

#define _GNU_SOURCE
#include <sched.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mount.h>
#include <sys/wait.h>

#define errExit(msg)    do { perror(msg); exit(1);} while (0)
#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE]; // space for child's stack

int worker(void* arg) {
	// mount --make-rslave /
	if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) < 0)
		errExit("mount slave");

	// re-mount /proc in order to have ps utility working
	umount("/proc");
	mount("proc", "/proc", "proc", MS_MGC_VAL, NULL);
	perror("mount /proc");

	chdir("/");
	execlp("/bin/bash", "bash", NULL);
	return 0;
}

int main(int argc, char **argv) {

	// clone environment
	const pid_t child = clone(worker,
		child_stack + STACK_SIZE,
		CLONE_NEWNS | CLONE_NEWPID | SIGCHLD /* | CLONE_NEWNET |*/,
		NULL);
	if (child == -1)
		errExit("clone");

	// wait for the child to finish
	waitpid(child, NULL, 0);

	return 0;
}

I compile the program, start it as root, and check the process table using ps command:

# gcc -o nstest nstest.c
# ./nstest
mount /proc: Success
# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.2  19456  1960 pts/0    S    18:34   0:00 bash
root         2  0.0  0.1  16832  1208 pts/0    R+   18:34   0:00 ps aux
# 

I can start now my Firefox session in the new namespace:

# su netblue
$ cd ~
$ firefox &

Conclusion

One weakness of the Linux kernel interface is that users are able to control and modify the memory and running state of any of their processes. If one application was compromised, the attacker could further compromise any existing process run by the user. For example, a compromised browser would allow the attacker to monitor any login session the user would initiate, regardless what program they are using.

Linux version of Google Chrome/Chromium uses a PID namespaces sandbox to prevent ptrace attacks. The sandbox code can be used independently in other projects. For more on sandboxing technology used in Chrome/Chromium, check out LinuxSandboxing page on Chromium wiki.

Related posts

5 thoughts on “Debian Virtualization: Back to the Basics, part 2

  1. Rob van der Hoeven

    Wow, I did not know this and to be honest I could not imagine this. Why ship an OS with full debugging functionality *enabled* for *all* user accounts? Beats me. Thank you for this eye opener!

    Another huge security hole is X GUI isolation. I turns out that programs connected to the same X server can read each others key strokes. So if you start an ssh session inside an Xterm, all programs connected to the same X server can read your password. Joanna Rutkowska (one of the authors of Qubes OS) wrote an interesting article about this:

    http://theinvisiblethings.blogspot.nl/2011/04/linux-security-circus-on-gui-isolation.html

    X server isolation can be done with namespaces too (using Xephyr). Have not done this myself yet. Will write an article about this on my blog if you don’t beat me to it 😉

    Rob.

    BTW: The best way to protect against the ptrace threat is to drop CAP_SYS_PTRACE from the capabilities bounding set.

    Reply
    1. netblue30 Post author

      Thanks for the x11 link, really interesting! There is some tool I’ve been using for isolating x11, I have a write-up here. It is based on LXC and Xephyr. It requires however OverlayFS support in Linux kernel – available so far on Ubuntu and openSUSE kernels.

      Reply
  2. Pingback: Debian Virtualization: Back to the Basics, part 2 | Hallow Demon

  3. Pingback: Links 26/1/2014: Instructionals | Techrights

  4. Pingback: Debian virtualization basics (2) | 0ddn1x: tricks with *nix

Leave a comment