Debian Virtualization: Back to the Basics, part 3

The traditional Linux security model starts with file permissions. The model lets the kernel decide whether or not a process may access a resource based on permissions set as part of the filesystem. The coarse-grained granularity of this model often causes Linux processes to have too many rights. If more granularity is needed, one has to resort to adding security related code into the program source.

This series of articles is about Linux namespaces, a lightweight virtualization technology implemented in Linux kernel. In part 1 I’ve talked about building chroot jails using mount namespace, and in part 2 I’ve looked into isolating processes using PID namespace. The next step is to isolate the TCP/IP networking stack using network namespaces.

Security at this level is always reactive. Assuming the bad guy breaks into your server, he will realize he doesn’t have root privileges (classic Unix privilege separation implemented in server software), he runs on top of a fake filesystem (chroot), and he cannot get outside on the network. The later is usually done by placing the computer in a Demilitarized Zone (DMZ) behind a firewall.

The same effect can be achieved on the cheap using Linux namespaces. For this, I place the server in a container (vm1) running its own network segment (10.10.20.0/24). The container is connected to the host through a Linux bridge interface (br0). On the host I configure iptables firewall, isolating the server and effectively limiting the potential damage that could be inflicted on the larger network. The final setup looks like this:

Network setup

Network setup

Configuring the host

On the host, I run the following script:

#!/bin/bash

#
# Network configuration script for vm1
#

# bridge setup
brctl addbr br0
ifconfig br0 10.10.20.1/24 up

# enable ipv4 forwarding
echo "1" > /proc/sys/net/ipv4/ip_forward

# netfilter cleanup
iptables --flush
iptables -t nat -F
iptables -X
iptables -Z
iptables -P INPUT ACCEPT
iptables -P OUTPUT ACCEPT
iptables -P FORWARD ACCEPT

# external interface - accept only packets going to 10.10.20.10:80
iptables -A FORWARD -i eth0 -d 10.10.20.10 -p tcp --dport 80 -j ACCEPT
iptables -A FORWARD -i eth0 -j DROP

# internal interface - no new connections
iptables -A FORWARD -i br0 -m state --state NEW,INVALID -j DROP

# netfilter port forwarding
iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to 10.10.20.10:80

The script creates br0 bridge interface, enables routing, and configures the firewall. The most important firewall entry is iptables -A FORWARD -i br0 -m state –state NEW,INVALID -j DROP. It drops any connection originated in the container, thus isolating the container from the network. I also forward host TCP port 80 to a http server running in the container.

Configuring the container

I create the container using clone() system call. The container uses a new filesystem namespace (CLONE_NEWNS), a new PID namespace (CLONE_NEWPID), and a new network namespace (CLONE_NEWNET). In the container the filesystem is mounted read-only, with /tmp, /var/tmp, and /var/run set as new tmpfs filesystems. The host side creates a veth device pair and connects the container to the bridge. The program is as follows:

#define _GNU_SOURCE
#include <sched.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mount.h>
#include <sys/wait.h>

#define errExit(msg)    do { perror(msg); exit(1);} while (0)
#define STACK_SIZE (1024 * 1024)
static char child_stack[STACK_SIZE];		  // space for child's stack

static void mnt_rdonly(const char *dir) {
	if (mount(dir, dir, NULL, MS_BIND|MS_REC, NULL) < 0)
		errExit(dir);

	if (mount(NULL, dir, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY|MS_REC, NULL) < 0)
		errExit(dir);
}

static void mnt_tmpfs(const char *dir) {
	if (mount(NULL, dir, "tmpfs", 0, NULL) < 0)
		errExit(dir);
}

int worker(void* arg) {
	// building the chroot filsystem
	if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) < 0)
		errExit("mount slave");
	mnt_rdonly("/bin");
	mnt_rdonly("/sbin");
	mnt_rdonly("/lib");
	mnt_rdonly("/lib64");
	mnt_rdonly("/usr");
	mnt_rdonly("/etc");
	mnt_tmpfs("/tmp");
	mnt_tmpfs("/var/tmp");
	mnt_tmpfs("/var/run");

	// re-mount /proc in order to have ps utility working
	umount("/proc");
	if (mount("proc", "/proc", "proc", MS_MGC_VAL, NULL))
		errExit("/proc");

	chdir("/");
	execlp("/bin/bash", "bash", NULL);
	return 0;
}

int main(int argc, char **argv) {
	const pid_t mypid = getpid();
	const char *bridgedev = "br0";

	// clone environment
	const pid_t child = clone(worker,
		child_stack + STACK_SIZE,
		CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWNET | SIGCHLD,
		NULL);
	if (child == -1)
		errExit("clone");

	// connect container to br0
	char cmd[200];
	sprintf(cmd, "/bin/ip link add veth%u type veth peer name eth0 netns %u", mypid, child);
	system(cmd);
	sprintf(cmd, "/sbin/ifconfig veth%u up", mypid);
	system(cmd);
	sprintf(cmd, "/sbin/brctl addif %s veth%u", bridgedev, mypid);
	system(cmd);

	// wait for the child to finish
	waitpid(child, NULL, 0);

	return 0;
}

I compile the program, start it as root, and verify I have only my bash session running in the container:

# gcc -o jail main.c
# ./jail 
# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.2  19456  1968 pts/1    S    09:17   0:00 bash
root         2  0.0  0.1  16832  1212 pts/1    R+   09:17   0:00 ps aux

The next step is to configure the network interfaces in the container, add a default route, and verify that networking is working:

# ifconfig lo up
# ifconfig eth0 10.10.20.10/24
# route add default gw 10.10.20.1
# ping 10.10.20.1
PING 10.10.20.1 (10.10.20.1) 56(84) bytes of data.
64 bytes from 10.10.20.1: icmp_req=1 ttl=64 time=0.180 ms
64 bytes from 10.10.20.1: icmp_req=2 ttl=64 time=0.108 ms
64 bytes from 10.10.20.1: icmp_req=3 ttl=64 time=0.118 ms
^C
--- 10.10.20.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1998ms
rtt min/avg/max/mdev = 0.108/0.135/0.180/0.033 ms

If the firewall on the host is running, any attempt to open a new connection on the external network should be refused.

Starting the network service

In this moment I have the jail all set up, it is time to start a network service in the container. I choose Lighttpd webserver, it is easy to install (apt-get install lighttpd) and configure. As described above, the host firewall was previously set to forward TCP port 80 from host to my server.

# /etc/init.d/lighttpd start
[ ok ] Starting web server: lighttpd.
# ps aux
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.2  19456  2140 pts/1    S    09:17   0:00 bash
www-data    26  0.0  0.1  54500  1192 ?        S    09:36   0:00 /usr/sbin/lighttpd
root        44  0.0  0.1  16832  1224 pts/1    R+   09:36   0:00 ps aux
#

The server starts as root, and immediately switches to a low privileged user, www-data. I can connect to it from anywhere on my local network by pointing a browser to my host IP address.

Conclusion

chroot is a mature technology, widely deployed today all over Internet. Linux namespaces enhances it by adding support for process and network stack isolation. What I like most about namespaces is the programming interface is very compact, basically by adding 20 or 30 new lines of code you can greatly enhance the security of your software. Or you can use simple jail programs such the one described here to drive your network services.

The most advanced jail program distributed today by default by all Linux distribution is LXC. It supports all the namespaces implemented so far in Linux kernel. In particular, the network namespace support is very extensive. You can use it to set simple bridge networks (veth) similar to the network described in this post, vlan, and macvlan networks. It also has an interface for managing multiple containers. To further enhance the security of the programs running in the container, LXC also configures Linux control groups.

Related posts

About these ads

3 thoughts on “Debian Virtualization: Back to the Basics, part 3

  1. Pingback: Debian Virtualization: Back to the Basics, part 3 | Hallow Demon

  2. Pingback: Links 14/2/2014: Instructionals | Techrights

  3. Pingback: Debian virtualization basics (3) | 0ddn1x: tricks with *nix

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s