The traditional Linux security model starts with file permissions. The model lets the kernel decide whether or not a process may access a resource based on permissions set as part of the filesystem. The coarse-grained granularity of this model often causes Linux processes to have too many rights. If more granularity is needed, one has to resort to adding security related code into the program source.
This series of articles is about Linux namespaces, a lightweight virtualization technology implemented in Linux kernel. In part 1 I’ve talked about building chroot jails using mount namespace, and in part 2 I’ve looked into isolating processes using PID namespace. The next step is to isolate the TCP/IP networking stack using network namespaces.
Security at this level is always reactive. Assuming the bad guy breaks into your server, he will realize he doesn’t have root privileges (classic Unix privilege separation implemented in server software), he runs on top of a fake filesystem (chroot), and he cannot get outside on the network. The later is usually done by placing the computer in a Demilitarized Zone (DMZ) behind a firewall.
The same effect can be achieved on the cheap using Linux namespaces. For this, I place the server in a container (vm1) running its own network segment (10.10.20.0/24). The container is connected to the host through a Linux bridge interface (br0). On the host I configure iptables firewall, isolating the server and effectively limiting the potential damage that could be inflicted on the larger network. The final setup looks like this:
Configuring the host
On the host, I run the following script:
#!/bin/bash # # Network configuration script for vm1 # # bridge setup brctl addbr br0 ifconfig br0 10.10.20.1/24 up # enable ipv4 forwarding echo "1" > /proc/sys/net/ipv4/ip_forward # netfilter cleanup iptables --flush iptables -t nat -F iptables -X iptables -Z iptables -P INPUT ACCEPT iptables -P OUTPUT ACCEPT iptables -P FORWARD ACCEPT # external interface - accept only packets going to 10.10.20.10:80 iptables -A FORWARD -i eth0 -d 10.10.20.10 -p tcp --dport 80 -j ACCEPT iptables -A FORWARD -i eth0 -j DROP # internal interface - no new connections iptables -A FORWARD -i br0 -m state --state NEW,INVALID -j DROP # netfilter port forwarding iptables -t nat -A PREROUTING -p tcp --dport 80 -j DNAT --to 10.10.20.10:80
The script creates br0 bridge interface, enables routing, and configures the firewall. The most important firewall entry is iptables -A FORWARD -i br0 -m state –state NEW,INVALID -j DROP. It drops any connection originated in the container, thus isolating the container from the network. I also forward host TCP port 80 to a http server running in the container.
Configuring the container
I create the container using clone() system call. The container uses a new filesystem namespace (CLONE_NEWNS), a new PID namespace (CLONE_NEWPID), and a new network namespace (CLONE_NEWNET). In the container the filesystem is mounted read-only, with /tmp, /var/tmp, and /var/run set as new tmpfs filesystems. The host side creates a veth device pair and connects the container to the bridge. The program is as follows:
#define _GNU_SOURCE #include <sched.h> #include <string.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/mount.h> #include <sys/wait.h> #define errExit(msg) do { perror(msg); exit(1);} while (0) #define STACK_SIZE (1024 * 1024) static char child_stack[STACK_SIZE]; // space for child's stack static void mnt_rdonly(const char *dir) { if (mount(dir, dir, NULL, MS_BIND|MS_REC, NULL) < 0) errExit(dir); if (mount(NULL, dir, NULL, MS_BIND|MS_REMOUNT|MS_RDONLY|MS_REC, NULL) < 0) errExit(dir); } static void mnt_tmpfs(const char *dir) { if (mount(NULL, dir, "tmpfs", 0, NULL) < 0) errExit(dir); } int worker(void* arg) { // building the chroot filsystem if (mount(NULL, "/", NULL, MS_SLAVE | MS_REC, NULL) < 0) errExit("mount slave"); mnt_rdonly("/bin"); mnt_rdonly("/sbin"); mnt_rdonly("/lib"); mnt_rdonly("/lib64"); mnt_rdonly("/usr"); mnt_rdonly("/etc"); mnt_tmpfs("/tmp"); mnt_tmpfs("/var/tmp"); mnt_tmpfs("/var/run"); // re-mount /proc in order to have ps utility working umount("/proc"); if (mount("proc", "/proc", "proc", MS_MGC_VAL, NULL)) errExit("/proc"); chdir("/"); execlp("/bin/bash", "bash", NULL); return 0; } int main(int argc, char **argv) { const pid_t mypid = getpid(); const char *bridgedev = "br0"; // clone environment const pid_t child = clone(worker, child_stack + STACK_SIZE, CLONE_NEWNS | CLONE_NEWPID | CLONE_NEWNET | SIGCHLD, NULL); if (child == -1) errExit("clone"); // connect container to br0 char cmd[200]; sprintf(cmd, "/bin/ip link add veth%u type veth peer name eth0 netns %u", mypid, child); system(cmd); sprintf(cmd, "/sbin/ifconfig veth%u up", mypid); system(cmd); sprintf(cmd, "/sbin/brctl addif %s veth%u", bridgedev, mypid); system(cmd); // wait for the child to finish waitpid(child, NULL, 0); return 0; }
I compile the program, start it as root, and verify I have only my bash session running in the container:
# gcc -o jail main.c # ./jail # ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.2 19456 1968 pts/1 S 09:17 0:00 bash root 2 0.0 0.1 16832 1212 pts/1 R+ 09:17 0:00 ps aux
The next step is to configure the network interfaces in the container, add a default route, and verify that networking is working:
# ifconfig lo up # ifconfig eth0 10.10.20.10/24 # route add default gw 10.10.20.1 # ping 10.10.20.1 PING 10.10.20.1 (10.10.20.1) 56(84) bytes of data. 64 bytes from 10.10.20.1: icmp_req=1 ttl=64 time=0.180 ms 64 bytes from 10.10.20.1: icmp_req=2 ttl=64 time=0.108 ms 64 bytes from 10.10.20.1: icmp_req=3 ttl=64 time=0.118 ms ^C --- 10.10.20.1 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 1998ms rtt min/avg/max/mdev = 0.108/0.135/0.180/0.033 ms
If the firewall on the host is running, any attempt to open a new connection on the external network should be refused.
Starting the network service
In this moment I have the jail all set up, it is time to start a network service in the container. I choose Lighttpd webserver, it is easy to install (apt-get install lighttpd) and configure. As described above, the host firewall was previously set to forward TCP port 80 from host to my server.
# /etc/init.d/lighttpd start [ ok ] Starting web server: lighttpd. # ps aux USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.2 19456 2140 pts/1 S 09:17 0:00 bash www-data 26 0.0 0.1 54500 1192 ? S 09:36 0:00 /usr/sbin/lighttpd root 44 0.0 0.1 16832 1224 pts/1 R+ 09:36 0:00 ps aux #
The server starts as root, and immediately switches to a low privileged user, www-data. I can connect to it from anywhere on my local network by pointing a browser to my host IP address.
Conclusion
chroot is a mature technology, widely deployed today all over Internet. Linux namespaces enhances it by adding support for process and network stack isolation. What I like most about namespaces is the programming interface is very compact, basically by adding 20 or 30 new lines of code you can greatly enhance the security of your software. Or you can use simple jail programs such the one described here to drive your network services.
The most advanced jail program distributed today by default by all Linux distribution is LXC. It supports all the namespaces implemented so far in Linux kernel. In particular, the network namespace support is very extensive. You can use it to set simple bridge networks (veth) similar to the network described in this post, vlan, and macvlan networks. It also has an interface for managing multiple containers. To further enhance the security of the programs running in the container, LXC also configures Linux control groups.
Related posts
Pingback: Debian Virtualization: Back to the Basics, part 3 | Hallow Demon
Pingback: Links 14/2/2014: Instructionals | Techrights
Pingback: Debian virtualization basics (3) | 0ddn1x: tricks with *nix