Firejail is a generic Linux namespaces security sandbox, capable of running graphic interface programs as well as server programs. The sandbox is lightweight, the overhead is low. There are no socket connections open, no daemons running in the background. All security features are implemented directly in Linux kernel and available on any Linux computer.
Seccomp-bpf stands for secure computing mode. It’s a simple, yet effective sandboxing tool introduced in Linux kernel 3.5. It allows the user to attach a system call filter to a process and all its descendants, thus reducing the attack surface of the kernel. Seccomp filters are expressed in Berkeley Packet Filter (BPF) format.
In this article I’ll show you how to build a whitelist seccomp-bpf filter and how to attach the filter to a user program using Firejail sandbox. Throughout the article I will use Transmission BitTorrent client as an example.
I start by extracting a list of syscalls the program uses, build the filter and run the program in Firejail. As new syscalls are discovered during testing, the filter is updated. When everything looks fine, I integrate the filter into a security profile suitable for Firejail. These are the steps:
Syscalls
Linux has several tools for listing syscalls. I guess the easiest one to use is strace (apt-get install strace, yum install strace). I start transmission-gtk in strace using -qcf options (quiet, count, follow).
$ strace -qcf transmission-gtk
I play for about 5 minutes with the program, go through some menus, start and stop a download etc.
As I close the program, strace prints the syscall list on the terminal:
% time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 42.93 3.095527 247 12512 poll 19.64 1.416000 2975 476 select 13.65 0.984000 3046 323 nanosleep 12.09 0.871552 389 2239 330 futex 11.47 0.827229 77 10680 epoll_wait 0.08 0.005779 66 88 fadvise64 0.06 0.004253 4 1043 193 read 0.06 0.004000 3 1529 3 lstat 0.00 0.000344 0 2254 1761 stat [...] 0.00 0.000000 0 1 fallocate 0.00 0.000000 0 24 eventfd2 0.00 0.000000 0 1 inotify_init1 ------ ----------- ----------- --------- --------- ---------------- 100.00 7.210150 95061 23256 total
Firejail
I bring strace output (cut&paste) in a text editor and clean it up. I build a comma-separated list without any blanks, something like:
poll,select,nanosleep,futex,epoll_wait,fadvise64,read,lstat,stat,[...]
I use –seccomp.keep option to start Firejail, and –shell=none to run the program directly without the extra syscalls required by a shell:
$ firejail --shell=none --seccomp.keep=poll,select,[...] transmission-gtk
It looks ugly in this moment, a kilometer-long command line that doesn’t even work. For some reasons strace missed some syscalls. Time to bring in the system logger.
Syslog
If I get errors in the terminal, I just add the missing syscall to the list and try again. But this is not always the case. Most of the time Linux kernel will just kill the process and send audit messages to syslog. For this reason, I keep another terminal open monitoring syslog:
$ sudo tail -f /var/log/syslog
The log entry tells me exactly what system call number crashed the program, syscall=201 in the example above. To associate the number with a name, I use firejail as follows:
$ firejail --debug-syscalls | grep 201 201 - time $
Looks like I need to add time syscall to the list. I keep on adding syscalls to the list as they are reported and try again. To get Transmission working I ended up adding pwrite64,time,exit,exit_group on top of what strace reported – not too bad!
Security profiles
Firejail installs in /etc/firejail directory security profiles for several popular programs. The profiles define a manicured filesystem with most directories mounted read-only, and several files and directories blanked in $HOME, mainly files holding passwords and encryption keys.
Transmission BitTorrent client is supported, and the profile also defines a default seccomp blacklist filter. I want to upgrade this filter to the whitelist filter I’ve just built. For this, I go into ~/.config/firejail directory and copy the default Transmission profile there:
$ cd ~/.config/firejail $ cp /etc/firejail/transmission-gtk.profile . $ vim transmission-gtk.profile
I add a “shell none” line, and I replace “seccomp” with “seccomp.keep poll,select,nanosleep,futex,epoll_wait,fadvise64,[…]”. The result looks like this:
$ cd ~/.config/firejail $ cat transmission-gtk.profile # transmission-gtk profile include /etc/firejail/disable-mgmt.inc include /etc/firejail/disable-secret.inc blacklist ${HOME}/.adobe blacklist ${HOME}/.macromedia blacklist ${HOME}/.mozilla blacklist ${HOME}/.icedove blacklist ${HOME}/.thunderbird caps.drop all shell none seccomp.keep poll,select,nanosleep,futex,epoll_wait,fadvise64,read,lstat,stat,epoll_ctl,sendto,readv,recvfrom,ioctl,write,inotify_add_watch,writev,socket,getdents,mprotect,mmap,open,close,fstat,lseek,munmap,brk,rt_sigaction,rt_sigprocmask,access,pipe,madvise,connect,sendmsg,recvmsg,bind,listen,getsockname,getpeername,socketpair,setsockopt,getsockopt,clone,execve,uname,fcntl,ftruncate,rename,mkdir,rmdir,unlink,readlink,umask,getrlimit,getrusage,times,getuid,getgid,geteuid,getegid,getresuid,getresgid,statfs,fstatfs,prctl,arch_prctl,epoll_create,set_tid_address,clock_getres,inotify_rm_watch,set_robust_list,fallocate,eventfd2,inotify_init1,pwrite64,time,exit,exit_group netfilter $
The command “caps.drop all” in the security profile above disables all capabilities. Linux capabilities feature of Linux kernel is similar to seccomp, but works deep inside the kernel. Between seccomp and capabilities more than half the kernel code is disabled.
Firejail chooses the profile automatically, based on the name of the executable. To run Transmission with all security features enabled, the command is:
$ firejail transmission-gtk
Conclusion
Whitelist seccomp filters are easy to build, yet they need lots of testing. The filters are not portable. For example this filter build on Debian Wheezy will not work on Ubuntu 14.04. The exact list of syscalls depends on the kernel running the system, the version of the program and all the libraries the program is linking in.
For more information about Firejail, visit the project page.
great article
Do you recommend building a whitelist for your favorite browser?
keep up the good work
Thanks
Thanks! The filter could break next time your distro installs an update for firefox, or an update for flash player or any other plugin/extension you have installed. I use firefox with the default seccomp blacklist filter, never tried it with a whitelist filter.
Pingback: Links 11/5/2015: Linux 4.1 RC3, OpenELEC 6.0 Beta | Techrights
Could you consider having an option (maybe –seccomp-explicit) which *only* blacklists the syscalls specified? That would be very useful in the case of firefox, where the Hangouts plugin requires ptrace. seccomp.keep does not accomplish this, and nothing else allows me to whitelist a previously blacklisted syscall.
Use “–seccomp.drop=syscall,syscall,syscall…” instead of –seccomp.keep or –seccomp. It blacklists only the calls you specify.
You will also need to allow capabilities, so remove “caps.drop all” line from /etc/firejail/firefox.profile.
Thanks! Although I’m curious…why would I need to remove caps.drop all? I tried just adding seccomp.drop and it seems to work fine, blacklisting all the syscalls I specified.
There seem to be a sys_ptrace capability.
I’m not sure I follow. Right now, I have the following (relevant) config options:
caps.drop all
seccomp.drop
Both seem to work just fine – that is, all capabilities are dropped and the relevant syscalls are blacklisted.
You’re right, forget about caps.drop all
I meant
seccomp.drop mount,umount2,…
Um, you should know that ptrace completely defeats the purpose of seccomp. It’s even displayed in the manpage. Basically, the SETREGS ptrace flag can be used to break out of a seccomp sandbox by messing with registers holding the syscall number and arguments right before it’s called (seccomp filters are checked for permission before ptrace is allowed to mess with things… or in other words, ptrace is allowed to mess with registers right before entering kernelmode, even after a process has been given “permission” by seccomp).
Seccomp is not typically meant for the end-user. It requires you actually know what you’re doing and what the effects on the program will be. It’s not just a “whitelist all syscalls the program needs and you are automatically secure”.
tl;dr if you whitelist ptrace, you are not using seccomp
Pingback: NF.sec – Linux Security Blog - Firejail – proste budowanie klatek
Pingback: Firejail: A New Lightweight Browser And Application Sandbox « IgnorantGuru's Blog
Pingback: linux process sandboxing with linux ‘user namespaces’ and firejail – atropineal
Pingback: Firejail | security sandbox
Pingback: Firejail- Proteção de Segurança – Manjariando
Pingback: Firejail – Xt@rget