This story is a continuation of the last one, in which we discussed Process Capabilities Sets in detail. Some of you may be wondering how these Capabilities Sets are determined or applied to Unprivileged and Privileged Program Binary. This article is aimed at them.
Before I begin detailing process creation mechanics and Linux capabilities, I'd want to go over two key concepts.
Capability-aware applications can manipulate their capability set with system calls (capset, capget, prctl) after load. At some point during execution when an application doesn't need certain capabilities, it can drop some capabilities from its effective set to limit exposure to privileged tasks. As long it has a capability in the permitted set, it always brings back that capability to its effective set.
e.g runc, ping etc.
Applications don’t do any system calls (capset) to modify their capabilities, and they depend on the capability sets that are inherited from the parent and constructed during application load. In order words, they rely on an effective capability set to do their job.
e.g cat, ls etc.
Unprivileged Program Binary is when no File Capabilities are enabled on the executable. When we load an unprivileged program binary (e.g., ls, cat), the capability sets of the thread (parent) in conjunction with file SETUID bit are used to determine the capabilities of that thread after execve(2).
In the case of Unprivileged Program Binary, the ambient capabilities are critical in determining the thread's capabilities.
Let's have a look at how capability sets are determined for an Unprivileged Program Binary after execve(2) under certain conditions.
Ambient capabilities must exist in a bounding set.
An unprivileged user (bash process) uses the ping executable to ping a local server.
Criteria:
# File Ownership: setuid bit != set && owner == root
$ ls -la ping_clone
-rwxr-xr-x ... root root ... ping_clone
# Parent Process: Unprivileged bash proces which runs with no
# or limited capabilities
$ capsh --print
Current: =
Bounding set =cap_chown,cap_dac_override, .....
.....
uid=1000(ubuntu)
gid=1000(ubuntu)
# Executable Binary: Unprivileged ping binary
$ getcap ping_clone
Use capsh utility to bootstrap an unprivileged bash process and then ping a local server.
$ sudo capsh --caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep"
--keep=1 --user=ubuntu --addamb="cap_net_admin,cap_net_raw" --print -- -c "./ping_clone -c 1 localhost"
Current: = cap_setgid,cap_setuid,cap_setpcap,cap_net_admin,cap_net_raw+p
Bounding set = cap_chown,cap_dac_override,cap_dac_read_search,
cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,
cap_setpcap,cap_linux_immutable,cap_net_bind_service,
cap_net_broadcast,cap_net_admin,cap_net_raw,
cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,
cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,
cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,
cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,
cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,
cap_syslog,35,36,37
Securebits: 020/0x10/5'b10000
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: yes (unlocked)
uid=1000(ubuntu)
gid=1000(ubuntu)
groups=4(adm),10(wheel),190(systemd-journal),991(docker),1000(ubuntu)
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=255 time=0.033 ms
--- localhost ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.033/0.033/0.033/0.000 ms
So what's going on here? Let's have a look:
Current & Bounding set: Create a favorable environment for ping_clone and which is
sudo(root)───>su───>bash───>ping_clone(ubuntu)
--user=$USER: Drop all capabilities on UID change as we transition from the root into $USER.
--addamb=cap_net_raw: Add the ambient set to the effective and permitted sets when executing unprivileged binaries.
You may need to install setpriv utility.
$ sudo apt install setpriv
We'll use the setpriv utility to run the ping_clone binary as an unprivileged user.
$ sudo setpriv --inh-caps '-all,+net_raw' \
--bounding-set '-all,+net_raw' \
--reuid=ubuntu \
--ambient-caps='+net_raw' \
./ping_clone -c1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.019 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms
When --ambient-caps argument isn't supplied, ping_clone utility will complain about 'socket: Operation not permitted'
So, what exactly is going on here? Let me clarify.
--reuid=ubuntu: All effective and permitted capabilities sets will be dropped from ping_clone binary.
--ambient-caps=+net_raw: Recalculate the effective and permitted capabilities sets based on given ambient capabilities sets.
A privileged user (bash process) pings a local server using an unprivileged ping binary.
Criteria
# File Ownership: setuid bit != set && owner == root.
$ ls -la ping_clone
-rwxr-xr-x ... root root ... ping_clone
# Parent Process: Privileged bash process runs with full capabilities.
$ capsh --print
Current: = cap_net_admin,cap_net_raw,cap_chown,cap_dac_override, .....
Bounding set = cap_net_admin,cap_net_raw,cap_chown,cap_dac_override, .....
.....
uid=0(root)
gid=0(root)
...
# Executable Binary: Unprivileged ping binary (file capabilities aren't set).
$ getcap ping_clone
When you log in as root, your Effective User ID is set to 0 and you have unrestricted access to the system to do (nearly) whatever you want.
Login as a root user explains everything.
With (Effective User ID == 0), the bash process becomes a privileged process. Despite the fact that all Linux capabilities are enabled, the kernel normally skips all restriction checks when Effective User ID == 0.
Set User ID (setuid) and Set Group ID (sgid) are special permissions for executable files.
When these permissions are assigned to a file, the file to be executed assumes the privileges of the file's owner or group.
setuid bit changes a program effective uid (euid) upon execution.
Criteria:
# File Ownership: setuid bit == set && owner == root.
$ ls -la ping_clone
-rwsr-xr-x ... root root ... ping_clone
# Parent Process: Unprivileged bash process(no or limited capabilities).
$ capsh --print
Current: =
Bounding set =cap_chown,cap_dac_override, .....
.....
uid=1000(ubuntu)
gid=1000(ubuntu)
...
# Executable Binary: Unprivileged ping binary. (file capabilities aren't set).
$ getcap ping_clone
# setuid bit set
$ ls -la
...
-rwsr-xr-x ... root root ... ping_clone
When a non-root user executes the ping clone utility owned by the root user and with the setuid bit set, the file will always run in root user context (EUID = 0), until a program changes its effective uid (euid) during execution.
~$ ping_clone localhost &
[1] 31994
~# PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.027 ms
~$ cat /proc/31994/status
Name: ping_clone
...
...
Uid: 1000 1000 0 1000
Gid: 1000 1000 1000 1000
...
CapInh: 0000000000000000
CapPrm: 0000000000003000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
...
So, what's going on here?
Let's take a look at the ping_clone utility from the perspective of system calls. Remember that it is a capability-aware application that may change its capabilities programmatically.
Take a look at the output of the strace tracing tool.
Privileged Program Binary means that certain capabilities have been assigned to executable files. When we load a privileged Program Binary (e.g., ping clone), the executable file's capability set plays a significant role in the thread after execve(2).
Use getcap utility to determine privileged status of a Program Binary.
File permitted set and old bounding set (before execve()) are logically ANDed.
P1 = Bounding Old & File Permitted Set
File inheritable set and old inheritable set (before execve()) are logically ANDed.
P2 = Inheritable Old & File Inheritable Set
Final state of permitted set is calculated by doing logical OR P1 and P2.
P = P1 | P2
A unprivileged user (bash process) pings a local server using a privileged ping binary.
Criteria:
# setuid bit != set && owner != root
$ ls -la ping_clone
-rwxr-xr-x ... ubuntu ubuntu ... ping_clone
# Privileged ping binary
$ getcap ping_clone
ping_clone = cap_net_raw+i
# Unprivileged User
$ capsh --print
Current: =
Bounding set =cap_chown,cap_dac_override, .....
.....
uid=1000(ubuntu)
gid=1000(ubuntu)
...
Condition: Make sure that ping_clone utility is set with cap_net_raw as it's inheritable capability.
Terminal 1
# Privileged ping binary
$ getcap ping_clone
ping_clone = cap_net_raw+i
$ sudo capsh
--caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep"
--keep=1 --user=ubuntu --inh="cap_net_raw"
--print -- -c "./ping_clone localhost"
Current: = cap_net_raw+ip cap_setgid,cap_setuid,cap_setpcap,cap_net_admin+p
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 020/0x10/5'b10000
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: yes (unlocked)
uid=1000(ubuntu)
gid=1000(ubuntu)
groups=4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),999(docker),1000(ubuntu)
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023
Terminal 2
$ cat /proc/4696/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000002000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
So what's going on here? Let's have a look
When file permitted set is limited to cap_net_raw.
Terminal 1
# Privileged ping binary
$ getcap ping_clone
ping_clone = cap_net_raw+p
$ sudo capsh
--caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep"
--user=ubuntu
--print -- -c "./ping_clone localhost"
Current: =
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 020/0x10/5'b10000
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: yes (unlocked)
uid=1000(ubuntu)
gid=1000(ubuntu)
groups=4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),999(docker),1000(ubuntu)
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023 ms
Terminal 2
$ cat /proc/4696/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000002000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
So what's going on here? Let's explain:
File effective bit makes more sense when application binaries like cat, nice, etc are unaware of capget() and capset() syscalls and can't change their thread effective set. In this case, they rely on external conditions, such as file effective bit, to copy all the capabilities of the permitted set into an effective set.
Instead of ping_clone utility, we will use top_clone utility for demonstration.
Terminal 1
# Privicp leged ping binary
$ getcap top_clone
top_clone = cap_chown+ep
$ ./top_clone
....
uid=1000(ubuntu)
top - 09:44:35 up 13:25, 0 users, load average: 0.15, 0.05, 0.01
Tasks: 120 total, 2 running, 79 sleeping, 0 stopped, 0 zombie
.....
Terminal 2
CapInh: 0000000000000000
CapPrm: 0000000000000001
CapEff: 0000000000000001
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
So what's going with thread capabilities:
Terminal 2#2 CapPrm: Above Capabilities Transition will help us to determine the final state of thread permitted set (0x0000000000000001=cap_chown) which matches with file permitted set.
$ getcap top_clone
top_clone = cap_chown+ep
Terminal 2#3 CapEff: Since file effective flag/bit is set for top_clone, It automatically copies permitted set into an effective set.
CapPrm: 0000000000000001
CapEff: 0000000000000001