This story is a continuation of the , in which we discussed Process Capabilities Sets in detail. Some of you may be wondering how these are determined or applied to . This article is aimed at them. last one Capabilities Sets Unprivileged and Privileged Program Binary Before I begin detailing process creation mechanics and Linux capabilities, I'd want to go over two key concepts. Capability Aware Applications Capability-aware applications can manipulate their capability set with system calls ( ) after load. At some point during execution when an application doesn't need certain capabilities, it can drop some capabilities from its effective set to limit exposure to privileged tasks. As long it has a capability in the permitted set, it always brings back that capability to its effective set. capset, capget, prctl e.g runc, ping etc. Capability Dump Applications Applications don’t do any system calls (capset) to modify their capabilities, and they depend on the capability sets that are inherited from the parent and constructed during application load. In order words, they rely on an effective capability set to do their job. e.g cat, ls etc. Unprivileged Program Binary Unprivileged Program Binary is when no are enabled on the executable. When we load an unprivileged program binary (e.g., ls, cat), the capability sets of the thread (parent) in conjunction with file SETUID bit are used to determine the capabilities of that thread after execve(2). File Capabilities In the case of Unprivileged Program Binary, the ambient capabilities are critical in determining the thread's capabilities. Let's have a look at how capability sets are determined for an Unprivileged Program Binary after execve(2) under certain conditions. Capabilities Transition Explanation There will be no change in the inheritable & bounding set. inheritable & bounding: These capabilities are lost during execve() and are recalculated based on ambient capabilities. effective & permitted: The ambient capabilities are introduced to reinforce lost capabilities ineffective & permitted set. ambient: Ambient capabilities must exist in a bounding set. Use Case #1: Unprivileged Bash Process An unprivileged user (bash process) uses the ping executable to ping a local server. Criteria: undefined setuid bit != set && owner == root [ ] File Ownership: undefined Unprivileged bash process runs with no or limited capabilities [ ] Parent Process: undefined Unprivileged ping binary. [ ] Executable Binary: Schematic Diagram Prepare The Environment # File Ownership: setuid bit != set && owner == root $ ls -la ping_clone -rwxr-xr-x ... root root ... ping_clone # Parent Process: Unprivileged bash proces which runs with no # or limited capabilities $ capsh --print Current: = Bounding set =cap_chown,cap_dac_override, ..... ..... uid=1000(ubuntu) gid=1000(ubuntu) # Executable Binary: Unprivileged ping binary $ getcap ping_clone Demo #1: Using capsh Utility Use to bootstrap an unprivileged bash process and then ping a local server. capsh utility $ sudo capsh --caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep" --keep=1 --user=ubuntu --addamb="cap_net_admin,cap_net_raw" --print -- -c "./ping_clone -c 1 localhost" Current: = cap_setgid,cap_setuid,cap_setpcap,cap_net_admin,cap_net_raw+p Bounding set = cap_chown,cap_dac_override,cap_dac_read_search, cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid, cap_setpcap,cap_linux_immutable,cap_net_bind_service, cap_net_broadcast,cap_net_admin,cap_net_raw, cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio, cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin, cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time, cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write, cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin, cap_syslog,35,36,37 Securebits: 020/0x10/5'b10000 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: yes (unlocked) uid=1000(ubuntu) gid=1000(ubuntu) groups=4(adm),10(wheel),190(systemd-journal),991(docker),1000(ubuntu) PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=255 time=0.033 ms --- localhost ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.033/0.033/0.033/0.000 ms So what's going on here? Let's have a look: Create a favorable environment for ping_clone and which is Current & Bounding set: ) sudo(root)───>su───>bash───>ping_clone(ubuntu Drop all capabilities on as we transition from the into . --user=$USER: UID change root $USER Add the ambient set to the effective and permitted sets when executing unprivileged binaries. --addamb=cap_net_raw: Demo #2: Using Utility setpriv You may need to install . setpriv utility $ sudo apt install setpriv We'll use the to run the ping_clone binary as an unprivileged user. setpriv utility $ sudo setpriv --inh-caps '-all,+net_raw' \ --bounding-set '-all,+net_raw' \ --reuid=ubuntu \ --ambient-caps='+net_raw' \ ./ping_clone -c1 127.0.0.1 PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data. 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.019 ms --- 127.0.0.1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.019/0.019/0.019/0.000 ms When --ambient-caps argument isn't supplied, ping_clone utility will complain about 'socket: Operation not permitted' So, what exactly is going on here? Let me clarify. All effective and permitted capabilities sets will be dropped from ping_clone binary. --reuid=ubuntu: Recalculate the effective and permitted capabilities sets based on given ambient capabilities sets. --ambient-caps=+net_raw: Use Case #2: Privileged Bash Process A privileged user (bash process) pings a local server using an unprivileged ping binary. Criteria undefined setuid bit != set && owner == root. [ ] File Ownership: undefined Privileged bash process runs with all capabilities enabled. [ ] Parent Process: undefined Unprivileged ping binary (file capabilities aren't set). [ ] Executable Binary: Schematic Diagram Prepare The Environment # File Ownership: setuid bit != set && owner == root. $ ls -la ping_clone -rwxr-xr-x ... root root ... ping_clone # Parent Process: Privileged bash process runs with full capabilities. $ capsh --print Current: = cap_net_admin,cap_net_raw,cap_chown,cap_dac_override, ..... Bounding set = cap_net_admin,cap_net_raw,cap_chown,cap_dac_override, ..... ..... uid=0(root) gid=0(root) ... # Executable Binary: Unprivileged ping binary (file capabilities aren't set). $ getcap ping_clone Capabilities Transition When you log in as root, your Effective User ID is set to 0 and you have unrestricted access to the system to do (nearly) whatever you want. Login as a root user explains everything. With (Effective User ID == 0), the bash process becomes a privileged process. Despite the fact that all Linux capabilities are enabled, the kernel normally skips all restriction checks when Effective User ID == 0. Use Case #3: Special Permissions (SUID, SGID) (setuid) and (sgid) are special permissions for executable files. Set User ID Set Group ID When these permissions are assigned to a file, the file to be executed assumes the privileges of the file's owner or group. bit changes a program effective uid (euid) upon execution. setuid Criteria: undefined setuid bit == set && owner == root. [ ] File Ownership: undefined Unprivileged bash process(no or limited capabilities). [ ] Parent Process: undefined Unprivileged ping binary (file capabilities aren't set). [ ] Executable Binary: Schematic Diagram Prepare The Environment # File Ownership: setuid bit == set && owner == root. $ ls -la ping_clone -rwsr-xr-x ... root root ... ping_clone # Parent Process: Unprivileged bash process(no or limited capabilities). $ capsh --print Current: = Bounding set =cap_chown,cap_dac_override, ..... ..... uid=1000(ubuntu) gid=1000(ubuntu) ... # Executable Binary: Unprivileged ping binary. (file capabilities aren't set). $ getcap ping_clone # setuid bit set $ ls -la ... -rwsr-xr-x ... root root ... ping_clone Capabilities Transition When a non-root user executes the ping clone utility owned by the root user and with the setuid bit set, the file will always run in root user context (EUID = 0), until a program changes its effective uid (euid) during execution. ~$ ping_clone localhost & [1] 31994 ~# PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.027 ms ~$ cat /proc/31994/status Name: ping_clone ... ... Uid: 1000 1000 0 1000 Gid: 1000 1000 1000 1000 ... CapInh: 0000000000000000 CapPrm: 0000000000003000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 ... So, what's going on here? Isn't it supposed to be Uid: 1000 0 0 0 as stated in the claim? uid=1000: The permitted set is reduced to . CapPrm: 0000000000003000: cap_net_admin , cap_net_raw How does the process conduct privileged network actions if the effective sets are empty? CapEff: 0000000000000000: Let's take a look at the from the perspective of system calls. Remember that it is a that may change its capabilities programmatically. ping_clone utility capability-aware application Take a look at the output of the . strace tracing tool Gets all capabilities in {effective, permitted} sets. At line 4: Drops all effective capabilities and removes all unwanted capabilities from the permissible set, leaving just . At line 6: CAP_NET_ADMIN and CAP_NET_RAW is used to keep capability sets throughout a future EUID transition. At line 7: prctl(PR_SET_KEEPCAPS, 1) Change effective user id to less privileged user. At line 9: The capability set has been re-established as an effective capability set for sensitive network operations. Add line 21: CAP_NET_RAW Privileged Program Binary Privileged Program Binary means that certain capabilities have been assigned to executable files. When we load a privileged Program Binary (e.g., ping clone), the executable file's capability set plays a significant role in the thread after . execve(2) Use to determine privileged status of a Program Binary. getcap utility Capabilities Transition Explanation The ambient capabilities has no role in capabilities transition and are set to zero. ambient: There will be no change in the inheritable & bounding set. inheritable & bounding: The logic to determine the final state of permitted set is complicated. It all depends on old inheritable capabilities and file capabilities and follows the given transition logic permitted: File permitted set and old bounding set (before ) are logically ANDed. execve() P1 = Bounding Old & File Permitted Set File inheritable set and old inheritable set (before ) are logically ANDed. execve() P2 = Inheritable Old & File Inheritable Set Final state of permitted set is calculated by doing logical OR P1 and P2. P = P1 | P2 Transition logic is as follows effective: in permitted set as effective capability whenever required. Capabilities Aware Application has the luxury to activate/deactivate a capability (Dump applications) to control the auto enforcement of permitted set as effective set after . File effective flag/bit is introduced for Capabilities Unaware Applications execve() Use Case #1: Unprivileged Bash Process A unprivileged user (bash process) pings a local server using a privileged ping binary. Criteria: undefined setuid bit != set && owner != root. [ ] File Ownership: undefined Unprivileged bash process (no or limited capabilities) [ ] Parent Process: undefined Privileged ping binary (file capabilities are set using ) [ ] Executable Binary: capset() Schematic Diagram Prepare The Environment # setuid bit != set && owner != root $ ls -la ping_clone -rwxr-xr-x ... ubuntu ubuntu ... ping_clone # Privileged ping binary $ getcap ping_clone ping_clone = cap_net_raw+i # Unprivileged User $ capsh --print Current: = Bounding set =cap_chown,cap_dac_override, ..... ..... uid=1000(ubuntu) gid=1000(ubuntu) ... Example #1: When File Inheritable Set is set Make sure that is set with as it's inheritable capability. Condition: ping_clone utility cap_net_raw Terminal 1 # Privileged ping binary $ getcap ping_clone ping_clone = cap_net_raw+i $ sudo capsh --caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep" --keep=1 --user=ubuntu --inh="cap_net_raw" --print -- -c "./ping_clone localhost" Current: = cap_net_raw+ip cap_setgid,cap_setuid,cap_setpcap,cap_net_admin+p Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read Securebits: 020/0x10/5'b10000 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: yes (unlocked) uid=1000(ubuntu) gid=1000(ubuntu) groups=4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),999(docker),1000(ubuntu) PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023 Terminal 2 $ cat /proc/4696/status | grep Cap CapInh: 0000000000000000 CapPrm: 0000000000002000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 So what's going on here? Let's have a look We need a less privileged bash session with desired capabilities before executing ping_clone. --user=$USER: Bash session must enable cap_net_raw in the inheritable set as per capabilities transition logic for Privileged Program Binary. --inh=cap_net_raw: We want to make sure is there in the bash session inheritable set. Terminal 1#7 Current: cap_net_raw Example #2: File Permitted Set is set When file permitted set is limited to . cap_net_raw Terminal 1 # Privileged ping binary $ getcap ping_clone ping_clone = cap_net_raw+p $ sudo capsh --caps="cap_net_admin,cap_net_raw,cap_setpcap,cap_setuid,cap_setgid+ep" --user=ubuntu --print -- -c "./ping_clone localhost" Current: = Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read Securebits: 020/0x10/5'b10000 secure-noroot: no (unlocked) secure-no-suid-fixup: no (unlocked) secure-keep-caps: yes (unlocked) uid=1000(ubuntu) gid=1000(ubuntu) groups=4(adm),20(dialout),24(cdrom),25(floppy),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(lxd),114(netdev),999(docker),1000(ubuntu) PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.023 ms Terminal 2 $ cat /proc/4696/status | grep Cap CapInh: 0000000000000000 CapPrm: 0000000000002000 CapEff: 0000000000000000 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 So what's going on here? Let's explain: Again we desire a less privileged bash session with certain capabilities before executing ping_clone. --user=$USER: We have intentionally remove this argument to prove ping_clone is still operation without inheritable capabilities. --inh=cap_net_raw: This is normal since we didn't specify argument to drop permitted set from parent bash session after . Terminal 1#9 Current: --keep fork() Example #3: When File Effective Bit is set File effective bit makes more sense when application binaries like cat, nice, etc are unaware of and syscalls and can't change their thread effective set. In this case, they rely on external conditions, such as file effective bit, to copy all the capabilities of the permitted set into an effective set. capget() capset() Instead of we will use for demonstration. ping_clone utility, top_clone utility Terminal 1 # Privicp leged ping binary $ getcap top_clone top_clone = cap_chown+ep $ ./top_clone .... uid=1000(ubuntu) top - 09:44:35 up 13:25, 0 users, load average: 0.15, 0.05, 0.01 Tasks: 120 total, 2 running, 79 sleeping, 0 stopped, 0 zombie ..... Terminal 2 CapInh: 0000000000000000 CapPrm: 0000000000000001 CapEff: 0000000000000001 CapBnd: 0000003fffffffff CapAmb: 0000000000000000 So what's going with thread capabilities: Above Capabilities Transition will help us to determine the final state of thread permitted set (0x0000000000000001=cap_chown) which matches with file permitted set. Terminal 2#2 CapPrm: $ getcap top_clone top_clone = cap_chown+ep Since file effective flag/bit is set for top_clone, It automatically copies permitted set into an effective set. Terminal 2#3 CapEff: CapPrm: 0000000000000001 CapEff: 0000000000000001