The prevalent perception is that Linux users benefit from and exercise privileges, however this is not the case. It's the process or executable that runs in a certain user context and exercises rights (permission to carry out to perform the privileged operations guarded by Linux kernel).
Processes have capabilities, not users.
In a Unix-like system, the traditional strategy for dealing with Process Privileges is to use a Binary Design (Privileged processes and Unprivileged processes). That is, a process can run as root and have full access to the system, or it can operate as a non-root user and not be able to execute privileged activities.
Privileged processes: All kernel security permission checks are bypassed by privileged processes. Perf events performance monitoring, for example, is completely accessible to privileged processes with no access, scope, or resource limitations.
Privileged Processes (where effective user ID is 0)
Note: (Effective User ID == 0) is referred to as superuser or root
Unprivileged processes are subjected to a comprehensive security permission check based on the credentials of the process (usually: effective UID, effective GID).
Unprivileged Processes (where effective user ID is nonzero)
Although this simple design is good for system administrators who need full access to the system to perform critical operations (installing updates, adding users, performing backups, mounting filesystems, rebooting the system, etc. ), it makes it challenging for system operators (HR, Finance, etc.) to perform their day-to-day tasks whenever they need to perform restricted operations or access files owned by other users.
DAC (Discretionary Access Control) is installed by default on the Linux File System (Files/Directories/Devices) to permit others to control access. Owners of files or directories have absolute ownership over who has access to their files and what activities they can perform.
When an Unprivileged Process (Effective User ID!= 0) requests access to the system, the Linux kernel conducts access control checks based on the user's privileged access.
Drawback of DAC
Capabilities in Linux are used to provide fine-grained access to kernel resources that was previously unavailable to unprivileged processes. Instead of granting full access to the targeted process at once, the Linux kernel splits root permissions into smaller bits that can be distributed individually on a thread-by-thread basis.
The capability handbook page [1] has a comprehensive list of all available capabilities.
# A complete list of all available capabilities is present
# in the capability manual page [1].
$ man capabilities
# -------------------- #
# Alternative would be
# -------------------- #
# Capability supported by your kernel
$ cat /proc/sys/kernel/cap_last_cap
37
The Linux Privilege Model divides root privilege into 38+ capabilities which non-root users can use to execute privileged actions (like system calls or data manipulation).
There are five different capability sets that can be enabled to each process(thread) and each is represented by a 64-bit number and can have zero or more capabilities.
The Effective set helps the kernel to know final permissions of a process.
When a process attempts a privileged operation, the kernel verifies that the relevant bit in the effective set is set. When a process requests to set the monoatomic clock, for example, the kernel first verifies that the CAP_SYS_TIME bit in the process effective set is set.
The Permitted set indicates what capabilities a process can use and limits what can be in effective set.
A process can have capabilities that are set in the "permitted set" but not in the effective set. This indicates that the process has temporarily disabled this capability. A process can only set its effective set bit if it is included in the permitted set.
The inheritable capabilities are the capabilities of the current process that should be inherited by a program executed by the current process.
The permitted set of a process is masked against the inheritable set during exec(), while child processes and threads are given an exact copy of the capabilities of the parent process. Also note that ‘inheriting’ a capability does not necessarily automatically give any thread effective capabilities. ‘inherited’ capabilities only directly influence the new thread permitted set.
It is possible to limit the capabilities that a process may ever obtain using "bounding set."
Only capabilities found in the bounding set will be permitted in the inheritable and permitted sets. It is used to limit a program's capabilities. You cannot have any capability in other capability sets unless it is present in the bounding set.
The ambient capability set is applied to all non-SUID binaries that do not have file capabilities.
The ambient capability are retained capabilities during execve(). However, not all capabilities in the ambient set may be kept since they are dropped if they are not included in either the inheritable or permitted capability set.
To see the capabilities of a particular process, use the status file in the /proc/<PID>/ directory.
Process capabilities are expressed in hexadecimal format.
CapInh = Inherited capabilities
CapPrm = Permitted capabilities
CapEff = Effective capabilities
CapBnd = Bounding set
CapAmb = Ambient capabilities set
Let's have a look at the Ping utility's process capabilities. You could be wondering why effective capabilities are set to zeroes. The simplest answer would be that ping is a Capability Aware Application, which means it may drop some or all effective capabilities once they're no longer be required to reduce exposure. It can still reinstate a capability to Effective Capabilities Sets as long as it has a capability in Permitted Capabilities Sets.
# Mute the output and get process id
~$ ping 127.0.0.1 > /dev/null &
[1] 21002
~$ cat /proc/21002/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000003000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
An alternative would be to use getpcaps utility to display the capabilities of a particular process.
getpcaps resolves capabilities into proper names
# suppress the output and get process id
~$ ping 127.0.0.1 > /dev/null &
[1] 21002
~$ getpcaps 21002
Capabilities for `21002': = cap_net_admin,cap_net_raw+p
Similarly, using pscap utility, we can generate a report of all running processes' capabilities.
$ pscap -a
ppid pid name command capabilities
0 1 root systemd full
1 419 root systemd-journal chown, dac_override, dac_read_search, fowner, setgid, setuid, sys_ptrace, sys_admin, audit_control, mac_override, syslog, audit_read
1 447 root lvmetad full
1 457 root systemd-udevd full
1 589 systemd-timesync systemd-timesyn sys_time
capsh utility decodes a capability value represented in hexadecimal into the capability name.
The proc filesystem (procfs) lists process capabilities in hexadecimal format.
~$ cat /proc/21002/status | grep Cap
CapInh: 0000000000000000
CapPrm: 0000000000003000
CapEff: 0000000000000000
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
# Decode raw capabilities
~$ capsh --decode=0000000000003000
0x0000000000003000=cap_net_admin,cap_net_raw
~$ capsh --decode=0000000000003000
0x0000001fffffffff=cap_chown,cap_dac_override,
cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,
cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,
cap_net_bind_service,cap_net_broadcast,cap_net_admin,
cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,
cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,
cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,
cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,
cap_audit_write,cap_audit_control,cap_setfcap,
cap_mac_override,cap_mac_admin,cap_syslog,35,36
# -------------------- #
# Alternative would be
# -------------------- #
~$ for line in $(grep Cap /proc/21002/status | awk '{print $2}'); do capsh --decode=$line; done;
0x0000000000000000=
0x0000000000003000=cap_net_admin,cap_net_raw
0x0000000000000000=
0x0000001fffffffff=cap_chown,cap_dac_override,
cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,
cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,
cap_net_bind_service,cap_net_broadcast,cap_net_admin,
cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,
cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,
cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,
cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,
cap_audit_write,cap_audit_control,cap_setfcap,
cap_mac_override,cap_mac_admin,cap_syslog,35,36
0x0000000000000000=
The capsh utility can be used to drop a capability by passing either --drop or --uid.
UID argument causes the thread to lose all capabilities.
~$ sudo capsh --caps="cap_setpcap,cap_setuid,cap_setgid+ep" \
--drop="cap_net_admin,cap_net_raw" --keep=1 --uid=1001 \
--print -- -c "ping localhost"
Current: = cap_setgid,cap_setuid,cap_setpcap+p Bounding set =
Securebits: 020/0x10/5'b10000
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: yes (unlocked) uid=1001(test1) gid=0(root) groups=0(root)
ping: socket: Operation not permitted Super-powers are granted randomly so please submit an issue if you're not happy with yours.
# -------------------- #
# Alternative would be
# -------------------- #
$ sudo capsh --drop=cap_net_raw --print -- -c "/bin/ping -c 1 localhost"
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,35,36,37
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=0(root)
ping: socket: Operation not permitted
There are three distinct capability sets that can be associated with an executable file. The kernel assesses the capabilities of the new process in conjunction with the current Process Capability and File (Binaries) Capabilities.
These capabilities are added to the process permitted set on execution.
After the execve(), the intersection (logical AND) of the thread inheritable and file inheritable sets are added to the thread permitted set.
In contrast to other file capability sets, it is only a flag. When the flag is set, the process effective set following execve() is set to the new process permitted set; otherwise, it is empty.
Depending on the use case, we may need to look for files with capabilities enabled.
To find all files with file capabilities set, use getcap -r.
A malicious user can use getcap -r find an exploitable executable binary on the system.
$ getcap -r / 2>/dev/null
/home/ubuntu/environment/cat_clone = cap_setuid+ep
/home/ubuntu/environment/top_clone = cap_chown+ep
/home/ubuntu/environment/ping_clone = cap_net_raw+p
/usr/bin/mtr-packet = cap_net_raw+ep
filecap utility does similar job to list capabilities of files.
~$ filecap /usr
file capabilities
/usr/bin/mtr-packet net_
Similarly, we can find the capabilities set of all running processes using the pscap utility.
# pscap -a
ppid pid name command capabilities
6148 6152 root bash full
The setcap utility adds capabilities to an executable file as permitted and effective capabilities.
Only privileged users (CAP_SETFCAP) can perform this operation.
$ setcap cap_net_raw,cap_net_admin+ep ping_clone unable to set CAP_SETFCAP
effective capability: Operation not permitted
Add cap_net_raw to the file inheritable set.
# Privileged ping binary
~# setcap cap_net_raw+i ping_clone
~$ getcap ping_clone
ping_clone = cap_net_raw+i
Add cap_net_raw, cap_net_admin to the file permitted set.
# Privileged ping binary
~# setcap cap_net_raw,cap_net_admin+p ping_clone
~$ getcap ping_clone
ping_clone = cap_net_raw,cap_net_admin+p
Enabling the file effective flag causes the thread permitted set to be automatically enforced to the thread effective set.
# Privileged ping binary
~# setcap ping_clone
ping_clone = cap_net_raw,cap_net_admin+ep
~$ getcap ping_clone
ping_clone = cap_net_raw,cap_net_admin+ep
To inspect an executable file's file capabilities, use the getcap utility.
~$ getcap ping_clone
ping_clone = cap_net_raw+i
An alternative technique would be to compare the file capability set to an arbitrary value and see if it matches.
Use setcap -v to verify file capabilities.
# When it confirms file capabilities
$ setcap -v cap_net_admin,cap_net_raw+ep ping_clone
ping_clone: OK
# When file capabilities differs
$ setcap -v cap_net_raw+ep ping_clone
ping_clone differs in [pe]
In next chapter, we will see how capabilities sets are determined for Unprivileged and Privileged Program Binaries after execve(2).
Stay tuned …