Seccomp, eBPF, and the Importance of Kernel System Call Filtering
Learn about attacks that affect user space security products and how popular technologies such as Seccomp and eBPF can be used in such a way that avoids these issues.
Join the DZone community and get the full member experience.
Join For FreeFiltering system calls is an essential component of many host-based runtime security products on Linux systems. There are many different techniques that can be used to monitor system calls, all of which have certain tradeoffs. Recently, kernel modules have become less popular in favor of user space runtime security agents due to portability and stability benefits. Unfortunately, it is possible to architect user space agents in such a way that they are susceptible to several attacks such as time of check time of use (TOCTOU), agent tampering, and resource exhaustion. This article explains attacks that often affect user space security products and how popular technologies such as Seccomp and eBPF can be used in such a way that avoids these issues.
Attacks Against User Space Agents
User space agents are often susceptible to several attacks such as TOCTOU, tampering, and resource exhaustion. These attacks all take advantage of the fact that the user space agent must communicate with the kernel before it makes a decision about system call or other action that occurs on the system. Generally, these attacks attempt to modify data passed in system calls in such a way that prevents a user space agent from detecting an attack or taking advantage of the fact that the agent does not protect itself from tampering.
TOCTOU vulnerabilities present a substantial risk to user space security agents running on the Linux kernel. These vulnerabilities arise when security decisions are based on data that can be altered by an attacker between the check and the subsequent use. For instance, a user space security agent might check the arguments of a system call before allowing a certain operation, but during the time gap before the operation is executed, an adversary could change the system call’s arguments. This manipulation could lead to a divergence between the state perceived by the security agent and the actual state, potentially resulting in security breaches. Addressing TOCTOU challenges in user space security agents requires careful consideration of synchronization mechanisms, ensuring that checks and corresponding actions are executed atomically to prevent exploitation.
Resource exhaustion poses a notable threat to user space security agents operating on the Linux kernel, often through the execution of an excessive number of system calls. In this scenario, attackers exploit the agent's requirement to check system calls in a manner that is non blocking. By initiating a barrage of system calls, such as file operations, network connections, or process creation, adversaries aim to overload the agent with benign events and exhaust the agent’s resources such as CPU, memory, or network bandwidth. user space security agents need to implement effective blocking mechanisms that enable them to perform a check on a system call before allowing the call to complete its execution.
Tampering attacks are another common issue user space security agents must address. In these attacks, adversaries aim to manipulate the behavior or compromise the integrity of the user space security agent itself, rendering it ineffective or allowing it to be bypassed. Typically, tampering with the agent requires root level access to the system as most security agents run a root. Tampering can take various forms, including altering the configuration of the security agent, deleting or modifying the agent’s executable files on disk, injecting malicious code into its processes, and temporarily pausing or killing its processes with signals. By subverting the user space security agent, attackers can disable critical security features and evade detection. user space security agents must be aware of these attacks and have the appropriate detection mechanisms built in.
Seccomp for Kernel Filtering
Seccomp, short for “Secure Computing”, is a Linux kernel feature designed to filter system calls made by a process thread. It allows user space security agents to define a restricted set of allowed system calls, reducing the attack surface of an application. Options for system calls that violate the filter include killing the application and notifying another user space process such as a user space security agent. Traditional Seccomp operates by preventing all system calls except for read, write, and exit which significantly restricts the system calls a thread may execute.
Seccomp BPF (Berkeley Packet Filter) is an evolution that provides a more flexible filtering mechanism compared to traditional seccomp. Unlike the original version, Seccomp-BPF allows for the dynamic loading of custom Berkeley Packet Filter programs, enabling more fine-grained control over filtering criteria. Seccomp BPF enables the restriction of specific system calls and enables inspection of system call parameters to inform filtering decisions. Seccomp-BPF cannot dereference pointers, so its system call argument analysis is focused on the value of the arguments themselves. By enforcing policies that exclude potentially risky system calls and interactions, Seccomp-BPF contributes significantly to enhancing application security, with the latter offering a more versatile approach to system call filtering.
Seccomp avoids the TOCTOU problem by evaluating system call arguments directly. Because seccomp inspects arguments by value, it is not possible for an attacker to alter them after an initial system call. Thus, the attacker does not have an opportunity to modify the data inspected by seccomp after the security check is performed. It is important to note that user space applications that need to dereference pointers to inspect data such as file paths must do so carefully, as this approach can potentially be manipulated by TOCTOU attacks if appropriate precautions are not taken. For example, a security agent could change the value of a pointer argument to a system call to a non-deterministic location and explicitly set the memory it points to. This approach makes TOCTOU attacks more challenging because it prevents another malicious thread in the monitored process from modifying memory pointed to by the original system call arguments.
Seccomp is designed with tampering in mind. Both seccomp and seccomp BPF are immutable. Once a thread has seccomp enabled, it cannot be disabled. Similarly, seccomp BPF filters are inherited by all child processes. If additional seccomp programs are added, they are executed in LIFO order. All seccomp BPF filters that are loaded are executed, and the most restrictive result returned by the filters is enacted on the thread. Because seccomp settings and filters are immutable and inherited by child processes, it is not possible for an attacker to bypass their defenses without a kernel exploit. It is important that seccomp BPF filters consider both 64-bit and 32-bit system calls as one technique sometimes used to evade filtering is to change the ABI to 32-bit on a 64-bit operating system.
Seccomp avoids resource exhaustion because all system call checks occur inline and before the system call is executed. Thus, the thread executing the system call is blocked while the filter is inspecting the system call arguments. This approach prevents the calling thread from executing additional system calls while the seccomp filter is operating. Because seccomp BPF filters are pure functions, they cannot save data across executions. So, it is not possible to cause them to run out of working memory by storing data about previously executed system calls. This approach ensures seccomp will not cause a system to have reduced memory consumption.
By avoiding TOCTOU, tampering, and resource consumption issues, seccomp provides a powerful mechanism for security teams and application developers to enhance their security posture. Seccomp provides a flexible approach to runtime detection and protection against various threats, from malware to exploitable vulnerabilities that works across Linux distributions. Thus, teams can use seccomp to enhance the security posture of their entire Linux workloads in the cloud, in the data center, and at the edge.
eBPF for Kernel Filtering
eBPF can mitigate TOCTOU vulnerabilities by executing filtering logic directly within the kernel, eliminating the need for transitions between user space and kernel space. This inline execution ensures that security decisions are made atomically, leaving no opportunity for attackers to manipulate the system state between security checks and system call execution. However, it is also dependent on where exactly the program hooks into the kernel.
When hooking into system calls, the memory location with the pathname to be accessed belongs to user space, and user space can change it after the hook runs, but before the pathname is used to perform the actual open in-kernel operation. This is depicted in the image below, where the bpf hook checks the “innocent” path, but the kernel operation actually happens with the “suspicious” path.
Hooking into a kernel function that happens after the path is copied from user space to kernel space avoids this problem because the hook operates on memory that the user space application cannot modify. For example in file integrity monitoring, instead of a system call, we could hook into the security_file_permission function, which is called on every file access or security_file_open and is executed whenever a file is opened.
By accessing system call arguments within the kernel context, eBPF programs can ensure that security decisions are based on consistent and verifiable information, effectively neutralizing TOCTOU attack vectors. It is impossible to do proper enforcement without in-kernel filtering because by the time the event has reached user-space, it is already too late if the operation has already been executed.
eBPF also provides robust mechanisms for preventing tampering attacks by executing filtering logic within the kernel. Unlike user space agents, which may be susceptible to tampering attempts targeting their executable files, memory contents, or configuration settings, eBPF programs operate within the highly privileged kernel context, where access controls and integrity protections are strictly enforced.
For instance, an eBPF program enforcing integrity checks on critical system files can maintain cryptographic hashes of file contents within kernel memory, ensuring that any unauthorized modifications are detected and prevented in real time. With eBPF, the state of what is watched can be updated in the kernel inline with the operations, while doing this in user-space introduces race conditions.
Finally, eBPF addresses resource exhaustion attacks by implementing efficient event filtering and resource management strategies within the kernel. Unlike user space agents, which may be overwhelmed by excessive system call traffic, eBPF programs can leverage kernel-level optimizations to efficiently process and prioritize incoming events, ensuring optimal utilization of system resources.
Deciding at the eBPF hook whether the event is of interest to the user or not, means that no extraneous events will be generated and processed by the agent. The alternative, to do the filtering in user-space, tends to induce significant overhead for events that happen very frequently in a system (such as file access or networking) that can lead to resource exhaustion.
Low overhead in-kernel filtering means security teams no longer have a resource concern driving decisions on how many files to monitor or whether to enable FIM on systems with extensive I/O operations such as on database servers. eBPF can filter out non-relevant events that are uninteresting to the policy, repetitive, or part of the normal expected behavior to minimize overhead. Thus, eBPF-based security agents can optimize resource utilization and ensure uninterrupted protection against resource exhaustion attacks.
By leveraging eBPF's capabilities to mitigate TOCTOU vulnerabilities, prevent tampering attacks, and mitigate resource exhaustion risks, security teams can develop runtime security solutions that effectively protect Linux systems against a wide range of threats.
Opinions expressed by DZone contributors are their own.
Comments