Day 5: AppArmor + layered sandbox design

How AppArmor actually attaches to a process (via the bprm_check_security LSM hook at execve time, keyed on binary path), why bash script.sh silently runs unconfined while ./script.sh does not, the six exec modifiers (ix/Px/Cx/Ux and their setuid-preserving uppercase forms), the hardlink and bind-mount tricks that bypass path-based MAC, and why a production sandbox layers namespace + capability + seccomp + AppArmor + cgroup — with the argument that, if you can only afford two, seccomp + AppArmor is the highest-ROI pair.

1. Where AppArmor sits Link to heading

The core: AppArmor is an LSM (Linux Security Module) hook that intercepts inside syscall handling. It enforces path-based MAC (Mandatory Access Control).

Split with seccomp:

seccomp: at the syscall entry, granularity = nr + integer args
AppArmor: at LSM hooks, granularity = the resolved result of the syscall — paths, network protocol/family
seccomp can’t dereference pointers (Day 4); AppArmor fills that blind spot.

Split with SELinux:

AppArmor: path-based — easy to write, easy to bypass (hardlink / bind mount)
SELinux: label-based — hard to write, hard to bypass (label = inode xattr)
Engineering trade-off: Ubuntu/Stripe lean AppArmor because the config is simpler and K8s integration is cleaner.

2. How profiles attach Link to heading

Trigger point Link to heading

At execve time, the kernel LSM hook bprm_check_security fires:

Kernel finishes parsing the target binary (ELF / shebang)
The LSM chain calls AppArmor’s hook
AppArmor looks up a profile by the target binary’s path
Match → install the profile context into task->security (concretely, aa_task_ctx)
No match → unconfined (this is not default-deny — a common misconception)
execve then continues; by the time the new image’s first instruction runs, the profile is in place

The key is a path string, not an inode Link to heading

AppArmor uses the kernel’s d_path() to get a resolved absolute path
Multiple hardlinks to the same binary each look up independently — this is the root of the path-based weakness (§5)
No exact match → fall through to wildcards (/foo/**); ultimately fall back to unconfined

Fork without exec Link to heading

copy_process uses LSM hooks task_alloc / cred_prepare to copy the parent’s security context to the child
The whole fork chain inherits the same profile until some descendant execves and triggers a new lookup
So shell built-ins and worker forks without exec all inherit.

The classic trap: `bash myscript.sh` vs `./myscript.sh` Link to heading

Invocation	What execve sees	Path AppArmor looks up	Result
`./myscript.sh` (script +x)	Kernel handles shebang, then exec’s the script path	`/path/to/myscript.sh`	Script’s profile attaches
`bash myscript.sh`	`/bin/bash`	`/bin/bash`	Script’s profile does not attach; bash’s profile attaches (if any)
`source myscript.sh`	No execve at all	Current shell’s profile	Profile unchanged

Production lesson: give your script an executable bit and call it as ./script.sh. This is exactly why an earlier experiment of mine — bash myapp.sh — had no profile in effect; switching to /root/.../myapp.sh fixed it.

`change_profile` (voluntary switch) Link to heading

aa-exec -p strict_profile -- /usr/bin/cmd

A confined process can switch itself into a stricter profile, but the original profile must allow it:

change_profile -> /strict_profile,

Use cases: a supervisor that forks and then pins each child to a profile, or a process that self-tightens after some initialization stage.

3. Profile syntax Link to heading

Basic shape Link to heading

#include <tunables/global>        # globals (@{HOME} etc.)

/path/to/program {                # profile head = attach path
    #include <abstractions/base>  # preset abstractions (dynamic linker / libc)

    capability net_bind_service,  # re-check capability (stricter than just dropping caps)
    network inet stream,          # network rule (AF + type)

    /etc/myapp/config.conf r,     # file rule: path + mode
    /var/log/myapp/*.log rw,      # wildcard
    @{HOME}/.myapp/** rwk,        # var + recursive glob + lock

    /usr/bin/helper Cx,           # exec a child program: exec modifier
}

File access modes Link to heading

Modifier	Meaning
`r`	read
`w`	write (includes truncate / append)
`a`	append-only (write but cannot truncate / seek)
`l`	link (may create a hardlink to this file)
`k`	lock (flock / fcntl)
`m`	mmap with PROT_EXEC (plain PROT_READ mmap doesn’t need `m`)

Production gotchas:

Binaries need mr, not just r, or PROT_EXEC mmap of their segments fails
w implies a; a doesn’t imply w (append-only protects log integrity)
Forget m and you get the bizarre “can cat it but can’t exec it” error.

Exec modifiers (the heart of it) Link to heading

Modifier	Which profile does the child get	Use case	Risk
`ix` (inherit)	Inherit current profile	Child is a helper, same constraints	safe
`Px` (profile)	Switch to a standalone profile	Child is another program with its own profile	safe
`Cx` (child profile)	Switch to a nested hat	Sub-profile defined inside parent; child enters it	safe
`Ux` (unconfined)	Drop AppArmor entirely	Only for high-trust helpers	dangerous

Case matters: setuid handling Link to heading

Lowercase (ix / px / cx / ux): strip setuid escalation
Uppercase (iX / Px / Cx / Ux): preserve setuid

Production almost always uses uppercase; lowercase is rare.

Fallback modifiers Link to heading

/bin/helper Px -> helper_profile,    # switch to helper_profile
/bin/helper Pix,                     # if no target profile, fall back to inherit
/bin/helper Cix,                     # same idea, Cx + ix fallback

A bare Px returns EACCES if the target profile isn’t loaded, so defensive production rules write Pix / Cix as a safety net.

Child profile (hat) nesting Link to heading

/usr/bin/myapp {
    /usr/bin/myapp r,
    /bin/helper Cx -> helper_hat,

    profile helper_hat {           # ← hat lives inside parent profile
        /tmp/helper.input r,
        /tmp/helper.output w,
    }
}

After exec, the helper runs under myapp//helper_hat (the double slash is the hat naming convention).

Tighter than Px (a hat can’t see profiles outside its parent)
Tighter than ix (a hat is a subset of the parent’s rights)

Abstractions Link to heading

Preset rule bundles; production profiles almost always pull these in:

abstraction	What it covers
`abstractions/base`	The minimum any Linux program needs (mmap libc / read ld.so.cache / vDSO / etc.)
`abstractions/nameservice`	resolv.conf / hosts / nsswitch
`abstractions/python`	Paths a Python interpreter touches
`abstractions/openssl`	SSL libs + CA certs
`abstractions/X`	X11

The way a production profile opens: #include <abstractions/base>. Near-boilerplate.

A seed for LLM eval: LLMs tend to enumerate every path explicitly rather than reach for an abstraction → the profile grows long and brittle (one libc patch and it breaks). “Profile-level semantic abstraction” is an independent eval metric.

4. Enforce vs Complain Link to heading

Dimension	Enforce	Complain
On violation	Deny, return -EACCES / -EPERM	Allow, keep running
Audit log	`apparmor="DENIED"`	`apparmor="ALLOWED"`
Process behavior	Constrained, may fail	Unaware, runs normally
Use case	Production protection	Profile development / workload study
Risk	A buggy profile can break the service	No protection, observation only

Mirror image of seccomp Link to heading

aa-complain ≈ SECCOMP_RET_LOG (allow + log)
aa-enforce ≈ SECCOMP_RET_ERRNO / SECCOMP_RET_KILL_PROCESS (block)

The same “observe first, then constrain” workflow shows up across LSM and seccomp.

Switching Link to heading

sudo aa-complain /path/to/program     # to complain
sudo aa-enforce /path/to/program      # to enforce
sudo aa-disable /path/to/program      # unload
sudo aa-status                        # all profile states

5. Production workflow: complain → logprof → enforce Link to heading

1. aa-genprof /path/to/program
   ├─ generate an empty skeleton profile, default complain
   └─ tail audit in the background

2. Run real workload (production-like traffic / test suite)
   └─ every unauthorized access → audit ALLOWED

3. aa-logprof
   ├─ reads audit log
   ├─ for each "unauthorized but allowed" event, asks you
   └─ you pick: allow / deny / glob / abstraction / inherit; it writes back into the profile

4. Loop 2-3 until the profile converges

5. aa-enforce
   └─ ship to production

6. Keep watching audit DENIED
   ├─ true attack → alert
   └─ false positive → patch profile

Two traps in the workflow Link to heading

1. Complain doesn’t just log, it induces violations. Under complain the program “thinks it can do anything” and exercises code paths it wouldn’t otherwise touch (fallback branches). You think complain covered everything, then enforce surfaces new DENIEDs. Counter: do a short enforce run too and collect another round.

2. logprof should be suggesting abstractions, not single rules. It will ask “use abstractions/base?” instead of “allow /etc/ld.so.cache r?”. A high-quality production profile is recognizable by its mix of abstractions + targeted refinements.

6. Path-based weakness Link to heading

Why it exists Link to heading

A path is not a property of the file — it’s a property of how you name the file in some namespace. The same inode can have many paths; the same path can point to different inodes over time. Rules are bound to paths → change the path↔inode mapping and you bypass.

Hardlink bypass Link to heading

Threat model A: profile allows reading /tmp/safe.log; attacker does:

ln /etc/passwd /tmp/safe.log      # link passwd to a path the profile allows
cat /tmp/safe.log                 # program reads the "legal path", actual content is passwd

Preconditions:

Unix read permission on the source (modern kernels add protected_hardlinks=1)
Write permission on the target directory

/etc/passwd is 0644 readable, /tmp is writable → attack works.

Note: the reverse (ln /etc/shadow /tmp/x) usually doesn’t — shadow is 0640, a normal user can’t read it, so they can’t hardlink it.

Higher-value variants: linking /proc/$$/maps or a device node into a profile-allowed path.

Bind mount bypass (nastier than hardlink) Link to heading

mount --bind /etc /tmp/safe_dir
cat /tmp/safe_dir/passwd          # actually /etc/passwd

Doesn’t need per-file permission, just mount capability (CAP_SYS_ADMIN, or inside a user namespace)
Rebrands an entire directory tree in one shot
Works across filesystems (hardlink doesn’t)

Direct K8s relevance: CAP_SYS_ADMIN inside a container = able to bind-mount = bypass the host’s AppArmor. Production K8s pods must drop SYS_ADMIN.

Symlinks Link to heading

AppArmor 4.x+ resolves symlinks before matching by default, killing most symlink attacks. But:

TOCTOU races are still possible (create/delete a symlink between check and use)
Explicit l (link) control can forbid creating hardlinks at all

vs SELinux (label-based) Link to heading

	AppArmor (path)	SELinux (label)
Identity	path string	inode xattr `security.selinux`
Hardlink	rule rides on path → bypass via re-link	label stays with inode → no bypass
Bind mount	rule rides on path → bypass	label unchanged → no bypass
Config complexity	simple (path globs)	very complex (type/role/user)
K8s integration	one line in pod spec	painful
Learning curve	gentle	steep

Why Stripe picks AppArmor: engineering trade-off — config is easy and K8s integration is good, and the bypass paths are mostly catchable by the seccomp + caps + namespace layers underneath.

7. Five-layer sandbox design (the core) Link to heading

Each layer’s job Link to heading

Layer	What it governs	What it stops
namespace	View isolation (mnt/net/pid/user/uts/ipc/cgroup)	Lateral movement / info leak
capability	Slicing root into 38 caps	Privilege escalation
seccomp	syscall nr + integer args	Kernel attack surface (syscall 0day)
AppArmor	Resolved path / network protocol	Info leak / persistence
cgroup	Resource quotas (CPU/mem/PID/IO)	DoS

Why they must be layered Link to heading

Each layer covers a different semantic dimension; any single layer has structural blind spots:

seccomp sees syscall nr → can’t see path content → needs AppArmor
AppArmor sees paths → can’t see namespace transitions → needs caps / ns
caps see permission families → can’t see specific actions → needs seccomp for fine-grained syscall denial
namespaces isolate the view → don’t restrict what you do inside the view → needs AppArmor / seccomp
cgroups cap resources → don’t see access semantics → needs AppArmor

Defense in depth: one CVE breaks one layer, the next layer catches it.

Canonical case: runc CVE-2019-5736 Link to heading

Attack: hardlink + procfs trick to overwrite the host’s runc binary → container escape
Bypasses namespace (procfs exposes host PIDs)
Bypasses caps (default caps were enough)
The only layer that could stop it was AppArmor (docker’s default profile later added deny /proc/sys/** w)

One layer doesn’t cut it. Multi-layer defense is mandatory.

Pick just two — seccomp + AppArmor Link to heading

Why:

Widest combined coverage — seccomp owns syscall boundary (kernel attack surface), AppArmor owns syscall result (file / network)
They cover each other’s blind spots (pointer args / kernel attacks)
Don’t depend on namespaces — bare-metal processes can use them, which is critical for Stripe’s fleet-level mitigation story

Why not just namespace + cap: those do isolation and coarse permissions; they don’t stop application-logic attacks. Modern attacks are mostly logic flaws, not missing isolation.

Why not just AppArmor:

LSM has path-bypass blind spots
Kernel-level attacks bypass LSM entirely
seccomp is the last line for kernel attack surface

Production containers turn on all five, but for simpler threat models (trusted code + supply-chain defense), seccomp + AppArmor is the highest ROI pair.

8. Seeds for the Stripe project Link to heading

Mitigation is not a single artifact Link to heading

The best mitigation for a CVE might be “1 seccomp rule + 2 AppArmor rules + drop 1 cap.” LLMs that emit a single artifact will usually under-cover.

Which layer to mitigate at is a design decision Link to heading

The same attack can be stopped at several layers, but trade-offs differ:

seccomp: strict but coarse (whole syscall blocked)
AppArmor: fine but bypassable (path-based)
cap: wide blast radius (dropping SYS_ADMIN may break unrelated things)

Making the LLM explain why this layer is a key eval signal.

Adversarial robustness is the core metric Link to heading

Not just “does the mitigation stop the original PoC?”, but:

Can the attacker bypass with a small tweak?
Variant syscall (execve → execveat)?
Path tricks (hardlink / bind mount)?
Different ABI (i386 / x32)?

Each variant is its own test case.

Typical LLM failures when writing AppArmor profiles Link to heading

Doesn’t use abstractions → long and brittle
Misses the m modifier → PROT_EXEC mmap fails
Uses Px without a fallback (Pix)
Doesn’t account for hardlink / bind mount bypass
Enumerates every path explicitly — you can’t read off “what core asset is being protected”

Each of these is an independent eval metric.

9. Debugging AppArmor Link to heading

Profile state Link to heading

sudo aa-status                       # all profiles, sorted by enforce/complain
sudo aa-status | grep myapp          # a specific profile

Is the profile actually attached? Link to heading

# While the program is running:
cat /proc/<pid>/attr/current         # prints "/path/to/program (enforce)"
ps -eo pid,comm,label                # system-wide

Note: if the profile doesn’t allow reading /proc/*/attr/current, the program reading itself will EACCES — that EACCES is itself indirect evidence the profile is in effect.

Violation records Link to heading

sudo dmesg -T | grep apparmor | tail
sudo journalctl -k | grep apparmor | tail
sudo ausearch -m APPARMOR -ts recent          # if auditd is installed
sudo grep apparmor /var/log/syslog | tail

A full DENIED record looks like:

audit: type=1400 audit(...): apparmor="DENIED"
  operation="open"
  profile="/root/apparmor_test/myapp.sh"
  name="/etc/shadow"
  pid=... comm="cat"
  requested_mask="r" denied_mask="r"
  fsuid=0 ouid=0

Every field matters:

operation: open / exec / mount / capable / …
profile: which profile blocked it
name: the path that was blocked
requested_mask vs denied_mask: what was asked for, what was denied
comm: the program that actually ran

Lab pitfalls (ones I hit) Link to heading

aa-status shows 0 confined processes: profile is loaded but no live process is currently using it — attach is per-process, when the program exits the count goes to zero
dmesg shows no DENIED: kernel ring buffer got flushed by other logs (UFW etc.), or audit rate-limited it
Script invoked as bash script.sh: profile doesn’t attach (the profile head is the script path, not bash)
/proc/*/attr/current not readable: the profile didn’t allow it — not a kernel bug

10. Takeaways Link to heading

AppArmor = LSM hook + path-based MAC, enforced at the syscall result layer
Profile attach keys on binary path; inherited across fork, re-looked-up on execve
bash script.sh does not attach the script’s profile — you must ./script.sh
Six exec modifiers: ix (inherit) / Px (standalone) / Cx (hat) / Ux (drop) — uppercase preserves setuid
mr ≠ r — binaries must allow PROT_EXEC mmap
Complain ≈ seccomp LOG, workflow: complain → logprof → enforce
Path-based weakness: hardlink / bind mount changes the path↔inode map, so the rules bypass
Five-layer sandbox split: ns / cap / seccomp / AppArmor / cgroup — each covers a different blind spot
If you pick two, pick seccomp + AppArmor — complementary and namespace-independent
Three dimensions for LLM-mitigation eval: layer correctness / abstraction quality / adversarial robustness

Week 1 wrap-up Link to heading

Day	Topic	Core takeaway
1	syscall ABI + the user→kernel path	x86_64 registers / `syscall` instruction’s HW side effects / kernel dispatch
2	strace + common syscalls	The three startup layers (ld.so / libc init / main) / isatty pattern / fd reuse race
3	seccomp strict mode	The 4 allowed syscalls + design philosophy / `_exit` vs `SYS_exit` trap
4	seccomp BPF filter	arch check / no pointer deref (two reasons) / syscall family / 8 RET actions / `USER_NOTIF`
5	AppArmor + layered sandbox	profile attach / 6 modifiers / path-based weakness / 5 layers complementary

Week 2 preview: K8s security is essentially packaging this week’s contents into a pod spec:

securityContext.capabilities → cap drop
securityContext.seccompProfile → seccomp filter (RuntimeDefault / Localhost)
securityContext.appArmorProfile → AppArmor profile (per-container)
pod.spec.hostNetwork / hostPID → namespace isolation
resources.limits → cgroup
PodSecurityStandards (restricted / baseline / privileged) → preset combinations of the five layers

The mental model is already in place; K8s is just a declarative wrapper. Week 2 should go quickly.