Performance & why it's slow

On the default QEMU path, every guest CPU instruction is emulated in software. There is no shortcut: KVM hardware virtualization is impossible without root on Android. This page explains exactly what that means, where it hurts most, and every tuning knob Podroid applies.

The core bottleneck: TCG software emulation

QEMU's CPU backend on Podroid is TCG (Tiny Code Generator), a software JIT that translates guest ARM64 instructions into host ARM64 instructions at runtime. The pipeline for each guest basic block is: fetch guest instructions, decode, translate to host IR, optimize, emit host code, cache the result as a "translation block" (TB), execute. Simple scalar code runs at roughly 2-5x the cost of native; anything that creates many short-lived blocks (branch-heavy code, dynamic dispatch, interpreters) compounds that badly.

Hardware virtualization via KVM would eliminate the translation entirely, letting the guest run directly on the physical CPU cores. KVM requires CAP_SYS_ADMIN or the Android Virtualization Framework's special permissions. The QEMU path has neither. AVF (pKVM) does, which is why it is dramatically faster on devices that support it - see Backends: QEMU & AVF.

Why npm, Node, JVM, and compilers crawl

The double-JIT problem

Node.js, the JVM, and similar runtimes are themselves JIT compilers: they watch hot code paths in the guest and emit new machine code at runtime to speed them up. Under TCG, that freshly-generated guest machine code has never been seen before. QEMU must translate it, cache it as TBs, and then the JIT runtime modifies or discards those code pages when it deoptimizes or garbage-collects - which invalidates the TBs. The cycle repeats: generate guest code, translate, invalidate, translate again. The translation-block cache (tb-size) is finite; JIT output that is larger than the cache evicts older entries, causing retranslation stalls on every cache miss. You are paying TCG overhead on top of what the JIT was already doing.

Process-spawn cost

Commands like npm install, ./configure, and make spawn hundreds or thousands of short-lived child processes. Each new QEMU process starts with a cold translation cache. There is no shared TB cache between QEMU process invocations, and no persistent cache on disk. That means the same tiny utilities (sh, awk, grep, cp) are fully re-translated from scratch on every invocation. Even a build step that does trivially little actual computation feels slow because the TCG startup cost dominates. This is the mechanism behind "even small builds feel slow."

I/O

Disk I/O travels through three layers: guest virtio-blk driver to QEMU device emulation, then to the host Android filesystem. Podroid assigns a dedicated I/O thread to each virtio-blk device so disk requests are handled on a separate thread and do not stall vCPU execution. The guest uses the mq-deadline I/O scheduler for request merging, which matters most for the random small writes Podman's overlay graph driver generates during container image extraction. The ext4 overlay volume is mounted with noatime,commit=60,barrier=0 to reduce metadata write amplification.

Memory

The default allocation is 512 MB of guest RAM - enough for a shell and basic containers, but tight for npm or a JVM process. Podroid configures ZRAM swap inside the guest at 1.5x the VM's RAM, using lz4 compression. Because lz4 averages around 3x on heap and text pages, that compressed swap costs only a fraction of its size in real RAM - so a 512 MB VM gets roughly 768 MB of extra swap on top of its RAM at near-zero I/O cost (ZRAM lives in RAM; there is no disk read or write). That headroom is what keeps an npm install or a browser tab from immediately hitting the out-of-memory killer. For memory-heavy work, raise RAM in Settings → RAM.

Network

Podroid uses SLIRP, a userspace TCP/IP stack embedded inside QEMU. It is adequate for bulk transfers (container image pulls, SSH, HTTP) but adds per-packet CPU overhead compared to a kernel network path. High-connection-count workloads - tools that open hundreds of parallel connections like npm install with many dependencies - will feel this overhead in addition to the CPU emulation cost. There is no alternative without root: TAP networking requires a TUN/TAP device or elevated permissions.

Tuning summary

The table below lists every tuning measure Podroid applies and what it buys. All of these are already active by default; they are not optional settings you need to find.

Measure	Where it lives	What it buys
`-cpu max,pauth-impdef=on`	QEMU command line	Enables all host CPU extensions in the guest, including non-cryptographic pointer authentication (fast, TCG-native). Up to ~50% on PAuth-heavy code (Rust, recent Clang output).
`-accel tcg,thread=multi`	QEMU command line	One host thread per vCPU. Without this all vCPUs share one host thread and can't use more than one core. Enables real SMP scaling.
`tb-size=256` (<2 GB RAM) / `tb-size=512` (≥2 GB RAM)	QEMU command line	Larger translation-block cache reduces re-translation for JIT-heavy workloads (Node, JVM in containers). Default QEMU value is 32 MB.
Per-disk `iothread`	QEMU command line	Dedicated I/O thread per virtio-blk device, decoupled from vCPU threads. Largest single win for container image pull and extraction.
`mitigations=off` in kernel cmdline	Guest kernel boot args	Disables Spectre/Meltdown mitigations inside the guest. Safe inside a TCG VM: speculative execution attacks cannot cross the emulated ISA boundary. Saves 5-15% CPU on syscall-heavy workloads.
`mq-deadline` I/O scheduler	Guest kernel cmdline (`elevator=mq-deadline`)	Request merging for virtio-blk random writes. Reduces write amplification on Podman overlay graph driver operations.
ZRAM swap at 1.5x RAM (lz4)	Guest init (`podroid-bootstrap` OpenRC service)	lz4 (~3x on heap/text) makes the compressed swap cost a fraction of its size, giving large headroom against the out-of-memory killer at near-zero I/O cost. Priority 100 so it is preferred over any file swap.
ext4 `noatime,commit=60,barrier=0`	Overlay volume mount options	Skips access-time updates, batches journal commits to 60-second windows, and omits write barriers (safe: data loss on crash only risks the 60-second window, not the filesystem). Reduces metadata I/O on package-install workloads.
`metacopy=on` on overlayfs	Overlay mount options	Copies only inode metadata (not data) when a file is stat'd or chmod'd on the overlay. Speeds up container image layer application.

What won't help without root

Some common Linux performance tricks are simply unavailable on a stock Android device: io_uring is blocked by Android 12+ seccomp policy; CPU affinity (taskset) requires elevated scheduling permissions; KSM memory deduplication is a kernel knob; huge pages on the host are not user-configurable; TAP networking requires a TUN/TAP device. These are hard limits, not oversights.

Full hardware virtualization (KVM, or Qualcomm's Gunyah) is in the same category for the QEMU path: opening the hypervisor device node directly requires root, and on AVF it is gated to devices that allow non-protected pKVM (see Backends). Podroid stays no-root by design. If you are rooted on a recent Qualcomm Snapdragon device, a separate project such as DroidVM uses Gunyah or KVM for near-native VMs; Podroid does not require root and uses QEMU or AVF instead.

Biggest wins

If you have a Pixel 8, 9, or 10 (or another device shipping pKVM), switch the backend to AVF in Settings → Advanced. AVF runs the guest on real hardware CPU cores via the kernel hypervisor - no TCG, no double-JIT overhead, no cold-cache process-spawn cost. See Backends: QEMU & AVF for setup. For heavy workloads on any backend, also raise RAM in Settings and lower the X11 viewer resolution if you are running a desktop.

Previous← How it works NextBackends: QEMU & AVF →

Podroid is free software (GPL). Docs for v1.2.5. Found something inaccurate? Open an issue.