confidential computing可信计算GPU

confidential computing mode

在看nvml文档的时候,看到confidential computing mode,经过查询,是可信计算的意思。就是HBM会加密你的数据,防止被云厂商偷看到权重和数据。
🔹 What is Confidential Computing?

In general, Confidential Computing = protecting data while it is being processed, not just when stored (at rest) or transmitted (in transit).

It relies on Trusted Execution Environments (TEEs) — isolated, hardware-enforced secure zones inside CPUs or GPUs.

Goal: prevent a malicious hypervisor, OS, or even cloud provider from snooping on or tampering with sensitive workloads.

🔹 NVIDIA Confidential Computing Mode (GPU-side)

In NVML (NVIDIA Management Library), the Confidential Compute Mode setting controls whether the GPU is operating in this secure execution environment.

On GPUs like the H100, this involves:

Encrypted GPU memory: HBM contents are transparently encrypted.

Encrypted links: PCIe/NVLink traffic can be encrypted.

Attestation: The GPU can provide a cryptographic proof (quote) that it is running in a trusted, secure mode.

Isolation: Prevents other VMs or processes from accessing sensitive GPU memory/state.

🔹 NVML and Confidential Computing

In NVML, you’ll see APIs like:

nvmlDeviceGetConfComputeMode()

nvmlDeviceSetConfComputeMode()

These let you query or configure whether a GPU is in:

Disabled mode (normal GPU behavior, no memory encryption).

Enabled mode (confidential computing protections active).

Sometimes also a Mixed mode if the GPU supports partitioned usage.

🔹 Why it Matters

In cloud AI training/inference, your model weights/data are valuable IP. Confidential mode ensures the cloud provider (or other tenants) cannot peek into GPU memory.

In multi-tenant HPC clusters, it prevents one user’s workload from leaking into another’s.

In regulated industries (finance, healthcare), it helps meet compliance by securing in-use data.

🔹 Example (H100 GPU)

Without CC: GPU memory is plaintext in HBM; PCIe packets are visible to host.

With CC:

HBM memory → AES-XTS encrypted.

PCIe/NVLink links → optionally encrypted.

Attestation → remote user can verify the GPU is running trusted firmware.

Leave a Comment