nvmlGpuTopologyLevel_t
typedef enum nvmlGpuLevel_enum
{
NVML_TOPOLOGY_INTERNAL = 0, // e.g. Tesla K80
NVML_TOPOLOGY_SINGLE = 10, // all devices that only need traverse a single PCIe switch
NVML_TOPOLOGY_MULTIPLE = 20, // all devices that need not traverse a host bridge
NVML_TOPOLOGY_HOSTBRIDGE = 30, // all devices that are connected to the same host bridge
NVML_TOPOLOGY_NODE = 40, // all devices that are connected to the same NUMA node but possibly multiple host bridges
NVML_TOPOLOGY_SYSTEM = 50 // all devices in the system
// there is purposefully no COUNT here because of the need for spacing above
} nvmlGpuTopologyLevel_t;
NVML_TOPOLOGY_INTERNAL
GPUs are on the same board (e.g., dual-GPU card). Fastest connection.
NVML_TOPOLOGY_SINGLE
GPUs share the same PCIe switch, typically the same PCIe slot/complex. Lower latency and higher bandwidth compared to higher levels.
NVML_TOPOLOGY_MULTIPLE
GPUs are under different PCIe switches but still below the same host bridge. This adds more PCIe hops.
NVML_TOPOLOGY_HOSTBRIDGE
GPUs are connected through a host bridge. A “host bridge” connects the CPU/system root complex to one or more PCIe hierarchies.
→ So two GPUs connected via host bridge means: they are still on the same host, but their PCIe paths diverge only above the level of switches, at the root complex.
NVML_TOPOLOGY_NODE
GPUs are in different NUMA nodes. This means the CPU sockets differ, so the GPUs are attached to different root complexes. Latency is higher.
NVML_TOPOLOGY_SYSTEM
GPUs are only reachable through the full system interconnect. This is the farthest relationship.
→ Still within the same physical host (unless you’re in a virtualized/multi-host environment with NVSwitch across nodes, which is rare outside DGX SuperPOD).
what is dual gpu
A dual-GPU card means a single expansion card (a single PCB that you plug into a PCIe slot) that carries two separate GPU dies (packages), often with their own memory, power regulators, and cooling, but sharing the same board and PCIe interface.
Examples:
NVIDIA GeForce GTX 690 (Kepler, 2012) → 2 × GK104 GPUs on one board
NVIDIA Tesla K80 (Kepler, 2014) → 2 × GK210 GPUs on one board, often used in datacenters
AMD Radeon HD 7990 (Tahiti, 2013) → 2 × Tahiti GPUs
Why it matters for NVML:
If you query topology with NVML, two GPUs on the same PCB (dual-GPU card) will usually return
NVML_TOPOLOGY_INTERNAL → meaning closest possible connection, since they may share a PCIe bridge chip directly on the card.
So yes — your interpretation is correct:
👉 Dual-GPU card = two GPU packages on the same PCB board, plugged into a single PCIe slot.
What is a PCIe switch?
A hardware component that fans out PCIe lanes. Think of it like an Ethernet switch but for PCIe. Multiple devices (GPUs, NICs, NVMe) can attach under a PCIe switch.
What is a host bridge?
The component that connects CPU/system memory (root complex) to one or more PCIe hierarchies. In multi-socket servers, each CPU typically has its own host bridge.
GPUs connected by host bridge lines: same host or different host?
Same host. They are just under different PCIe hierarchies attached to the same CPU root complex.
All devices in the system lines: same host or different host?
Same host. NVML_TOPOLOGY_SYSTEM means "the farthest possible connection within this system."
It doesn’t mean cross-host. NVML itself is per-host and does not describe connections between different physical servers.
graph TD
CPU["CPU / Root Complex<br/>(PCIe Controller)"]
Switch["PCIe Switch<br/>(fan-out)"]
CPU -->|x16 lanes| Switch
subgraph "PCIe Slots"
Slot1["PCIe Slot x16<br/>(16 lanes)"]
Slot2["PCIe Slot x8<br/>(8 lanes)"]
Slot3["PCIe Slot x4<br/>(4 lanes)"]
Slot4["PCIe Slot x1<br/>(1 lane)"]
end
Switch -->|x16 lanes| Slot1
Switch -->|x8 lanes| Slot2
Switch -->|x4 lanes| Slot3
Switch -->|x1 lane| Slot4
can a numa node have multiple cpu socket?
Yes — but it depends on the system architecture.
Most common today:
A NUMA node = 1 CPU socket + its directly attached memory.
This is the standard mapping in modern x86 servers (Intel Xeon, AMD EPYC). Each socket is its own NUMA node.
Possible but less common:
A NUMA node can include multiple sockets if the firmware/OS groups them that way. This was more typical in older systems (e.g., some SGI or IBM big-iron machines) or if the BIOS is set to “NUMA = off” / “Node Interleaving.” In that case, the OS may see 1 NUMA node spanning multiple sockets.
Also possible:
A single socket can expose multiple NUMA nodes. Example:
AMD EPYC (Naples/Rome/Milan) has multiple “CCDs” (chiplets). Each CCD can appear as its own NUMA node, even though they’re on the same physical socket.
👉 So:
By default: 1 NUMA node ≈ 1 CPU socket.
But depending on system design or BIOS/firmware config:
One node can span multiple sockets, or
One socket can be split into multiple NUMA nodes.
nvidia-smi topo输出
55006|JYTFY-D1-308-H100-D01-4|2025-09-22 10:36:21[like@ ~]nvidia-smi topo -m
GPU0 GPU1 GPU2 GPU3 GPU4 GPU5 GPU6 GPU7 NIC0 NIC1 NIC2 NIC3 CPU Affinity NUMA Affinity GPU NUMA ID
GPU0 X NV18 NV18 NV18 NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU1 NV18 X NV18 NV18 NV18 NV18 NV18 NV18 PIX NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU2 NV18 NV18 X NV18 NV18 NV18 NV18 NV18 NODE PIX SYS SYS 0,2,4,6,8,10 0 N/A
GPU3 NV18 NV18 NV18 X NV18 NV18 NV18 NV18 NODE NODE SYS SYS 0,2,4,6,8,10 0 N/A
GPU4 NV18 NV18 NV18 NV18 X NV18 NV18 NV18 SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
GPU5 NV18 NV18 NV18 NV18 NV18 X NV18 NV18 SYS SYS PIX NODE 1,3,5,7,9,11 1 N/A
GPU6 NV18 NV18 NV18 NV18 NV18 NV18 X NV18 SYS SYS NODE NODE 1,3,5,7,9,11 1 N/A
GPU7 NV18 NV18 NV18 NV18 NV18 NV18 NV18 X SYS SYS NODE PIX 1,3,5,7,9,11 1 N/A
NIC0 NODE PIX NODE NODE SYS SYS SYS SYS X NODE SYS SYS
NIC1 NODE NODE PIX NODE SYS SYS SYS SYS NODE X SYS SYS
NIC2 SYS SYS SYS SYS NODE PIX NODE NODE SYS SYS X NODE
NIC3 SYS SYS SYS SYS NODE NODE NODE PIX SYS SYS NODE X
Legend:
X = Self
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
PIX = Connection traversing at most a single PCIe bridge
NV# = Connection traversing a bonded set of # NVLinks
NIC Legend:
NIC0: mlx5_0
NIC1: mlx5_1
NIC2: mlx5_2
NIC3: mlx5_3
siml/nvml/nvidia-smi topo中的topo对齐
# same host, cross numa
SMI_PATH_SYS = 6, ///< Cross-NUMA connection
NVML_TOPOLOGY_SYSTEM = 50 // all devices in the system
SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
# same numa, cross host bridge. host bridge=root complex in wikipedia
SMI_PATH_NODE = 5, ///< NUMA node internal
NVML_TOPOLOGY_NODE = 40, // all devices that are connected to the same NUMA node but possibly multiple host bridges
NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
# same host bridge
SMI_PATH_PHB = 4, ///< PCIe Host Bridge
NVML_TOPOLOGY_HOSTBRIDGE = 30, // all devices that are connected to the same host bridge
PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
The label PHB indicates that data must traverse the PCIe Host Bridge, typically meaning the CPU. This path incurs some latency because data must pass through the CPU before reaching its destination.
# different PCIe switches but still below the same host bridge
SMI_PATH_PXB = 3, ///< Multiple PCIe bridges
NVML_TOPOLOGY_MULTIPLE = 20, // all devices that need not traverse a host bridge
PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
#
SMI_PATH_PIX = 2, ///< Single PCIe bridge
NVML_TOPOLOGY_SINGLE = 10, // all devices that only need traverse a single PCIe switch
PIX = Connection traversing at most a single PCIe bridge
如何查服务器上有没有InfiniBand
ibstat
https://static.189505.xyz/blogTexts/ibstat.verbose.siorigin.h100.txt
ibv_devinfo
https://static.189505.xyz/blogTexts/ibv_devinfo.verbose.siorigin.h100.txt
lspci | grep -i mell
https://static.189505.xyz/blogTexts/lspcivv.verbose.siorigin.h100.txt
ip link show
https://static.189505.xyz/blogTexts/ip.link.show.siorigin.h100.log
ethtool -i
for I in `ip link show | grep ibp | awk -F: '{print $2}'`; do echo "=========== eth:$I"; ethtool -i $I; done
https://static.189505.xyz/blogTexts/ethtool.siorigin.h100.txt