Hello, 你好,
We found that some standard nVidia tests fail with dual nVidia RTX 4090 system.
我们发现一些标准的 Nvidia 测试在双 Nvidia RTX 4090 系统上失败。
It is CUDA 11.8, driver 520.61.05 running on Linux dl 5.15.0-52-generic #58-Ubuntu
这是在运行 Linux dl 5.15.0-52-generic #58-Ubuntu 的 CUDA 11.8 和驱动程序 520.61.05。
SMP Thu Oct 13 08:03:55 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
SMP 2022 年 10 月 13 日星期四 08:03:55 UTC x86_64 x86_64 x86_64 GNU/Linux
TEST-1: 测试-1:
executables_v2/bin/x86_64/linux/release/simpleP2P
可执行文件_v2/bin/x86_64/linux/release/simpleP2P
[executables_v2/bin/x86_64/linux/release/simpleP2P] - Starting…
[executables_v2/bin/x86_64/linux/release/simpleP2P] - 开始..
Checking for multiple GPUs…
检查多个 GPU..
CUDA-capable device count: 2
CUDA 可用设备数:2
Checking GPU(s) for support of peer to peer memory access…
检查 GPU(们)是否支持 peer to peer 内存访问..
Peer access from NVIDIA GeForce RTX 4090 (GPU0) → NVIDIA GeForce RTX 4090 (GPU1) : Yes
来自 NVIDIA GeForce RTX 4090(GPU0)的对等访问→NVIDIA GeForce RTX 4090(GPU1):是
Peer access from NVIDIA GeForce RTX 4090 (GPU1) → NVIDIA GeForce RTX 4090 (GPU0) : Yes
来自 NVIDIA GeForce RTX 4090(GPU1)的对等访问→NVIDIA GeForce RTX 4090(GPU0):是
Enabling peer access between GPU0 and GPU1…
启用 GPU0 和 GPU1 之间的对等访问..
Allocating buffers (64MB on GPU0, GPU1 and CPU Host)…
分配缓冲区(在 GPU0、GPU1 和 CPU Host 上分配 64MB)..
Creating event handles… 创建事件句柄..
cudaMemcpyPeer / cudaMemcpy between GPU0 and GPU1: 25.19GB/s
GPU0 和 GPU1 之间的 cudaMemcpyPeer / cudaMemcpy:25.19GB/s
Preparing host buffer and memcpy to GPU0…
准备主机缓冲区并复制到 GPU0..
Run kernel on GPU1, taking source data from GPU0 and writing to GPU1…
在 GPU1 上运行内核,从 GPU0 获取源数据并写入 GPU1..
Run kernel on GPU0, taking source data from GPU1 and writing to GPU0…
在 GPU0 上运行内核,从 GPU1 获取源数据并写入 GPU0..
Copy data back to host from GPU0 and verify results…
将数据从 GPU0 复制回主机并验证结果…
Verification error @ element 0: val = nan, ref = 0.000000
元素 0 处的验证错误:val = nan,ref = 0.000000
Verification error @ element 1: val = nan, ref = 4.000000
元素 1 验证错误:val = nan,ref = 4.000000
Verification error @ element 2: val = nan, ref = 8.000000
元素 2 验证错误:val = nan,ref = 8.000000
Verification error @ element 3: val = nan, ref = 12.000000
元素 3 验证错误:val = nan,ref = 12.000000
Verification error @ element 4: val = nan, ref = 16.000000
元素 4 验证错误:val = nan,ref = 16.000000
Verification error @ element 5: val = nan, ref = 20.000000
元素 5 验证错误:val = nan,ref = 20.000000
Verification error @ element 6: val = nan, ref = 24.000000
元素 6 验证错误:val = nan,ref = 24.000000
Verification error @ element 7: val = nan, ref = 28.000000
元素 7 验证错误:val = nan,ref = 28.000000
Verification error @ element 8: val = nan, ref = 32.000000
元素 8 的验证错误:val = nan,ref = 32.000000
Verification error @ element 9: val = nan, ref = 36.000000
元素 9 验证错误:val = nan,ref = 36.000000
Verification error @ element 10: val = nan, ref = 40.000000
元素 10 的验证错误:val = nan,ref = 40.000000
Verification error @ element 11: val = nan, ref = 44.000000
元素 11 的验证错误:val = nan,ref = 44.000000
Disabling peer access… 禁用同行访问..
Shutting down… 关闭中…
Test failed! 测试失败!
TEST-2 测试 2
executables_v2/bin/x86_64/linux/release/OrderedAllocationIPC
executables_v2/bin/x86_64/linux/release/有序分配 IPC
Step 0 done 步骤 0 已完成
Step 1 done 第一步已完成
Process 0: verifying… 进程 0:正在验证..
Process 0: Verification mismatch at 0: 0 != 1
进程 0:在 0 处的验证不匹配:0!=1
Process 0: Verification mismatch at 1: 0 != 1
进程 0:验证不匹配 at 1: 0!=1
Process 0: Verification mismatch at 2: 0 != 1
进程 0:验证不匹配 at 2: 0!=1
Process 0: Verification mismatch at 3: 0 != 1
进程 0:在 3 处的验证不匹配:0!=1
Process 0: Verification mismatch at 4: 0 != 1
进程 0:4 处验证不匹配:0!=1
Process 0: Verification mismatch at 5: 0 != 1
进程 0:验证不匹配 at 5: 0!=1
Process 0: Verification mismatch at 6: 0 != 1
进程 0:在 6 处的验证不匹配:0!=1
.
.
.
Here is system info:
这是系统信息:
Architecture: x86_64 架构:x86_64
CPU op-mode(s): 32-bit, 64-bit
CPU 运行模式:32 位,64 位
Address sizes: 48 bits physical, 48 bits virtual
物理地址大小:48 位,虚拟地址大小:48 位
Byte Order: Little Endian
字节序:小端
CPU(s): 32 CPU 个数:32
On-line CPU(s) list: 0-31
在线 CPU 列表:0-31
Vendor ID: AuthenticAMD 供应商 ID:AuthenticAMD
Model name: AMD Ryzen Threadripper PRO 5975WX 32-Cores
模型名称:AMD Ryzen Threadripper PRO 5975WX 32 核心
CPU family: 25 CPU 系列:25
Model: 8 模型:8
Thread(s) per core: 1
核心上的线程数:1
Core(s) per socket: 32
插槽上的核心数:32
-
创建
2022 年 11 月 -
最后回复
-
54
回复
-
19.1k
浏览量
-
23
用户
-
71
赞
-
18
链接
有 54 个回复,预计阅读时间为 8 分钟。