yisen2994 wrote:
3090ti才是
有需要的等等
很狂喔 雙芯卡再現 一張破十萬不是夢

ya19881217 wrote:
也不能算是灌水 浮點運算力的確有提升上去
SKAP wrote:
要嘛照實體數量標示,然後備註每時脈週期可執行2次著色,因此性能大幅提升
kkk123kkk123kkk wrote:
實體數量就是從原本圖(恕刪)
SKAP wrote:
要扯細節,OK說法1(恕刪)
One of the key design goals for the Ampere 30-series SM was to achieve twice the throughput for FP32 operations compared to the Turing SM. To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.
and you can note the dual FP32 units in two data paths. Each SM consists of 128 CUDA cores which is why we have seen a doubling of the core count on the Ampere GPU.
The Turing architecture features a new SM design that incorporates many of the features introduced in our Volta GV100 SM architecture. Two SMs are included per TPC, and each SM has a total of 64 FP32 Cores and 64 INT32 Cores. In comparison, the Pascal GP10x GPUs have one SM per TPC and 128 FP32 Cores per SM. The Turing SM supports concurrent execution of FP32 and INT32 operations (more details below), independent thread scheduling similar to the Volta GV100 GPU.