This article presents a novel static random access memory computing-in-memory (SRAM-CIM) structure designed for high-precision multiply-and-accumulate (MAC) operations with high energy efficiency (EF), high readout accuracy, and short compute latency. The proposed device employs 1) a time-domain incremental-accumulation (TDIA) scheme to enable high-accumulation MAC operations while maintaining a large signal margin across MAC values (MACVs), 2) a dynamic differential-reference (D2REF) scheme based on software-hardware co-design to reduce read energy consumption, and 3) a low-dMACV-aware recursive time-to-digital converter (LMAR-TDC) for implementation with the D2REF scheme to further suppress readout energy consumption. A 28 nm 1 Mb SRAM-CIM macro fabricated using foundry-provided compact 6T-SRAM cells achieved EF of 39.31 TOPS/W and compute latency of 6.6 ns for 8b-MAC operations with 64 accumulations per cycle and near-full output precision (22b). © 1966-2012 IEEE.
Results/Contributions: This article presents an SRAM-CIM macro that uses a time-domain accumulation scheme for CNN operations. The proposed DCU enables the use of foundry-provided compact 6T SRAM cells while achieving robust MAC operations involving a large number of accumulations and a consistently large signal margin across MACVs. The proposed D2REF scheme D2REF was shown to decrease readout power consumption while preserving inference accuracy. The efficacy of the schemes proposed in this article was verified in experiments using a fabricated 28 nm 1 Mb time-domain SRAM-CIM macro, which achieved a compute latency of 6.6 ns, area efficiency of 0.86–4.08 TOPS/mm2 , and EF of 22.02–115.6 TOPS/W for MAC operations involving 4–8b input, 4–8b weight, and 14–22b output.