An 8b-Precision 6T SRAM Computing-in-Memory Macro Using Time-Domain Incremental Accumulation for AI Edge Chips

Sustainable Development Goals

Abstract/Objectives

This article introduces an innovative static random access memory computing-in-memory (SRAM-CIM) architecture aimed at enhancing the precision, efficiency, and speed of multiply-and-accumulate (MAC) operations. Key features of the design include: 1) a time-domain incremental-accumulation (TDIA) approach that facilitates accurate MAC operations while preserving a large signal margin, 2) a dynamic differential-reference (D2REF) method developed through software-hardware co-design to minimize energy use during read operations, and 3) a low-dMACV-aware recursive time-to-digital converter (LMAR-TDC) to further decrease energy consumption. A prototype of this 28 nm 1 Mb SRAM-CIM, built using 6T-SRAM cells, achieved an energy efficiency of 39.31 TOPS/W and a compute latency of 6.6 ns for 8-bit MAC operations, handling 64 accumulations per cycle with nearly full output precision (22-bit).

Results/Contributions

This article presents a novel static random access memory computing-in-memory (SRAM-CIM) structure designed for high-precision multiply-and-accumulate (MAC) operations with high energy efficiency (EF), high readout accuracy, and short compute latency. The proposed device employs 1) a time-domain incremental-accumulation (TDIA) scheme to enable high-accumulation MAC operations while maintaining a large signal margin across MAC values (MACVs), 2) a dynamic differential-reference (D2REF) scheme based on software-hardware co-design to reduce read energy consumption, and 3) a low-dMACV-aware recursive time-to-digital converter (LMAR-TDC) for implementation with the D2REF scheme to further suppress readout energy consumption. A 28 nm 1 Mb SRAM-CIM macro fabricated using foundry-provided compact 6T-SRAM cells achieved EF of 39.31 TOPS/W and compute latency of 6.6 ns for 8b-MAC operations with 64 accumulations per cycle and near-full output precision (22b). © 1966-2012 IEEE.

Results/Contributions: This article presents an SRAM-CIM macro that uses a time-domain accumulation scheme for CNN operations. The proposed DCU enables the use of foundry-provided compact 6T SRAM cells while achieving robust MAC operations involving a large number of accumulations and a consistently large signal margin across MACVs. The proposed D2REF scheme D2REF was shown to decrease readout power consumption while preserving inference accuracy. The efficacy of the schemes proposed in this article was verified in experiments using a fabricated 28 nm 1 Mb time-domain SRAM-CIM macro, which achieved a compute latency of 6.6 ns, area efficiency of 0.86–4.08 TOPS/mm2 , and EF of 22.02–115.6 TOPS/W for MAC operations involving 4–8b input, 4–8b weight, and 14–22b output.

Keywords

static random-access memoryin-memory computingmultiply-accumulate operationenergy efficiencyread precisioncomputation delaytime-domain incremental accumulationdynamic differential referencesensingdigital convertermacroaccumulationfull-output precision

Contact Information

謝志成

cchsieh@ee.nthu.edu.tw