Processing-in-Memory DRAM Architectures for Neural Network Applications

Sudarshan, Chirag

doi:10.26204/KLUEDO/8310

Emerging applications based on machine learning and Deep Neural Networks (DNNs) are data-driven and memory-intensive. Hence, there is a recent shift from compute-centric architectures to memory-centric architectures. In this context, Processing-in-Memory (PIM) is a new emerging compute paradigm to overcome the memory issues such as bandwidth limitation, long data access latency, and very high data access energy, which are encountered in conventional compute platforms for memory-intensive applications. The key idea of PIM is to integrate the computation units inside the memory to exploit the massive internal data parallelism and to minimize data movement energy consumption. Thereby achieving the highest energy efficiency and/or increased throughput. Researchers have investigated various memories, such as SRAM, DRAM, and emerging memories (i.e. RRAM, FEFET, MRAM, etc.) as PIM candidates. The drawbacks for SRAMs and emerging memories for PIM are 1) SRAM-based PIM accelerators have a low memory capacity and 2) emerging memories are not technologically as mature as DRAMs or SRAMs. Unlike SRAMs and RRAMs, DRAMs satisfy both technological maturity and high memory capacity requirements, which makes them a prominent candidate for PIM. There is a recent surge in research on DRAM-based PIM, both from academia and industries, but it is relatively less compared to SRAM or emerging memory-based PIM. Very recently in the year 2021-2022, major memory vendors like Samsung and SK Hynix have published their respective initial engineering samples of DRAM-PIM. The commodity DRAM technology is different in comparison to standard CMOS logic technology-compatible SRAMs or some emerging memories. It is highly optimized for the density, yield, and thus cost per bit. Commodity DRAM architectures also do not explicitly contain computation units apart from the peripheral logic for data access. Hence, most state-of-the-art DRAM-PIM architectures integrated the computation units at the output of the DRAM bank (i.e. bank peripheral region) without modifying the core structure of the existing commodity DRAM bank architecture. Thereby sufficing a very appealing feature known as Dual-Mode that allows these architectures to be employed as both high-density main memory and high energy-efficient accelerator. In a commodity DRAM, the highest data parallelism and lowest data movement energy is in the Sub-Array region, i.e., inside the DRAM bank. E.g. the amount of parallel data available near the DDR4 bank output is 128-bits while in the Sub-Array region, it is 2 KB. As a result, a few state-of-the-art architectures explored the integration of computation units inside the DRAM banks to further increase the throughput and energy efficiency in comparison to the bank peripheral region approach. However, these architectures required the modification of a highly optimized DRAM Sub-Array design. Furthermore, the bank’s internal region like near the Sub-Array region also imposes high area constraints to the new computation units. In this thesis, I investigate a series of novel DRAM-PIM architectures for DNN inference and training that enable the integration of computation units inside the commodity DRAM bank without majorly modifying the existing commodity bank design and strictly the Sub-Array design to profit from the DRAM road-map. Additionally, I also ensure seamless integration of the DNN computation units by fulfilling the area constraints of the near Sub-Array region. My architectures will fulfill the Dual-Mode features while achieving the highest throughput and energy efficiency in comparison to State-of-the-art DRAM-PIM architectures. Furthermore, I will also extend my research to address the memory issues in non-PIM conventional platforms. I propose a novel DRAM architecture referred to as Dual-Sense-Amplifier (DSA) that reduces the DRAM data access latency which results in increased bandwidth utilization without increasing energy consumption. All my novel DRAM architectures will be fully compatible with any type of commodity DRAM architecture (e.g. HBM, GDDR, etc.) and retain most of its circuit designs. Finally, I investigate the reduction of the DRAM energy without modifying the commodity DRAM architecture. For this, I propose a lean, low power, and low area DRAM memory controller architecture that is tailored for IoT/embedded applications and leverages on the transprecision computing methodology.

Author:	Chirag Sudarshan ORCiD
URN:	urn:nbn:de:hbz:386-kluedo-83104
DOI:	https://doi.org/10.26204/KLUEDO/8310
ISBN:	978-3-95974-219-1
Series (Serial Number):	Forschungsberichte Mikroelektronik (34)
Advisor:	Norbert Wehn
Document Type:	Doctoral Thesis
Cumulative document:	No
Language of publication:	English
Date of Publication (online):	2024/06/28
Year of first Publication:	2024
Publishing Institution:	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Granting Institution:	Rheinland-Pfälzische Technische Universität Kaiserslautern-Landau
Acceptance Date of the Thesis:	2023/10/24
Date of the Publication (Server):	2024/07/02
Page Number:	VII, 154
Faculties / Organisational entities:	Kaiserslautern - Fachbereich Elektrotechnik und Informationstechnik
CCS-Classification (computer science):	B. Hardware / B.3 MEMORY STRUCTURES / B.3.1 Semiconductor Memories (NEW) (B.7.1) / Dynamic memory (DRAM) (NEW)
DDC-Cassification:	6 Technik, Medizin, angewandte Wissenschaften / 621.3 Elektrotechnik, Elektronik
Licence (German):	Creative Commons 4.0 - Namensnennung, nicht kommerziell, keine Bearbeitung (CC BY-NC-ND 4.0)

Processing-in-Memory DRAM Architectures for Neural Network Applications

Download full text files

Export metadata

Additional Services