Skip to content

Latest commit

 

History

History
180 lines (160 loc) · 16.9 KB

PIM.md

File metadata and controls

180 lines (160 loc) · 16.9 KB

Application Scenario Marker

  • #f03c15 General Purpose
  • #c5f015 Neural Network
  • #1589F0 Graph Processing
  • #af62ff Bioinformatics
  • #0abab5 Data Analytics
  • #ff66cc Associative Computing
  • #f4f442 Automata Computing
  • #ece5b8 Data Manipulation
  • #161616 Security
  • #003366 Others

PIM

Circuit level researches

DRAM based

#f03c15[ISSCC 1997][Intelligent RAM (IRAM): chips that remember and compute]
#f03c15[GLSVLSI 2005][PIM lite: A multithreaded processor-in-memory prototype]

SRAM based

#c5f015[ICASSP 2014][An Energy-Efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM]
#c5f015[VLSI 2016][A machine-learning classifier implemented in a standard 6T SRAM array]
#c5f015[ISSCC 2018][A 65nm 4Kb Algorithm-Dependent Computing-in-Memory SRAM Unit-Macro with 2.3ns and 55.8TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors]
#c5f015[ISSCC 2018][Conv-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications]
#ff66cc[JSSC 2018][A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V VDDmin]
#c5f015[DAC 2108][Parallelizing SRAM arrays with customized bit-cell for binary neural networks]

RRAM based

#c5f015[nature 2015][Training and operation of an integrated neuromorphic network based on metal-oxide memristors]
#c5f015[ASP-DAC 2017][MPIM: Multi-purpose in-memory processing using configurable resistive memory]
#f03c15[IEEE Electron Device Letters 2018][Reconfigurable Boolean Logic in Memristive Crossbar: the Principle and Implementation]

PCRAM based

#c5f015[TED 2015][Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element]

STT-RAM based

#c5f015[ISCAS 2014][Spin-Transfer Torque Magnetic Memory as a Stochastic Memristive Synapse]
#c5f015[ISVLSI 2017][Hybrid Polymorphic Logic Gate with 5-Terminal Magnetic Domain Wall Motion Device]

#c5f015[ASP-DAC 2018][HieIM: Highly Flexible In-Memory Computing using STT MRAM]
#c5f015[IEEE Transactions on Magnetics 2017][In-Memory Processing Paradigm for Bitwise Logic Operations in STT–MRAM]
#f03c15[DATE 2018][Computing-in-memory with spintronics]

Architecture level researches

#f03c15[IEEE Micro 1976][A case for intelligent RAM]
#f03c15[Computer 1995][Processing in Memory: The Terasys Massively Parallel PlM Array]
#f03c15[Frontiers 1996][Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies]
#f03c15[Computer 1997][Scalable Processors in the Billion-Transistor Era: IRAM]
#f03c15[ISCA 1997][Processing in memory: Chips to petaflops]
#f03c15[ASPDAC 2018][PIMCH: cooperative memory prefetching in processing-in-memory architecture]

DRAM based

#f03c15[CICC 1992][Computational RAM: A Memory-SIMD Hybrid and Its Application to DSP]
Arch: add logic within DRAM to perform vector operations
#f03c15[ICPP 1994][EXECUBE-A New Architecture for Scaleable MPPs]
Arch: add logic within DRAM to perform vector operations
#f03c15[IEEE Computer 1995][Processing in memory: The Terasys massively parallel PIM array]
Arch: add logic within DRAM to perform vector operations
#f03c15[IEEE Micro 1997][A Case for Intelligent RAM]
Arch: add logic within DRAM to perform vector operations
#f03c15[ICCD 1997][Intelligent RAM (IRAM): the industrial setting, applications, and architectures]
Arch: add logic within DRAM to perform vector operations
#f03c15[ISCA 1998][Active Pages: A Computation Model for Intelligent Memory]
#f03c15[SC 1999][Mapping Irregular Applications to DIVA, a PIM based Data-Intensive Architecture]
#ff66cc[MTDT 1999][The Dynamic Associative Access Memory Chip and its Application to SIMD Processing and Full-text Database Retrieval]
#f03c15[IEEE DT 1999][Computational RAM: Implementing processors in memory]
#f03c15[ISCA 2000][Smart Memories: A Modular Reconfigurable Architecture]
#f03c15[IPDPS 2002][Memory-intensive benchmarks: IRAM vs. cache-based machines]
#f03c15[ICS 2002][The Architecture of the DIVA Processing-In-Memory Chip]
#f03c15[ICCD 2012][FlexRAM: Toward an Advanced Intelligent Memory System]
#ece5b8[Micro 2013][RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization]
#1589F0[HPEC 2013][Accelerating Sparse Matrix-Matrix Multiplication with 3D-Stacked Logic-in-Memory Hardware]
#0abab5[SIGMOD 2015][JAFAR: Near-Data Processing for Databases]
#f03c15[CAL 2015][Fast Bulk Bitwise AND and OR in DRAM]
#ff66cc[MemSys 2015][NCAM: Near-Data Processing for Nearest Neighbor Search]
#f03c15[MemSys 2015][Opportunities and Challenges of Performing Vector Operations inside the DRAM]
#f03c15[MemSys 2015][SIMT-based Logic Layers for Stacked DRAM Architectures: A Prototype]
#0abab5[DaMoN 2015][Beyond the Wall: Near-Data Processing for Databases]
#f03c15[ISCA 2016][DRAF: a low-power DRAM-based reconfigurable acceleration fabric]
FPGA style
#f03c15[HPCA 2016][Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM]
#f03c15[arXiv 2016][Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM]
#f03c15[TACO 2016][AIM: Energy-Efficient Aggregation Inside the Memory Hierarchy]
#f03c15[Micro 2017][Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology]
#f03c15[Micro 2017][DRISA: a DRAM-based Reconfigurable In-Situ Accelerator]
#f03c15[MemSys 2017][PHOENIX: Efficient Computation in Memory]
#f03c15[TVLSI 2017][Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares]
#c5f015[Micro 2018][SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator]
#f03c15[PACT 2018][In-DRAM Near-Data Approximate Acceleration for GPUs]
#c5f015[DAC 2018][DrAcc: a DRAM based accelerator for accurate CNN inference]
#c5f015[TCAD 2018][McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM]
#c5f015[TransPDS 2018][Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture]
#f03c15[arXiv 2018][The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR]

SRAM based

#f03c15[PACT 2014][SQRL: hardware accelerator for collecting software data structures]
#f4f442[Micro 2017][Cache automaton]
#f03c15[HPCA 2017][Compute Caches]
#c5f015[ISCA 2018][Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks]
#c5f015[arXiv 2018][Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays]

RRAM based

#ff66cc[MemSys 2016][Processing Acceleration with Resistive Memory-based Computation]
#c5f015[HPCA 2016][Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning]
#c5f015[ISCA 2016][PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory]
#c5f015[ISCA 2016][ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars]
#f03c15[ICCAD 2016][Reconfigurable In-Memory Computing with Resistive Memory Crossbar]
#f03c15[DAC 2016][Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories]
#c5f015[HPCA 2017][PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning]
#1589F0[HPCA 2017][GraphR: Accelerating Graph Processing Using ReRAM]
#c5f015[TCAD 2017][MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System]
#f03c15[CAL 2017][IMEC: A Fully Morphable In-Memory Computing Fabric Enabled by Resistive Crossbar]
RRAM FPGA
#f03c15[ICCAD 2017][RRAM-based Reconfigurable In-Memory Computing Architecture with Hybrid Routing]
RRAM FPGA
#c5f015[HPCA 2018][Making Memristive Neural Network Accelerators Reliable]
#f03c15[ISCA 2018][Enabling Scientific Computing on Memristive Accelerators]
#c5f015[ASPDAC 2018][ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks]
#c5f015[ASPDAC 2018][Training Low Bitwidth Convolutional Neural Network on RRAM]
RRAM training
#c5f015[JESTCS 2018][Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator]
RRAM training]
#c5f015[MICRO 2018][LerGAN: A Zero-free, Low Data Movement and PIM-based GAN Architecture]
#c5f015[IEEE Micro 2018][Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration]
#003366[TCSI 2018][IMAGING: In-Memory AlGorithms for Image processiNG]
#003366[ISCAS 2018][Efficient Algorithms for In-Memory Fixed Point Multiplication Using MAGIC]

PCRAM based

#f03c15[DAC 2015][ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing]

STT-RAM based

#ff66cc[ISCA 2013][AC-DIMM: Associative computing with STT-MRAM]
#c5f015[DAC 2018][CMP-PIM: An Energy-Efficient Comparator-based Processing-in-Memory Neural Network Accelerator]

System level researches

ISA / Compiler

#f03c15[HPCA 2001][Automatically mapping code on an intelligent memory architecture]
#f03c15[ICRC 2017][Generalize or Die: Operating Systems Support for Memristor-based Accelerators]
#f4f442[IPDPS 2017][Similarity Search on Automata Processors]
#f03c15[ASPLOS 2018][Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler]
#f03c15[ASPLOS 2018][Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support]
RRAM FPGA
#f03c15[DATE 2018][Prometheus: Processing-in-memory Heterogeneous Architecture Design From a Multi-layer Network Theoretic Strategy]
#f03c15[ISCA 2018][PROMISE: an end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms]

Firmware / Runtime / Middleware

#f03c15[SC 2002][Gilgamesh: A multithreaded processor-in-memory architecture for petaflops computing]
#f03c15[arXiv 2017][CODA: Enabling Co-location of Computation and Data for Near-Data Processing]
Platform: GPU + HBM(SM style)
Programming model: GPU programming model
Two ideas: (1) selectively localize data / scatter data; (2) thread-block and data co-location
Virtual Memomry assumption: This paper assumes SMs in the memory stack are equipped with a hardware TLB and memory management units (MMUs) that access page tables and are capable of performing virtual address translation.

Coherence / Consistence / Concurrency (atomicity) issues

#f03c15[CAL 2017][LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory]
Also works for NDP architecture
PIM kernel identification: programmer / compiler
#f03c15[TACO 2015][GP-SIMD Processing-in-Memory]
Coherence: restrict PIM processing logic to execute on only non-cacheable data, which forces cores within the CPU to read PIM data directly from DRAM.