Application Scenario Marker
- General Purpose
- Neural Network
- Graph Processing
- Bioinformatics
- Data Analytics
- Associative Computing
- Automata Computing
- Data Manipulation
- Security
- Others
[ISSCC 1997][Intelligent RAM (IRAM): chips that remember and compute]
[GLSVLSI 2005][PIM lite: A multithreaded processor-in-memory prototype]
[ICASSP 2014][An Energy-Efficient VLSI Architecture for Pattern Recognition via Deep Embedding of Computation in SRAM]
[VLSI 2016][A machine-learning classifier implemented in a standard 6T SRAM array]
[ISSCC 2018][A 65nm 4Kb Algorithm-Dependent Computing-in-Memory SRAM Unit-Macro with 2.3ns and 55.8TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors]
[ISSCC 2018][Conv-RAM: An Energy-Efficient SRAM with Embedded Convolution Computation for Low-Power CNN-Based Machine Learning Applications]
[JSSC 2018][A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V VDDmin]
[DAC 2108][Parallelizing SRAM arrays with customized bit-cell for binary neural networks]
[nature 2015][Training and operation of an integrated neuromorphic network based on metal-oxide memristors]
[ASP-DAC 2017][MPIM: Multi-purpose in-memory processing using configurable resistive memory]
[IEEE Electron Device Letters 2018][Reconfigurable Boolean Logic in Memristive Crossbar: the Principle and Implementation]
[TED 2015][Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element]
[ISCAS 2014][Spin-Transfer Torque Magnetic Memory as a Stochastic Memristive Synapse]
[ISVLSI 2017][Hybrid Polymorphic Logic Gate with 5-Terminal Magnetic Domain Wall Motion Device]
[ASP-DAC 2018][HieIM: Highly Flexible In-Memory Computing using STT MRAM]
[IEEE Transactions on Magnetics 2017][In-Memory Processing Paradigm for Bitwise Logic Operations in STT–MRAM]
[DATE 2018][Computing-in-memory with spintronics]
[IEEE Micro 1976][A case for intelligent RAM]
[Computer 1995][Processing in Memory: The Terasys Massively Parallel PlM Array]
[Frontiers 1996][Pursuing a Petaflop: Point Designs for 100 TF Computers Using PIM Technologies]
[Computer 1997][Scalable Processors in the Billion-Transistor Era: IRAM]
[ISCA 1997][Processing in memory: Chips to petaflops]
[ASPDAC 2018][PIMCH: cooperative memory prefetching in processing-in-memory architecture]
[CICC 1992][Computational RAM: A Memory-SIMD Hybrid and Its Application to DSP]
Arch: add logic within DRAM to perform vector operations
[ICPP 1994][EXECUBE-A New Architecture for Scaleable MPPs]
Arch: add logic within DRAM to perform vector operations
[IEEE Computer 1995][Processing in memory: The Terasys massively parallel PIM array]
Arch: add logic within DRAM to perform vector operations
[IEEE Micro 1997][A Case for Intelligent RAM]
Arch: add logic within DRAM to perform vector operations
[ICCD 1997][Intelligent RAM (IRAM): the industrial setting, applications, and architectures]
Arch: add logic within DRAM to perform vector operations
[ISCA 1998][Active Pages: A Computation Model for Intelligent Memory]
[SC 1999][Mapping Irregular Applications to DIVA, a PIM based Data-Intensive Architecture]
[MTDT 1999][The Dynamic Associative Access Memory Chip and its Application to SIMD Processing and Full-text Database Retrieval]
[IEEE DT 1999][Computational RAM: Implementing processors in memory]
[ISCA 2000][Smart Memories: A Modular Reconfigurable Architecture]
[IPDPS 2002][Memory-intensive benchmarks: IRAM vs. cache-based machines]
[ICS 2002][The Architecture of the DIVA Processing-In-Memory Chip]
[ICCD 2012][FlexRAM: Toward an Advanced Intelligent Memory System]
[Micro 2013][RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization]
[HPEC 2013][Accelerating Sparse Matrix-Matrix Multiplication with 3D-Stacked Logic-in-Memory Hardware]
[SIGMOD 2015][JAFAR: Near-Data Processing for Databases]
[CAL 2015][Fast Bulk Bitwise AND and OR in DRAM]
[MemSys 2015][NCAM: Near-Data Processing for Nearest Neighbor Search]
[MemSys 2015][Opportunities and Challenges of Performing Vector Operations inside the DRAM]
[MemSys 2015][SIMT-based Logic Layers for Stacked DRAM Architectures: A Prototype]
[DaMoN 2015][Beyond the Wall: Near-Data Processing for Databases]
[ISCA 2016][DRAF: a low-power DRAM-based reconfigurable acceleration fabric]
FPGA style
[HPCA 2016][Low-Cost Inter-Linked Subarrays (LISA): Enabling Fast Inter-Subarray Data Movement in DRAM]
[arXiv 2016][Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM]
[TACO 2016][AIM: Energy-Efficient Aggregation Inside the Memory Hierarchy]
[Micro 2017][Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology]
[Micro 2017][DRISA: a DRAM-based Reconfigurable In-Situ Accelerator]
[MemSys 2017][PHOENIX: Efficient Computation in Memory]
[TVLSI 2017][Excavating the Hidden Parallelism Inside DRAM Architectures With Buffered Compares]
[Micro 2018][SCOPE: A Stochastic Computing Engine for DRAM-based In-situ Accelerator]
[PACT 2018][In-DRAM Near-Data Approximate Acceleration for GPUs]
[DAC 2018][DrAcc: a DRAM based accelerator for accurate CNN inference]
[TCAD 2018][McDRAM: Low Latency and Energy-Efficient Matrix Computations in DRAM]
[TransPDS 2018][Exploiting Parallelism for CNN Applications on 3D Stacked Processing-In-Memory Architecture]
[arXiv 2018][The processing using memory paradigm: In-DRAM bulk copy, initialization, bitwise AND and OR]
[PACT 2014][SQRL: hardware accelerator for collecting software data structures]
[Micro 2017][Cache automaton]
[HPCA 2017][Compute Caches]
[ISCA 2018][Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks]
[arXiv 2018][Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays]
[MemSys 2016][Processing Acceleration with Resistive Memory-based Computation]
[HPCA 2016][Memristive Boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning]
[ISCA 2016][PRIME: a novel processing-in-memory architecture for neural network computation in ReRAM-based main memory]
[ISCA 2016][ISAAC: a convolutional neural network accelerator with in-situ analog arithmetic in crossbars]
[ICCAD 2016][Reconfigurable In-Memory Computing with Resistive Memory Crossbar]
[DAC 2016][Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-Volatile Memories]
[HPCA 2017][PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning]
[HPCA 2017][GraphR: Accelerating Graph Processing Using ReRAM]
[TCAD 2017][MNSIM: Simulation Platform for Memristor-Based Neuromorphic Computing System]
[CAL 2017][IMEC: A Fully Morphable In-Memory Computing Fabric Enabled by Resistive Crossbar]
RRAM FPGA
[ICCAD 2017][RRAM-based Reconfigurable In-Memory Computing Architecture with Hybrid Routing]
RRAM FPGA
[HPCA 2018][Making Memristive Neural Network Accelerators Reliable]
[ISCA 2018][Enabling Scientific Computing on Memristive Accelerators]
[ASPDAC 2018][ReGAN: A pipelined ReRAM-based accelerator for generative adversarial networks]
[ASPDAC 2018][Training Low Bitwidth Convolutional Neural Network on RRAM]
RRAM training
[JESTCS 2018][Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator]
RRAM training]
[MICRO 2018][LerGAN: A Zero-free, Low Data Movement and PIM-based GAN Architecture]
[IEEE Micro 2018][Newton: Gravitating Towards the Physical Limits of Crossbar Acceleration]
[TCSI 2018][IMAGING: In-Memory AlGorithms for Image processiNG]
[ISCAS 2018][Efficient Algorithms for In-Memory Fixed Point Multiplication Using MAGIC]
[DAC 2015][ProPRAM: exploiting the transparent logic resources in non-volatile memory for near data computing]
[ISCA 2013][AC-DIMM: Associative computing with STT-MRAM]
[DAC 2018][CMP-PIM: An Energy-Efficient Comparator-based Processing-in-Memory Neural Network Accelerator]
[HPCA 2001][Automatically mapping code on an intelligent memory architecture]
[ICRC 2017][Generalize or Die: Operating Systems Support for Memristor-based Accelerators]
[IPDPS 2017][Similarity Search on Automata Processors]
[ASPLOS 2018][Bridge the Gap between Neural Networks and Neuromorphic Hardware with a Neural Network Compiler]
[ASPLOS 2018][Liquid Silicon-Monona: A Reconfigurable Memory-Oriented Computing Fabric with Scalable Multi-Context Support]
RRAM FPGA
[DATE 2018][Prometheus: Processing-in-memory Heterogeneous Architecture Design From a Multi-layer Network Theoretic Strategy]
[ISCA 2018][PROMISE: an end-to-end design of a programmable mixed-signal accelerator for machine-learning algorithms]
[SC 2002][Gilgamesh: A multithreaded processor-in-memory architecture for petaflops computing]
[arXiv 2017][CODA: Enabling Co-location of Computation and Data for Near-Data Processing]
Platform: GPU + HBM(SM style)
Programming model: GPU programming model
Two ideas: (1) selectively localize data / scatter data; (2) thread-block and data co-location
Virtual Memomry assumption: This paper assumes SMs in the memory stack are equipped with a hardware TLB and memory management units (MMUs) that access page tables and are capable of performing virtual address translation.
[CAL 2017][LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory]
Also works for NDP architecture
PIM kernel identification: programmer / compiler
[TACO 2015][GP-SIMD Processing-in-Memory]
Coherence: restrict PIM processing logic to execute on only non-cacheable data, which forces cores within the CPU to read PIM data directly from DRAM.