Fourth International Workshop on
Domain Specific System Architecture (DOSSA-4)

The main theme of this year:
HW/SW Components for Domain Specific Systems

Seoul, Korea, April 2 (Sat 17:00- 21:00 EDT, Sun 06:00 - 10:00 KST), Virtual


In conjunction with the 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA-28)

Workshop Program

5:00 pm - 5:05 pm (EDT, 6:00 am - 6:05 am in Seoul)
Workshop Introduction

5:05 pm - 5:40 pm (EDT, 6:05 am - 6:40 am in Seoul)Invited Talk I
Rob Schreiber, Technical Fellow, Cerebras
"Wafer-Scale Processors for AI and HPC"

5:40 pm - 6:15 pm (EDT, 6:40 am - 7:15 am in Seoul)Invited Talk II
Vijay Janapa Reddi, Harvard University
"Tiny Machine Learning (TinyML) for Domain-Specific Systems"

6:15 pm - 6:30 pm (EDT, 7:15 am - 7:30 am in Seoul)Paper I
Hengyu Zhao (University of California, San Diego), Haolan Liu (UC San Diego), Pingfan Meng (, Yubo Zhang (, Michael Wu (, Tiancheng Lou (, Jishen Zhao (UCSD)
"Leveraging Routing Information to Enable Efficient Memory System Management of ScanMatch in Autonomous Vehicles"
(slide) (paper)

6:30 pm - 6:45 pm (EDT, 7:30 am - 7:45 am in Seoul)Paper II
Minho Ha (SK hynix)*, Jungmin Choi (SK hynix), Donguk Moon (SK hynix), Taeyoung Ahn (SK hynix), Joonseop Sim (SK hynix), Byungil Koh (SK hynix), Euicheol Lim (SK hynix), Kyoung Park (SK hynix)
"Accelerating Data Analytics near Memory:A k-NN Search Case Study"
(slide) (paper)

6:45 pm - 6:55 pm (EDT, 7:45 am - 7:55 am in Seoul)Break time

6:55 pm - 7:30 pm (EDT, 7:55 am - 8:30 am in Seoul)Invited Talk III
Moo-Kyoung Chung, Chief Technology Officer, SAPEON Korea
"Introduction to SAPEON, an Inference Specific AI Accelerator and its Use Cases from Cloud to Automotive"

7:30 pm - 7:45 pm (EDT, 8:30 am - 8:45 am in Seoul) Paper III
Hanqiu Chen (Nanjing University)*, Cong Hao (Georgia Institute of Technology)
"Mask-Net: A Hardware-efficient Object Detection Network with Masked Region Proposals"
(slide) (paper)

7:45 pm - 8:00 pm (EDT, 8:45 am - 9:00 am in Seoul) Paper IV
Rishov R. Sarkar (Georgia Institute of Technology)*; Cong Hao (Georgia Institute of Technology)
"A Generic FPGA Accelerator Framework for Ultra-Fast GNN Inference"
(slide) (paper)

8:00 pm - 8:15 pm (EDT, 9:00 am - 9:15 am in Seoul) Paper V
Sam Jijina (Georgia Institute of Technology)*; Ramyad Hadidi (Georiga Tech); Jun Chen (Georgia Tech); Zhen Jiang (Georgia Institute of Technology); Ashutosh Dhekne (Georgia Institute of Technology); Hyesoon Kim (Georgia Tech)
"DynaaDCP: Dynamic Navigation of Autonomous Agents for Distributed Capture Processing"
(slide) (paper)

8:15 pm - 8:50 pm (EDT, 9:15 am - 9:50 am in Seoul)Invited Talk IV
Euicheol Lim , Research Fellow, SK Hynix
"PIM and various Computational Memory Solutions"

8:50 pm - 8:55 pm (EDT, 9:50 am - 9:55 am in Seoul)Closing


Domain specific systems are an increasingly important computing environment for many people and businesses. As the information technologies emerges into various real world applications such as autonomous driving, IoT (Innternet of Things), CPS (Cyber physical systems) and health care applications in the 4th industrial revolution era, interest in the specialized domain specific computing systems are increasing significantly. In addition to the conventional computing platforms, domain specific computing systems have a lot of design challenges including specialized hardware components like hardware accelerator, optimized library and domain specific languages. This workshop focuses on domain specific system design in both hardware and software aspects and their interaction in order to improve the availability and efficiency in the emerging real world applications. The main theme of this workshop in this year is the HW/SW components for domain specific systems. Topics of particular interest include, but are not limited to:

Application analysis and workload characterization to design domain specific system for emerging applications, such as autonomous driving, IoT and health care applications.
Domain specific processor/system architectures and hardware features for domain specific systems;
Hardware accelerators for domain specific systems;
Storage architectures for domain specific systems;
Experiences in domain specific system development;
Novel techniques to improve responsiveness by exploiting domain specific systems;
Novel techniques to improve performance/energy for domain specific systems;
Domain specific systems performance evaluation methodologies;
Application benchmarks for domain specific systems;
Enabling technologies for domain specific systems (smart edge devices, smart sensors, energy harvesting, sensor networks, sensor fusion etc.);

The workshop aims at providing a forum for researchers, engineers and students from academia and industry to discuss their latest research in designing domain specific system for various emerging application areas in 4th industrial revolution era to bring their ideas and research problems to the attention of others, and to obtain valuable and instant feedback from fellow researchers. One of the goals of the workshop is to facilitate lively and rigorous–yet friendly–discussion about the research problems in the architecture, implementation, networking, and programming and thus pave the way to novel solutions that improve both hardware and software of future domain specific systems

Invited Talk I

- Speaker : Rob Schreiber, Techinical Fellow, Cerebras

- Talk Title : Wafer-Scale Processors for AI and HPC

- Abstract :
   Dennard scaling is over, and Moore’s Law is coming to an end. Now, with the rise of deep learning and a host of beyond-exascale challenges in HPC, we need more performance. To extend the growth of performance in HPC, architectural specialization has returned, providing a significant boost for computational science. At the hardware level, the development by Cerebras Systems of a viable wafer-scale compute platform opens new opportunities.
   Cerebras has created the first commercial wafer-scale computer. The CS-2, with a 7nm wafer having over 800,000 powerful processors and an equally powerful memory and interconnect, is the second generation system. Because single-wafer integration provides remarkable memory and interconnect bandwidth and latency, wafer-scale processing has breakthrough implications across high-performance use cases, from inference and training in deep neural networks, to computational chemistry, to computational physics. The memory and communication walls in single-chip processors impose delay when off-processor-chip data is needed. By changing the scale of the chip by two orders of magnitude, we pack a small, powerful, mini-supercomputer on one piece of silicon, and eliminate much of the off-chip traffic for applications that can fit in the available memory. The elimination of most off-chip communication also cuts the power per unit performance, a key performance determining parameter.
   For training the emerging large-scale neural networks, the use of the wafer-scale processor means that less scale out and smaller batches are needed, improving usability and efficiency and reducing the pressure on the data storage service. For HPC use cases like finite difference methods, Fourier methods, dense matrices, and examples involving irregular and dynamic interprocessor communication, we are demonstrating supercomputer performance levels or better at kilowatt rather than megawatt power and single rack rather than datacenter cost. The fine-grained nature of the architecture allows strong scaling and reduces time to solution rather than boosting flops for enormous problems.
   The future holds great promise for the wafer-scale approach, and I will sketch some areas for extending our basic technology with innovations at the hardware and the architecture level.

- Bio :
    Rob Schreiber is a technical fellow at Cerebras Systems, Inc., where he works on architecture and programming of systems for AI and for computational science. Before Cerebras he taught at Stanford and RPI and worked at NASA, at startups, and at HP. Schreiber’s research spans sequential and parallel algorithms for matrix computation, compiler optimization for parallel languages, and high performance computer design. With Moler and Gilbert, he developed the sparse matrix extension of Matlab. He created the NAS CG parallel benchmark. He was a designer of the High Performance Fortran language. Rob led the development at HP of a system for synthesis of custom hardware accelerators. He has help pioneer the exploitation of photonic signaling in processors and networks. He is an ACM Fellow, a SIAM Fellow, and was awarded, in 2012, the Career Prize from the SIAM Activity Group in Supercomputing.

Invited Talk II

- Speaker : Vijay Janapa Reddi, Harvard University

- Talk Title : Tiny Machine Learning (TinyML) for Domain-Specific Systems

- Abstract :
   Tiny machine learning (TinyML) is a fast-growing and emerging field at the intersection of machine learning (ML) algorithms and low-cost embedded systems. It enables on-device sensor data analysis for vision, audio, IMU, etc., at ultra-low-power consumption. Moving machine learning computing close to the sensor(s) allows for an expansive new variety of always-on ML use-cases aptly suited for domain-specific computing. This talk introduces the vision behind TinyML and focuses on some exciting new domain-specific applications that TinyML is enabling for low-cost IoT solutions. Although TinyML has rich possibilities, there are still numerous technical challenges. Tight onboard processor, memory and storage constraints, coupled with embedded software fragmentation, and a lack of relevant large-scale sensor datasets and benchmarks pose a substantial barrier to developing novel applications. The talk touches upon the myriad research opportunities for unlocking the full potential of this emerging field, spanning from algorithm design to automatic hardware synthesis for TinyML.

- Bio :
    Vijay Janapa Reddi is an Associate Professor at Harvard University, VP and a founding member of MLCommons (, a nonprofit organization aiming to accelerate machine learning (ML) innovation for everyone. He also serves on the MLCommons board of directors and is a Co-Chair of the MLCommons Research organization. He co-led the MLPerf Inference ML benchmark for datacenter, edge, mobile and IoT systems. Before joining Harvard, he was an Associate Professor at The University of Texas at Austin in the Electrical and Computer Engineering department. His research sits at the intersection of machine learning, computer architecture and system software. He specializes in building computing systems for tiny IoT devices, as well as mobile and edge computing. Dr. Janapa-Reddi is a recipient of multiple honors and awards, including the National Academy of Engineering (NAE) Gilbreth Lecturer Honor (2016), IEEE TCCA Young Computer Architect Award (2016), Intel Early Career Award (2013), Google Faculty Research Awards (2012, 2013, 2015, 2017, 2020), Best Papers at the 2020 Design Automation Conference (DAC), 2005 International Symposium on Microarchitecture (MICRO), 2009 International Symposium on High-Performance Computer Architecture (HPCA), IEEE’s Top Picks in Computer Architecture honorable mentions & awards (2006, 2010, 2011, 2016, 2017, 2021). He has been inducted into the MICRO and HPCA Hall of Fame (in 2018 and 2019, respectively). He is passionate about widening access to applied machine learning for STEM, Diversity, and using AI for social good. He designed the Tiny Machine Learning (TinyML) series on edX, a massive open online course (MOOC) that sits at the intersection of embedded systems and ML that thousands of global learners can access and audit free of cost. He was also responsible for the Austin Hands-on Computer Science (HaCS) deployed in the Austin Independent School District for K-12 CS education. Dr. Janapa-Reddi received a Ph.D. in computer science from Harvard University, an M.S. from the University of Colorado at Boulder and a B.S from Santa Clara University.

Invited Talk III

- Speaker : Moo-Kyoung Chung, Chief Technology Officer, SAPEON Korea

- Talk Title : Introduction to SAPEON, an Inference Specific AI Accelerator and its Use Cases from Cloud to Automotive

- Abstract :
    As artificial intelligence is used across industries and in our daily lives, many domain-specific AI accelerators are being launched on the market in a wide range from large-scale datacenters to small IoT devices. This presentation introduces SAPEON, a low-latency and high-throughput AI inference accelerator and its use cases from edge datacenters and autonomous vehicles for real-time AI services.

- Bio :
    MK Chung is the Chief Technology Officer at SAPEON Korea, a semiconductor company with high-performance AI chip products. As CTO, he leads the R&D center and is responsible for product development and research on SAPEON’s technical innovation. Previously, MK worked as an architect, hardware engineer, and system software engineer for GPU and DSP at SAMSUNG and ETRI. He received his BS from Korea University and his MS and Ph.D. from KAIST.

Invited Talk IV

- Speaker : Euicheol Lim, Research Fellow ,SK Hynix

- Talk Title : PIM and various Computational Memory Solutions

- Abstract :
    Currently, IT services are mainly based on big data and AI, and in order to efficiently support them, cloud services such as PaaS are developing. Big data and AI, the main applications, both require more memory capacity and the higher memory performance compared to existing applications, so there are opportunities for new memory solutions in the cloud system. The computational memory that can reduce data movement between the computing unit and the memory, which consumes most of the energy of the modern computing system, and can utilize the higher BW in the memory, is one of the good examples. In this talk, we will briefly introduce PIM recently announced by Sk hynix and talk about various other computational memory approaches. In particular, we will mainly deal with what workload or what system this memory solution can be applied to, and also try to explain what problems to solve for PIM solution to the actual system.

- Bio :
    Eui-cheol Lim is a Research Fellow and leader of Memory Solution Product Development team in SK Hynix. He received the B.S. degree and the M.S. degree from Yonsei University, Seoul, Korea, in 1993 and 1995, and the Ph.D. degree from Sungkyunkwan University, suwon, Korea in 2006. Dr.Lim joined SK Hynix in 2016 as a system architect in memory system R&D. Before joining SK Hynix, he had been working as an SoC architect in Samsung Electronics and leading the architecture of most Exynos mobile SoC. His recent interesting points are memory and storage system architecture with new media memory and new memory solution such as CXL memory and Processing in Memory.


Submit a 2‐page presentation abstract to a web‐based submission system ( by Mar. 7, 2022. Notification of acceptance will be sent out by Mar. 17, 2022. Final paper and presentation material (to be posted on the workshop web site) due Mar. 27, 2022. For additional information regarding paper submissions, please contact the organizers.


Abstract submission : Mar.7, 2022
Author notification : Mar. 17, 2022
Final camera-ready paper : Mar. 27, 2022
Workshop : April. 2, 2022

Workshop Organizers

Hyesoon Kim, Georgia Tech (
Giho Park, Sejong Univ. (

Web Chair

Chiwon Han, Sejong Univ. (
Sungwun Bae, Sejong Univ. (