The Third International Workshop on
Domain Specific System Architecture (DOSSA-3)

The main theme of this year:
HW/SW Components for Domain Specific Systems

Seoul, Korea(remote), Feb 28, 2021


In conjunction with the 27th IEEE International Symposium on High-Performance Computer Architecture (HPCA-27)

Workshop Program

5:00 pm - 5:10 pm Feb 28 EST (7:00 am - 7:10 am March 1st in Seoul)
Workshop Introduction

5:10 pm - 5:50 pm EST (7:10 am - 7:50 am in Seoul)Invited Talk I
Dong Hyuk Woo, Software engineer at Google
"Challenges with On-Device ML at Scale"

5:50 pm - 6:35 pm EST (7:50 am - 8:35 am in Seoul)Invited Talk II
He Xiao, Cadence Design Systems, Inc.
"The eTIE Language Extension - Agile for Domain-Specific Accelerator Design"

6:35 pm - 7:00 pm EST (8:35 am - 9:00 am in Seoul)Break time

7:00 pm - 7:45 pm EST (9:00 am - 9:45 am in Seoul)Invited Talk III
Daehyun Kim, Vice President at Samsung Electronics
"On-Device AI for Mobile and Consumer Devices: From DNN Model Compressions to Domain-Specific Accelerators"

7:45 pm - 8:15 pm EST (9:45 am - 10:15 am in Seoul) Paper I
Augusto Vega(IBM T. J. Watson Research Center), John-David Wellman(IBM T. J. Watson Research Center), Hubertus Franke(IBM T. J. Watson Research Center), Alper Buyuktosunoglu(IBM T. J. Watson Research Center), Pradip Bose(IBM T. J. Watson Research Center), Aporva Amarnath(University of Michigan), Hiwot Kassa(University of Michigan), Subhankar Pal(University of Michigan), Ronald Dreslinski(University of Michigan)
"STOMP: Agile Evaluation of Scheduling Policies in Heterogeneous Multi-Processors"
(paper) (slide)

8:15 pm - 8:45 pm EST (10:15 am - 10:45 am in Seoul)Paper II
Pradip Bose(IBM T. J. Watson Research Center), Augusto Vega(IBM T. J. Watson Research Center), Sarita Adve(University of Illinois at Urbana-Champaign (UIUC)), Vikram Adve(University of Illinois at Urbana-Champaign (UIUC)), Sasa Misailovic(University of Illinois at Urbana-Champaign (UIUC)), Luca Carloni(Columbia University), Ken Shepard(Columbia University), David Brooks(Harvard University), Vijay Janapa Reddi(Harvard University), Gu-Yeon Wei(Harvard University)
"Secure and Resilient SoCs for Autonomous Vehicles"

8:45 pm - 8:50 pm EST (10:45 am - 10:50 am in Seoul)Closing


Domain specific systems are an increasingly important computing environment for many people and businesses. As the information technologies emerges into various real world applications such as autonomous driving, IoT (Internet of Things), CPS (Cyber physical systems) and health care applications in the 4th industrial revolution era, interest in the specialized domain specific computing systems are increasing significantly. In addition to the conventional computing platforms, domain specific computing systems have a lot of design challenges including specialized hardware components like hardware accelerator, optimized library and domain specific languages. This workshop focuses on domain specific system design in both hardware and software aspects and their interaction in order to improve the availability and efficiency in the emerging real world applications. The main theme of this workshop in this year is the HW/SW components for domain specific systems. Topics of particular interest include, but are not limited to:

Application analysis and workload characterization to design domain specific system for emerging applications, such as autonomous driving, IoT and health care applications.
Domain specific processor/system architectures and hardware features for domain specific systems;
Hardware accelerators for domain specific systems;
Storage architectures for domain specific systems;
Experiences in domain specific system development;
Novel techniques to improve responsiveness by exploiting domain specific systems;
Novel techniques to improve performance/energy for domain specific systems;
Domain specific systems performance evaluation methodologies;
Application benchmarks for domain specific systems;
Enabling technologies for domain specific systems (smart edge devices, smart sensors, energy harvesting, sensor networks, sensor fusion etc.);

The workshop aims at providing a forum for researchers, engineers and students from academia and industry to discuss their latest research in designing domain specific system for various emerging application areas in 4th industrial revolution era to bring their ideas and research problems to the attention of others, and to obtain valuable and instant feedback from fellow researchers. One of the goals of the workshop is to facilitate lively and rigorous–yet friendly–discussion about the research problems in the architecture, implementation, networking, and programming and thus pave the way to novel solutions that improve both hardware and software of future domain specific systems.

Invited Talk I

- Speaker : Dong Hyuk Woo, Software engineer at Google

- Talk Title : Challenges with On-Device ML at Scale

- Abstract :
   Thanks to many good reasons, we are seeing more and more machine learning workload moving to mobile and embedded devices. In this presentation, we will share various challenges in deploying the state-of-the-art machine learning research outcomes into mobile and embedded products at scale, in our computing ecosystem and discuss what we need to do as a community to address these challenges.

- Bio :
   Dong Hyuk Woo is a software engineer at Google. He is one of co-founders of the Edge TPU program at Google and currently leading Edge TPU compilation efforts. Previously, he worked as an architect for Google's cloud TPU (the first training-capable TPU), an architect in Intel's exascale computing team, and a research scientist at Intel Labs. He received his BS from Seoul National University, and his MS and Ph.D. from Georgia Institute of Technology.

Invited Talk II

- Speaker : He Xiao, Cadence Design Systems, Inc.

- Talk Title : The eTIE Language Extension - Agile for Domain-Specific Accelerator Design

- Abstract :
   With the emergence of various application areas such as graphics, bioinformatics, deep learning, and many others, the design of domain-specific accelerators is becoming pronounced to achieve magnitude improvements in performance, cost, and power efficiency compared to the generalpurpose processors. The traditional design process for such high-performance hardware may require long design and verification cycles, and cannot keep pace with the growing requirements and fast design iterations of the hardware accelerators nowadays. Therefore, we present a novel external Tensilica Instruction Extension (eTIE) language for the domain-specific accelerators to improve the design flexibility and reduce the time-to-market. The easy-to-use eTIE language is similar to Verilog and will generate both the RTL hardware and supporting software libraries used by the compiler and simulator. With eTIE, users can enhance their design by further optimizing the Cadence Xtensa processors using new instructions and additional hardware accelerators for a particular domain of applications. In this workshop, we present the use of the eTIE language to design a rotation-based coordinate rotation digital computer (CODIC) in parallel with the Cadence Xtensa DSP core and evaluate the benefits brought by this novel design methodology.

- Bio :
   Dr. He XIAO graduated from Georgia Institute of Technology and his Ph.D. research focused on characterizing and simulating the physical effects on multi-core microarchitectures using 3DIC technology, as well as exploring adaptive architectures based on the Multiphysics analysis. His research interests include computer architecture, low power design, programming models and compiler optimization.
   Dr. XIAO currently works in IPG group at Cadence Design Systems, Inc., where he is responsible for designing and enhancing the compiler toolset for Tensilica Instruction Extension (TIE), a proprietary language that allows the users to extend the Tensilica DSP processor with custom instructions and coprocessors, to support high-performance DSP design and to explore novel architectures for application-specific acceleration.

Invited Talk III

- Speaker : Daehyun Kim, Vice President at Samsung Electronics

- Talk Title : On-Device AI for Mobile and Consumer Devices: From DNN Model Compressions to Domain-Specific Accelerators

- Abstract :
  Though key AI algorithms were originally developed in decades ago, it was big data availability and hardware acceleration that really made AI so successful today. Its great successes in computer vision and speech recognition encouraged many researchers to apply AI to almost every area of science and engineering. Further, thanks to advances in model compression technologies and hardware accelerators, we are even enabling AI on low-end electronics such as washing machines.
  With on-device AI, we aim to deliver transparent AI experience on user devices without connecting to cloud servers. To enable AI functionalities on mobile and consumer devices, a system-level holistic optimization from algorithm to chip is necessary to overcome AI application’s compute, memory and power demands. We have to develop small but accurate DNN models, hardware accelerators to run the models, and runtime and compiler to manage the accelerators efficiently.
  I would like to introduce our efforts at Samsung for on-device AI systems. First, I will overview the on-device AI platform we are developing for our devices. We try to address three key technology components --- model compression, AI system SW, and AI HW accelerator. Then, I will discuss the details of our researches for each item. For model compression, we developed a fractional quantization method called FleXOR to achieve sub 1-bit compression ratio. We also developed a binary-coding based quantization method and the corresponding matrix multiplication library called BiQGEMM. For AI system SW, we are building nnStreamer that applies the stream processing paradigm to neural network processing. And, we started a light-weight training framework called nnTrainer to make on-device learning feasible for embedded devices. For AI HW accelerator, I would like discuss one of our domain-specific accelerator designs. We designed a streaming line processing architecture that enables 4K 60fps super resolution video processing (11.5 TOPS at 1 GHz). Finally, I will conclude the talk by discussing future works.

- Bio :
   Daehyun Kim is a vice president at Samsung Electronics, leading the On-Device Lab of Samsung Research. Before joining Samsung, he has been at Google and Intel working on microprocessor architecture and workload optimization for 15 years since he received the Ph.D. from Cornell University.
   His research interests include on-device AI, machine learning accelerators, mobile SoC architecture, high performance computing, intelligent memory systems, and workload analysis. Based on workload-driven architecture design, he has done path-finding research for Samsung’s on-device AI platform, Google’s Android & ChromeOS products, and Intel’s Xeon Phi coprocessors.


Submit a 2-page presentation abstract to a web-based submission system ( by Feb. 1, 2021. Notification of acceptance will be sent out by Feb. 8, 2021. Final paper and presentation material (to be posted on the workshop web site) due Feb. 22, 2021. For additional information regarding paper submissions, please contact the organizers.


Abstract submission : Feb. 1, 2021
Author notification : Feb. 8, 2021
Final camera-ready paper : Feb. 22, 2021
Workshop : Feb. 28, 2021

Workshop Organizers

Hyesoon Kim, Georgia Tech (
Giho Park, Sejong Univ. (

Web Chair

Minkwan Kee, Sejong Univ. (
Chiwon Han, Sejong Univ. (