Intelligent Storage and I/O Devices
As parallel computer architectures significantly shrinking the execution
time in compute kernels, the performance bottlenecks of applications shift
to the rest of part of execution, including data movement, object
deserialization/serialization as well as other software overheads in
managing data storage. To address this new bottleneck, the best approach is
to not move data and endow storage devices with new roles.
Morpheus is one of the very first research project that implements this
concept in real systems. We utilize existing, commericially available
hardware components to build the Morpheus-SSD. The Morpheus model not only
speeds up a set of heterogeneous computing applications by 1.32x, but also
allows these applications to better utilize emerging data transfer methods
that can send data directly to the GPU via peer-to-peer to further achieve
- Gunjae Koo, Kiran Kumar Matam, Te I, Hema Venkata Krishna Giri Narra,
Jing Li, Steven Swanson, Murali Annavaram, and Hung-Wei Tseng.
Summarizer: Trading Bandwidth with Computing Near
In 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO
- Yanqin Jin, Hung-Wei Tseng, Steven Swanson and Yannis Papakonstantinou.
KAML: A Flexible, High-Performance Key-Value
In 23rd International Symposium on High Performance Computer Architecture
(HPCA 2017). February 2017.
- Hung-Wei Tseng, Qianchen Zhao, Yuxiao Zhao, Mark Gahagan and Steven
Swanson. Morpheus: Creating Application Objects Efficiently for
In 43rd International Symposium on Computer Architecture (ISCA 2016). June 2016.
Building Efficient Heterogeneous Computers
As the discontinuation of Dannard scaling and Moore's Law, computers become heterogeneous. However, moving data among heterogeneous computing units and storage devices becomes an emerging bottleneck in these systems.
My research proposes the "Hippogriff" system that revisits the programming model of moving data in heterogeneous computer systems. Instead of using the conventional CPU-centric, programmer-specified methods, the
Hippogriff system simplifies the application interface and
provide a middle layer to efficiently handle the data movement. We also implemented peer-to-peer data transfer between the GPU and the SSD in the Hippogriff system.
The preliminary result demonstrates 46% performance gain by applying
Hippogriff to a set of rodinia GPU applications. For highly optimized GPU MapReduce framework,
Hippogriff still demonstrates up to 27% performance gain.
Jing Li, Hung-Wei Tseng, Chunbin Lin, Steven Swanson, and Yannis
Papakonstantinou. HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data
Analytics. Proceedings of VLDB Endowment, Volume 9(14), 2016.
Yang Liu, Hung-Wei Tseng, Mark Gahagan, Jing Li, Yanqin Jin and Steven
Swanson. Hippogriff: Efficiently Moving Data in Heterogeneous Computing
Systems. In 34th IEEE International Conference on Computer Design (ICCD
2016). Oct. 2016.
Yang Liu, Hung-Wei Tseng and Steven Swanson.
SPMario: Scale Up MapReduce with I/O-Oriented Scheduling for the GPU. In 34th IEEE International Conference on Computer Design (ICCD
2016). Oct. 2016.
- Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. Gullfoss: Accelerating and Simplifying Data Movement among Heterogeneous Computing and Storage Resources .
Department of Computer Science and Engineering, University of California, San Diego technical report technical report CS2015-1015, 2015.
Data-triggered threads (DTT) is a programming and execution model that
initiates computation only when the application changes memory content. This
model exposes new opportunities for parallelism and eliminates redundant,
In conventional architectures, 78% of all loads fetch redundant
data, leading to a high incidence of redundant computation. By expressing
computation through data-triggered threads, that computation is executed
once when the data changes, and is skipped whenever the data does not
change. The set of C SPEC benchmarks show performance speedup of up to 5.9X,
and averaging 46%; other benchmarks even higher.
This project examines
hardware-supported DTT, a software-only implementation, and
compiler-generated DTTs with no input from the programmer.
- Code release:
- Selected publications:
- Hung-Wei Tseng and Dean M. Tullsen. CDTT: Compiler-generated
data-triggered threads. In 20th International Symposium on High Performance
Computer Architecture (HPCA 2014). February 2014.
Hung-Wei Tseng and Dean M. Tullsen. Data-Triggered Multithreading for Near Data
Processing. In 1st Workshop on Near-Data Processing (WoNDP), Dec 2013.
- Hung-Wei Tseng and Dean M. Tullsen, Software data-triggered
threads. In Proceedings of ACM SIGPLAN Conference on
Object-Oriented Programming, Systems, Languages and
Applications, Tucson, Arizona, United States, Page(s): 703 - 716, October 2012.
- Hung-Wei Tseng and Dean M. Tullsen, Eliminating Redundant Computation and Exposing Parallelism through Data-Triggered Threads. IEEE Micro, Volume: 32 , Issue: 3, Page(s): 38 – 47, June 2012.
- Hung-Wei Tseng and Dean M. Tullsen.
Threads: Eliminating Redundant Computation. In Proceedings of 17th International Symposium on High Performance Computer
Architecture (HPCA-17), page 181-192, February, 2011. Nominated for Best Student Paper!
Characterizing flash memory for unstable power supply
The low power and high speed of flash memory make it popular in a wide range of
applications from the hand held to the data center. In all these applications,
system power loss/fade poses a serious danger to data in flash devices. If the flash
memory device loses power during a program or erase operation, the corruption
of meta data may cause the whole device become inoperable. To better
understand the behavior of flash memory when power fails, we use custom-built platform
and directly measure the
errors that cutting power or adjust supply voltages to flash chips during
operations can cause in this project.
- Selected publications:
- Other resources:
I am currently advising the following top-notch graduate students:
- Te I
- Stefan O'Neil
- Vaibhava Lakshmi
- Murtuza Taher Lokhandwala
- Prathamesh Pramod Bhatkar
- Yu-Ching Hu
- Hao Zhang
- Xindi Li
- Yu-Chia Liu
I also work with the following talented undergraduate students:
- Zackary Allen
- Alec Rohloff
I have also advised these students, who have each graduated:
Developing awesome ideas and training researchers are my duties as a
professor. I am always looking for new graduate students. If you are
interested at working with me, please apply
the Department of Computer Science
or the Department of Electrical and Computer Engineering
of NC State University and mention me as a
potential advisor in the application system.