Hung-Wei Tseng (曾宏偉)

Assistant Professor
  • Computer Science
  • Room 3254, Engineering Building II (890 Oval Drive), Campus Box 8206
  • Raleigh, NC 27695-8206
I am currently an assistant professor in the Department of Computer Science and the Department of Electrical and Computer Engineering, NC State University . I am now leading the Extreme Storage & Computer Architecture Laboratory. Prior to joining NCSU, I was a PostDoc of the Non-volatile Systems Laboratory and a lecturer of the Department of Computer Science and Engineering at University of California, San Diego with Professor Steven Swanson. My thesis work with Professor Dean Tullsen is data-triggered threads. This work was selected by IEEE Micro "Top Picks from Computer Architecture" in 2012.
 

Research Projects

Intelligent Storage and I/O Devices

As parallel computer architectures significantly shrinking the execution time in compute kernels, the performance bottlenecks of applications shift to the rest of part of execution, including data movement, object deserialization/serialization as well as other software overheads in managing data storage. To address this new bottleneck, the best approach is to not move data and endow storage devices with new roles.

Morpheus is one of the very first research project that implements this concept in real systems. We utilize existing, commericially available hardware components to build the Morpheus-SSD. The Morpheus model not only speeds up a set of heterogeneous computing applications by 1.32x, but also allows these applications to better utilize emerging data transfer methods that can send data directly to the GPU via peer-to-peer to further achieve 1.39x speedup.

Building Efficient Heterogeneous Computers

As the discontinuation of Dannard scaling and Moore's Law, computers become heterogeneous. However, moving data among heterogeneous computing units and storage devices becomes an emerging bottleneck in these systems.

My research proposes the "Hippogriff" system that revisits the programming model of moving data in heterogeneous computer systems. Instead of using the conventional CPU-centric, programmer-specified methods, the Hippogriff system simplifies the application interface and provide a middle layer to efficiently handle the data movement. We also implemented peer-to-peer data transfer between the GPU and the SSD in the Hippogriff system.

The preliminary result demonstrates 46% performance gain by applying Hippogriff to a set of rodinia GPU applications. For highly optimized GPU MapReduce framework, Hippogriff still demonstrates up to 27% performance gain.

  • Jing Li, Hung-Wei Tseng, Chunbin Lin, Steven Swanson, and Yannis Papakonstantinou. HippogriffDB: Balancing I/O and GPU Bandwidth in Big Data Analytics. Proceedings of VLDB Endowment, Volume 9(14), 2016.
  • Yang Liu, Hung-Wei Tseng, Mark Gahagan, Jing Li, Yanqin Jin and Steven Swanson. Hippogriff: Efficiently Moving Data in Heterogeneous Computing Systems. In 34th IEEE International Conference on Computer Design (ICCD 2016). Oct. 2016.
  • Yang Liu, Hung-Wei Tseng and Steven Swanson. SPMario: Scale Up MapReduce with I/O-Oriented Scheduling for the GPU. In 34th IEEE International Conference on Computer Design (ICCD 2016). Oct. 2016.
  • Hung-Wei Tseng, Yang Liu, Mark Gahagan, Jing Li, Yanqin Jin, and Steven Swanson. Gullfoss: Accelerating and Simplifying Data Movement among Heterogeneous Computing and Storage Resources . Department of Computer Science and Engineering, University of California, San Diego technical report technical report CS2015-1015, 2015.

Data-Triggered Threads

Data-triggered threads (DTT) is a programming and execution model that initiates computation only when the application changes memory content. This model exposes new opportunities for parallelism and eliminates redundant, unnecessary computation.

In conventional architectures, 78% of all loads fetch redundant data, leading to a high incidence of redundant computation. By expressing computation through data-triggered threads, that computation is executed once when the data changes, and is skipped whenever the data does not change. The set of C SPEC benchmarks show performance speedup of up to 5.9X, and averaging 46%; other benchmarks even higher.

This project examines hardware-supported DTT, a software-only implementation, and compiler-generated DTTs with no input from the programmer.

Characterizing flash memory for unstable power supply

The low power and high speed of flash memory make it popular in a wide range of applications from the hand held to the data center. In all these applications, system power loss/fade poses a serious danger to data in flash devices. If the flash memory device loses power during a program or erase operation, the corruption of meta data may cause the whole device become inoperable. To better understand the behavior of flash memory when power fails, we use custom-built platform and directly measure the errors that cutting power or adjust supply voltages to flash chips during operations can cause in this project.

 

Selected Publications

(Full listing)

Software

 

My Group

I am currently advising the following top-notch graduate students:
  • Te I
  • Stefan O'Neil
  • Vaibhava Lakshmi
  • Murtuza Taher Lokhandwala
  • Prathamesh Pramod Bhatkar
  • Yu-Ching Hu
  • Hao Zhang
  • Xindi Li
  • Yu-Chia Liu
I also work with the following talented undergraduate students:
  • Zackary Allen
  • Alec Rohloff
I have also advised these students, who have each graduated:
  • Joshua Okrend
Developing awesome ideas and training researchers are my duties as a professor. I am always looking for new graduate students. If you are interested at working with me, please apply to either the Department of Computer Science or the Department of Electrical and Computer Engineering of NC State University and mention me as a potential advisor in the application system.