Research

My research is around building efficient abstractions for computer software by optimizing across the stack.

Programming Languages Implementation and Its Architectural Support

My current research focus is around architectural support for automatic memory management in managed languages (also known as garbage collection). In my PhD research, I am designing a novel hardware architecture inside a RISC-V SoC for GC acceleration. My honours project investigated hardware transactional memory (ISMM’21) for concurrent copying GC.

Programming language implementation is the main theme of my research. Since 2017, I have been deeply involved in developing the next generation of the MMTk memory management framework. I also previously worked on the Mu micro virtual machine, developing a RPython JIT compiler on top of Mu. I contribute to open-source language implementations, including Chapel, JikesRVM, and OpenJDK.

Performance Analysis and Optimization

My research of building high-performance, low-level computer systems is underpinned by flexible performance analysis tools and sound evaluation methodology – “if you can’t measure it, you can’t improve it.” I currently work on incorporating tracing technologies (such as eBPF) in managed language runtimes (MPLR’23), revealing optimization opportunities missed by sampling and logging. Previously, my distillation methodology (ISPASS’22) exposed the substantial overheads incurred by production garbage collectors. I applied my knowledge in industrial-strength systems during my internships at Microsoft Research, Twitter, and Google.

I am a strong believer in reproducible science, and serve on the artifact evaluation committees for top-tier conferences. I also help maintain the DaCapo benchmark suite (ASPLOS’25) .

Computer Systems

I have a broad interest in computer systems, including operating systems, cyber security, and high-performance computing. During my internship at Microsoft Research, I used program synthesis to generate efficient implementations of parallel programming primitives by exploiting the accelerator topologies in the data center. Our implementations can outperform hand-optimized vendor libraries, and power distributed machine learning on Azure. Our paper was awarded the Best Paper Award at PPoPP’21.