Experience

Experience

Building high-performance, high-reliability network infrastructure for AI at cloud scale.

I work on networking benchmarks, infrastructure observability, resilient transport, and agentic infrastructure systems for large-scale AI and HPC clusters. My work spans production engineering, research collaboration, and practical systems design for cloud-scale infrastructure.

Software Engineer II · Azure HPC Team

Microsoft · Vancouver, Canada / Beijing, China

Sept. 2024 – Present

MRC and Resilient AI Supercomputer Networking

Network Benchmarking for AI/HPC Infrastructure

  • Work on networking benchmarks and deployment readiness for high-performance AI clusters.
  • Focus on topology-aware benchmarking across NVLink, multi-node NVLink, InfiniBand, and Ethernet fabrics.
  • Develop practical reliability and performance signals for large-scale cluster buildout and production readiness.

Agentic Platform for Infrastructure Workflows

  • Explore an Agentic Platform for AI infrastructure workflows, connecting LLM-driven agents with benchmark selection, infrastructure evaluation, and operational automation.
  • Build reliable interfaces between model capabilities and infrastructure engineering tasks.

Software Engineer · Azure HPC Team

Microsoft China · Beijing, China

July 2022 – Oct. 2024

SuperBench: Benchmarking and Topology-Aware Evaluation

  • Worked on the open-source SuperBench benchmarking framework for cloud AI infrastructure.
  • Focused on making benchmarking scalable, topology-aware, and useful for production readiness; related paper: USENIX ATC 2024 Best Paper.

Moneo: Observability and Performance Signals

  • Worked on the open-source Moneo observability stack for GPU, InfiniBand, and custom performance signals.
  • Helped turn low-level system metrics into actionable signals for anomaly detection and infrastructure optimization.

Software Engineer · Azure Networking Team

Microsoft China · Beijing, China

Oct. 2021 – June 2022

SAI Qualification for SONiC

  • Built validation infrastructure for multi-vendor SONiC/SAI qualification, improving confidence in switch behavior and cloud-scale networking interoperability.
  • Streamlined automated switch interface testing and release qualification workflows across vendor platforms for Azure networking.

Education

Nagoya University

Nagoya, Japan

M.Eng. in Electrical Engineering · GPA 3.91/4.0

Oct. 2019 – Sept. 2021

Supervisor: Prof. Hiroshi Hasegawa

Thesis: Resource Allocation in Elastic Optical Networks via Reinforcement Learning

B.Sc. in Electrical Information and Engineering · Major GPA 3.43/4.0

Sept. 2015 – July 2019

Supervisor: Dr. Qing Liu

Thesis: An Encoding-Free Genetic Algorithm for Topology Optimization

Skills

PythonC/C++RustGolangShellTorchSlurmInfiniBandMPINCCLMegatron-LMLLM Agents