Experience
Experience
Building high-performance, high-reliability network infrastructure for AI at cloud scale.
I work on networking benchmarks, infrastructure observability, resilient transport, and agentic infrastructure systems for large-scale AI and HPC clusters. My work spans production engineering, research collaboration, and practical systems design for cloud-scale infrastructure.

Software Engineer II · Azure HPC Team
Microsoft · Vancouver, Canada / Beijing, China
Sept. 2024 – Present MRC and Resilient AI Supercomputer Networking
- Work on high-performance, high-reliability network infrastructure for large AI clusters, including MRC and SRv6-based resilient networking.
- Papers:
- Related writeups: Microsoft, OpenAI, AMD, Broadcom, Intel, and NVIDIA.
Network Benchmarking for AI/HPC Infrastructure
- Work on networking benchmarks and deployment readiness for high-performance AI clusters.
- Focus on topology-aware benchmarking across NVLink, multi-node NVLink, InfiniBand, and Ethernet fabrics.
- Develop practical reliability and performance signals for large-scale cluster buildout and production readiness.
Agentic Platform for Infrastructure Workflows
- Explore an Agentic Platform for AI infrastructure workflows, connecting LLM-driven agents with benchmark selection, infrastructure evaluation, and operational automation.
- Build reliable interfaces between model capabilities and infrastructure engineering tasks.

Software Engineer · Azure HPC Team
Microsoft China · Beijing, China
July 2022 – Oct. 2024 SuperBench: Benchmarking and Topology-Aware Evaluation
- Worked on the open-source SuperBench benchmarking framework for cloud AI infrastructure.
- Focused on making benchmarking scalable, topology-aware, and useful for production readiness; related paper: USENIX ATC 2024 Best Paper.
Moneo: Observability and Performance Signals
- Worked on the open-source Moneo observability stack for GPU, InfiniBand, and custom performance signals.
- Helped turn low-level system metrics into actionable signals for anomaly detection and infrastructure optimization.

Software Engineer · Azure Networking Team
Microsoft China · Beijing, China
Oct. 2021 – June 2022 SAI Qualification for SONiC
- Built validation infrastructure for multi-vendor SONiC/SAI qualification, improving confidence in switch behavior and cloud-scale networking interoperability.
- Streamlined automated switch interface testing and release qualification workflows across vendor platforms for Azure networking.
Education
M.Eng. in Electrical Engineering · GPA 3.91/4.0
Oct. 2019 – Sept. 2021
Supervisor: Prof. Hiroshi Hasegawa
Thesis: Resource Allocation in Elastic Optical Networks via Reinforcement Learning
B.Sc. in Electrical Information and Engineering · Major GPA 3.43/4.0
Sept. 2015 – July 2019
Supervisor: Dr. Qing Liu
Thesis: An Encoding-Free Genetic Algorithm for Topology Optimization
Skills
PythonC/C++RustGolangShellTorchSlurmInfiniBandMPINCCLMegatron-LMLLM Agents