My name is Amir Yazdanbakhsh. I joined Google Research as a Research Scientist in 2019, following a one year AI residency. I am the co-founder and co-lead of the Machine Learning for Computer Architecture team. We leverage the recent machine learning methods and advancements to innovate and design better hardware accelerators. The work of our team has been covered by media outlets including WIRED, ZDNet, AnalyticsInsight, InfoQ.
I am also interested in designing large-scale distributed systems for training machine learning applications. To that end, I led the development of a massively large-scale distributed reinforcement learning system that scales to TPU Pod and efficiently manages thousands of actors to solve complex, real-world tasks. As a case study, our team demonstrates how using this highly scalable system enables reinforcement learning to accomplish chip placement in ~an hour instead of days or weeks by human effort.
I received my Ph.D. degree in computer science from the Georgia Institute of Technology. My Ph.D. work has been recognized by various awards, including Microsoft PhD Fellowship and Qualcomm Innovation Fellowship.
Awards and Honors
Microsoft Research PhD Fellowship, 2016-2018.
Qualcomm Innovation Fellowship, 2014-2015.
Gold Medal in ACM Student Research Competition (SRC), 2018.
Honorable Mention in IEEE Micro Top Picks, 2016.
Need to Fit Billions of Transistors on a Chip? Let AI Do It, WIRED, 2021.
Google’s Deep Learning Finds a Critical Path in AI Chips, ZDNet, 2021.
Google's Apollo AI for Chip Design Improves Deep Learning Performance by 25%, InfoQ, 2021.
Google's AI Lab Unveiled A New Framework for Efficient Chip Design, Analytics Insight, 2021.
Today we introduce Menger, a massive large-scale distributed RL infrastructure with localized inference that scales up to several thousand actors across multiple processing clusters (e.g., Borg cells), reducing the overall training time in the task of chip placement. In this post we describe how we implement Menger using Google TPU accelerators for fast training iterations, and present its performance and scalability on the challenging task of chip placement. Menger reduces the training time by up to 8.6x (down to ~one hour) compared to a baseline implementation.