Principal Engineer, Infrastructure Software for AI
SB Telecom America Corp. - palo alto, CA
Apply NowJob Description
Company Description:SB Telecom America Corp. offers innovative technology solutions to drive business growth and success. As part of the SoftBank Group, we focus on AI, IoT, Security, and Digital Marketing to create new business values for our clients. Our digital marketing services cater to the Japanese market with bilingual experts in the U.S. and U.K.About Softbank: Softbank is making significant investments in infrastructure for AI. Softbank Corp. has recently established a new US center in Silicon Valley, focused on infrastructure software for AI and AI foundations for mobile networks. Our goals are to challenge the norms and create products making use of our SOTA infrastructure (like Nvidia MGX, DGX Grace :Hopper platforms, and beyond) and cloud-native software. These products are geared towards centralized AI data centers as well as distributed AI Radio Access Network (RAN) data centers. We are looking for expert practitioners who are inspired to bring innovation and build transformative products.Minimum Qualifications:Bachelor's degree in Computer Science, Electrical Engineering, or related field.15+ years in software, hardware, engineering, including platforms and distributed systems.7+ years in Technical Lead roles, leading high-impact projects, teams.Experience in building systems & systems SW, AI frameworks, and applied AI.Preferred Qualifications:Master's or PhD in a relevant field.Deep product experience with Kubernetes and container orchestration.Experience with GPU systems and high-performance computing environments.Expertise in building scalable infrastructure to support AI workloads.Experience with AI developer frameworks, tools, and automation systems.Role: Lead the infrastructure team of Staff and Senior Engineers responsible for building foundational software on top of GPU systems supporting AI workloads (training, fine-tuning and serving). Guide the development of new AI infrastructure with a focus on Kubernetes and GPU systems. Drive innovation in systems software architecture and automation for maximizing resource utilization. As a Directly Responsible Individual (DRI) for engineering, work with product management and program management to lead execution towards commercialization.Responsibilities:Develop and lead engineering team to build systems software for supporting AI workloads on large-scale GPU systems.Deliver control plane for workloads including scheduling and orchestration. Deliver management plane for underlying platforms.Provide northbound APIs for customer portal to interact with the infrastructure.Contribute to Product Definition (PRD) and own the resulting product execution schedules.Attract and build engineering talent.Role model and foster a culture of humility and innovation for product delivery
Created: 2024-11-09