Software Engineering Manager (Machine Learning ...
Greylock - new york city, NY
Apply NowJob Description
A fast-growing Series B company in our portfolio is looking for a hands-on leader to head their infrastructure team. This team plays a critical role in their machine learning platform, designing, developing, and optimizing core systems. This position is perfect for an infrastructure engineering expert with a solid technical background who thrives in mentoring and leading teams.About the Company:They're developing the most efficient, scalable, and reliable solution for running machine learning workloads"”whether in their cloud or the this role, you will have the opportunity to:Lead, manage, and mentor the infrastructure engineering team responsible for building the backbone of the ML platform.Define and execute the technical strategy for infrastructure, ensuring performance, security, and scalability of key systems.Collaborate with ML teams and cross-functional stakeholders to ensure seamless integration of models into production environments.Design and implement scalable solutions, including CICD pipelines, container orchestration, and cloud infrastructure (AWS, GCP, etc.).Optimize system performance by identifying and addressing infrastructure bottlenecks.Own end-to-end project management for infrastructure initiatives, from planning to execution and ongoing maintenance.Foster engineering best practices and a culture of continuous improvement within the team.Qualifications:Bachelor's, Master's, or Ph.D. in Computer Science, Engineering, or related field.5+ years of professional experience in infrastructure or software engineering, with at least 2 years in a technical leadership role.Expertise in infrastructure design, including containerization (Docker), orchestration (Kubernetes), and cloud platforms (AWS, GCP).Strong experience with CICD pipelines, infrastructure as code (Terraform, Ansible), and monitoring systems.Solid understanding of networking, security, and high-availability infrastructure design.Experience managing and scaling infrastructure for machine learning or similar high-performance workloads.Proven track record of leading teams and delivering large-scale, production-level infrastructure solutions.Excellent problem-solving skills and the ability to drive technical projects from idea to completion.BONUS POINTS:Experience optimizing infrastructure for machine learning workloads, including GPU utilization and distributed computing.Familiarity with multi-cloud strategies and hybrid cloud deployments.Deep understanding of security best practices in cloud-native environments.Previous experience in a fast-paced startup environment, particularly in ML or AI.Logistical Questions:Stage: Series BLocation: New YorkHybridReports to: CTOFounderTeam Size: 8 People
Created: 2024-10-13