Member of Technical Staff - ML Performance
Modal, Inc - New York City, NY
Apply NowJob Description
About Us:Modal is building the serverless compute platform to support the next generation of AI companies. In order to deliver the developer experience we wanted, we went deep and built our own infrastructure-including our own custom file system, container runtime, scheduler, container image builder, and much more.We're a small team based out of New York, Stockholm and San Francisco. In just one year, we've reached 8-figure revenue, tripled our headcount, scaled to support thousands of GPUs, and raised over $32M in funding.Working at Modal means joining one of the fastest-growing AI infrastructure organizations at an early stage, with many opportunities to grow within the company. Our team includes creators of popular open-source projects (e.g. Seaborn, Luigi), academic researchers, international olympiad medalists, and experienced engineering and product leaders with decades of experience.The Role:We are looking for strong engineers with experience in making ML systems performant at scale. If you are interested in contributing to open-source projects and Modal's container runtime to push language and diffusion models towards higher throughput and lower latency, we'd love to hear from you!Requirements:5+ years of experience writing high-quality, high-performance code.Experience working with torch, high-level ML frameworks, and inference engines (vLLM or TensorRT).Familiarity with Nvidia GPU architecture and CUDA.Experience with ML performance engineering (tell us a story about boosting GPU performance - debugging SM occupancy issues, rewriting an algorithm to be compute-bound, eliminating host overhead, etc).Nice-to-have: familiarity with low-level operating system foundations (Linux kernel, file systems, containers, etc).Ability to work in-person, in our NYC, San Francisco or Stockholm office.
Created: 2025-02-28