Staff Software Engineer, Reliability
Robinhood - Menlo Park, CA
Apply NowJob Description
About the team + role The Reliability Engineering team at Robinhood ensures the reliability, scalability, performance, and security of the systems powering millions of users. As a hybrid role combining software engineering and systems operations, Software Engineers focused on Reliability partner closely with development teams and work on a variety of projects including: Applications including brokerage, crypto, and money Service Level Agreements (SLAs) and Service Level Objectives (SLOs) Incident metrics (Mean Time To Detect and Mean Time To Resolve) Production Readiness Review (PRR) Black box monitoring Canary Testing infrastructure Staging environment Costs and efficiency As a Staff Software Engineer on the Reliability team, you will help build the roadmap and collaborate heavily with cross-functional partners. You will build systems of reliability, centralized tooling, and ensure proper focus for the team. This is a newly formed team, and if you are interested in being part of the founding team, we would love to chat with you! What you'll do Design, build, and maintain large-scale systems that power Robinhood's platform, infrastructure, and core services Write and review high-quality code, create capacity and scaling plans, and debug complex, real-time issues in mission-critical systems used by millions of customers. Lead by example, mentoring teammates, promoting best practices, and fostering a culture focused on operational excellence and system resilience. Take ownership of system reliability by participating in on-call rotations, proactively addressing potential issues, and driving long-term improvements to reduce downtime. Collaborate with industry-leading engineers to develop scalable tools and infrastructure that meet Robinhood's growing demands. Drive innovation by optimizing infrastructure for reliability and cost-efficiency, supporting Robinhood's mission to democratize finance for all at a global scale. What you bring 8+ years experience in designing, building, and maintaining large-scale, distributed systems Proficiency in programming languages such as Python/Go/C++ Expertise in operating systems (Linux/Unix), networking, and troubleshooting sophisticated production issues in high-availability environments. A track record of mentoring team members, fostering collaboration, and contributing to a culture of continuous improvement. What we offer Market competitive and pay equity-focused compensation structure 100% paid health insurance for employees with 90% coverage for dependents Annual lifestyle wallet for personal wellness, learning, and development, and more! Lifetime maximum benefit for family forming and fertility benefits Dedicated mental health support for employees and eligible dependents Generous time away including company holidays, paid time off, sick time, parental leave, and more! Lively office environment with catered meals, fully stocked kitchens, and geo-specific commuter benefits #J-18808-Ljbffr
Created: 2025-02-01