Site Reliability Engineer
Grafbase, Inc. - New York City, NY
Apply NowJob Description
We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's reliability and scalability.What you will do:Work across teams to ensure software is developed and deployed for maximum reliabilityDevelop, run and improve processes and toolsBuild automation to support reliability efforts for all of our production servicesJoin incidents, help solve them, and assist in drafting RCAs and other documentation that are provided directly to customersAbout You:You have at least 8+ years of experience working with production systemsExperienced in managing large-scale production systemsStrong proficiency in the Rust programming languageHands-on experience with containerization technologies like Helm, Docker or KubernetesSolid experience with cloud platforms such as AWS, Azure, Google CloudKnowledgeable of network protocols, load balancing, and DNS managementFamiliar with monitoring and logging tools and best practicesDeployed and monitored systems using infrastructure as codeExcellent problem-solving and troubleshooting skills
Created: 2025-02-15