Site Reliability Engineer
Grafbase, Inc. - new york city, NY
Apply NowJob Description
We are looking for a Site Reliability Engineer to join our Engineering team. As an SRE, you will play a crucial role in ensuring the reliability, availability, and performance of our systems and services. You will collaborate, design, implement, and maintain infrastructure and automation solutions, supporting the continuous improvement of our platform's reliability and scalability. What you will do: Work across teams to ensure software is developed and deployed for maximum reliability Develop, run and improve processes and tools Build automation to support reliability efforts for all of our production services Join incidents, help solve them, and assist in drafting RCAs and other documentation that are provided directly to customers About You: You have at least 8+ years of experience working with production systems Experienced in managing large-scale production systems Strong proficiency in the Rust programming language Hands-on experience with containerization technologies like Helm, Docker or Kubernetes Solid experience with cloud platforms such as AWS, Azure, Google Cloud Knowledgeable of network protocols, load balancing, and DNS management Familiar with monitoring and logging tools and best practices Deployed and monitored systems using infrastructure as code Excellent problem-solving and troubleshooting skills
Created: 2024-09-30