Site Reliability Engineer
Altimetrik - Mountain View, CA
Apply NowJob Description
Job DescriptionDesign, implement, and maintain complex data systems supporting millions of customers with Cloud Native principles and best practices to ensure highly available, secure, performant, and scalable database systems.Responsibilities:Create and maintain Jenkins pipelines using Groovy.Build and deploy services in Kubernetes clusters using Helm or Kustomize, including writing Dockerfiles.Manage and configure infrastructure using Ansible and Terraform.Engage in on-call rotations to support pre-production and production systems for customer-facing products, ensuring reliability and uptime.Use Splunk for performance monitoring, observability, and troubleshooting.Write/Review RCA documentation to prevent recurrence of incidents and share learnings.Contribute to major system upgrades, deployment automation, monitoring enhancements, and production changes.Create operational playbooks, contribute to how-to articles, and develop domain knowledge to drive team improvements.Participate and contribute to FMEA/Chaos testing, security remediations, and incident response.Share best practices and patterns for operational excellence and cost optimization.Continuously reduce or eliminate manual steps by automating processes wherever possible.Proactively seek opportunities to increase developer velocity and productivity.Qualifications:Bachelor's or Master's degree in Computer Science or a related technical field, or equivalent experience.4+ years of hands-on development and operational experience with building and maintaining infrastructure in AWS.Strong experience in performance monitoring, troubleshooting, and tuning.Hands-on experience with AWS services and cloud hosting.Experience with scripting languages for DevOps automation.Proficiency in one or more programming languages: Java, Python, or Ruby (including debugging skills).In-depth knowledge of Docker and Kubernetes, including deployment with Helm or Kustomize.Expertise in monitoring and observability using Splunk (required).Proven ability to create Jenkins pipelines using Groovy.Experience managing and configuring infrastructure using Ansible and Terraform.
Created: 2025-01-24