Senior Site Reliability Engineer (SRE)
Cox Automotive - tucker, GA
Apply NowJob Description
This role is for a Senior Site Reliability Engineer (SRE) on the Manheim Logistics SRE team. The SRE team is responsible for designing and maintaining AWS infrastructure and deployment pipelines for 15+ development teams. The team focuses on a Docker-based infrastructure solution while incorporating new architectural patterns such as Lambda, Step Functions, and Fargate. Strong emphasis is placed on IaC with Terraform and best practices including proactive monitoring and alerting. This role involves working directly with a release train to improve monitoring/alerting, define error budgets, and assist with DevSecOps. Responsibilities: Design and implement software tools for reliable application delivery and performance management Setup and maintain application monitoring and alerting Collaborate with engineering teams to ensure best practices are implemented Improve predictability and reliability of software releases and workflows Reduce mean time to recovery (MTTR) by troubleshooting, monitoring, alerting, and automating recovery Communicate effectively and manage processes Qualifications: Bachelor's degree in Computer Science or related discipline Minimum 4 years' experience in software development and architecture Minimum 2 years' experience with Terraform Minimum 1 year of experience with Amazon AWS technologies, especially ECS and Lambda Experience with agile development, continuous integration, and automated testing Preferred Skills: Extensive AWS platform skills including Cognito, WAF, Elasticache, S3, and more Experience automating Terraform at scale Experience with Database Server infrastructure (RDS, MySQL, Postgres, etc) .NET core development experience Experience with GitHub, Docker, and Linux administration
Created: 2024-10-15