Site Reliability Engineer
Tandym Group - charlotte, NC
Apply NowJob Description
Tandym Group is seeking a Site Reliability Engineer to support a financial client based in Charlotte. Responsibilities:Run the production environment by monitoring availability and taking a holistic view of system healthSupport the applications with OnCall rotation support. Provide stability to our applications and facilitates rapid feature development by taking active control on direction of the service and be proactiveAutomate and eliminate manual work and look for opportunities for automationMaintain and implement the SLO implementation adoption and automationProduction ReadinessHealth Scoring & Error Budget Tracking Runbook standards, maintenance, and updates Qualifications:Experience using DevOps tools and technologies such as GitLab, and Infrastructure as Code tools such as TerraformStrong troubleshooting skills and building and enhancing the observability using monitoring toolsProactive approach to Observability maturity, identifying problems, performance bottlenecks, and areas for improvement for observabilityLeading incident response and supporting application teams.Blameless postmortems Developer feedback for enhanced logging, runbooks and addressing technical debt.Promoting observability best practices Experience in monitoring tools Dynatrace & Splunk Experience in public cloud platforms, preferably AWS and Api gatewaysExperience developing API or Microservices or Frontend is a plusExperience using source version control (SVC) such as GitDesired SkillsAWS certification
Created: 2024-10-19