Site Reliability Engineer

Radar - San Francisco, CA

Apply Now

Job Description

[Full Time] Site Reliability Engineer at RADAR (United States)Learn more about the general tasks related to this opportunity below, as well as required skills.Site Reliability EngineerRADAR United StatesDate Posted: 14 Mar, 2023Work Location: San Francisco, United StatesSalary Offered: $100000 "” $230000 yearlyJob Type: Full TimeExperience Required: 6+ yearsRemote Work: YesStock Options: NoVacancies: 1 availableAbout UsBe part of an exciting, well-funded startup changing the world of retail and beyond. RADAR's mission is to revolutionize customer experience in retail through precise identification of inventory in the stores and distribution centers, completely transforming the in-store experience for employees and customers alike.About the RoleAs a cloud Site Reliability Engineer, you will be involved with our fast-paced releases and collaborate closely with the application development team. The role requires hands-on participation and a deep understanding of cloud-related technologies, management platforms, and networking.ResponsibilitiesRun the production environment by monitoring availability and taking a holistic view of system health, and correct any issues with low latency, high performance, scalable systems in a polyglot architecture.Lead in capacity planning, automate the server capacity monitoring and scaling, and best practices for metrics gathering, monitoring, and alarming.Provide tooling to monitor and resolve any issues with persistent data stores in the system, basic data administration, and optimization for the data pipeline.Evangelize high engineering standards and best practices across multiple areas.Measure and optimize system performance, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating for continual improvement.Provide primary operational support and engineering for multiple large-scale distributed software applications.Follow key SRE practices of preventive measure for all failures, availability, performance, monitoring, alerting, and incident response.Document "tribal knowledge" and conduct post-incident reviews and corrections.Improve operational processes (such as deployments and upgrades) to make them as boring as possible.Debug production issues across services and levels of the stack.About YouRequirements:Bachelor's degree or the equivalent in experience in Engineering, Computer Science or related field.7+ years professional experience in DevOps / SRE handling production procedures and have a certification with a major cloud provider as GCP, AWS, or Azure.In-depth experience with Docker Compose / Docker swarm, Kubernetes cluster deployment, cluster design, sizing, and containerization.In-depth experience deploying microservice architecture, applications, and supporting serverless architectures.In-depth experience with infrastructure-as-code and config management for VMs and containers. Terraforms, Ansible or comparable tooling.In-depth experience with Prometheus, TICK stack, Elastic, Logstash/ Filebeat, telegraph amongst others.Prior experience in building out solutions with Vault and Consul for secret and configuration.Prior in-depth experience with open-source databases, cloud-native databases, cloud-native messaging frameworks.Rock solid with scripting languages such as Python, Ruby, Go shell, and YAML constructs.Working Knowledge of networking concepts, VPN, and VPC constructs in cloud.Understanding of Operations tools (Pagerduty, CloudWatch, Datadog, Sentry, etc.).Deeply conversant with cloud infrastructure security best practices.Good understanding with one of the following CI / CD tooling: Atlassian tooling, Jenkins, CircleCI, and cloud-native deployment tools and deep understanding of GITOPS.What We're Looking For In TeammatesWe are looking for exceptional people to join our growing team and have a positive impact on our culture, technology, and product from day one. We deeply value humility, curiosity, and a positive attitude.What It's Like To Work With UsWe respect each other and each of our contributions, and we believe that the best solutions will come from a diversity of ideas and perspectives.#J-18808-Ljbffr

Created: 2024-11-12

➤

Login

Create Account

Site Reliability Engineer