Site Reliability Engineer

Tekwissen - tampa, FL

Apply Now

Job Description

Overview: TekWissen Group is a workforce management provider throughout the USA and many other countries in the world. Our client is an American multinational information technology services and consulting company and is a leading provider of information technology, consulting, and business process outsourcing services, dedicated helping the world's leading companies build stronger businesses. Title: Site Reliability Engineer Work Location: Tampa, FL 33607 Job Type: Contract Work Type: Onsite Duration: 3 Months Job Description: Top Qualifications: Advanced Kubernetes - Must have strong skills in Kubernetes at scale using one of GKE, AKS, EKS or RKE. Experience with Kubectl and Helm. Containers - Experience deploying Java (Spring Boot) microservices in dockerized environments. Observability - Experience in setting up tools like Prom/Grafana, Datadog, AppDynamics, Splunk. to give actionable intel on a microservice environment including but not limited to synthetics, Application performance monitoring,logging and Alerting (Pagerduty/OpsGenie Integrations). Good CI/CD expertise - Jenkins, Azure DevOps, Github Actions, ArgoCD, Artifactory, Azure container registry, Google container registry and other similar tooling SCM - Working with tools like Github/Gitlab for source code management and well as experience with branching strategies like GitFlow and trunk based. Job Summary: We are looking for a seasoned Site Reliability Engineer to augment our team to support its strategy of driving products and technology into everything they deliver to accelerate the growth in business. As a SRE, you'll work as part of a team of problem solvers, helping to solve complex business issues from strategy to execution. The team covers a variety of responsibilities that are executed by DevSecOps, Site Reliability and ML Ops Engineers, including: Defining standard reliability and resilience for infrastructure and application components. Proactive optimization of redundancies, monitoring and alerting practices and patterns Developing resilient and highly available distributed systems. Infrastructure as Code development for building cloud tools. Secrets and configuration management Monitoring systems and services, providing incident and emergency response to triage and resolve system or client issues Management of the application ecosystem improving platform infrastructure and applications with high reliability, resiliency, performance, and quality Supporting documentation, knowledge articles, and runbooks Designing, building, and Implementing SRE patterns that adhere to our client's security TekWissen® Group is an equal opportunity employer supporting workforce diversity.

Created: 2024-10-21

➤

Login

Create Account

Site Reliability Engineer