Lead Site Reliability Engineer
Blueworks Technologies - Mc Lean, VA
Apply NowJob Description
This is a contracting position with our company with a hourly rate.We are seeking aLead Site Reliability Engineer to lead the transformation of customer environments to a serverless and/or containerized, Kubernetes-based infrastructure. This role requires deep expertise in cloud solutions, Kubernetes orchestration, networking, security, monitoring, and site reliability engineering, with a focus on deploying and managing robust, scalable architectures. The ideal candidate brings extensive experience with scripting, automation, and infrastructure as code.Responsibilities:Lead the migration of customer landscapes from traditional or VM-based setups to serverless or containerized, Kubernetes-based architectures.Architect and deploy Kubernetes clusters on major cloud platforms (AWS, Azure, GCP) with a focus on reliability, scalability, and security.Design secure, resilient network architectures, implement network policies, and ensure optimized traffic routing in multi-cloud and hybrid cloud environments.Design and implement secure architectures and practices, including role-based access control (RBAC), network segmentation, and container security within Kubernetes and cloud environments.Set up end-to-end monitoring solutions (Prometheus, Grafana, etc.), logging, and alerting to enable proactive management and visibility into system performance.Develop strategies for automated failover, self-healing systems, and high availability across complex, distributed systems.Use Terraform, Ansible, AWS CDK, or similar to define and manage infrastructure as code, creating reusable modules and automated deployment pipelines.Develop automation scripts using Python, Shell, or other tools to optimize workflows, ensure consistency, and eliminate manual processes.Implement SRE principles, foster an engineering culture that prioritizes reliability, and advocate for operational best practices across teams.Required Skills:10+ years of experience in cloud environments, with deep expertise in serverless architectures, containerization (Docker, Kubernetes) and orchestration.Proven experience building CI/CD pipelines and automated deployment workflows with tools like Jenkins, GitLab CI, GitHub Actions, or similar.Hands-on experience in designing for fault tolerance, auto-scaling, and disaster recovery in distributed systems.Advanced experience with IaC tools like Terraform, AWS CDK, or similar, and configuration management with Ansible.Proficient in scripting languages such as Python and Shell, with experience in creating automation scripts, tooling, and infrastructure management scripts.Strong experience with testing methodologies, chaos engineering principles, and recovery strategies to ensure system robustness and rapid recovery.Strong understanding of cloud networking, routing, VPNs, load balancing, DNS, and firewall management.Experience implementing monitoring, logging, and alerting frameworks using Prometheus, Grafana, Fluentd, ELK, or similar tools.Ability to effectively communicate technical concepts to diverse audiences, including customers and cross-functional teams.Preferred Qualifications:Kubernetes Certified Administrator (CKA) or Kubernetes Certified Application Developer (CKAD), AWS Certified Solutions Architect, or similar certifications.Familiarity with Kubernetes deployments across AWS, Azure, and GCP.Demonstrated commitment to staying updated with the latest in SRE practices, containerization, and cloud technologies.
Created: 2024-11-07