Site Reliability Engineer
Aegistech - new york city, NY
Apply NowJob Description
We are looking to hire an employee for a Cloud SRE - Windows Hybrid role located in NYC. The role is located within the client's Cloud and Platform group, a global team responsible for the maintenance and support of infrastructure systems used within the client, a global international investment bank. The team plays a critical role and works closely with global counterparts in maintaining the production infrastructure. The candidate should have strong technical, functional, and analytical skills with good experience in automation and supporting critical infrastructure and troubleshooting on Windows systems. This position will contribute towards supporting and driving infrastructure implementations to completion and serving as subject matter expert to the user community, across Infrastructure, Platform, and Software as a Service (IaasPaaSSaaS). The team operates in a follow the sun support model. The function provides a variety of services to our stakeholders including hardware specification advice, Operational Readiness of new solutions and implementation of new Windows servers. THE DAY-TO-DAY RESPONSIBILITIES: Supportmanage MS Windows systems and implementation of Change requests.The candidate will create scripts to increase the efficiency of daily support. This includes updating runbooks and support procedures.Active collaboration with the Global Operations and Engineering teams to implement key projects within the Cloud environments.Provide assistance and support to transformation programs for application and services looking to move to the cloud environment.Responsible for looking at ways to improveautomate SRE items - availability, latency, performance, efficiency, and capacity planning.Troubleshoot system performance issues.Handle trouble tickets, user requests, proactive maintenance.Support weekend BCP DR tests and weekend on call production support on a rotation basis.Assist application teams in post-configuration of new servers deployed.A good understanding of ITIL and Change Management policies is desired.Coordination with Infrastructure teams and Business IT managers to deliver projects on schedule.Work with the Engineering team on Operational Readiness and implement engineered solutions to improve efficiency and stability of the infrastructure.Provide documentation for 1st line Operations team and maintain run vestigate and determine root causes for major incidents with the help of vendors and internal infrastructure teams, providing a detailed RCA and plan for remediation.Attend to escalations during Follow-The-Sun support hours.Contribute towards BAU Projects.Work with Incident Management team to provide RCAs for Incidents THE SKILLS YOU NEED TO GET THE ROLE: Solid experience as a Windows Systems Administrator in a large-scale, global and distributed environmentCloud tools such as Ansible, GIT, Kubernetes, TerraformVirtual (VMware), physical networking configuration is a plusAbility to deploy and support MS Windows ClustersAbility to create scripts using PowerShell, Python, VBScript, JscriptJavaScript is a plusKnowledge of Site Reliability Engineering components are a must.Experience working with VMware virtualizationUnderstanding of Active Directory and how enterprise class identity and access management (IAM) is extended from on-premises environment to public cloud is a plusAbility to troubleshoot issues and provide resolutionWritten and verbal communication skills are a mustWork independently as well as in a teamPrevious experience with supporting a banking infrastructure is preferredPrior experience of global enterpriseExperience of working with offshore IT teams
Created: 2024-09-30