Senior / Lead Hadoop Engineer
NTT DATA, Inc. - Charlotte, NC
Apply NowJob Description
NTT DATA strives to hire exceptional, innovative and passionate individuals who want to grow with us. If you want to be part of an inclusive, adaptable, and forward-thinking organization, apply now.Check all associated application documentation thoroughly before clicking on the apply button at the bottom of this description.We are currently seeking a Senior / Lead Hadoop Engineer to join our team in Charlotte, North Carolina (US-NC), United States (US).Job Duties and Responsibilities:In this role, you will:Lead complex technology initiatives towards Hadoop platform stability, automation initiatives.Platform Incident and Problem managementBe part of 24/7 platform support team to perform platform incident management and triaging team.Assess the impact of the platform issue and prioritize the triage by partnering with platform tenants, platform development and Hadoop vendor teams.Facilitate and perform root cause analysis of platform incidents towards determining permanent resolution involving Tenants and Hadoop vendor.Communicate with Stakeholders, leadership team on the triaging progress status.To troubleshoot tenant reported failures by deep diving onto client logs to identify the root cause.Use Autosys as scheduler to schedule, execute platform routine activities.Hadoop Cluster MonitoringMonitor Hadoop cluster health and performance.Automate new monitoring opportunities leveraging available monitoring and alerting tools such as Prometheus, Thousand Eyes, Splunk etc.Perform troubleshooting on the actionable alerts and tuning to ensure optimal cluster performance.Hadoop Cluster Configuration and tuningPeriodically tune and configure Hadoop ecosystem components such as HDFS, YARN, Hive, HBase, Spark, HPOS etc.Schedule the Change request and track the approval process for implementation.Platform Backup and RecoveryEnsure Hadoop platform is resilient with data backup and recovery strategies in place.Perform emergency or scheduled failover of platform to Disaster recovery site and fall back.Upgrades and Patch ManagementPlan and execute upgrades of Hadoop ecosystem components and patches.Ensure compatibility and smooth integration of new features and enhancements.Work with Operating System administrators and patching automation build team to schedule and execute patching.Perform Cluster health validation post patching and communicate with stakeholders.Identify automation and process improvement opportunities related to patching to implement it.Documentation and ReportingMaintain comprehensive documentation of Hadoop cluster configurations & troubleshooting processes, and procedures.Develop and Schedule to Generate reports on cluster usage, performance metrics, and capacity utilization.Collaboration and SupportAbility to work both independently and in collaboration with Platform development and Platform tenant teams to troubleshoot and fix Hadoop platform issues, maintain production stability to enterprise-wide applications.Collaborate and consult with key technical experts, senior technology team, and Hadoop vendor to resolve complex technical issues and achieve goals.Basic Qualifications:8+ Years Proven Experience in technology incident and problem management role.4+ Years Proven Experience in using monitoring tools like Prometheus, Splunk, SiteScope & Thousand Eyes etc.6+ Years Proven experience as a Unix Scripting.2+ Years Proven experience in Hadoop administration or in a similar role.Preferred Qualifications:Strong knowledge of Hadoop ecosystem and its components (Hive, HBase, Spark, Hue).Proven Experience is using Autosys or similar job scheduler.Excellent problem-solving and troubleshooting skills.Excellent communication and people skills for collaborating with cross-functional teams.#J-18808-Ljbffr
Created: 2024-11-05