Principal Board/System Reliability Engineer
AMD - Austin, TX
Apply NowJob Description
Principal Board/System Reliability Engineer This range is provided by AMD. Your actual pay will be based on your skills and experience "” talk with your recruiter to learn more. Base pay range $156,880.00/yr - $235,320.00/yr WHAT YOU DO AT AMD CHANGES EVERYTHING We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. The Role Join a dynamic global team dedicated to integrating reliability early into the design of AMD's cutting-edge datacenter/MI (Machine Instinct) accelerator portfolio, leveraging knowledge of product lifecycle, to develop advanced tests for component, board, and system level interactions. Collaborate with cross-functional teams across AMD Global Operations & Quality, business units, partners, and worldwide supplier network. This lead role focuses on driving board and system-level reliability qualification that meets world-class product quality and reliability for AMD. Key Responsibilities Working closely with global reliability peers, lead reliability efforts of next generation AMD MI accelerators and air/liquid cooled systems, including product qualifications of accelerator modules, kits and systems. Work with internal teams and customers to identify targeted power/thermal/mechanical/workload application environments and define reliability specifications to ensure board/system robustness. Conduct NUDD (new, unique, different, difficult) analysis of new hardware designs, utilize FMEA and FEA (finite element simulations) to define early reliability assessment plans, DOEs. Develop novel risk assessment methods using validated models for reliability failure mechanisms and end-use conditions, to enable reliability/cost tradeoffs and product's performance envelope. Provide inputs to platform teams leveraging tools like Ansys Sherlock, PTC Windchill, and MTBF. Define reliability risk assessment plans and quantify risks using applied statistical methods. Leverage field telemetry and build relationships with CM partners and end-customers to better assess manufacturing and end-use reliability stress conditions. Influence reliability team goals, infrastructure for next generation systems/technologies, and capability/competence roadmap. The Person Technical expertise in the field of electronics/hardware reliability with broad knowledge across materials, mechanical, electrical domains and ability to deep-dive into each. Experienced conducting reliability tests such as temperature cycle, power cycle, reliability demonstration test (RDT), temp/humidity, shock & vibration, bend, strain gage, daisy chain board reliability, and system integration/interaction testing. Sound knowledge of material behavior, physics of failure, electronic failure mechanisms, solder joint reliability, electrochemical migration, burn-in screening, and CTE driven effects. Proven competence in statistical analysis, DOE formulations, life distributions, acceleration factor development, life data modeling and dppm risk quantification. Strong analytical/problem-solving skills, highly organized, with attention to detail. A self-starter and leader, able to handle ambiguity and possesses strong accountability. Adept at delivering results under time constraints, balancing analytical rigor with practical decision-making. A team player, with strong interpersonal skills, mentor engineers, providing guidance in both technical and professional skill development to foster growth and team success. Able to work in a technical, fast-paced, solution-driven work environment, and collaborate across international time zones to drive results. Preferred Experience Extensive experience in a quality/reliability engineering role, and leading product validation testing and analysis. Electronics packaging, new product introduction, high volume manufacturing and test experience for IT equipment or similar products. Experience with datacenter infrastructure and thermal solutions. Knowledge of PCB design, layout tools such as Allegro, Valor, Gerber is a plus. Experience with lifecycle warranty analysis, MTBF and reliability block diagrams is a plus. Familiarity or participation with JEDEC, IPC, AEC, IEC, ASHRAE industry standards. Knowledge of JMP scripting, Python and familiarity with ROCm utilities is beneficial. Academic Credentials MS/PhD, in Mechanical, Materials, Mechatronics engineering, or equivalent. Certified Reliability Engineer (CRE) certification is a plus. LOCATION: Austin, TX #J-18808-Ljbffr
Created: 2025-03-07