Research Engineer
Delphi - San Francisco, CA
Apply NowJob Description
Our "Clone Brain" architecture allows you to create a digital representation of your mind"”reflecting your knowledge, tone, ways of thinking, and even the purpose that drives your conversations. (For example, a leadership coach might direct their clone to mentor emerging managers, while a consultant might want their clone to focus on sales strategy and client onboarding.)Up until now, many of our improvements have come from intuition, first principles, and a very basic testing suite. We want to increase the fidelity of each Clone Brain, ensuring it captures its owner's unique style, knowledge, and conversational aims, while also being able to reason in new situations. But to do that, we need rigorous measurements and interpretability tools that transform "it feels right" into "we have metrics & benchmarks that prove it."Enter the Research Engineer - Evals & Interpretability. You'll develop frameworks that quantify how well each digital clone mirrors the authenticity and expertise of its human counterpart, while also building the tooling to open the black box and figure out why the clone behaves the way it does. If you're curious about cognitive science, neural network interpretability, and the essence of what makes a human mind unique"”this role has your name on it.What You Will Work On1. Frontier Eval Systems & MetricsDesign, implement, and manage robust evaluation frameworks that measure how faithfully a clone reflects its owner's tone, style, purpose, and reasoning.Develop automated tests and analysis pipelines to compare new models and architectures, ensuring we're always improving the fidelity of our Clone Brain.2. Interpretability & DebuggingBuild interpretability tools that shine a light on the internal workings of our clone models, from attention heads to knowledge graph structures.Investigate model behaviors and anomalies, surfacing insights that guide algorithmic improvements and mitigate unexpected outcomes.3. Collaboration & DeploymentWork closely with our AI, product, and engineering teams to integrate your evaluation suites into production workflows.Contribute to real-time feedback loops that help experts refine their clone's knowledge and style with confidence.4. Infrastructure & ToolingDevelop the technical infrastructure for large-scale experimentation and analysis, ensuring that interpretability and eval frameworks can scale across thousands of clones.Help define our data schemas, retrieval strategies, and model instrumentation in collaboration with data and infra engineers.Preferred AbilitiesHands-On Research Experience: A track record of designing experiments and running them end-to-end"”whether in AI, ML, or another scientific domain.LLM Familiarity: Experience evaluating or fine-tuning large language models, with an emphasis on measuring alignment, style transfer, or interpretability.Python Proficiency: Strong coding skills to build robust pipelines and experiment frameworks.Evals & Benchmarking: Familiarity with common language model benchmarks and an eagerness to develop new ones.Interpretability Fundamentals: Knowledge of mechanistic interpretability, feature attribution, or circuit-level analysis is a huge plus.Infrastructure & Tools: Comfort with containers, scaling experiments on clusters, and building internal tools.Experimental Mindset: Ability to pivot quickly when an approach doesn't pan out, and a relentless drive to find creative solutions to open-ended questions.Why You Might Like This RoleEvals for AI is pushing the frontier of research. How to do evals correctly is still an open question. People who will thrive in this role are excited by this challenge, and the opportunity to be at the forefront of research.High level of ownership and impact on product, technical architecture, and company cultureOpportunity to define the future of digital cloning, ultimately enabling digital immortality and 1-1 mentorship for the masses.Challenging work that pushes you to your limitsCollaboration with a team passionate about scaling human potential and personalized learningChance to join a fast-growing startup creating a new market, approaching problems from first principles while valuing design and brandWhy You Might Not Like This RoleNot a 9-to-5We move fast, iterate often, and tackle ambitious challenges"”this isn't a clock-in/clock-out environment.No Existing BlueprintIf you prefer well-trodden paths and established frameworks, be warned: we're creating something that's never existed before.Applied AI Over Foundation ResearchOur focus is on building and optimizing real products for end users, not on training new LLMs from scratch.Fully On-SiteWe believe in-person collaboration drives better ideas. If you're looking for remote, this might not be for you.
Created: 2025-03-01