Senior Machine Learning Engineer I
Broad Institute - cambridge, MA
Apply NowJob Description
Description & Requirements The Senior Machine Learning Engineer will participate in research and development efforts aimed at solving problems in analyzing large-scale clinical data with a mission of improving human health. The candidate will work with multiple modalities, including imaging data, time series, and clinical notes. The candidate will also develop methods to ascertain disease outcomes as well as characterize risk factors from large electronic health record data sets. Rich representations of clinical data derived from deep learning models will be used in conjunction with genetic data to investigate the genetic basis for disease. The ideal candidate has both a theoretical and practical understanding of deep learning techniques and has a proven track record in areas such as clinical research, computational biology, probability, statistics, or data science. The candidate joins a strong team of machine learning practitioners to work with, has access to vast amounts of clinical data, and is encouraged to publish new methods and results in academic journals and conferences. The candidate will conduct research in clinical ML and disease biology, and must collaborate effectively with researchers at the Broad Institute and beyond. This position is suited to a person who is excited by the prospect of learning, adapting and applying modern machine learning techniques to solve the key challenges for emerging clinical data modalities, with revolutionary implications in advancing the state-of-the-art clinical practice. Responsibilities Adapting and applying existing machine learning techniques to clinical datasets Developing novel machine learning methods for understanding and organizing unstructured datasets Developing robust and generalizable inference algorithms that advance the state-of-the-art Writing well-crafted, maintainable, scalable, and performant machine learning code Designing, developing, and maintaining testing frameworks for machine learning code Developing techniques for characterizing, processing, and storing large real world clinical datasets Requirements Master's degree in Computational Biology, Computer Science, Physics, Math, Statistics, or related quantitative fields, or relevant experience 5-7 years designing and training models on large, complex and/or biased datasets. 5-7 years experience across deep learning frameworks like Keras, TensorFlow or PyTorch and machine learning packages (sklearn, etc.) Fluent with data modeling, indexing, and ETL and cloud-based pipelines (expertise with MLFlow or other MLOps packages, BigQuery, Spark or equivalent) Strong bash/shell scripting and proficiency with UNIX operating systems Familiarity with Numpy and Pandas Strong communication skills and ability to collaborate with clinicians, data scientists, and software engineers on model requirements and design Preferred Skills Knowledge of software engineering best practices including version control and writing tests Knowledge of MLOps best practices Experience developing data pipelines to prepare data for modeling from large, messy data sets Experience working with clinical data or omics data
Created: 2024-11-05