I'm a graduate student at Stanford with a concentration in machine learning and a minor in creative writing. I'm interested in shaping datasets to make machine learning systems more accesible to domain experts in fields like medical imaging.
ML models can achieve high quality performance on coarse-grained metrics (e.g., F1-score, overall accuracy), but they may underperform on critical data subsets, or slices. We introduce Slice-based Learning, a new programming model in which practitioners use slicing functions to specify critical data subsets for which the model should pay attention.
Scene graphs have emerged as useful in a number of computer vision tasks, including visual question answering — however, most scene graph datasets are sparse due to annotator error. This work attempts to overcome limitations of human annotators using a semi-supervised method, taking advantage of both limited labels and unlabeled data, to generate training datasets for scene graphs.
We leverage key abstractions in Snorkel to achieve state-of-the-art scores on the SuperGLUE benchmark: (1) weak supervision (2) data augmentation (3) data slicing.
We incorporate a number of supervision sources, from traditional supervision, transfer learning, multi-task learning, weak supervision, and ensembling, to achieve state-of-the-art scores on the GLUE benchmark.
Bicuspid aortic valve (BAV) is the most common congenital malformation of the heart — obtaining training data is a tremendous practical roadblock to building ML models for detecting this malformation. We collaborate with cardiologists from Stanford Medicine to write labeling functions over geometric features of heart MRIs to produce probabilistic training labels.
Using weak-supervision, we learn probabilistic training labels for aortic valve MRIs.