Building the data-first platform for AI application development at Snorkel AI — join us!
Here's an incomplete list of my reading, writing, and photography.
ML models can achieve high quality performance on coarse-grained metrics (e.g., F1-score, overall accuracy), but they may underperform on critical data subsets, or slices. We introduce Slice-based Learning, a new programming model in which practitioners use slicing functions to specify critical data subsets for which the model should pay attention.
Scene graphs have emerged as useful in a number of computer vision tasks, including visual question answering — however, most scene graph datasets are sparse due to annotator error. This work attempts to overcome limitations of human annotators using a semi-supervised method, taking advantage of both limited labels and unlabeled data, to generate training datasets for scene graphs.
We leverage key abstractions in Snorkel to achieve state-of-the-art scores on the SuperGLUE benchmark: (1) weak supervision (2) data augmentation (3) data slicing.
We incorporate a number of supervision sources, from traditional supervision, transfer learning, multi-task learning, weak supervision, and ensembling, to achieve state-of-the-art scores on the GLUE benchmark.
GLUE Benchmark. [blog]
Bicuspid aortic valve (BAV) is the most common congenital malformation of the heart — obtaining training data is a tremendous practical roadblock to building ML models for detecting this malformation. We collaborate with cardiologists from Stanford Medicine to write labeling functions over geometric features of heart MRIs to produce probabilistic training labels.
Using weak-supervision, we learn probabilistic training labels for aortic valve MRIs.
In NeurIPS 2017, ML4H Workshop. [poster]
Teaching Assistant, Spring 2018
Hosted office hours, advised student projects, and led discussion sections on backpropogation and weak supervision.