Loading…
Data By the Bay has ended
Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas  spanned by multiple horizontal data pipelines, platforms, and algorithms.  We are unifying data science and data engineering, showing what really works to run businesses at scale.
Tuesday, May 17 • 10:40am - 11:20am
lda2vec

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Standard natural language processing (NLP) is a messy and difficult affair. It requires teaching a computer about English-specific word ambiguities as well as the hierarchical, sparse nature of words in sentences. At Stitch Fix, word vectors help computers learn from the raw text in customer notes. Our systems need to identify a medical professional when she writes that she 'used to wear scrubs to work', and distill 'taking a trip' into a Fix for vacation clothing. Applied appropriately, word vectors are dramatically more meaningful and more flexible than current techniques and let computers peer into text in a fundamentally new way. I'll try to convince you that word vectors give us a simple and flexible platform for understanding text while speaking about word2vec, LDA, and introduce our hybrid algorithm lda2vec.

Speakers
avatar for Christopher Erick Moody

Christopher Erick Moody

Data Scientist, Stitch Fix
Caltech - Astrostats - PhD supercomputing. Now data labs @stitchfix coding up word2vec, Gaussian Processes, t-SNE, tensors, Factorization Machines, RNNs, & VI



Tuesday May 17, 2016 10:40am - 11:20am PDT
Ada