Name: Using Spark MLlib for NLP
Start: 2016-05-17T11:40:00-0700
End: 2016-05-17T12:20:00-0700

Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.

Back To Schedule

Using Spark MLlib for NLP

Apache Spark is most often used as a means of processing large amounts of data efficiently, but is also useful for the processing of individual predictions common to many NLP applications. The algorithms inside MLlib are useful in and of themselves, independent of the core Spark framework. IdiML is an open source tool that enables incredibly fast predictions on textual data by using various components within MLlib. It acts as a standalone tool for performing core machine learning functionality that can easily be integrated into production systems to provide low-latency continuous streaming predictions. This talk explores the functionality inside IdiML, how it uses MLlib, and why that makes such a big difference.

Speakers

Michelle Casbon

Senior Engineer, Google

Michelle Casbon is a Senior Engineer at Google, where she focuses on open source for machine learning and big data tools. Prior to joining Google, she was at Qordoba as Director of Data Science and Idibon as a Senior Data Science Engineer. Within these roles, she built and shipped... Read More →

Tuesday May 17, 2016 11:40am - 12:20pm PDT
Markov

Text

Data By the Bay

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Michelle Casbon

Attendees (23)