Name: A Scalable GA4GH Server Implementation
Start: 2016-05-20T11:10:00-0700
End: 2016-05-20T11:30:00-0700

Data By the Bay is the first Data Grid conference matrix with 6 vertical application areas spanned by multiple horizontal data pipelines, platforms, and algorithms. We are unifying data science and data engineering, showing what really works to run businesses at scale.

Back To Schedule

A Scalable GA4GH Server Implementation

Genomics and Health related data implies lots of data, usually distributed in remote data centers, with lots of contraints related to privacy and confidentiality. Scalability is required at two levels, first within a single data center, and for this, distributed computing technologies like Apache Spark, scalable machine learning libraries and distributed databases are a match. At the inter-data center level, the scheme to share data and data processing methods must be guided by interoperability standards. The Global Alliance For Genomics and Health (GA4GH) is defining such a standard. We present here an implementation of a GA4GH server, using distributed computing and databases as back-end engine, so providing a scalable reference implementation. We also show how to extend the GA4GH server, with new functionality like requesting some model estimation (Machine Learning) and predictions on these models. We then show with the Spark Notebook as interactive tool how to generae a client for the GA4GH server and how to execute methods on the server.

Speakers

Andy Petrella

Cofounder, Data Fellas

Creator of Spark Notebook

Friday May 20, 2016 11:10am - 11:30am PDT
Ada

Life

Data By the Bay

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Andy Petrella

Attendees (2)