Multivac Data Science Lab now available to all our partners!

mpanahi · 2018-11-08 14:31:11 UTC

In 2017, we introduced the Multivac Data Science Lab, a set of tools such as interactive notebooks on top of a dedicated cloud-based Hadoop cluster to run Apache Spark jobs.

After a successful test phase, we are now thrilled to announce the release of Multivac Data Science Lab to all our partners!

Multivac DSL Success Stories

Over the past 14 months, we have used Multivac DSL at ISC-PIF for the following use cases:

Machine Learning

Wikipedia and Web of Science topic modeling by using LDA
Building a recommendation model based on 100 million Netflix ratings by using ALS
Outcome prediction by Classification and Regression (Decision trees, random forests, logistic regression, and naive Bayes)
Clustering keywords and phrases by K-means and Gaussian mixtures (GMMs)

NLP

Implementing Stanford CoreNLP in Apache Spark for distributed NLP
Training Universal Dependencies ML for multilingual Part of Speech detection from millions of documents
Implementing distributed NLP pipelines for extracting keywords and phrases from large-scale English and French documents

Graph

Politoscope community detection (100 million tweets)
Distributed Louvain algorithm
Community-detection for keywords and topics by using LPA and Strongly Connected Components

ETL

Daily downloads, cleaning, extracting, and transforming 150-180 million Wikipedia page views. (total: 94 billion)
Extracting and transforming Politoscope data for Apache Hive and Apache SQL

APPLY FOR MULTIVAC DSL

mpanahi · 2019-03-28 09:06:33 UTC