Overview

This demo is to be presented at World of Watson 2016 - "Accelerate Your Data Science Delivery with Integrated Notebooks and IBM BigInsights".

The session link is here: https://www-01.ibm.com/events/global/wow/sessions/#/search/id/DMT-3516

  • The purpose of this Data Science Experience (DSX) project is to show how data from IBM BigInsights on cloud can analysed using DSX notebooks.
  • This project uses the http://grouplens.org/datasets/movielens/ ml-1m dataset to build a movie recommendation model using PySpark.
  • The ml-1m dataset consists of 1 million ratings from 6000 users on 4000 movies. It was released on 2/2003.

The movielens front end application where users can rate movies is available here: https://movielens.org/.
A screenshot of the movielens user interface can be seen here:

Instructions

The project is split into a number of different notebooks that focus on specific steps:

Step 1 - Provision BigInsights cluster

This notebook shows you how to provision a BigInsights on cloud cluster on Bluemix.
[Notebook link]

Step 2 - Setup BigInsights with MovieLens data

The cluster is then loaded with the movielens ml-1m dataset using this notebook.
[Notebook link]

Step 3 - Import data from BigInsights to DSX

In this step, we import the BigInsights ml-1m dataset into DSX.
[Notebook link]

Step 4 - Exploratory analysis

In this notebook, we perform some basic exploratory analysis of the ml-1m dataset before we jump into machine learning.
[Notebook link]

Step 5 - Train model

Here we use Spark's Machine Learning Library (MLlib) to train a machine learning model on the data.
[Notebook link]

Step 6 - Predict ratings

In this notebook, we simulate a new user's movie ratings and then use those ratings to predice movies for them.
[Notebook link]

Step 7 - Export Spark model to BigInsights

This notebook exports the model built in the previous notebook.
A scala spark job is then run on BigInsights that loads the model and predicts a rating for a user.
[Notebook link]

Support

If you have any questions about this project, please contact me at chris.snow@uk.ibm.com

Credits

In [ ]:
 
In [ ]: