Submit to FacebookSubmit to Google PlusSubmit to TwitterSubmit to LinkedIn

IN QUEST OF AN ALGORITHM PREDICTING WHICH FILMS WOULD BE POPULAR AMONG SPECIFIC USERS

NETFLIX is the biggest online DVD rental provider in USA. Users can choose online the movie that they want to see and receive a copy at their home, by courier.

The company wanted to provide customers with the opportunity to be able to select movies which would appeal to them, among thousands that available online, based on the history of previous rents. For this purpose, a contest was announced, offering the possibility of processing the huge data base (including 100.000.000 reviews, 500.000 customers and 18.000 movies) to scientists who would be willing to try and improve the accuracy of the film suggestion mechanism, by 10% or more.

The research team of the Assistant Professor of Computer Science at the University of the Aegean, Mr. Nikolaos Ampazis, in cooperation with two other high ranking teams, won 2nd place in 2009, in the global data analysis competition.

Application of analytical techniques and machine learning to solve large-scale, data mining problems

ACQUIRING ADVANCED EXPERTISE

The competition for the Netflix award has been very important for the research in the field of IT, since the released dataset was by far the largest ratings dataset ever becoming available to the research community. Dealing with this challenge required expertise in data management, use of advanced adaptive and non-linear machine learning models and also important innovations in the “intelligent” combination of different models.

The Company’s customers board was particularly sparse (only 1% in total contained reviews, as most customers had rated a few movies). Evaluations should be seen both as predictor variables and as dependent variables. The exceedance of the barrier of 10% led to extending the limits of the existing modeling techniques to a significant rate.

Mr. Ampazis’ team, consisting of himself and his research partner George Tsagas, worked as an autonomous team (“Feeds2”) focusing on the problem for 2.5 years and developed many innovative techniques and models for combining them. In collaboration with two other high ranking groups, they created the team “The Ensemble”, which proved that their modeling can lead to the understanding of business issues, market dynamics and prediction of behavior.

Award for the data analysis Netflix Prize, 2009

Mr. Ampazis’ team as a founding member of “The Ensemble” team, won 2nd place, although they tied in score with the first finalist with percentage 10.06%, because they submitted the results in the final 20 minutes of the contest. As an autonomous team, Feeds2 took 3rd place in the leaderboard.

The competition for the Netflix award, with the $1 million dollars Grand Prize, lasted more than 2.5 years. It attracted more than 50,000 registered groups consisting of computer scientists and mathematicians from over 186 different countries.

UNIVERSITY OF THE AEGEAN

Nikolaos Ampazis, Assistant Professor, Intelligent Data Exploration and Analysis Laboratory - IDEAL, Department of Financial and Management Engineering

George Tsagas, Researcher

http://www.netflixprize.com/leaderboard http://labs.fme.aegean.gr/ideal/