Please download to get full document.

View again

of 112
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Small Business & Entrepreneurship

Publish on:

Views: 22 | Pages: 112

Extension: PDF | Download: 0

UNIVERSITÁ DEGLI STUDI DI MILANO FACOLTÀ DI SCIENZE MATEMATICHE, FISICHE E NATURALI DIPARTIMENTO DI TECNOLOGIE DELL'INFORMAZIONE SCUOLA DI DOTTORATO IN INFORMATICA Settore disciplinare INF/01 TESI DI DOTTORATO DI RICERCA CICLO XXIII SERENDIPITOUS MENTORSHIP IN MUSIC RECOMMENDER SYSTEMS Eugenio Tacchini Relatore: Prof. Ernesto Damiani Direttore della Scuola di Dottorato: Prof. Ernesto Damiani Anno Accademico 2010/2011 II Acknowledgements I would like to thank all the people who helped me during my Ph.D. First of all I would like to thank Prof. Ernesto Damiani, my advisor, not only for his support and the knowledge he imparted to me but also for his capacity of understanding my needs and for having let me follow my passions; thanks also to all the other people of the SESAR Lab, in particular to Paolo Ceravolo and Gabriele Gianini. Thanks to Prof. Domenico Ferrari, who gave me the possibility to work in an inspiring context after my graduation, helping me to understand the direction I had to take. Thanks to Prof. Ken Goldberg for having hosted me in his laboratory, the Berkeley Laboratory for Automation Science and Engineering at the University of California, Berkeley, a place where I learnt a lot; thanks also to all the people of the research group and in particular to Dmitry Berenson and Timmy Siauw for the very fruitful discussions about clustering, path searching and other aspects of my work. Thanks to all the people who accepted to review my work: Prof. Richard Chbeir, Prof. Ken Goldberg, Prof. Przemysław Kazienko, Prof. Ronald Maier and Prof. Robert Tolksdorf. Thanks to 7digital, our media partner for the experimental test, and in particular to Filip Denker. III Thanks to Letizia Guagnini for the graphic design of the interface of Mentor.FM and to Simone Magnaschi for the HTML/CSS/JS conversion of the design. Thanks to Filippo Losi, Leonora Fortunati and Camilla Sampo, who helped to understand better which are the different communities behind some of the musical worlds found. Thanks to the people of stackoverflow.com for their willingness to help others using their impressive skills, thanks in particular to Itamar Katz for the precious suggestions received about the optimization of the the boolean similarity computation. Thanks to Marco della Vedova for the suggestions and the fruitful discussions. Thanks to all the friends who inspired and supported me during these years. Finally, thanks a lot to my family for supporting me over the years and thanks again to my father who, when I was 8 years old, bought for me a Commodore home computer, by mistake. Considering that computer science became for me first a passion and then a job, I would say it was an excellent example of serendipity. IV Abstract Nowadays the amount of content and products easily available on-line for purchase or fruition is so high that recommender systems represent an important resource for users in order to get suggestions about items (songs, movies, books, news, products in general...) they might like. For many years, research, in the field of recommender systems focused on improving accuracy, i.e. improving the precision with which the systems predict the rate that a given user would give to a given item. In the last years, an increasing number of efforts have been directed towards other important aspects such as novelty, diversity and serendipity of recommendations. In particular, with serendipity, in this context, we refer to the ability of a recommender system to propose unexpected and liked recommendations. Serendipity is likely the aspect which has received the least attention and it is the one, in this work, we focus more on. The aim of this thesis is to propose techniques which can be adopted by recommender system designers in order to increase serendipity while keeping an acceptable level of precision of the recommendations. We work in the domain of music, which presents a particularly suitable context for trying to propose non-obvious recommendations, mainly due to the lower cost, respect to other domains, of bad recommendations (listening to a song a user dislikes is not much time consuming). V The work proposes a collaborative-filtering method to classify artists, based on the Affinity Propagation clustering algorithm and on listening logs as data source. The classification, together with a list of the artists a user likes, is used to detect which musical clusters (called musical worlds ) the user is not familiar with. A technique to synthetically represent each cluster, based on freely chosen keywords (folksonomy), is also presented. A novel recommendation method based on gradual exposure and on a variation of the user-based collaborative filtering approach is proposed. The said method exploits the knowledge of the most eclectic users (we decided to call them mentors ) to choose, from the unfamiliar musical clusters, the ones which are more likely to contain serendipitous music for the active user. Once a target musical cluster has been chosen, a playlist is created, which starts with songs by artists who tend to be borderline in respect to the user's taste and continues with songs by artists who tend to be, gradually, closer to the most representative artist of the target cluster. A real music recommendation radio has been developed, implementing the techniques proposed and a traditional top-10 item-based recommender. The radio has been used as a validation test, considering the traditional recommender as a baseline to define which recommendations were expected and which ones were unexpected. The test session suggested that the proposed approach overcomes a method which relies on randomness in terms of a novel measure, called serendipity cost (measured as the total number of disliked songs over total number of serendipitous songs) and in term of cohesion, maintaining a total cost (measured as the total number of disliked songs over total number of liked songs, which can be considered an index of precision) which is much lower than the cost related to the random approach and closer to the cost of a traditional item-based recommender systems (1.03 for the method proposed, 0.46 for the traditional recommender, 2.77 for the random). The method we proposed in order to choose and order the intermediate artists in a playlist, based on graph search techniques, is used to gradually VI expose the user to the target musical world, following the intuition that showing a connection between the target musical world and the music the user is closer to can help him to accept the (unexpected) recommendation. This method, however, can itself be considered an achievement of this work and applied not only in this context but anytime the automatic production of a playlist, having in input the first artist, the last artist and a cohesion (distance in a playlist between an artist and the following one 1) constraint, is needed. 1 Note that cohesion in literature is usually defined as the average distance in a playlist between a song and the following one so in this sentence the term is used in a broader sense VII Contents 1 Introduction Recommender Systems Definition Aims Data sources Techniques Content-based Collaborative-filtering based User-based Item-based Hybrid A comparison between content-based and collaborativefiltering-based techniques Recommender Systems in the Music Domain Peculiarities Huge items space Low cost/consumption time Very high per-item reuse Contextual and mood usage Consumed in sequence Implicit feedback evaluation Applications Pandora Last.fm Musicovery The Echo Nest Motivations, goals and state of the art Overspecialisation problem Diversity, novelty, serendipity and user satisfaction...21 VIII 2.3 Problem statement Related work Proposed approach Dataset Similarity measures Introduction Boolean similarity Pearson Correlation similarity Folksonomy-based similarities Introduction to folksonomies Folksonomies in the music domain Music folksonomies and Music Information Retrieval / Recommendations Similarity measures Artists' classification Introduction Affinity Propagation clustering and Musical Worlds detection The Affinity Propagation algorithm Application to our Artists' classification problem How to represent a Musical World Examples and first discussion Eclecticism level evaluation Mentor approach Playlist generation Introduction Playlist generation, given in input the artists Playlist generation, given the first and the last artist Introduction The Artists' graph Application of the Floyd's Algorithm Examples Application: mentor.fm Introduction Interface and features Technologies Playlists creation and modalities Normal playlist, random profile Normal playlist, dj profile Surprise me! playlist, random profile Surprise me! playlist, mentor profile Assignment of user profiles Evaluation results Evaluation of Artists' Classification Metrics Results IX 5.2 Evaluation of the surprising suggestions Metrics Results Discussion Evaluation of the cohesion Metrics Results Discussion Conclusions, limitations and further research Contributions Limitations and future research References Appendix A Appendix B X 1 1 Introduction 1.1 Recommender Systems Definition Recommender systems are personalized information agents that provide recommendations: suggestions for items likely to be of use to a user [Burke, 2007]. Items is a generic term used in literature to denote the object of a recommendation, recommender systems can in fact work on very different kind of items, for example they can suggest: which books to read; which music to listen; which movies to watch; which friends to add in a social network. Usually, a recommender system focus on a specific kind of item and is designed and developed to effectively work on that [Ricci et al., 2010]. Recommender systems emerged as a research area in the 1990s and the interest in the field has increased in the recent years, also because of the 1 1 - Introduction grow of e-commerce web sites which has multiplied the options available to the users, making harder the process of choosing2 [Ricci et al., 2010] Aims Recommender systems are used by the final users for different kind of tasks, we summarize here the most common ones, considering the analysis provided by [Herlocker et al., 2004] and [Ricci et al., 2010]. Find some good items. This is probably the most common task: users ask recommender systems to extract automatically, from a collection of items, the subset of the them which is the most interesting (according to the recommender system predictor algorithm) for them. This problem is often called top-n, because just the first n (e.g. 10) most important items are returned. An example could be a movie recommender systems which recommend to a user, according to his profile, the top 10 movies he would enjoy more. Find all good items. Sometimes users need recommendations to extract all the items which can be considered interesting for a particular need: imagine for example a prior art search application, which helps a user, according to a topic, to find all the related patents, publications and public discussions, extracting them from an archive. While the difference between the first and the second task could seem trivial, according to the task the design of the system can change a lot: for the first task the system should above all minimize the number of false positives, preferring accuracy to coverage, for the second task the system should, above all, minimize the number of false negative, preferring coverage to accuracy. Let's consider the prior art search example described above: in that scenario, a user can accept a small number of false positive (items the system thinks are 2 For some work about the relation between availability of choice and user's benefit see [Schwartz, 2004] 2 1 - Introduction relevant even if they are not) but he would probably hardly accept false negative errors because they would prevent him from satisfying his needs (i.e. find all good items). Bundle recommendation. In this scenario the recommender system suggests a group of items. For example a desktop computers ecommerce Web site could provide, through a recommendation engine, suggestions about sets of computer components (motherboard, video card, hard disk, etc.) which can work well togethers, according to the preferences of the user on the single items. Sequence recommendation. In this scenario the user asks for an ordered sequence of items and the order of the items does matter. An example could be a music playlist recommender system or a system which recommends a sequence of publications to gradually introduce a user to a topic, depending on the confidence level he has. Group recommendation. In this scenario the recommender system provides recommendation not for a single user but for a group of people, trying to aggregate the profiles of all the users belonging to the group. For example a movie recommender system could provide suggestions for a movie to a group of friends who weekly meet to watch TV together. Annotation in context. This is the first recommender system scenario and, even if now it is probably not anymore a very common task for the typical recommender application, we present it to give a wider overview. For this kind of uses, the system presents to a user both the good and the bad items, annotating them according to the relevance for the user. An example could be a news reader which shows all the available items to the users, labeling them with 3 1 - Introduction different colours according to how much the user, presumably, is interested Data sources Data used by recommender systems refers to three kind of objects: items, users and transactions [Ricci et al., 2010]. Items can be represented in several ways, depending on the recommendation technique used by the system; for example collaborativefiltering-based recommender systems (see ) represent items using the ratings the users gave to them, while content-based recommender systems (see ) represent items using some attributes whose values depend on the characteristics of each item. Also the representation of users depends on the technique used by the recommender system, different kinds of user models can be used, in fact, to describe a user and give, accordingly, personalized recommendations; while a collaborative-filtering-based recommender system simply represents users through the ratings they assigned to items, other approaches could model users according to demographic data or other kind of knowledge. Transactions, finally, are interactions between users and items; the typical transaction recorded by recommender systems is a rating that a user assigned to an item, the rating can be explicit (e.g. a numeric score in the range) or implicit: an evaluation derived from the user's behaviour (e.g. if a user rented several time movies by a specific director, a high rating for that director can be assumed) Techniques Recommender systems are usually classified into three main categories: content-based recommender systems, collaborative filtering-based recommender systems and hybrid recommender systems [Adomavicius and Tuzhilin, 2005]. 4 1 - Introduction Content-based Content-based recommender systems rely on the analysis of the items to suggest: each item (e.g. a document, a song, a movie...) is analyzed by feature extraction techniques in order to represent its content in a specific information space; a document, for example, can be described through a keywords vector. Typically, a profile of the user is created considering the items he rated in the past, and the recommendation process consists in comparing the user profiles against the representation of the available items in order to suggest other items similar to ones the user liked in the past [Ricci et al., 2010]. In order to create the profile of the user, two different techniques can be used: explicit feedback and implicit feedback. The explicit feedback technique requires the user to evaluate the items, while the implicit feedback technique infers the evaluation according to the activity of the user; for example, if a user purchases an item or repeatedly listen to a song, the system can infer a positive feedback on the item or on the song [Ricci et al., 2010]. Different strategies can be adopted to get explicit feedback, the most used are: like/dislike: the users classify items using a binary rating scale; ratings: the user classify items using a numeric scale, for example rating items using a scale; text comments: the user can give a descriptive feedback writing a text comment about an item [Ricci et al., 2010]. The feedback techniques described are not just related to content-based recommender systems and are in fact also used with other type of recommender systems. 5 1 - Introduction Collaborative-filtering based Collaborative filtering (CF) is the process of filtering or evaluating items through the opinions of other people [Schafer et al., 2007]. Typically, these systems work on a users-items ratings matrix (where for each pair of useritem, the matrix provides, if available, the rating the user gave to the item) trying to estimates the ratings for the user-item pairs not yet available. In details, the main idea behind collaborative filtering techniques is that the rating of a user ux for an item ik should be similar to the one another user uy gave to the same item if ux and uy are similar i.e. if they rated similarly other items; following the same assumption, from a different prospective, the rating of a user ux for two items ik and iz should be similar if the two items are similar i.e. if other users rated similarly these two items [Ricci et al., 2010]. For some classic work on collaborative-filtering see [Breese et al., 1998], [Sarwar et al., 2001] and [Goldberg et al., 2001] Collaborative filtering techniques can be classified in two categories: memory-based and model-based [Breese et al., 1998]. In memory-based collaborative filtering recommender systems, the ratings stored in the system are directly used to predict ratings for user-item pairs which are still not available in the system; model-based techniques, instead, use the ratings stored in the system to produce a predictive model and use the model to predict the ratings for user-item pairs not available: some examples of model-based approaches are: Bayesian Clustering, Latent Semantic Analysis or Singular Value Decomposition [Ricci et al., 2010]. Memory-based approaches are furthermore classified in two different categories: usersbased and items-based [Candillier et al., 2007, Ricci et al., 2010]; we will present in details these two approaches in the following two paragraphs User-based The user-based approach predicts the rating that a user ui will give to an item ix according to the ratings that other users, similar to ui (and called 6 1 - Introduction neighbours ) gave to the same item. The similarity between the users (i.e. the neighbours set computation) is based on the ratings the users gave to others items [Ricci et al., 2010]. In figure 1, 2 and 3 a toy example of user-based collaborative filtering recommender system is presented: in our hypothetical movie recommender system, there are five users and four movies and each user can rate each of the movie using a discrete numerical value in the range 1...5, where 1 means the user does not like the movie and 5 means the user likes the movie very much; the recommender system tries to predict the ratings for the (user, item) pairs which still do not have a rating associated and proposes to the user the relative item if the predicted rating is 4 or 5. In the instance presented, all the users rated all the movies expect from Tom, who rated only three movies out of four. We want to predict the rating that Tom would give to the movie Lost in translation, in order to understand if we should recommend it to the user or not. The can be accomplished in two steps: 1. Select Tom's nei
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks