A combination of collaborative filtering approaches and contextual information can help overcome unpredictable behavior while taking into account position and layout bias for more effective recommendation.
It’s by no means an understatement to say that personalised and contextualised news recommendation is a challenge. ‘News’ by definition means news articles typically have a very short lifespan, ones that are popular today can be ‘old hat’ the next morning, fresh articles are published by the minute and their novelty can often attract more readers than the previous headlines. As if that wasn’t complicated enough, you need to factor in pretty unpredictable user behavior made up a complex mix of long-term preferences and short-term needs. Behavior is more often guided by serendipity, curiosity and the surprise effect than by clear, logical intent. Another influencing factor is the spatial context, especially for “local news” items that are really only of interest to readers in a specific geographical area. And let’s not forget that news items that originate from different news sources often cover the same events and only differ in focus. So, even if two articles overlap by 98%, it’s difficult to rank one with respect to the other when recommending them to a reader. How to formally define fairness with respect to differing viewpoints to respect and represent social, political and philosophical diversity, remains an open problem.
At Naver (5th biggest search engine with 30 million monthly active users), the AiRS Team (AI-based Recommender Systems*) has been developing personalised news recommendation algorithms for several years. The initial objective was to simultaneously provide readers with very popular, non-personalised suggestions of headlines (typically hot topics and breaking news) and, in a specific part of the web page, display a personalised list of news snippets. The Search and Recommendation research team in Europe began to collaborate with AiRS at the beginning of 2018 in the design of hybrid recommendation algorithms. “Hybrid” means the algorithms combine “Collaborative Filtering” (CF) approaches which exploit co-click patterns between users, so the system is able to recommend interesting articles simply because users similar to you like them, and “Content-based” approaches where the system analyses the content and some of the meta-data and detects that the article is close to others you liked previously.
The recommendation system basically captures long-term preferences with CF techniques such as tensor factorisation of the “(user x news x context)” tensor, which is regularly updated. This tensor simply represents the history of user-news interactions, telling that user u clicked on news n in context c. For each user, the recommendation system also captures short-term intents by building, in an online and incremental way, a reader profile based on an efficient and effective low-dimensional representation of the sequence of clicked and unclicked news items. Both long-term and short-term preferences are expressed as user relevance scores, which can be considered as personalised attractiveness measures between a user and a candidate news article to be recommended. Other factors, such as the context (time and/or location of the user), the quality of the images associated with the news item and its popularity trend are then combined with these user-specific multi-temporal relevance scores using some “orchestrator” learning-to-rank algorithm. This orchestrator model uses previous recommendation logs to best predict the news articles with the highest likelihood of being clicked. Diversity is also introduced in the recommendation list by working with clusters of news items, representing the same event(s) or related events, but which are reported by different publications and perspectives. This gives the reader a good spread of the main events or stories that may interest them (inter-cluster variety) with the possibility of diving into them to discover their associated facets and viewpoints (intra-cluster variety).
We can illustrate the underlying techniques by looking at two sub-problems we solved.
The first was to be able to determine, every time, the part of the user history the algorithm should pay attention to, to better predict the next “clicked” item. This kind of amounts to determining, from the context, whether the short, medium or long-term should be taken into account and with which importance. We use a relatively common concept in deep learning, that of “Attention Mechanisms”, to capture this kind of flexible temporal dependency. So, in principle, the same model could capture periodic reading patterns e.g. looking at the football results every Monday morning, as well as very serendipitous navigation such as accessing an event related to an article that was just read and that made the user want to know more.
The second problem is the “position bias” where the higher ranked item is more often clicked on even when several news items are equally relevant. This is coupled with other biases, such as the “trust” bias (people click on the top item because they trust the system) or the “layout” bias (items with a nice thumbnail are more likely to be clicked). These biases make it hard to derive a true relevance signal from noisy click information. Based on concepts such as counter-factual risk minimisation and “propensity weights” (the way the standard “importance sampling” strategy is implemented in this use case), we’re able to approximately decouple the bias from the true relevance to infer unbiased ranking metrics for new rankers using the historical datasets. This gives the advantage of having a good idea of how the new rankers will perform in a real setting, without doing any on-line experiments such as A/B testing or interleaving experiments. It lets us compare different algorithms with the current ranker and measure the relative performance improvements. Of course, once we have a method that’s stable, robust, scalable and that fulfils operational constraints, on-line experiments are unavoidable.
*About AiRS: AiRS (AI Recommender System) is applied across a variety of Naver services, including news, blogs, videos and Webtoon (digital comic app). It uses collaborative filtering, deep learning and reinforcement learning to overcome the cold-start problem and to improve the accuracy of the recommendations. AiRS also incorporates large-scale data refinement and serving techniques based on YARN containers, capable of maximum 10,000 TPS.