Data Science Applications - Recommender Systems

8 important questions on Data Science Applications - Recommender Systems

What is a Rating Matrix?

Rows are users, columns items. Each are given a score.

Sparsity problem: lets say we have 1 million users and 10,000 items. Very slight chance that 2 users have items in common. Solution could be to format in lists, to skip empty cells. And to add normalization, so the scores get more value.

Another problem often is the long tail. Some items get bought plenty. Some barely. Difficult to pick between popular item, or to differentiate with less popular but risky.

What evaluations are there for Recommender Systems?

Ranking: predicted vs actual rankings
Precisition cut off k: p(k) = TP / (TP+FP)
Average precision = 1/m sum[precision(i)*rel(i)]
Mean precision = i/users * average precision 

Spearman rank order correlation
- measures degree to which monotonic relationship exists between predicted and actual ratings.   

Kendall's tau
- 2 items are concordant if item who has higher actual rating also has higher predicted rating. Tau thats positive is good. 

Goodman Kruskal Gamma, ignores all tied parties. Gamma is (A-B)/(A+B)

What are some other recommender systems criteria?

Diversity: how diverse are recommendations?
User coverage: what % of users can be provided
Item coverage: what % of items can we recommend
Serendipity: how surprising are recommendations
Cold start: how does the system work for new users
Scalability: what if high # users/items
Profit Impact (A/B testing): random vs analytical recommendations
Uplift effect: is recommending an item useful
Interpretability: why are items recommended?
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What similarity measures are there?

Pearsons Correlation Coefficient (p): between +1 and -1. Average rating behavior not considered.

Cosine measure: between 0 and 1. Users are represented as vectors. Similarity is then based on the angle. Difference in average rating behavior not considered.

Adjusted cosine measure: substracts all averages of scores.

Jaccard index (J): used for binary data = (A dif B)/(A u B)

How does Item-Item collaborative filtering work?

Instead of rows (users) we look at column (item) similarities. 

Item similarities are considered to be more stable than user similarities. Items are simpler wheras users typically have multiple tastes. Users may like similar items but each also have their own specific differences.

What are the pros and cons of K-nearest neighbor based filtering?

Easy to develop. Easy to explain recommendations. Good performance. Widely studies and applied.

Cold start problem for new users and items. Performance drops when sparsity increases. Popularity bias. Scaling

What is content filtering?

Recommendations based on content of items instead of users' opinions. E.g. Same genre, actor, director, theme, etc.

Build item profile for each item. Profile is set vector of item features. Then create heuristic from text mining TF-IDF.

What are the pros and cons of content filtering?

PROS
- no cold starting problem for new items (still for users)
- no sparsity problem 
- easy to explain recommendations
- recommend to users with unique tastes
- recommend new and unpopular items 

CON
- tagging is expensive
- no information from other users
- recommendations for new users
- overspecialization

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo