Home / Summaries / Knowledge Management and Business Intelligence / recommendations-item-precision

Data Science Applications - Recommender Systems

8 important questions on Data Science Applications - Recommender Systems

What is a Rating Matrix?

Rows are users, columns items. Each are given a score.

Sparsity problem: lets say we have 1 million users and 10,000 items. Very slight chance that 2 users have items in common. Solution could be to format in lists, to skip empty cells. And to add normalization, so the scores get more value.

Another problem often is the long tail. Some items get bought plenty. Some barely. Difficult to pick between popular item, or to differentiate with less popular but risky.

What evaluations are there for Recommender Systems?

Ranking: predicted vs actual rankings
Precisition cut off k: p(k) = TP / (TP+FP)
Average precision = 1/m sum[precision(i)*rel(i)]
Mean precision = i/users * average precision

Spearman rank order correlation
- measures degree to which monotonic relationship exists between predicted and actual ratings.

Kendall's tau
- 2 items are concordant if item who has higher actual rating also has higher predicted rating. Tau thats positive is good.

Goodman Kruskal Gamma, ignores all tied parties. Gamma is (A-B)/(A+B)

What are some other recommender systems criteria?

Diversity: how diverse are recommendations?
User coverage: what % of users can be provided
Item coverage: what % of items can we recommend
Serendipity: how surprising are recommendations
Cold start: how does the system work for new users
Scalability: what if high # users/items
Profit Impact (A/B testing): random vs analytical recommendations
Uplift effect: is recommending an item useful
Interpretability: why are items recommended?

What similarity measures are there?

Pearsons Correlation Coefficient (p): between +1 and -1. Average rating behavior not considered.

Cosine measure: between 0 and 1. Users are represented as vectors. Similarity is then based on the angle. Difference in average rating behavior not considered.

Adjusted cosine measure: substracts all averages of scores.

Jaccard index (J): used for binary data = (A dif B)/(A u B)

How does Item-Item collaborative filtering work?

Instead of rows (users) we look at column (item) similarities.

Item similarities are considered to be more stable than user similarities. Items are simpler wheras users typically have multiple tastes. Users may like similar items but each also have their own specific differences.

What are the pros and cons of K-nearest neighbor based filtering?

Easy to develop. Easy to explain recommendations. Good performance. Widely studies and applied.

Cold start problem for new users and items. Performance drops when sparsity increases. Popularity bias. Scaling

What is content filtering?

Recommendations based on content of items instead of users' opinions. E.g. Same genre, actor, director, theme, etc.

Build item profile for each item. Profile is set vector of item features. Then create heuristic from text mining TF-IDF.

What are the pros and cons of content filtering?

PROS
- no cold starting problem for new items (still for users)
- no sparsity problem
- easy to explain recommendations
- recommend to users with unique tastes
- recommend new and unpopular items

CON
- tagging is expensive
- no information from other users
- recommendations for new users
- overspecialization

The question on the page originate from the summary of the following study material:

Knowledge Management and Business Intelligence

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Data Science Applications - Recommender Systems

8 important questions on Data Science Applications - Recommender Systems

What is a Rating Matrix?

What evaluations are there for Recommender Systems?

What are some other recommender systems criteria?

What similarity measures are there?

How does Item-Item collaborative filtering work?

What are the pros and cons of K-nearest neighbor based filtering?

What is content filtering?

What are the pros and cons of content filtering?

Summaries related to Data Science Foundations - Preprocessing

Knowledge Management and Business Intelligen…

Business Information Systems

Managerial economics

Lineaire Optimalisatie

Principles of Database Management

Consumer Behaviour

Computer Netwerken

Architecture and Modelling of Management Inf…

ICT Service Management

Management Control and Cost Management

Strategy