Data Science Applications - Recommender Systems
8 important questions on Data Science Applications - Recommender Systems
What is a Rating Matrix?
Sparsity problem: lets say we have 1 million users and 10,000 items. Very slight chance that 2 users have items in common. Solution could be to format in lists, to skip empty cells. And to add normalization, so the scores get more value.
Another problem often is the long tail. Some items get bought plenty. Some barely. Difficult to pick between popular item, or to differentiate with less popular but risky.
What evaluations are there for Recommender Systems?
Precisition cut off k: p(k) = TP / (TP+FP)
Average precision = 1/m sum[precision(i)*rel(i)]
Mean precision = i/users * average precision
Spearman rank order correlation
- measures degree to which monotonic relationship exists between predicted and actual ratings.
Kendall's tau
- 2 items are concordant if item who has higher actual rating also has higher predicted rating. Tau thats positive is good.
Goodman Kruskal Gamma, ignores all tied parties. Gamma is (A-B)/(A+B)
What are some other recommender systems criteria?
User coverage: what % of users can be provided
Item coverage: what % of items can we recommend
Serendipity: how surprising are recommendations
Cold start: how does the system work for new users
Scalability: what if high # users/items
Profit Impact (A/B testing): random vs analytical recommendations
Uplift effect: is recommending an item useful
Interpretability: why are items recommended?
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What similarity measures are there?
Cosine measure: between 0 and 1. Users are represented as vectors. Similarity is then based on the angle. Difference in average rating behavior not considered.
Adjusted cosine measure: substracts all averages of scores.
Jaccard index (J): used for binary data = (A dif B)/(A u B)
How does Item-Item collaborative filtering work?
Item similarities are considered to be more stable than user similarities. Items are simpler wheras users typically have multiple tastes. Users may like similar items but each also have their own specific differences.
What are the pros and cons of K-nearest neighbor based filtering?
Cold start problem for new users and items. Performance drops when sparsity increases. Popularity bias. Scaling
What is content filtering?
Build item profile for each item. Profile is set vector of item features. Then create heuristic from text mining TF-IDF.
What are the pros and cons of content filtering?
- no cold starting problem for new items (still for users)
- no sparsity problem
- easy to explain recommendations
- recommend to users with unique tastes
- recommend new and unpopular items
CON
- tagging is expensive
- no information from other users
- recommendations for new users
- overspecialization
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding