Data Science Applications - Fraud Analytics
12 important questions on Data Science Applications - Fraud Analytics
What are some Social Network definitions?
Edges (links): connections between nodes. These edges can give extra info. Relationship, weight, frequency, etc.
Next you can create: sociograms, adjacency matrix or adjacency lists.
What are Network Centrality Measures?
- Geodesic: shortest path between two nodes
- Degree: number of edges
- Closeness: distance to all other nodes
- Betweenness: # it appears in geodesics
- Graph theoretic center: node with smallest max distance to all other nodes
What is community mining?
Very useful in fraud cases. Find a group of stores that are sensitive to fraud. Or find a group of people that behave fraudulantly due to peer pressure.
Basic methods:
- graph partitioning
- Girvan Newman algorithm
Advanced methods:
- spectral clustering
- directly optimizing Q modularity
- finding communities with overlap
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What are Graph Partitioning Approaches?
Iterative bisection
- split given graph into 2 groups using minimum cut size. Or ratio cut or min-max cut.
What is the Girvan-Newman algorithm?
2. Edge with highest betweenness is removed
3. Betweenness of all edges affected by removal is recalculated
4. Steps 2 and 3 repeated until no edges remain
This creates a hierarchical network decompsition. A key decision is how to determine the optimal number of communities.
What is bottom-up community mining?
Communities can be
- complete: each node connected to each other node
- partial: each node connected to at least 1 other node
What is modularity Q?
What are the challenges of Predictive Analytics in social networks?
- data are not independent and identically distributed
- collective inference
- no easy seperation in training and test set
Goal is to model behavior that cascades from node to node much like an epidemic. Markov asumption: behavior of node depends only on behavior of direct neighbors.
Components
- non relational classifiers
- relational model
- collective inference
What is a Relational Neighbour Classifier?
- homophily and some class labels are known.
P(c,x) = 1/Z sum[w(x,xj)]
Aka, for a node it looks at how many are F and how many are NF. And this becomes a probability.
Probalistic Relational Neighbor Classifier takes weights for each node into effect. So not 1 or 0. But 0.2 and 0.8
What is Relational Logistic Regression?
Local
- describing a customers behavior
Network
- most # occuring class of neighbors
- # of classes of neighnors
- binary indicators indicating class presence
What is Social Network Featurization?
Degree: number of connections
Triangles: group of 3 nodes connected
Hops: #frauds we find in # of hops
Try to add as many features as you can. Later you can decide which are substantive
What is Collective Inference?
PageRank
- probability of visiting web page. Depends on linking pages and their # of outgoing links.
Same idea can be used for fraudulent networks
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding