Data Science Applications - Fraud Analytics

12 important questions on Data Science Applications - Fraud Analytics

What are some Social Network definitions?

Nodes: are vertices or points
Edges (links): connections between nodes. These edges can give extra info. Relationship, weight, frequency, etc.

Next you can create: sociograms, adjacency matrix or adjacency lists.

What are Network Centrality Measures?

Network centrality measures identify the most important vertices within a network.
- Geodesic: shortest path between two nodes
- Degree: number of edges
- Closeness: distance to all other nodes
- Betweenness: # it appears in geodesics 
- Graph theoretic center: node with smallest max distance to all other nodes

What is community mining?

Community: substructure of graph with dense linkage between members of community and sparse density outside community.

Very useful in fraud cases. Find a group of stores that are sensitive to fraud. Or find a group of people that behave fraudulantly due to peer pressure.  

Basic methods:
- graph partitioning
- Girvan Newman algorithm

Advanced methods:
- spectral clustering
- directly optimizing Q modularity
- finding communities with overlap
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What are Graph Partitioning Approaches?

Split whole graph into predetermined number of clusters. Optimize ration between within and between community edges.

Iterative bisection
- split given graph into 2 groups using minimum cut size. Or ratio cut or min-max cut.

What is the Girvan-Newman algorithm?

1. Calculate betweenness of all existing edges
2. Edge with highest betweenness is removed
3. Betweenness of all edges affected by removal is recalculated
4. Steps 2 and 3 repeated until no edges remain  

This creates a hierarchical network decompsition. A key decision is how to determine the optimal number of communities.

What is bottom-up community mining?

Starts with one node and add more nodes to community based on links. Also allows for overlapping communities

Communities can be
- complete: each node connected to each other node  
- partial: each node connected to at least 1 other node

What is modularity Q?

Measure to determine number of communities. It measures the fraction of within-community edges in the network. The stronger the communities, the higher the Q value. Between 0.3 and 0.7 is significantly strong

What are the challenges of Predictive Analytics in social networks?

Challenges
- data are not independent and identically distributed
- collective inference
- no easy seperation in training and test set

Goal is to model behavior that cascades from node to node much like an epidemic. Markov asumption: behavior of node depends only on behavior of direct neighbors.   

Components
- non relational classifiers
- relational model
- collective inference

What is a Relational Neighbour Classifier?

Assumptions:
- homophily and some class labels are known.

P(c,x) = 1/Z sum[w(x,xj)]

Aka, for a node it looks at how many are F and how many are NF. And this becomes a probability.

Probalistic Relational Neighbor Classifier takes weights for each node into effect. So not 1 or 0. But 0.2 and 0.8

What is Relational Logistic Regression?

It combies both local and network attributes. Frequently used in industry. Adding these attributes is also known as featurization or propositionalization.

Local
- describing a customers behavior
Network  
- most # occuring class of neighbors
- # of classes of neighnors
- binary indicators indicating class presence

What is Social Network Featurization?

Refers to mapping neighbor and network characteristics into features and combining them with the local variables for predictive modeling.

Degree: number of connections
Triangles: group of 3 nodes connected
Hops: #frauds we find in # of hops

Try to add as many features as you can. Later you can decide which are substantive

What is Collective Inference?

Given a network initialized by a loval model and a relational model. A collective inference method infers a set of class labels/probabilities for the unknown nodes.

PageRank
- probability of visiting web page. Depends on linking pages and their # of outgoing links.

Same idea can be used for fraudulent networks

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo