Home / Summaries / Knowledge Management and Business Intelligence / network-node-relational

Data Science Applications - Fraud Analytics

12 important questions on Data Science Applications - Fraud Analytics

What are some Social Network definitions?

Nodes: are vertices or points
Edges (links): connections between nodes. These edges can give extra info. Relationship, weight, frequency, etc.

Next you can create: sociograms, adjacency matrix or adjacency lists.

What are Network Centrality Measures?

Network centrality measures identify the most important vertices within a network.
- Geodesic: shortest path between two nodes
- Degree: number of edges
- Closeness: distance to all other nodes
- Betweenness: # it appears in geodesics
- Graph theoretic center: node with smallest max distance to all other nodes

What is community mining?

Community: substructure of graph with dense linkage between members of community and sparse density outside community.

Very useful in fraud cases. Find a group of stores that are sensitive to fraud. Or find a group of people that behave fraudulantly due to peer pressure.

Basic methods:
- graph partitioning
- Girvan Newman algorithm

Advanced methods:
- spectral clustering
- directly optimizing Q modularity
- finding communities with overlap

What are Graph Partitioning Approaches?

Split whole graph into predetermined number of clusters. Optimize ration between within and between community edges.

Iterative bisection
- split given graph into 2 groups using minimum cut size. Or ratio cut or min-max cut.

What is the Girvan-Newman algorithm?

1. Calculate betweenness of all existing edges
2. Edge with highest betweenness is removed
3. Betweenness of all edges affected by removal is recalculated
4. Steps 2 and 3 repeated until no edges remain

This creates a hierarchical network decompsition. A key decision is how to determine the optimal number of communities.

What is bottom-up community mining?

Starts with one node and add more nodes to community based on links. Also allows for overlapping communities

Communities can be
- complete: each node connected to each other node
- partial: each node connected to at least 1 other node

What is modularity Q?

Measure to determine number of communities. It measures the fraction of within-community edges in the network. The stronger the communities, the higher the Q value. Between 0.3 and 0.7 is significantly strong

What are the challenges of Predictive Analytics in social networks?

Challenges
- data are not independent and identically distributed
- collective inference
- no easy seperation in training and test set

Goal is to model behavior that cascades from node to node much like an epidemic. Markov asumption: behavior of node depends only on behavior of direct neighbors.

Components
- non relational classifiers
- relational model
- collective inference

What is a Relational Neighbour Classifier?

Assumptions:
- homophily and some class labels are known.

P(c,x) = 1/Z sum[w(x,xj)]

Aka, for a node it looks at how many are F and how many are NF. And this becomes a probability.

Probalistic Relational Neighbor Classifier takes weights for each node into effect. So not 1 or 0. But 0.2 and 0.8

What is Relational Logistic Regression?

It combies both local and network attributes. Frequently used in industry. Adding these attributes is also known as featurization or propositionalization.

Local
- describing a customers behavior
Network
- most # occuring class of neighbors
- # of classes of neighnors
- binary indicators indicating class presence

What is Social Network Featurization?

Refers to mapping neighbor and network characteristics into features and combining them with the local variables for predictive modeling.

Degree: number of connections
Triangles: group of 3 nodes connected
Hops: #frauds we find in # of hops

Try to add as many features as you can. Later you can decide which are substantive

What is Collective Inference?

Given a network initialized by a loval model and a relational model. A collective inference method infers a set of class labels/probabilities for the unknown nodes.

PageRank
- probability of visiting web page. Depends on linking pages and their # of outgoing links.

Same idea can be used for fraudulent networks

The question on the page originate from the summary of the following study material:

Knowledge Management and Business Intelligence

View summary

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Data Science Applications - Fraud Analytics

12 important questions on Data Science Applications - Fraud Analytics

What are some Social Network definitions?

What are Network Centrality Measures?

What is community mining?

What are Graph Partitioning Approaches?

What is the Girvan-Newman algorithm?

What is bottom-up community mining?

What is modularity Q?

What are the challenges of Predictive Analytics in social networks?

What is a Relational Neighbour Classifier?

What is Relational Logistic Regression?

What is Social Network Featurization?

What is Collective Inference?

Summaries related to Data Science Foundations - Preprocessing

Knowledge Management and Business Intelligen…

Business Information Systems

Managerial economics

Lineaire Optimalisatie

Principles of Database Management

Consumer Behaviour

Computer Netwerken

Architecture and Modelling of Management Inf…

ICT Service Management

Management Control and Cost Management

Strategy