Business Intelligence and Data Analytics
13 important questions on Business Intelligence and Data Analytics
What is data warehousing?
- data from operational (OLTP) databases
- and enriched with data from external sources
Organized around the most important subjects of a company instead of applications and systems
What are OLTP databases?
- ensure fast concurrent acces to data for: transactions, concurrency control, recovery techniques
- normalization crucial: must support heavy transaction loads, should prevent update anomalies
Therefore: poor performance for executing complex queries (joining many relational tables together)
What are the key properties of a data warehouse?
- data warehouse offers a series of historical snapshots
Non-volatile:
- new data are always added
- old data remain
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What are the differences betwen data warehouse database vs. OLTP database?
- support of tactical and strategic decisions
- integrated: internal and external data input
- non volatile: data only read, never changed or deleted
- time dependent: data can be analyzed over time
- subject oriented: data organized around business objects
- data are often aggregated
OLTP
- support of operational processes
- not integrated
- volatile
- time independent
- transaction oriented
- data are detailed
What is the multidimensional model?
- N dimensional space: data cube or hyper cube
Dimension level or granularity level
- determined by the level of detail for each dimension
- dimensions have a hierarchical nature
Facts = cells of the data cube
- measures (value in each cell)
What is the conceptual data warehouse design?
- no well established and universally adopted conceptual model for multidimensional data
- therefore, often data warehouse design is usually directly performed at the logical level
Note
- main goal of conceptual model: communication between stakeholders
- main goal of logical model: implementation
So for datawarehousing; often difficult exercise
What logical data warehousing designs are there?
- data stored in relational tables
- pre computed aggregations stored for performance
- large space required
- OLAP performed with complex SQL statements
- relational DBMS: well standardized + large storage
Multidimensional OLAP (MOLAP)
- data cubes stored in multidimensional arrays
- verry efficient OLAP operations: performant querying
- MOLAP enginers typically support less storage
- often used for smaller "data marts" (<10 dimensions)
Hybrid OLAP (HOLAP)
- combines both previous approaches
- large volumes of data stored in a relational database
- aggregations are kept in a seprate MOLAP store
What relational data warehouse designs are there?
2, types of tables
- facts tables: normalized
- dimension tables: generally not normalized
Snowflake schema
- avoids redudancy by normalizing dimension tables
What is data staging?
- database in which the extracted data undergoes successive modifications
Steps
- data cleaning
- data transformation
- load
- refresh
What is a data warehouse tier?
What architectural variations are there?
Data mart: supporting the decisions of a specific group
1. Direct acces to the data warehouse by client tools
2. Virtual data warehouse = defines a set of views over operational databases, materialized for efficient acces
What types of data warehouse usages are there?
- ad hoc querying an reporting
- online analytival processing (OLAP0
Discovery-oriented tools (data mining)
- knowledge discovery: search for hidden patterns
- supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools
What OLAP operations are there?
- from city -> country
Drill-down
- from year -> month
Slice
- from cube of cities to 1 city
Dice
- from cube to 2 cities and 2 months
What advanced topics are there in data warehousing?
- Trajectory data warehouses
- In memory database systems
- Real time data warehouses
- Semantic web technologies
- Multimedia data warehouses
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding