OLAP business databases - data warehousing & ETL
7 important questions on OLAP business databases - data warehousing & ETL
Essential features of Inmon's and Kimballs models
What is a data lake?
a database that holds raw data in its narrative format. The data lake approach copies all the data that might be relevant. Whereas the data warehouse approach cleans the data, the data lake approach doesn’t clean the data. Mostly, the two exist next to each other because different employees have different needs. This approach is also called schema on read
Which 4 types of data conversion do we know?
1. Simple: data type conversion
For example:
a. Dd-mm-yyyy hh:mm mmddyy
b. Integer 0/1 Boolean true/false
c. Double/float numeric
2. Complex type conversion
Address format, categorical data
3. Currency conversions
4. Language conversions
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
Which 5 types of ETL cleaning do we know?
1. Data must be validated and filtered garbage in = garbage out (GIGO)
2. Data in the real world is dirty
a. Incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data
b. Noisy: containing errors or outliers
c. Inconsistent: containing discrepancies in codes or names
Therefore, data quality is critical.
3. It is often neglected or casually handled
4. Problems exposed when data is summarized
5. Prevention is better, but who does the cleaning?
What is solution to unify all the different terms used in descriptions?
Which two types of ETL Load do we know?
1. Integral load:
a. All the records are taken and processes by comparing them with the records in the data warehouse
b. Extreme case: load deletes dw tables and replaces them with complete new versions (= full load)
c. When no time stamps are present it is difficult to do an incremental load.
2. Delta/incremental loada. Only the changed records are taken and processed systematically in the dw
Which 6 ETL tool vendors do we know?
1. Informatica PowerCenter
2. Bluesky Integration Studio
3. SAP Data Integrator
4. IBM WebSphere Data Integration Suite SAS enterprise data integration
5. Oracle Warehouse Builder
6. MS SQL Server Integration Services
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding