Data Architecture

15 important questions on Data Architecture

What is transient data?

Data that is generated in the background, but not saved in a database, unless it is necessary. For example, when your PC crashes, the transient data becomes persistent via a mechanism that saves the data

What is exhaust data?

Data created by an information system, for example, data generated by a smart meter in your house, or clickstream data

What is derived data?

A data element derived from other data elements, using for example a mathematical or logical transformation. Think about functions in excel or tableau
  • Higher grades + faster learning
  • Never study anything twice
  • 100% sure, 100% understanding
Discover Study Smart

What are the main kinds of data?

  • Form (qualitative and quantitative)
  • Structure (structured, semi-structured, unstructured)
  • Source (captured, derived, exhaust, transient)
  • Producer (primary, secondary, tertiary)
  • Type (indexical, attribute, metadata)

What unstructured data?

Data that cannot be captured in a spreadsheet, for example images, sensor or social media posts, video's

What are some issues with spreadsheet data?

Spreadsheets merge the data storage and processing part.

It is loosely structured, and there is limited space available.
Data volume is an issue.

What are the different ways of structuring your data?

  • Tabular
  • Hierarchical (relationships between data points)
  • Matrix (relationships between entities)

What are the characteristics of operational databases?

  • Data Integration = Low, dedicated to different application environments, such as Point of Sale, Inventory, Accounting and Logistics.
  • Orientation = Transaction oriented (order entry, purchasing, invoicing)
  • Data retention = limited because of performance and security concerns (weeks to months)
  • Volatility = High (recording and fetching single transactions, for example, inserting/updating/deleting fields)
  • Role of time = real-time, where time is an indicative record (represents current values when accessed, such as transaction time stamps)

Exam: What is the difference between operational databases and data warehouses?

The operational database, such as ATM's, managed by the OLTP (Operational Database Management System) is the source of information for the data warehouse, managed with OLAP (Online Analytical Processing Systems).
Where operational database data is concerned with current data, the data warehouses are usually concerned with historic data.

Exam: What is the difference between dimensions and measures, provide examples

Dimensions are qualitative values, such as names, dates, geographical points. They can be used to categorise and segment data, affecting the level of detail in your data.
Measures contain quantitative, numeric values that you can measure.

What is an approach to deal with big data?

Distributed computing, based on the SAC paradigm.

What is the SAC paradigm?

Split-Apply-Combine.

What does distributing computing mean?

Distributing each element of the SAC process is called distributed computing.

What are the elements of the knowledge pyramid of Adler?

DIKW; Data, Information, Knowledge, Wisdom

What are the the three types of data discussed in the 'kinds of data' slide?

  • Indexical
  • Attribute
  • Metadata 

The question on the page originate from the summary of the following study material:

  • A unique study and practice tool
  • Never study anything twice again
  • Get the grades you hope for
  • 100% sure, 100% understanding
Remember faster, study better. Scientifically proven.
Trustpilot Logo