Big data context
14 important questions on Big data context
What are the differences between dealing with new data versus existing data in research?
• Content analysis, 'big data' analysis focus on data already existing.
• Implications for sampling, ethics, validity, and reliability differ.
What does the term "quantitative" mean in content analysis?
- Involves counting occurrences
Why is it important for content analysis to be systematic?
- Systematic approach ensures consistency and reliability
- Rules are established for sampling and analysis
- Higher grades + faster learning
- Never study anything twice
- 100% sure, 100% understanding
What is the significance of content analysis being objective?
- Objective rules are unambiguous
- Avoids subjective biases
How many steps are involved in doing a content analysis according to Treadwell & Davies?
- Develop a hypothesis
- Define the content to be analyzed
- Sample the content
- Select units for coding
- Develop a coding scheme
- Code the units
- Count occurrences of the coded units
What is an advantage of a dictionary-based approach in text classification?
What is a disadvantage of a dictionary-based approach in text classification?
- May lead to occasional misclassifications, threatening validity
How can the sentences "I’m a huge fan of baseball. I have a big collection of bats." and "I’m a huge fan of stuffed nocturnal animals. I have a big collection of bats" demonstrate a limitation of dictionary-based approaches?
What are the coding rules for content analysis?
- All coding units must be assigned to a category, minimizing the 'other' category.
- Categories should be exclusive.
- Each coding unit is allocated to one category.
- Coders can assess multiple aspects of each unit, which are not necessarily mutually exclusive.
What are the characteristics of big data often defined by "the three V's"?
- Variety: includes text, images, audio, video; structured (databases) and unstructured (e.g., chats)
- Velocity: often real-time or with little lag
What does the fourth 'V', Veracity, refer to in the context of big data?
- Not distorted by observer effects or artificial settings
- Interpretation may not always be straightforward
What approach does big data lend itself well to in research?
• Sometimes aided by visualization
• Gain insights from data (induction) rather than test specific predictions on data (deduction)
• Look for correlations instead of causality
What are some opportunities in big data research?
• Reduces risk of error and bias due to larger sample size
• Can uncover unexpected correlations not predicted by theory
• Allows construction of more sophisticated statistical models
• But may lead to spurious correlations
• Risk of overfitting models
What are some challenges faced in big data research?
- Overfitting may occur when a complex statistical model fits existing data well but struggles to accurately predict new data.
The question on the page originate from the summary of the following study material:
- A unique study and practice tool
- Never study anything twice again
- Get the grades you hope for
- 100% sure, 100% understanding