Home / Summaries / Class notes - BDS: Big Data Analytics / words-format-tidytext

Tidytext comp

3 important questions on Tidytext comp

What is the tidytext format?

The tidy format is:

Each variable is a column

Each observation is a row
Each type of observational unit is a table

buildin from that, the tidytext format is a table that has one token per row, where a token is a meaningful piece of text, which is most often a word but it can also be word groups or punctuation

How is text converted into a tidy format?

Using unnest_tokens(word, data) creates a new dataframe where every row holds a single word

What are stopwords and how can they be removed from your dataset?

stop words are words that are not useful for an analysis, typically extremely common words such as “the”, “of”, “to”, and so forth in English.
We can remove stop words (kept in the tidytext dataset stop_words) with an anti_join().
if you have another list of stopwords that you'd rather use then provide that list to the antijoin argument instead.

The question on the page originate from the summary of the following study material:

BDS: Big Data Analytics

BDS: Big Data Analytics

A unique study and practice tool
Never study anything twice again
Get the grades you hope for
100% sure, 100% understanding

Remember faster, study better. Scientifically proven.

Study material generic cover image

Class notes - BDS: Big Data Analytics

369 flashcards
& notes

Book cover image

Cognitive Psychology

1364 flashcards
& notes

Book cover image

Research Methods in Psychology Evaluating a …

313 flashcards
& notes

Book cover image

Psychology A Concise Introduction

179 flashcards
& notes

Book cover image

An Introduction to Developmental Psychology

849 flashcards
& notes

Book cover image

Abnormal Psychology

668 flashcards
& notes

Study material generic cover image

Statistics The Art and Science of Learning f…

474 flashcards
& notes

Book cover image

Organizational Behavior

1134 flashcards
& notes

Study material generic cover image

A conceptual introduction to psychometrics

96 flashcards
& notes

Book cover image

Electrical Engineering: Concepts and Applica…

383 flashcards
& notes

Study material generic cover image

Class notes - scientific and statistical rea…

718 flashcards
& notes

Study material generic cover image

Class notes - Psychological assessment

247 flashcards
& notes