- Start with import of data
- Tidy up the data (Data Wraglling)
- Data cleaning cycle (transform - visualise - model)
- After a good result… communicate your findings
What is tidyverse
The tidyverse is a collection of R packages designed for data science, providing tools for data manipulation, visualization, and modeling using a consistent and intuitive syntax. Developed primarily by Hadley Wickham and colleagues at RStudio, the tidyverse emphasises the use of "tidy data" principles, where data is structured in a way that each variable forms a column, each observation forms a row, and each type of observational unit forms a table.
library(tidyverse)
#> ── Attaching core tidyverse packages ───────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2This tells you that tidyverse loads nine packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, tidyr. These are considered the core of the tidyverse because you’ll use them in almost every analysis.
tidyverse_update().Key Components of the Tidyverse
- Core Packages:
ggplot2:A powerful system for creating static and interactive graphics based on the Grammar of Graphics.dplyr:A package for data manipulation, offering a set of functions to manipulate data frames efficiently.tidyr: Helps tidy messy data, making it easy to reshape and clean data.readr:Provides functions to read rectangular data, such as CSV files, into R.purrr: Enhances R’s functional programming capabilities by providing tools to work with lists and vectors.tibble:An enhanced version of data frames that provides a more modern and user-friendly interface.stringr: Simplifies string manipulation using consistent functions.forcats:Tools for working with factors, which are R’s data structure for categorical data.lubridate:Makes it easier to work with dates and times.- Supporting Packages:
hms:Provides a simple class for storing and manipulating time-of-day values.magrittr:Includes the pipe operator (%>%) which allows for chaining commands in a readable manner.broom: Converts statistical analysis objects from R into tidy data frames.modelr:Provides functions to streamline the process of creating and working with models in a tidy workflow.- Consistency: All tidyverse functions are designed to work together seamlessly, following the same design philosophy and syntax.
- Ease of Use: Functions are intuitive and easy to learn, especially for users new to R.
- Efficiency: The packages are optimized for performance and designed to handle large datasets.
- Readability: Code written using tidyverse principles is often more readable and easier to debug and maintain.
- Community and Support: A large and active community provides extensive documentation, tutorials, and support.
Aside the above listed packages, there other relevant packages in R which needs to be installed when needed, and whenever this kind of error occurs…
library(ggrepel)
#> Error in library(ggrepel) : there is no package called ‘ggrepel’Just know that you need to run install.packages("ggrepel") to install the package.
BENEFITS OF tidyverse.