class: center, middle, inverse, title-slide # Intro to R for Data Science ## Beginner’s workshop ### AbdulMajedRaja RS --- # About Me - Studied at **Government College of Technology, Coimbatore** - Bengaluru R user group **Organizer** - R Packages Developer ( `coinmarketcapr`, `itunesr`) --- class: inverse # Disclaimer: - This workshop is **NOT** going to make you a Data Scientist **in a day**. - The objective is to help you get a flavor of R and how it is used in Data Science - Thus, get you ready to embark on your own journey to become a Data Scientist who uses R --- # Content: This presentation's content is heavily borrowed from the book [**R for Data Science**](https://r4ds.had.co.nz) by **Garrett Grolemund** and **Hadley Wickham** .center[<img src="images/r4ds_had_cover.png" width="30%">] --- # About R - R is a language and environment for statistical computing and graphics. (Ref: [`r-project.org`](https://www.r-project.org/about.html)) - R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand - R is Free - R can be extended (easily) via *packages*. - R is an interpreted language .right[![image](images/Rlogo.png)] --- class: inverse, center, middle # R Interpreter / Console / GUI ## Demo --- # About RStudio - RStudio is a **free and open-source IDE** for R, released by the company **RStudio, Inc.** - RStudio and its team regularly contribute to R community by releasing new packages, such as: - `tidyverse` - `shiny` - `knitr` .right[<img src="images/RStudio.png" width="40%">] --- class: inverse, center, middle # RStudio ## Demo --- # R Ecosystem Like `Python`, `R`'s strength lies in its Ecosystem. **Why R?** - R Packages ### Growth <figure> <img src='images/number-of-submitted-packages-to-CRAN.png' width="80% /> <font size="2"> <figcaption> Source: <a href ="https://gist.github.com/daroczig/3cf06d6db4be2bbe3368">@daroczig</a> </figcaption> </font> </figure> --- class: inverse, center, middle # Basics of R Programming --- # Hello, World! The traditional first step - **Hello, World!**: -- ```r *print("Hello, World!") ``` ``` ## [1] "Hello, World!" ``` <br/> ###.center[That's one small step for a man, one giant leap for mankind] ###.center[Neil Armstrong] --- # Arithmetic Operations ```r 2 + 3 ``` ``` ## [1] 5 ``` ```r 50000 * 42222 ``` ``` ## [1] 2111100000 ``` ```r 2304 / 233 ``` ``` ## [1] 9.888412 ``` ```r (33 + 44 ) * 232 / 12 ``` ``` ## [1] 1488.667 ``` --- # Assignment Operators ### .center[`<-` **Arrow (Less-than < and Minus - )**] ### .center[`=` **(Equal Sign)**] ```r (x <- 2 + 3) ``` ``` ## [1] 5 ``` ```r *(y = x ** 4) ``` ``` ## [1] 625 ``` ```r 5 * 9 -> a a + 3 ``` ``` ## [1] 48 ``` --- # Objects * The entities R operates on are technically known as `objects`. Example: Vector of numeric ```r vector_of_numeric <- c(2,4,5) typeof(vector_of_numeric) ``` ``` ## [1] "double" ``` --- # Vectors - Atomic Vectors - Homogeneous Data Type - logical - integer - double - character - *complex* - *raw* - Lists - (Recursive Vectors) Heterogeneous Data Type - `NULL` is used to represent absence of a vector **Vectors + Attributes (Additional Meta Data) = Augmented vectors** * Factors are built on top of integer vectors. * Dates and date-times are built on top of numeric vectors. * Data frames and tibbles are built on top of lists. --- ### Numeric Vector ```r nummy <- c(2,3,4) nummy_int <- c(1L,2L,3L) ``` ```r typeof(nummy) ``` ``` ## [1] "double" ``` ```r typeof(nummy_int) ``` ``` ## [1] "integer" ``` ```r is.numeric(nummy) ``` ``` ## [1] TRUE ``` ```r is.numeric(nummy_int) ``` ``` ## [1] TRUE ``` ```r is.double(nummy) ``` ``` ## [1] TRUE ``` ```r is.double(nummy_int) ``` ``` ## [1] FALSE ``` --- ### Character Vector ```r types <- c("int","double","character") types ``` ``` ## [1] "int" "double" "character" ``` ```r typeof(types) ``` ``` ## [1] "character" ``` ```r length(types) ``` ``` ## [1] 3 ``` ```r is.numeric(types) ``` ``` ## [1] FALSE ``` ```r is.character(types) ``` ``` ## [1] TRUE ``` --- ### Logical Vector ```r logicals <- c(TRUE,F,TRUE,T, FALSE) logicals ``` ``` ## [1] TRUE FALSE TRUE TRUE FALSE ``` --- # Coersion ## Typecasting - Explicit ```r money_in_chars <- c("20","35","33") typeof(money_in_chars) ``` ``` ## [1] "character" ``` ```r money_money <- as.numeric(money_in_chars) money_money ``` ``` ## [1] 20 35 33 ``` ```r typeof(money_money) ``` ``` ## [1] "double" ``` --- ## Typecasting - Implicit ```r money_money <- as.numeric(money_in_chars) money_money ``` ``` ## [1] 20 35 33 ``` ```r typeof(money_money) ``` ``` ## [1] "double" ``` ```r new_money <- c(money_money,"33") new_money ``` ``` ## [1] "20" "35" "33" "33" ``` ```r typeof(new_money) ``` ``` ## [1] "character" ``` --- #Vector - Accessing ```r month.abb #in-built character vector with Month Abbreviations ``` ``` ## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" ## [12] "Dec" ``` ```r month.abb[2] ``` ``` ## [1] "Feb" ``` ```r month.abb[4:7] ``` ``` ## [1] "Apr" "May" "Jun" "Jul" ``` ```r month.abb[c(2,5,7,10)] ``` ``` ## [1] "Feb" "May" "Jul" "Oct" ``` --- # Vector Manipulation ## Appending ```r days <- c("Mon","Tue","Wed") days ``` ``` ## [1] "Mon" "Tue" "Wed" ``` ```r week_end <- c("Sat","Sun") more_days <- c(days,"Thu","Fri",week_end) more_days ``` ``` ## [1] "Mon" "Tue" "Wed" "Thu" "Fri" "Sat" "Sun" ``` --- # Vector - Arithmetic ```r set.seed(122) so_many_numbers <- runif(10, min = 10, max = 100) so_many_numbers ``` ``` ## [1] 91.45185 91.61657 27.14995 13.68211 62.11661 66.31451 71.14383 ## [8] 10.25104 13.21914 63.69918 ``` ```r so_many_numbers * 200 ``` ``` ## [1] 18290.370 18323.314 5429.989 2736.422 12423.322 13262.902 14228.767 ## [8] 2050.209 2643.828 12739.836 ``` --- # Factors * In R, factors are used to work with categorical variables, variables that have a fixed and known set of possible values. * Useful with Characters where non-Alphabetical Ordering is required ```r days <- c("Thu","Wed","Sun") sort(days) ``` ``` ## [1] "Sun" "Thu" "Wed" ``` ```r week_levels <- c("Mon","Tue","Wed","Thu","Fri","Sat","Sun") (days_f <- factor(days, levels = week_levels)) ``` ``` ## [1] Thu Wed Sun ## Levels: Mon Tue Wed Thu Fri Sat Sun ``` ```r sort(days_f) ``` ``` ## [1] Wed Thu Sun ## Levels: Mon Tue Wed Thu Fri Sat Sun ``` --- # List Lists are a step up in complexity from atomic vectors: each element can be any type, not just vectors. ```r (a_list <- list("abcd",123,1:12,month.abb)) ``` ``` ## [[1]] ## [1] "abcd" ## ## [[2]] ## [1] 123 ## ## [[3]] ## [1] 1 2 3 4 5 6 7 8 9 10 11 12 ## ## [[4]] ## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep" "Oct" "Nov" ## [12] "Dec" ``` --- # List Accessing ```r a_list[[1]] ``` ``` ## [1] "abcd" ``` ```r a_list[[4]][4] ``` ``` ## [1] "Apr" ``` --- # Matrix ```r new_m <- matrix(data = 1:12, nrow = 3) new_m ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 1 4 7 10 ## [2,] 2 5 8 11 ## [3,] 3 6 9 12 ``` ```r new_m * 20 ``` ``` ## [,1] [,2] [,3] [,4] ## [1,] 20 80 140 200 ## [2,] 40 100 160 220 ## [3,] 60 120 180 240 ``` ```r dim(new_m) ``` ``` ## [1] 3 4 ``` ```r new_m[2,3] ``` ``` ## [1] 8 ``` --- # Dataframe ## Tabular Structure * dimension * row.names * col.names ```r colleges <- c("CIT","GCT","PSG") year <- c(2019,2018,2017) db <- data.frame(college_names = colleges, year_since = year) db ``` ``` ## college_names year_since ## 1 CIT 2019 ## 2 GCT 2018 ## 3 PSG 2017 ``` --- # Dataframe Manipulation ```r db$college_names ``` ``` ## [1] CIT GCT PSG ## Levels: CIT GCT PSG ``` ```r db[2,2] <- 1990 db ``` ``` ## college_names year_since ## 1 CIT 2019 ## 2 GCT 1990 ## 3 PSG 2017 ``` ```r db[,"year_since"] ``` ``` ## [1] 2019 1990 2017 ``` --- # Loops & Iterators ## For Loop ```r for (month_name in month.abb[1:4]) { print(paste("This month", month_name, "beautiful!!!")) } ``` ``` ## [1] "This month Jan beautiful!!!" ## [1] "This month Feb beautiful!!!" ## [1] "This month Mar beautiful!!!" ## [1] "This month Apr beautiful!!!" ``` As you move forward, Check the family of `apply` functions - `sapply()`, `tapply()`, `lapply()`, `apply()`. For advanced functional programming, refer `purrr` package --- # Logical Operations ## %in% operator ```r iris$Species %in% "virginica" ``` ``` ## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [34] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [67] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [78] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [89] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE ## [100] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [111] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [122] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [133] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE ## [144] TRUE TRUE TRUE TRUE TRUE TRUE TRUE ``` --- ## Logical Operators ```r 1:10 > 5 ``` ``` ## [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE ``` ```r 1:10 == 4 ``` ``` ## [1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE ``` ```r !1:10 == 4 ``` ``` ## [1] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE ``` --- # Conditions ```r if (iris$Sepal.Length[2]>5) { print("it is gt 5") } else print("it is not") ``` ``` ## [1] "it is not" ``` ```r if (iris$Sepal.Length>10) {print("hello")} ``` ``` ## Warning in if (iris$Sepal.Length > 10) {: the condition has length > 1 and ## only the first element will be used ``` ```r ifelse(iris$Sepal.Length>6, "more_than_10","les_than_10") ``` ``` ## [1] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [5] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [9] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [13] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [17] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [21] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [25] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [29] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [33] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [37] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [41] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [45] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [49] "les_than_10" "les_than_10" "more_than_10" "more_than_10" ## [53] "more_than_10" "les_than_10" "more_than_10" "les_than_10" ## [57] "more_than_10" "les_than_10" "more_than_10" "les_than_10" ## [61] "les_than_10" "les_than_10" "les_than_10" "more_than_10" ## [65] "les_than_10" "more_than_10" "les_than_10" "les_than_10" ## [69] "more_than_10" "les_than_10" "les_than_10" "more_than_10" ## [73] "more_than_10" "more_than_10" "more_than_10" "more_than_10" ## [77] "more_than_10" "more_than_10" "les_than_10" "les_than_10" ## [81] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [85] "les_than_10" "les_than_10" "more_than_10" "more_than_10" ## [89] "les_than_10" "les_than_10" "les_than_10" "more_than_10" ## [93] "les_than_10" "les_than_10" "les_than_10" "les_than_10" ## [97] "les_than_10" "more_than_10" "les_than_10" "les_than_10" ## [101] "more_than_10" "les_than_10" "more_than_10" "more_than_10" ## [105] "more_than_10" "more_than_10" "les_than_10" "more_than_10" ## [109] "more_than_10" "more_than_10" "more_than_10" "more_than_10" ## [113] "more_than_10" "les_than_10" "les_than_10" "more_than_10" ## [117] "more_than_10" "more_than_10" "more_than_10" "les_than_10" ## [121] "more_than_10" "les_than_10" "more_than_10" "more_than_10" ## [125] "more_than_10" "more_than_10" "more_than_10" "more_than_10" ## [129] "more_than_10" "more_than_10" "more_than_10" "more_than_10" ## [133] "more_than_10" "more_than_10" "more_than_10" "more_than_10" ## [137] "more_than_10" "more_than_10" "les_than_10" "more_than_10" ## [141] "more_than_10" "more_than_10" "les_than_10" "more_than_10" ## [145] "more_than_10" "more_than_10" "more_than_10" "more_than_10" ## [149] "more_than_10" "les_than_10" ``` --- # Functions ## Types - Base-R functions (`mean()`, `plot()`, `lm()`) - Package functions (`dplyr::mutate()`, `stringr::str_detect()`) - User-defined functions ```r workshop_hate_message <- function(name = "No one", n = 3) { text_to_print <- paste(name, "hate(s)", "this workshop") for(i in 1:n) { print(text_to_print) } } workshop_hate_message("All of us",4) ``` ``` ## [1] "All of us hate(s) this workshop" ## [1] "All of us hate(s) this workshop" ## [1] "All of us hate(s) this workshop" ## [1] "All of us hate(s) this workshop" ``` --- # Packages ## Package Installation & Loading ### From CRAN (usually Stable Version) ```r install.packages("itunesr") ``` **From Github (usually Development Version)** ```r #install.packages("devtools") devtools::install_github("amrrs/itunesr") ``` ### Loading ```r library(itunesr) ``` --- # Help ## using `help()` ```r help("runif") ``` ## using ? ```r ?sample ``` --- # Help - Example ```r example("for") ``` ``` ## ## for> for(i in 1:5) print(1:i) ## [1] 1 ## [1] 1 2 ## [1] 1 2 3 ## [1] 1 2 3 4 ## [1] 1 2 3 4 5 ## ## for> for(n in c(2,5,10,20,50)) { ## for+ x <- stats::rnorm(n) ## for+ cat(n, ": ", sum(x^2), "\n", sep = "") ## for+ } ## 2: 2.188171 ## 5: 1.936692 ## 10: 15.34038 ## 20: 25.59841 ## 50: 49.75875 ## ## for> f <- factor(sample(letters[1:5], 10, replace = TRUE)) ## ## for> for(i in unique(f)) print(i) ## [1] "c" ## [1] "a" ## [1] "d" ## [1] "b" ## [1] "e" ``` --- # Packages Vignette ```r vignette("dplyr") browseVignettes("dplyr") ``` --- class: inverse, center, middle # Data wrangling and Visualization using Tidyverse --- class: center, middle # Data Science Framework There are now like, you know, a billion venn diagrams showing you what data science is. But to me I think the definition is pretty simple. Whenever you're struggling with data, trying to understand what's going on with data, whenever you're trying to turn that **raw data into insight and understanding and discoveries**. I think that's **Data Science.**" - Hadley Wickham <figure> <img src='images/hadley_data_science.png' width="80% /> <font size="2"> <figcaption> Source: <a href ="https://www.youtube.com/watch?v=cpbtcsGE0OA">Hadley Wickham</a> </figcaption> </font> </figure> --- # Tidyverse - An opinionated collection of R packages designed for data science. - All packages share an underlying design *philosophy, grammar, and data structures*. ```r install.packages("tidyverse") ``` ### tidyverse packages ```r tidyverse::tidyverse_packages() ``` ``` ## [1] "broom" "cli" "crayon" "dplyr" "dbplyr" ## [6] "forcats" "ggplot2" "haven" "hms" "httr" ## [11] "jsonlite" "lubridate" "magrittr" "modelr" "purrr" ## [16] "readr" "readxl\n(>=" "reprex" "rlang" "rstudioapi" ## [21] "rvest" "stringr" "tibble" "tidyr" "xml2" ## [26] "tidyverse" ``` --- # Loading the Library ```r library(tidyverse) ``` ``` ## ── Attaching packages ──────────────────────────────────────────────── tidyverse 1.2.1 ── ``` ``` ## ✔ ggplot2 3.1.0 ✔ purrr 0.2.5 ## ✔ tibble 2.0.0 ✔ dplyr 0.7.8 ## ✔ tidyr 0.8.2 ✔ stringr 1.3.1 ## ✔ readr 1.3.1 ✔ forcats 0.3.0 ``` ``` ## Warning: package 'tibble' was built under R version 3.5.2 ``` ``` ## ── Conflicts ─────────────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ``` --- # Input Data Reading the dataset ```r #kaggle <- read_csv("data/kaggle_survey_2018.csv") kaggle <- read_csv("data/kaggle_survey_2018.csv", skip = 1) ``` --- # Basic Stats ### Dimension (Rows Column) ```r dim(kaggle) ``` ``` ## [1] 23859 395 ``` ```r glimpse(kaggle) ``` ``` ## Observations: 23,859 ## Variables: 395 ## $ `Duration (in seconds)` <dbl> … ## $ `What is your gender? - Selected Choice` <chr> … ## $ `What is your gender? - Prefer to self-describe - Text` <dbl> … ## $ `What is your age (# years)?` <chr> … ## $ `In which country do you currently reside?` <chr> … ## $ `What is the highest level of formal education that you have attained or plan to attain within the next 2 years?` <chr> … ## $ `Which best describes your undergraduate major? - Selected Choice` <chr> … ## $ `Select the title most similar to your current role (or most recent title if retired): - Selected Choice` <chr> … ## $ `Select the title most similar to your current role (or most recent title if retired): - Other - Text` <dbl> … ## $ `In what industry is your current employer/contract (or your most recent employer if retired)? - Selected Choice` <chr> … ## $ `In what industry is your current employer/contract (or your most recent employer if retired)? - Other - Text` <dbl> … ## $ `How many years of experience do you have in your current role?` <chr> … ## $ `What is your current yearly compensation (approximate $USD)?` <chr> … ## $ `Does your current employer incorporate machine learning methods into their business?` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Analyze and understand data to influence product or business decisions` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build and/or run a machine learning service that operationally improves my product or workflows` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build and/or run the data infrastructure that my business uses for storing, analyzing, and operationalizing data` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Build prototypes to explore applying machine learning to new areas` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Do research that advances the state of the art of machine learning` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - None of these activities are an important part of my role at work` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Select any activities that make up an important part of your role at work: (Select all that apply) - Other - Text` <dbl> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Selected Choice` <chr> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Basic statistical software (Microsoft Excel, Google Sheets, etc.) - Text` <dbl> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Advanced statistical software (SPSS, SAS, etc.) - Text` <dbl> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Business intelligence software (Salesforce, Tableau, Spotfire, etc.) - Text` <dbl> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Local or hosted development environments (RStudio, JupyterLab, etc.) - Text` <dbl> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Cloud-based data software & APIs (AWS, GCP, Azure, etc.) - Text` <dbl> … ## $ `What is the primary tool that you use at work or school to analyze data? (include text response) - Other - Text` <dbl> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Jupyter/IPython` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - RStudio` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - PyCharm` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Visual Studio Code` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - nteract` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Atom` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - MATLAB` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Visual Studio` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Notepad++` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Sublime Text` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Vim` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IntelliJ` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Spyder` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Which of the following integrated development environments (IDE's) have you used at work or school in the last 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Kaggle Kernels` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Colab` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Notebook` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Domino Datalab` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Datalab` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Paperspace` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Floydhub` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Crestle` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - JupyterHub/Binder` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Which of the following hosted notebooks have you used at work or school in the last 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Platform (GCP)` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Web Services (AWS)` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Microsoft Azure` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Alibaba Cloud` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - I have not used any cloud providers` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Which of the following cloud computing services have you used at work or school in the last 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Python` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - R` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SQL` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Bash` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Java` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Javascript/Typescript` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Visual Basic/VBA` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C/C++` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - MATLAB` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Scala` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Julia` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Go` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - C#/.NET` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - PHP` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Ruby` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - SAS/STATA` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - None` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What programming languages do you use on a regular basis? (Select all that apply) - Other - Text` <dbl> … ## $ `What specific programming language do you use most often? - Selected Choice` <chr> … ## $ `What specific programming language do you use most often? - Other - Text` <dbl> … ## $ `What programming language would you recommend an aspiring data scientist to learn first? - Selected Choice` <chr> … ## $ `What programming language would you recommend an aspiring data scientist to learn first? - Other - Text` <dbl> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Scikit-Learn` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - TensorFlow` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Keras` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - PyTorch` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Spark MLlib` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - H20` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Fastai` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Mxnet` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Caret` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Xgboost` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - mlr` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Prophet` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - randomForest` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - lightgbm` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - catboost` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - CNTK` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Caffe` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What machine learning frameworks have you used in the past 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Of the choices that you selected in the previous question, which ML library have you used the most? - Selected Choice` <chr> … ## $ `Of the choices that you selected in the previous question, which ML library have you used the most? - Other - Text` <dbl> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - ggplot2` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Matplotlib` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Altair` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Shiny` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - D3` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Plotly` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Bokeh` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Seaborn` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Geoplotlib` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Leaflet` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Lattice` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What data visualization libraries or tools have you used in the past 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Of the choices that you selected in the previous question, which specific data visualization library or tool have you used the most? - Selected Choice` <chr> … ## $ `Of the choices that you selected in the previous question, which specific data visualization library or tool have you used the most? - Other - Text` <dbl> … ## $ `Approximately what percent of your time at work or school is spent actively coding?` <chr> … ## $ `How long have you been writing code to analyze data?` <chr> … ## $ `For how many years have you used machine learning methods (at work or in school)?` <chr> … ## $ `Do you consider yourself to be a data scientist?` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - AWS Elastic Compute Cloud (EC2)` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Google Compute Engine` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - AWS Elastic Beanstalk` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Google App Engine` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Google Kubernetes Engine` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - AWS Lambda` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Google Cloud Functions` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - AWS Batch` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Azure Virtual Machines` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Azure Container Service` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Azure Functions` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Azure Event Grid` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Azure Batch` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Azure Kubernetes Service` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - IBM Cloud Virtual Servers` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - IBM Cloud Container Registry` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - IBM Cloud Kubernetes Service` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - IBM Cloud Foundry` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - None` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Selected Choice - Other` <chr> … ## $ `Which of the following cloud computing products have you used at work or school in the last 5 years (Select all that apply)? - Other - Text` <dbl> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Transcribe` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Speech-to-text API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Rekognition` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Vision API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Comprehend` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Natural Language API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Translate` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Translation API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Lex` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Dialogflow Enterprise Edition` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon Rekognition Video` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Video Intelligence API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud AutoML` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Amazon SageMaker` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Machine Learning Engine` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - DataRobot` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - H20 Driverless AI` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Domino Datalab` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - SAS` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Dataiku` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - RapidMiner` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Instabase` <lgl> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Algorithmia` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Dataversity` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Cloudera` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Machine Learning Studio` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Machine Learning Workbench` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Cortana Intelligence Suite` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Bing Speech API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Speaker Recognition API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Computer Vision API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Face API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Video API` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Studio` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Knowledge Catalog` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Assistant` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Discovery` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Text to Speech` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Visual Recognition` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Watson Machine Learning` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Cognitive Services` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Which of the following machine learning products have you used at work or school in the last 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Relational Database Service` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Aurora` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud SQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Spanner` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS DynamoDB` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Datastore` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Bigtable` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS SimpleDB` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Microsoft SQL Server` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - MySQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - PostgresSQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - SQLite` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Oracle Database` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Ingres` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Microsoft Access` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - NexusDB` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - SAP IQ` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Fusion Tables` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Database for MySQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Cosmos DB` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure SQL Database` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Database for PostgreSQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud Compose` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud Compose for MySQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud Compose for PostgreSQL` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud Db2` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Which of the following relational database products have you used at work or school in the last 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Elastic MapReduce` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Batch` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Dataproc` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Dataflow` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Dataprep` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Kinesis` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google Cloud Pub/Sub` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Athena` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - AWS Redshift` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Google BigQuery` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Teradata` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Microsoft Analysis Services` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Oracle Exadata` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Oracle Warehouse Builder` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - SAP IQ` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Snowflake` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Databricks` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure SQL Data Warehouse` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure HDInsight` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Azure Stream Analytics` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM InfoSphere DataStorage` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud Analytics Engine` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - IBM Cloud Streaming Analytics` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - None` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Which of the following big data and analytics products have you used at work or school in the last 5 years? (Select all that apply) - Other - Text` <dbl> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Audio Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Categorical Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Genetic Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Geospatial Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Image Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Numerical Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Sensor Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Tabular Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Text Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Time Series Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Video Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Selected Choice - Other Data` <chr> … ## $ `Which types of data do you currently interact with most often at work or school? (Select all that apply) - Other Data - Text` <dbl> … ## $ `What is the type of data that you currently interact with most often at work or school? - Selected Choice` <chr> … ## $ `What is the type of data that you currently interact with most often at work or school? - Other Data - Text` <dbl> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Government websites` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - University research group websites` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Non-profit research group websites` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Dataset aggregator/platform (Socrata, Kaggle Public Datasets Platform, etc.)` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - I collect my own data (web-scraping, etc.)` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Publicly released data from private companies` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Google Search` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Google Dataset Search` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - GitHub` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - None/I do not work with public data` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Where do you find public datasets? (Select all that apply) - Other - Text` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Gathering data` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Cleaning data` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Visualizing data` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Model building/model selection` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Putting the model into production` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Finding insights in the data and communicating with stakeholders` <dbl> … ## $ `During a typical data science project at work or school, approximately what proportion of your time is devoted to the following? (Answers must add up to 100%) - Other` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - Self-taught` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - Online courses (Coursera, Udemy, edX, etc.)` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - Work` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - University` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - Kaggle competitions` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - Other` <dbl> … ## $ `What percentage of your current machine learning/data science training falls under each category? (Answers must add up to 100%) - Other - Text` <dbl> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udacity` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Coursera` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - edX` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - DataCamp` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - DataQuest` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Kaggle Learn` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Fast.AI` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - developers.google.com` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Udemy` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - TheSchool.AI` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Online University Courses` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - None` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `On which online platforms have you begun or completed data science courses? (Select all that apply) - Other - Text` <dbl> … ## $ `On which online platform have you spent the most amount of time? - Selected Choice` <chr> … ## $ `On which online platform have you spent the most amount of time? - Other - Text` <dbl> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Twitter` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Hacker News` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - r/machinelearning` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Kaggle forums` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Fastai forums` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Siraj Raval YouTube Channel` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - DataTau News Aggregator` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Linear Digressions Podcast` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Cloud AI Adventures (YouTube)` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - FiveThirtyEight.com` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - ArXiv & Preprints` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Journal Publications` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - FastML Blog` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - KDnuggets Blog` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - O'Reilly Data Newsletter` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Partially Derivative Podcast` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - The Data Skeptic Podcast` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Medium Blog Posts` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Towards Data Science Blog` <lgl> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Analytics Vidhya Blog` <lgl> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - None/I do not know` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `Who/what are your favorite media sources that report on data science topics? (Select all that apply) - Other - Text` <dbl> … ## $ `How do you perceive the quality of online learning platforms and in-person bootcamps as compared to the quality of the education provided by traditional brick and mortar institutions? - Online learning platforms and MOOCs:` <chr> … ## $ `How do you perceive the quality of online learning platforms and in-person bootcamps as compared to the quality of the education provided by traditional brick and mortar institutions? - In-person bootcamps:` <chr> … ## $ `Which better demonstrates expertise in data science: academic achievements or independent projects? - Your views:` <chr> … ## $ `How do you perceive the importance of the following topics? - Fairness and bias in ML algorithms:` <chr> … ## $ `How do you perceive the importance of the following topics? - Being able to explain ML model outputs and/or predictions` <chr> … ## $ `How do you perceive the importance of the following topics? - Reproducibility in data science` <chr> … ## $ `What metrics do you or your organization use to determine whether or not your models were successful? (Select all that apply) - Selected Choice - Revenue and/or business goals` <chr> … ## $ `What metrics do you or your organization use to determine whether or not your models were successful? (Select all that apply) - Selected Choice - Metrics that consider accuracy` <chr> … ## $ `What metrics do you or your organization use to determine whether or not your models were successful? (Select all that apply) - Selected Choice - Metrics that consider unfair bias` <chr> … ## $ `What metrics do you or your organization use to determine whether or not your models were successful? (Select all that apply) - Selected Choice - Not applicable (I am not involved with an organization that builds ML models)` <chr> … ## $ `What metrics do you or your organization use to determine whether or not your models were successful? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What metrics do you or your organization use to determine whether or not your models were successful? (Select all that apply) - Other - Text` <dbl> … ## $ `Approximately what percent of your data projects involved exploring unfair bias in the dataset and/or algorithm?` <chr> … ## $ `What do you find most difficult about ensuring that your algorithms are fair and unbiased? (Select all that apply) - Lack of communication between individuals who collect the data and individuals who analyze the data` <chr> … ## $ `What do you find most difficult about ensuring that your algorithms are fair and unbiased? (Select all that apply) - Difficulty in identifying groups that are unfairly targeted` <chr> … ## $ `What do you find most difficult about ensuring that your algorithms are fair and unbiased? (Select all that apply) - Difficulty in collecting enough data about groups that may be unfairly targeted` <chr> … ## $ `What do you find most difficult about ensuring that your algorithms are fair and unbiased? (Select all that apply) - Difficulty in identifying and selecting the appropriate evaluation metrics` <chr> … ## $ `What do you find most difficult about ensuring that your algorithms are fair and unbiased? (Select all that apply) - I have never found any difficulty in this task` <chr> … ## $ `What do you find most difficult about ensuring that your algorithms are fair and unbiased? (Select all that apply) - I have never performed this task` <chr> … ## $ `In what circumstances would you explore model insights and interpret your model's predictions? (Select all that apply) - Only for very important models that are already in production` <chr> … ## $ `In what circumstances would you explore model insights and interpret your model's predictions? (Select all that apply) - For all models right before putting the model in production` <chr> … ## $ `In what circumstances would you explore model insights and interpret your model's predictions? (Select all that apply) - When determining whether it is worth it to put the model into production` <chr> … ## $ `In what circumstances would you explore model insights and interpret your model's predictions? (Select all that apply) - When building a model that was specifically designed to produce such insights` <chr> … ## $ `In what circumstances would you explore model insights and interpret your model's predictions? (Select all that apply) - When first exploring a new ML model or dataset` <chr> … ## $ `In what circumstances would you explore model insights and interpret your model's predictions? (Select all that apply) - I do not explore and interpret model insights and predictions` <chr> … ## $ `Approximately what percent of your data projects involve exploring model insights?` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Examine individual model coefficients` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Examine feature correlations` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Examine feature importances` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Plot decision boundaries` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Create partial dependence plots` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Dimensionality reduction techniques` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Attention mapping/saliency mapping` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Plot predicted vs. actual results` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Print out a decision tree` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Sensitivity analysis/perturbation importance` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - LIME functions` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - ELI5 functions` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - SHAP functions` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - None/I do not use these model explanation techniques` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What methods do you prefer for explaining and/or interpreting decisions that are made by ML models? (Select all that apply) - Other - Text` <chr> … ## $ `Do you consider ML models to be "black boxes" with outputs that are difficult or impossible to explain?` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Share code on Github or a similar code-sharing repository` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Share both data and code on Github or a similar code-sharing repository` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Share data, code, and environment using a hosted service (Kaggle Kernels, Google Colaboratory, Amazon SageMaker, etc.)` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Share data, code, and environment using containers (Docker, etc.)` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Share code, data, and environment using virtual machines (VirtualBox, etc.)` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Make sure the code is well documented` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Make sure the code is human-readable` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Define all random seeds` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Define relative rather than absolute file paths` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Include a text file describing all dependencies` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - None/I do not make my work easy for others to reproduce` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What tools and methods do you use to make your work easy to reproduce? (Select all that apply) - Other - Text` <dbl> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - Too expensive` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - Too time-consuming` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - Requires too much technical knowledge` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - Afraid that others will use my work without giving proper credit` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - Not enough incentives to share my work` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - I had never considered making my work easier for others to reproduce` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - None of these reasons apply to me` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Selected Choice - Other` <chr> … ## $ `What barriers prevent you from making your work even easier to reuse and reproduce? (Select all that apply) - Other - Text` <dbl> … ``` --- class: inverse, center # Dataset Overview ## Demo on RStudio --- # Data Questions (Business Problem) - What's the percentage of Male and Female respondents? - What are the top 5 countries? --- # dyplr verbs - `mutate()` - adds new variables that are functions of existing variables - `select()` - picks variables based on their names. - `filter()` - picks cases based on their values. - `summarise()` - reduces multiple values down to a single summary. - `arrange()` - changes the ordering of the rows. --- # Introducing %>% Pipe Operator - The pipe, `%>%`, comes from the magrittr package by Stefan Milton Bache - **Output of LHS** is given as the **input (first argument) of RHS** ### Example ```r kaggle %>% dim() ``` ``` ## [1] 23859 395 ``` Although doesn't make much sense to use `%>%` in this context, Hope it explains the function. --- # Percentage of Male and Female * Column name - `What is your gender? - Selected Choice` ### Pseudo-code - `group_by` the `kaggle` dataframe on column `What is your gender? - Selected Choice` - `count` the values - calculate `percentage` value from the `count`s --- # % of Male and Female - Group By & Count - Method 1 ```r kaggle %>% group_by(`What is your gender? - Selected Choice`) %>% summarise(n = n()) ``` ``` ## # A tibble: 4 x 2 ## `What is your gender? - Selected Choice` n ## <chr> <int> ## 1 Female 4010 ## 2 Male 19430 ## 3 Prefer not to say 340 ## 4 Prefer to self-describe 79 ``` --- # % of Male and Female - Group By & Count - Method 2 ```r kaggle %>% group_by(`What is your gender? - Selected Choice`) %>% count() ``` ``` ## # A tibble: 4 x 2 ## # Groups: What is your gender? - Selected Choice [4] ## `What is your gender? - Selected Choice` n ## <chr> <int> ## 1 Female 4010 ## 2 Male 19430 ## 3 Prefer not to say 340 ## 4 Prefer to self-describe 79 ``` --- # % of Male and Female - Group By & Count - Sorted ```r kaggle %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% arrange(desc(n)) ``` ``` ## # A tibble: 4 x 2 ## # Groups: What is your gender? - Selected Choice [4] ## `What is your gender? - Selected Choice` n ## <chr> <int> ## 1 Male 19430 ## 2 Female 4010 ## 3 Prefer not to say 340 ## 4 Prefer to self-describe 79 ``` --- # % of Male and Female - Percentage ```r kaggle %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% ungroup() %>% mutate(perc = round(n / sum(n),2)) ``` ``` ## # A tibble: 4 x 3 ## `What is your gender? - Selected Choice` n perc ## <chr> <int> <dbl> ## 1 Female 4010 0.17 ## 2 Male 19430 0.81 ## 3 Prefer not to say 340 0.01 ## 4 Prefer to self-describe 79 0 ``` --- # % of Male and Female - Nice_Looking_Table ```r kaggle %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% ungroup() %>% mutate(perc = round(n / sum(n),2)) %>% knitr::kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> What is your gender? - Selected Choice </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> perc </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 4010 </td> <td style="text-align:right;"> 0.17 </td> </tr> <tr> <td style="text-align:left;"> Male </td> <td style="text-align:right;"> 19430 </td> <td style="text-align:right;"> 0.81 </td> </tr> <tr> <td style="text-align:left;"> Prefer not to say </td> <td style="text-align:right;"> 340 </td> <td style="text-align:right;"> 0.01 </td> </tr> <tr> <td style="text-align:left;"> Prefer to self-describe </td> <td style="text-align:right;"> 79 </td> <td style="text-align:right;"> 0.00 </td> </tr> </tbody> </table> --- class: inverse,center,middle # But, Wait!!! ## Go Back and See ### If you have only `Male` and `Female`? --- class: inverse,center,middle # Time for some cleaning ## In the form of `filter()`ing --- # % of Male and Female - Filtered_Nice ```r kaggle %>% filter(`What is your gender? - Selected Choice` %in% c("Male","Female")) %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% ungroup() %>% mutate(perc = round(n / sum(n),2)) %>% knitr::kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> What is your gender? - Selected Choice </th> <th style="text-align:right;"> n </th> <th style="text-align:right;"> perc </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 4010 </td> <td style="text-align:right;"> 0.17 </td> </tr> <tr> <td style="text-align:left;"> Male </td> <td style="text-align:right;"> 19430 </td> <td style="text-align:right;"> 0.83 </td> </tr> </tbody> </table> --- class: inverse,center,middle # An Awkward column name, isn't it??! --- # % of Male and Female - All_Nice_Table ```r library(scales) #for Percentage Formatting ``` ``` ## ## Attaching package: 'scales' ``` ``` ## The following object is masked from 'package:purrr': ## ## discard ``` ``` ## The following object is masked from 'package:readr': ## ## col_factor ``` ```r kaggle %>% filter(`What is your gender? - Selected Choice` %in% c("Male","Female")) %>% group_by(`What is your gender? - Selected Choice`) %>% count() %>% ungroup() %>% mutate(perc = round(n / sum(n),2)) %>% mutate(perc = scales::percent(perc)) %>% rename(Gender = `What is your gender? - Selected Choice`, Count = n, Percentage = perc) %>% knitr::kable(format = "html") ``` <table> <thead> <tr> <th style="text-align:left;"> Gender </th> <th style="text-align:right;"> Count </th> <th style="text-align:left;"> Percentage </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Female </td> <td style="text-align:right;"> 4010 </td> <td style="text-align:left;"> 17.0% </td> </tr> <tr> <td style="text-align:left;"> Male </td> <td style="text-align:right;"> 19430 </td> <td style="text-align:left;"> 83.0% </td> </tr> </tbody> </table> --- # Top 5 Countries * Column name - `In which country do you currently reside?` ### Pseudo-code - `count` number of respondents from each country - `arrange` countries in descending order based on their count value - `top 5` in the list is the output --- # Top 5 Countries - Code ```r kaggle %>% count(`In which country do you currently reside?`) %>% arrange(desc(n)) %>% top_n(5) %>% knitr::kable(format = "html") ``` ``` ## Selecting by n ``` <table> <thead> <tr> <th style="text-align:left;"> In which country do you currently reside? </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> United States of America </td> <td style="text-align:right;"> 4716 </td> </tr> <tr> <td style="text-align:left;"> India </td> <td style="text-align:right;"> 4417 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:right;"> 1644 </td> </tr> <tr> <td style="text-align:left;"> Other </td> <td style="text-align:right;"> 1036 </td> </tr> <tr> <td style="text-align:left;"> Russia </td> <td style="text-align:right;"> 879 </td> </tr> </tbody> </table> --- class: inverse,center,middle # Is `Other` a country name??? --- # Top 5 Countries ```r kaggle %>% filter(!`In which country do you currently reside?` %in% "Other") %>% count(`In which country do you currently reside?`) %>% rename(Country = `In which country do you currently reside?`) %>% arrange(desc(n)) %>% top_n(5) %>% knitr::kable(format = "html") ``` ``` ## Selecting by n ``` <table> <thead> <tr> <th style="text-align:left;"> Country </th> <th style="text-align:right;"> n </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> United States of America </td> <td style="text-align:right;"> 4716 </td> </tr> <tr> <td style="text-align:left;"> India </td> <td style="text-align:right;"> 4417 </td> </tr> <tr> <td style="text-align:left;"> China </td> <td style="text-align:right;"> 1644 </td> </tr> <tr> <td style="text-align:left;"> Russia </td> <td style="text-align:right;"> 879 </td> </tr> <tr> <td style="text-align:left;"> Brazil </td> <td style="text-align:right;"> 736 </td> </tr> </tbody> </table> --- class: inverse,center,middle # Table is nice, but a visually appealing plot is Nicer ## 😉 --- # Top 5 Countries - Plot #1 ```r kaggle %>% filter(!`In which country do you currently reside?` %in% "Other") %>% count(`In which country do you currently reside?`) %>% rename(Country = `In which country do you currently reside?`) %>% arrange(desc(n)) %>% top_n(5) %>% ggplot() + geom_bar(aes(Country,n), stat = "identity") + coord_flip() + theme_minimal() + labs(title = "Top 5 Countries", subtitle = "From where Kaggle Survey Respondentns reside", x = "Country", y = "Number of Respondents", caption = "Data Source: Kaggle Survey 2018") ``` --- # Top 5 Countries - Plot #2 ![](presentation_files/figure-html/countries2-1.png)<!-- --> --- # Top 5 Countries - Plot #3 Themed ![](presentation_files/figure-html/countries3-1.png)<!-- --> --- class: inverse, center, middle # Documentation and Reporting using R Markdown ## Demo --- class: inverse, center, middle # Project Demo --- class: center, middle # Object Detection in 3 Lines of R Code ## using Tiny YOLO ### -Project Demo- --- # References - [R for Data Science](https://r4ds.had.co.nz/) - [R-Bloggers](https://www.r-bloggers.com/) --- class: center, middle # Thanks! Slides created via the R package [**xaringan**](https://github.com/yihui/xaringan). The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com).