3 Beginner Level R
3.1 Table of Contents
- Introduction to R
- Installing R and RStudio
- Basic Syntax and Operations
- Variables and Data Types
- Vectors
- Matrices and Arrays
- Data Frames
- Lists
- Control Structures
- Functions
- Basic Plotting
- Reading and Writing Data
- Practice Exercises
3.2 Introduction to R
3.2.1 What is R?
R is a powerful programming language specifically designed for: - Statistical computing and data analysis - Data visualization and graphics - Machine learning and predictive modeling - Reproducible research and reporting
Key Features: - ✅ Free and open-source - ✅ 18,000+ packages on CRAN - ✅ Excellent data visualization (ggplot2) - ✅ Strong statistical capabilities - ✅ Active global community - ✅ Industry standard in data science
3.2.2 Why Learn R?
Career Opportunities: - Data Scientist - Statistical Analyst - Bioinformatician - Financial Analyst - Research Scientist
Industries Using R: - Tech (Google, Facebook, Microsoft) - Finance (banks, hedge funds) - Healthcare and pharmaceuticals - Academia and research - Government agencies
3.3 Installing R and RStudio
3.3.1 Step 1: Install R
Windows:
1. Visit https://cran.r-project.org/
2. Click "Download R for Windows" → "base"
3. Download and run the .exe installer
4. Follow installation wizardmacOS:
1. Visit https://cran.r-project.org/
2. Click "Download R for macOS"
3. Download the .pkg file for your system
4. Open and installLinux (Ubuntu/Debian):
sudo apt update
sudo apt install r-base r-base-dev3.3.2 Step 2: Install RStudio
- Visit https://posit.co/download/rstudio-desktop/
- Download RStudio Desktop (Free)
- Install for your operating system
3.3.3 Verify Installation
# Check R version
R.version.string
# Output: "R version 4.3.0 (2023-04-21)"
# Check working directory
getwd()
# Set working directory
setwd("/path/to/your/folder")3.3.4 RStudio Interface
Four Main Panes:
- Source Editor (Top-Left)
- Write and edit R scripts (.R files)
- Create R Markdown documents
- View data tables
- Console (Bottom-Left)
- Execute R commands interactively
- View output and messages
- See errors and warnings
- Environment/History (Top-Right)
- View loaded objects and variables
- Browse command history
- Import datasets
- Files/Plots/Packages/Help (Bottom-Right)
- Navigate project files
- View generated plots
- Manage packages
- Access documentation
Essential Shortcuts:
| Action | Windows/Linux | macOS |
|---|---|---|
| Run line/selection | Ctrl + Enter | Cmd + Enter |
Assignment <- |
Alt + - | Option + - |
| Comment code | Ctrl + Shift + C | Cmd + Shift + C |
| Save script | Ctrl + S | Cmd + S |
| Clear console | Ctrl + L | Cmd + L |
Pipe %>% |
Ctrl + Shift + M | Cmd + Shift + M |
3.4 Basic Syntax and Operations
3.4.1 R as a Calculator
# Arithmetic operations
5 + 3 # Addition: 8
10 - 4 # Subtraction: 6
6 * 7 # Multiplication: 42
20 / 4 # Division: 5
2^3 # Exponentiation: 8
2**3 # Alternative exponentiation: 8
17 %% 5 # Modulo (remainder): 2
17 %/% 5 # Integer division: 3
# Order of operations (PEMDAS)
2 + 3 * 4 # Result: 14
(2 + 3) * 4 # Result: 20
# Mathematical functions
sqrt(16) # Square root: 4
abs(-10) # Absolute value: 10
log(10) # Natural log: 2.302585
log10(100) # Base 10 log: 2
exp(1) # e^1: 2.718282
sin(pi/2) # Sine: 1
cos(0) # Cosine: 1
tan(pi/4) # Tangent: 1
round(3.14159, 2) # Round: 3.14
ceiling(3.2) # Round up: 4
floor(3.8) # Round down: 33.4.3 Getting Help
# Access function documentation
?mean
help(mean)
# Search for topics
??"linear regression"
help.search("regression")
# Get examples
example(mean)
# View function arguments
args(plot)
# See available methods
methods(plot)
# Find package documentation
help(package = "dplyr")3.5 Variables and Data Types
3.5.1 Creating Variables
# Assignment operator: <- (preferred)
x <- 10
y <- 5
name <- "Alice"
# Alternative: = (works but <- is R convention)
x = 10
# Print variable
print(x)
x # Auto-print
# Multiple assignments
a <- b <- c <- 0
# Chained operations
result <- (x + y) * 23.5.2 Variable Naming Rules
# Valid names
my_var <- 1
myVar <- 2
my.var <- 3
my_var_123 <- 4
.hidden <- 5 # Starts with dot (hidden variable)
# Invalid names (will cause errors)
# 2var <- 1 # Can't start with number
# my-var <- 2 # Hyphens not allowed
# my var <- 3 # Spaces not allowed
# for <- 4 # Reserved wordBest Practices:
# Use snake_case (recommended for R)
student_count <- 30
average_score <- 85.5
max_temperature <- 98.6
# Be descriptive
total_sales <- 1000000 # Good
ts <- 1000000 # Bad (unclear)
# Use meaningful names
user_age <- 25 # Good
x <- 25 # Bad (not descriptive)3.5.3 Data Types
3.5.3.1 1. Numeric (Double)
x <- 10.5
class(x) # "numeric"
typeof(x) # "double"
is.numeric(x) # TRUE
# Scientific notation
large_num <- 1.5e6 # 1,500,000
small_num <- 3.2e-4 # 0.000323.5.3.2 2. Integer
y <- 5L # L suffix creates integer
class(y) # "integer"
typeof(y) # "integer"
is.integer(y) # TRUE
# Convert to integer
as.integer(10.7) # 10 (truncates)3.5.3.3 3. Character (String)
name <- "John Doe"
class(name) # "character"
# Single or double quotes
greeting1 <- "Hello"
greeting2 <- 'Hello'
# String operations
paste("Hello", "World") # "Hello World"
paste0("Hello", "World") # "HelloWorld" (no space)
toupper("hello") # "HELLO"
tolower("HELLO") # "hello"
nchar("Hello") # 5 (character count)
substr("Hello World", 1, 5) # "Hello"3.5.3.4 4. Logical (Boolean)
is_student <- TRUE
is_employed <- FALSE
class(is_student) # "logical"
# Logical operators
TRUE & FALSE # AND: FALSE
TRUE | FALSE # OR: TRUE
!TRUE # NOT: FALSE
# Comparison operators
5 > 3 # TRUE
5 < 3 # FALSE
5 >= 5 # TRUE
5 == 5 # TRUE (equality)
5 != 3 # TRUE (not equal)3.5.3.5 5. Complex
z <- 3 + 2i
class(z) # "complex"
Re(z) # Real part: 3
Im(z) # Imaginary part: 2
Mod(z) # Modulus: 3.6055513.5.4 Type Conversion
# Convert between types
as.numeric("123") # 123
as.character(123) # "123"
as.integer(10.7) # 10
as.logical(1) # TRUE
as.logical(0) # FALSE
# Failed conversion
as.numeric("abc") # NA (with warning)
# Check for NA
is.na(NA) # TRUE3.5.5 Special Values
# NA: Not Available (missing value)
x <- NA
is.na(x) # TRUE
# NULL: Empty/undefined
y <- NULL
is.null(y) # TRUE
length(NULL) # 0
# NaN: Not a Number
z <- 0/0
is.nan(z) # TRUE
# Inf: Infinity
w <- 1/0
is.infinite(w) # TRUE
# Differences
length(NA) # 1
length(NULL) # 0
NA + 5 # NA
NULL + 5 # Error3.6 Vectors
3.6.1 Creating Vectors
# Using c() function (combine/concatenate)
numbers <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie")
logicals <- c(TRUE, FALSE, TRUE, TRUE)
# Sequences
seq1 <- 1:10 # 1 2 3 4 5 6 7 8 9 10
seq2 <- 10:1 # Descending
seq3 <- seq(1, 10, by=2) # 1 3 5 7 9
seq4 <- seq(0, 1, length.out=5) # 0.00 0.25 0.50 0.75 1.00
# Repeated values
rep1 <- rep(1, times=5) # 1 1 1 1 1
rep2 <- rep(c(1,2), times=3) # 1 2 1 2 1 2
rep3 <- rep(c(1,2), each=3) # 1 1 1 2 2 2
rep4 <- rep(1:3, each=2, times=2) # 1 1 2 2 3 3 1 1 2 2 3 33.6.2 Vector Properties
x <- c(10, 20, 30, 40, 50)
length(x) # 5
class(x) # "numeric"
typeof(x) # "double"
is.vector(x) # TRUE3.6.3 Vector Arithmetic
x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)
# Element-wise operations
x + y # 11 22 33 44 55
x - y # -9 -18 -27 -36 -45
x * y # 10 40 90 160 250
x / y # 0.1 0.1 0.1 0.1 0.1
x^2 # 1 4 9 16 25
# Scalar operations
x + 10 # 11 12 13 14 15
x * 2 # 2 4 6 8 10
# Vector recycling (shorter vector repeats)
c(1, 2, 3) + c(10, 20) # 11 22 13 (with warning)3.6.4 Indexing and Subsetting
x <- c(10, 20, 30, 40, 50)
# Positive indexing (1-based!)
x[1] # 10 (first element)
x[3] # 30
x[c(1, 3, 5)] # 10 30 50
x[1:3] # 10 20 30
# Negative indexing (exclude)
x[-1] # 20 30 40 50
x[-c(1, 3)] # 20 40 50
x[-(1:3)] # 40 50
# Logical indexing
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)] # 10 30 50
x[x > 25] # 30 40 50
x[x >= 20 & x <= 40] # 20 30 40
# Named vectors
ages <- c(Alice=25, Bob=30, Charlie=35)
ages["Alice"] # 25
ages[c("Alice", "Charlie")] # 25 35
names(ages) # "Alice" "Bob" "Charlie"3.6.5 Modifying Vectors
x <- c(10, 20, 30, 40, 50)
# Replace elements
x[3] <- 99 # c(10, 20, 99, 40, 50)
x[c(1, 5)] <- 0 # Replace multiple
# Append elements
x <- c(x, 60) # Add to end
x <- c(0, x) # Add to beginning
x <- append(x, 100, after=3) # Insert at position
# Remove elements
x <- x[-1] # Remove first
x <- x[x != 30] # Remove all 30s3.6.6 Vector Functions
x <- c(3, 7, 2, 9, 1, 5)
# Basic statistics
length(x) # 6
sum(x) # 27
mean(x) # 4.5
median(x) # 4
min(x) # 1
max(x) # 9
range(x) # 1 9
sd(x) # Standard deviation: 2.88
var(x) # Variance: 8.3
quantile(x) # Quartiles
# Sorting
sort(x) # 1 2 3 5 7 9
sort(x, decreasing=TRUE) # 9 7 5 3 2 1
order(x) # Indices for sorting
rank(x) # Ranks
rev(x) # Reverse: 5 1 9 2 7 3
# Unique and duplicates
y <- c(1, 2, 2, 3, 3, 3, 4)
unique(y) # 1 2 3 4
duplicated(y) # FALSE FALSE TRUE FALSE TRUE TRUE FALSE
table(y) # Frequency table
# Cumulative functions
cumsum(c(1,2,3,4)) # 1 3 6 10
cumprod(c(1,2,3,4)) # 1 2 6 24
cummin(c(3,1,4,1,5)) # 3 1 1 1 1
cummax(c(3,1,4,1,5)) # 3 3 4 4 53.6.7 Vector Operations
# Set operations
a <- c(1, 2, 3, 4, 5)
b <- c(4, 5, 6, 7, 8)
union(a, b) # 1 2 3 4 5 6 7 8
intersect(a, b) # 4 5
setdiff(a, b) # 1 2 3
setequal(a, b) # FALSE
# Element-wise comparison
x <- c(1, 2, 3)
y <- c(3, 2, 1)
x == y # FALSE TRUE FALSE
x > y # FALSE FALSE TRUE
all(x > 0) # TRUE (all elements)
any(x > 2) # TRUE (at least one)
# Which indices
x <- c(10, 25, 30, 15, 40)
which(x > 20) # 2 3 5
which.max(x) # 5
which.min(x) # 13.7 Matrices and Arrays
3.7.1 Creating Matrices
# Using matrix() function
mat1 <- matrix(1:12, nrow=3, ncol=4)
# [,1] [,2] [,3] [,4]
# [1,] 1 4 7 10
# [2,] 2 5 8 11
# [3,] 3 6 9 12
# Fill by row
mat2 <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
# Combining vectors
mat3 <- rbind(c(1,2,3), c(4,5,6)) # Row bind
mat4 <- cbind(c(1,2,3), c(4,5,6)) # Column bind
# Identity matrix
diag(3) # 3x3 identity matrix3.7.2 Matrix Operations
A <- matrix(1:4, nrow=2, ncol=2)
B <- matrix(5:8, nrow=2, ncol=2)
# Element-wise operations
A + B # Addition
A - B # Subtraction
A * B # Element-wise multiplication
A / B # Element-wise division
# Matrix multiplication
A %*% B # True matrix multiplication
# Transpose
t(A) # Transpose
# Matrix functions
det(A) # Determinant
solve(A) # Inverse (if invertible)
eigen(A) # Eigenvalues and eigenvectors3.7.3 Matrix Indexing
mat <- matrix(1:12, nrow=3, ncol=4)
# Single element
mat[2, 3] # Row 2, Column 3
# Entire row
mat[2, ] # All of row 2
# Entire column
mat[, 3] # All of column 3
# Submatrix
mat[1:2, 2:3]
# Named dimensions
rownames(mat) <- c("A", "B", "C")
colnames(mat) <- c("W", "X", "Y", "Z")
mat["A", "X"]3.7.4 Matrix Functions
mat <- matrix(1:12, nrow=3, ncol=4)
nrow(mat) # 3
ncol(mat) # 4
dim(mat) # c(3, 4)
rowSums(mat) # Sum each row
colSums(mat) # Sum each column
rowMeans(mat) # Mean of each row
colMeans(mat) # Mean of each column
# Apply function
apply(mat, 1, sum) # Apply to rows (margin=1)
apply(mat, 2, mean) # Apply to columns (margin=2)3.8 Data Frames
3.8.1 Creating Data Frames
# Using data.frame()
df <- data.frame(
name = c("Alice", "Bob", "Charlie", "David"),
age = c(25, 30, 35, 28),
salary = c(50000, 60000, 75000, 55000),
employed = c(TRUE, TRUE, FALSE, TRUE),
stringsAsFactors = FALSE # Keep strings as character
)
print(df)
View(df) # Opens in viewer pane3.8.2 Exploring Data Frames
# Structure and summary
str(df) # Data structure
summary(df) # Statistical summary
glimpse(df) # dplyr alternative (if installed)
# Dimensions
nrow(df) # Number of rows
ncol(df) # Number of columns
dim(df) # Both dimensions
# Column names
names(df)
colnames(df)
# First/last rows
head(df, 3) # First 3 rows
tail(df, 2) # Last 2 rows3.8.3 Accessing Data Frame Elements
# Access columns
df$name # $ operator (returns vector)
df[["name"]] # [[ ]] (returns vector)
df[, "name"] # [ ] (returns vector)
df["name"] # [ ] (returns data frame)
df[, 1] # By index
# Multiple columns
df[, c("name", "age")]
df[, 1:2]
# Access rows
df[1, ] # First row
df[1:2, ] # First two rows
# Specific cell
df[1, "age"] # Row 1, age column
df$age[1] # First element of age3.8.4 Filtering Data Frames
# Logical subsetting
df[df$age > 28, ]
df[df$employed == TRUE, ]
df[df$salary >= 60000, ]
# Multiple conditions
df[df$age > 25 & df$employed == TRUE, ]
df[df$age < 26 | df$age > 30, ]
# Using subset() function
subset(df, age > 28)
subset(df, age > 25 & employed == TRUE)
subset(df, age > 28, select = c(name, salary))3.8.5 Modifying Data Frames
# Add new column
df$bonus <- df$salary * 0.1
df$age_group <- ifelse(df$age < 30, "Young", "Senior")
# Modify existing column
df$age <- df$age + 1
# Add new row
new_row <- data.frame(
name = "Eve",
age = 32,
salary = 70000,
employed = TRUE
)
df <- rbind(df, new_row)
# Remove column
df$bonus <- NULL
df <- df[, -4] # Remove 4th column
# Remove row
df <- df[-5, ] # Remove 5th row3.8.6 Sorting Data Frames
# Sort by one column
df[order(df$age), ] # Ascending
df[order(-df$age), ] # Descending
df[order(df$age, decreasing=TRUE), ]
# Sort by multiple columns
df[order(df$employed, -df$salary), ] # employed asc, salary desc3.9 Lists
3.9.1 Creating Lists
# Lists can contain different types
my_list <- list(
numbers = c(1, 2, 3, 4, 5),
name = "John",
matrix = matrix(1:4, nrow=2),
df = data.frame(x=1:3, y=4:6),
nested = list(a=1, b=2)
)
print(my_list)
str(my_list)3.9.2 Accessing List Elements
# Using [[ ]] returns the element
my_list[[1]] # First element
my_list[["numbers"]] # By name
# Using $ returns the element
my_list$numbers
# Using [ ] returns a sublist
my_list[1] # List containing first element
my_list[c(1, 3)] # List with elements 1 and 3
# Nested access
my_list$nested$a
my_list[["df"]]$x3.9.3 Modifying Lists
# Add elements
my_list$new_item <- "New value"
my_list[[6]] <- c(10, 20)
# Remove elements
my_list$new_item <- NULL
my_list[[6]] <- NULL
# List functions
length(my_list) # Number of elements
names(my_list) # Element names3.10 Control Structures
3.10.1 If-Else Statements
# Basic if
x <- 10
if (x > 5) {
print("x is greater than 5")
}
# If-else
if (x > 15) {
print("x is large")
} else {
print("x is not large")
}
# If-else if-else
score <- 75
if (score >= 90) {
grade <- "A"
} else if (score >= 80) {
grade <- "B"
} else if (score >= 70) {
grade <- "C"
} else if (score >= 60) {
grade <- "D"
} else {
grade <- "F"
}
# Vectorized ifelse
scores <- c(95, 82, 78, 65)
grades <- ifelse(scores >= 90, "A",
ifelse(scores >= 80, "B",
ifelse(scores >= 70, "C", "F")))3.10.2 For Loops
# Basic for loop
for (i in 1:5) {
print(i)
}
# Loop over vector
fruits <- c("apple", "banana", "cherry")
for (fruit in fruits) {
print(paste("I like", fruit))
}
# Loop with index
for (i in seq_along(fruits)) {
print(paste(i, ":", fruits[i]))
}
# Nested loops
for (i in 1:3) {
for (j in 1:3) {
print(paste(i, "*", j, "=", i*j))
}
}3.10.3 While Loops
# Basic while loop
count <- 1
while (count <= 5) {
print(count)
count <- count + 1
}
# While with condition
x <- 1
while (x < 100) {
print(x)
x <- x * 2
}3.10.4 Loop Control
# break: exit loop
for (i in 1:10) {
if (i == 5) break
print(i)
}
# next: skip to next iteration
for (i in 1:10) {
if (i %% 2 == 0) next # Skip even numbers
print(i)
}3.11 Functions
3.11.1 Creating Functions
# Basic function
greet <- function() {
print("Hello!")
}
greet()
# Function with parameters
greet_person <- function(name) {
print(paste("Hello,", name))
}
greet_person("Alice")
# Function with return value
add_numbers <- function(a, b) {
result <- a + b
return(result)
}
sum_result <- add_numbers(5, 3)
# Implicit return (last expression)
multiply <- function(a, b) {
a * b # Automatically returned
}3.11.2 Default Parameters
power <- function(base, exponent = 2) {
base^exponent
}
power(5) # 25 (uses default)
power(5, 3) # 125
# Multiple defaults
greet <- function(name = "Guest", time = "day") {
paste("Good", time, name)
}
greet() # "Good day Guest"
greet("Alice") # "Good day Alice"
greet("Bob", "morning") # "Good morning Bob"3.11.3 Multiple Return Values
# Return a list
stats <- function(x) {
list(
mean = mean(x),
median = median(x),
sd = sd(x),
range = range(x)
)
}
result <- stats(c(1, 2, 3, 4, 5))
result$mean # 3
result$sd # Standard deviation3.11.4 Function Documentation
#' Calculate Circle Area
#'
#' @param radius Numeric value for circle radius
#' @return Area of the circle
#' @examples
#' circle_area(5)
circle_area <- function(radius) {
if (radius < 0) {
stop("Radius must be positive")
}
pi * radius^2
}3.12 Basic Plotting
3.12.1 Scatter Plots
# Basic scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 6)
plot(x, y)
# Customized scatter plot
plot(x, y,
main = "My Scatter Plot",
xlab = "X Axis",
ylab = "Y Axis",
col = "blue",
pch = 19, # Point type
cex = 1.5) # Point size3.12.2 Line Plots
# Line plot
plot(x, y, type = "l", col = "red", lwd = 2)
# Points and lines
plot(x, y, type = "b", col = "blue")
# Multiple lines
plot(x, y, type = "l", col = "blue")
lines(x, y*1.5, col = "red")
legend("topleft", legend = c("Series 1", "Series 2"),
col = c("blue", "red"), lty = 1)3.12.3 Bar Plots
# Simple bar plot
counts <- c(5, 10, 15, 20)
barplot(counts)
# Named bars
names(counts) <- c("A", "B", "C", "D")
barplot(counts,
main = "Bar Chart",
xlab = "Category",
ylab = "Count",
col = "steelblue")3.12.4 Histograms
# Generate random data
data <- rnorm(1000, mean = 50, sd = 10)
# Histogram
hist(data,
main = "Histogram",
xlab = "Value",
col = "lightblue",
breaks = 30)3.12.5 Box Plots
# Box plot
data1 <- rnorm(100, mean = 50)
data2 <- rnorm(100, mean = 60)
data3 <- rnorm(100, mean = 55)
boxplot(data1, data2, data3,
names = c("Group A", "Group B", "Group C"),
main = "Box Plot Comparison",
col = c("red", "blue", "green"))3.12.6 Saving Plots
# Save as PDF
pdf("myplot.pdf", width = 8, height = 6)
plot(x, y)
dev.off()
# Save as PNG
png("myplot.png", width = 800, height = 600)
plot(x, y)
dev.off()3.13 Reading and Writing Data
3.13.1 CSV Files
# Read CSV
df <- read.csv("data.csv")
df <- read.csv("data.csv", header = TRUE, stringsAsFactors = FALSE)
# Write CSV
write.csv(df, "output.csv", row.names = FALSE)3.13.2 Text Files
# Read delimited files
df <- read.table("data.txt", header = TRUE, sep = "\t")
# Write text file
write.table(df, "output.txt", sep = "\t", row.names = FALSE)3.13.3 Built-in Datasets
# Load built-in datasets
data(mtcars)
head(mtcars)
# See available datasets
data()
# Common datasets
data(iris)
data(airquality)
data(ChickWeight)3.13.4 RDS Files (R Native Format)
# Save R object
saveRDS(df, "mydata.rds")
# Read R object
df <- readRDS("mydata.rds")
# Save multiple objects
save(df, x, y, file = "myworkspace.RData")
# Load saved objects
load("myworkspace.RData")3.14 Practice Exercises
3.14.1 Exercise 1: Vector Operations
# Create a vector of ages
ages <- c(23, 45, 67, 89, 12, 34, 56, 78)
# Tasks:
# 1. Find mean, median, and standard deviation
# 2. Count how many ages are above 50
# 3. Create a new vector with ages below 40
# 4. Sort ages in descending order
# Solutions:
mean(ages)
median(ages)
sd(ages)
sum(ages > 50)
young <- ages[ages < 40]
sort(ages, decreasing = TRUE)3.14.2 Exercise 2: Data Frame Manipulation
# Create a data frame of students
students <- data.frame(
name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
math_score = c(85, 92, 78, 95, 88),
english_score = c(90, 85, 82, 91, 87),
age = c(20, 21, 19, 22, 20)
)
# Tasks:
# 1. Add a column for average score
# 2. Filter students with average > 85
# 3. Sort by math_score descending
# 4. Find the student with highest total score
# Solutions:
students$avg_score <- (students$math_score + students$english_score) / 2
students[students$avg_score > 85, ]
students[order(-students$math_score), ]
students$total <- students$math_score + students$english_score
students[which.max(students$total), ]3.14.3 Exercise 3: Functions
# Write a function that:
# 1. Takes a vector of numbers
# 2. Removes outliers (values > 2 SD from mean)
# 3. Returns cleaned vector
remove_outliers <- function(x) {
mean_x <- mean(x, na.rm = TRUE)
sd_x <- sd(x, na.rm = TRUE)
lower_bound <- mean_x - 2 * sd_x
upper_bound <- mean_x + 2 * sd_x
x[x >= lower_bound & x <= upper_bound]
}
# Test
data <- c(1, 2, 3, 4, 5, 100, 200)
remove_outliers(data)3.14.4 Exercise 4: Control Structures
# FizzBuzz: Print numbers 1-100, but:
# - "Fizz" for multiples of 3
# - "Buzz" for multiples of 5
# - "FizzBuzz" for multiples of both
for (i in 1:100) {
if (i %% 15 == 0) {
print("FizzBuzz")
} else if (i %% 3 == 0) {
print("Fizz")
} else if (i %% 5 == 0) {
print("Buzz")
} else {
print(i)
}
}3.14.5 Exercise 5: Data Analysis Project
# Use the built-in mtcars dataset
data(mtcars)
# Tasks:
# 1. Calculate average MPG by number of cylinders
# 2. Find cars with MPG > 25
# 3. Create a scatter plot of MPG vs Weight
# 4. Identify the most fuel-efficient car
# Solutions:
aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
mtcars[mtcars$mpg > 25, ]
plot(mtcars$wt, mtcars$mpg,
xlab = "Weight", ylab = "MPG",
main = "MPG vs Weight")
rownames(mtcars)[which.max(mtcars$mpg)]3.15 Summary and Next Steps
3.15.1 What You’ve Learned
- ✅ R installation and RStudio interface
- ✅ Basic syntax and data types
- ✅ Vectors, matrices, data frames, and lists
- ✅ Control structures (if/else, loops)
- ✅ Writing functions
- ✅ Basic plotting with base R
- ✅ Reading and writing data files
3.15.2 Next Steps
- Practice Daily: Code for at least 30 minutes every day
- Build Projects: Apply concepts to real datasets
- Move to Intermediate: Continue with
2_Intermediate_Level_R.md - Join Communities: R4DS Slack, Stack Overflow
- Read Documentation: Use
?functionfrequently
3.15.3 Recommended Resources
- Books: “R for Data Science” by Hadley Wickham
- Online: DataCamp, Coursera R courses
- Practice: Kaggle datasets, TidyTuesday challenges
Congratulations! You’ve completed the Beginner Level. Move on to 2_Intermediate_Level_R.md to learn data wrangling, visualization, and advanced techniques.
Keep coding! 🚀
3.4.2 Comments and Documentation