3  Beginner Level R

3.1 Table of Contents

  1. Introduction to R
  2. Installing R and RStudio
  3. Basic Syntax and Operations
  4. Variables and Data Types
  5. Vectors
  6. Matrices and Arrays
  7. Data Frames
  8. Lists
  9. Control Structures
  10. Functions
  11. Basic Plotting
  12. Reading and Writing Data
  13. Practice Exercises

3.2 Introduction to R

3.2.1 What is R?

R is a powerful programming language specifically designed for: - Statistical computing and data analysis - Data visualization and graphics - Machine learning and predictive modeling - Reproducible research and reporting

Key Features: - ✅ Free and open-source - ✅ 18,000+ packages on CRAN - ✅ Excellent data visualization (ggplot2) - ✅ Strong statistical capabilities - ✅ Active global community - ✅ Industry standard in data science

3.2.2 Why Learn R?

Career Opportunities: - Data Scientist - Statistical Analyst - Bioinformatician - Financial Analyst - Research Scientist

Industries Using R: - Tech (Google, Facebook, Microsoft) - Finance (banks, hedge funds) - Healthcare and pharmaceuticals - Academia and research - Government agencies


3.3 Installing R and RStudio

3.3.1 Step 1: Install R

Windows:

1. Visit https://cran.r-project.org/
2. Click "Download R for Windows""base"
3. Download and run the .exe installer
4. Follow installation wizard

macOS:

1. Visit https://cran.r-project.org/
2. Click "Download R for macOS"
3. Download the .pkg file for your system
4. Open and install

Linux (Ubuntu/Debian):

sudo apt update
sudo apt install r-base r-base-dev

3.3.2 Step 2: Install RStudio

  1. Visit https://posit.co/download/rstudio-desktop/
  2. Download RStudio Desktop (Free)
  3. Install for your operating system

3.3.3 Verify Installation

# Check R version
R.version.string
# Output: "R version 4.3.0 (2023-04-21)"

# Check working directory
getwd()

# Set working directory
setwd("/path/to/your/folder")

3.3.4 RStudio Interface

Four Main Panes:

  1. Source Editor (Top-Left)
    • Write and edit R scripts (.R files)
    • Create R Markdown documents
    • View data tables
  2. Console (Bottom-Left)
    • Execute R commands interactively
    • View output and messages
    • See errors and warnings
  3. Environment/History (Top-Right)
    • View loaded objects and variables
    • Browse command history
    • Import datasets
  4. Files/Plots/Packages/Help (Bottom-Right)
    • Navigate project files
    • View generated plots
    • Manage packages
    • Access documentation

Essential Shortcuts:

Action Windows/Linux macOS
Run line/selection Ctrl + Enter Cmd + Enter
Assignment <- Alt + - Option + -
Comment code Ctrl + Shift + C Cmd + Shift + C
Save script Ctrl + S Cmd + S
Clear console Ctrl + L Cmd + L
Pipe %>% Ctrl + Shift + M Cmd + Shift + M

3.4 Basic Syntax and Operations

3.4.1 R as a Calculator

# Arithmetic operations
5 + 3          # Addition: 8
10 - 4         # Subtraction: 6
6 * 7          # Multiplication: 42
20 / 4         # Division: 5
2^3            # Exponentiation: 8
2**3           # Alternative exponentiation: 8
17 %% 5        # Modulo (remainder): 2
17 %/% 5       # Integer division: 3

# Order of operations (PEMDAS)
2 + 3 * 4      # Result: 14
(2 + 3) * 4    # Result: 20

# Mathematical functions
sqrt(16)       # Square root: 4
abs(-10)       # Absolute value: 10
log(10)        # Natural log: 2.302585
log10(100)     # Base 10 log: 2
exp(1)         # e^1: 2.718282
sin(pi/2)      # Sine: 1
cos(0)         # Cosine: 1
tan(pi/4)      # Tangent: 1
round(3.14159, 2)  # Round: 3.14
ceiling(3.2)   # Round up: 4
floor(3.8)     # Round down: 3

3.4.2 Comments and Documentation

# Single-line comment

# Multiple lines of comments
# Use # at the start of each line

x <- 5  # Inline comment

# Good commenting practice
# Calculate the average of test scores
scores <- c(85, 92, 78, 95, 88)
average_score <- mean(scores)  # Use built-in mean function

3.4.3 Getting Help

# Access function documentation
?mean
help(mean)

# Search for topics
??"linear regression"
help.search("regression")

# Get examples
example(mean)

# View function arguments
args(plot)

# See available methods
methods(plot)

# Find package documentation
help(package = "dplyr")

3.5 Variables and Data Types

3.5.1 Creating Variables

# Assignment operator: <- (preferred)
x <- 10
y <- 5
name <- "Alice"

# Alternative: = (works but <- is R convention)
x = 10

# Print variable
print(x)
x  # Auto-print

# Multiple assignments
a <- b <- c <- 0

# Chained operations
result <- (x + y) * 2

3.5.2 Variable Naming Rules

# Valid names
my_var <- 1
myVar <- 2
my.var <- 3
my_var_123 <- 4
.hidden <- 5  # Starts with dot (hidden variable)

# Invalid names (will cause errors)
# 2var <- 1        # Can't start with number
# my-var <- 2      # Hyphens not allowed
# my var <- 3      # Spaces not allowed
# for <- 4         # Reserved word

Best Practices:

# Use snake_case (recommended for R)
student_count <- 30
average_score <- 85.5
max_temperature <- 98.6

# Be descriptive
total_sales <- 1000000        # Good
ts <- 1000000                 # Bad (unclear)

# Use meaningful names
user_age <- 25                # Good
x <- 25                       # Bad (not descriptive)

3.5.3 Data Types

3.5.3.1 1. Numeric (Double)

x <- 10.5
class(x)        # "numeric"
typeof(x)       # "double"
is.numeric(x)   # TRUE

# Scientific notation
large_num <- 1.5e6  # 1,500,000
small_num <- 3.2e-4 # 0.00032

3.5.3.2 2. Integer

y <- 5L         # L suffix creates integer
class(y)        # "integer"
typeof(y)       # "integer"
is.integer(y)   # TRUE

# Convert to integer
as.integer(10.7)  # 10 (truncates)

3.5.3.3 3. Character (String)

name <- "John Doe"
class(name)     # "character"

# Single or double quotes
greeting1 <- "Hello"
greeting2 <- 'Hello'

# String operations
paste("Hello", "World")           # "Hello World"
paste0("Hello", "World")          # "HelloWorld" (no space)
toupper("hello")                  # "HELLO"
tolower("HELLO")                  # "hello"
nchar("Hello")                    # 5 (character count)
substr("Hello World", 1, 5)      # "Hello"

3.5.3.4 4. Logical (Boolean)

is_student <- TRUE
is_employed <- FALSE

class(is_student)  # "logical"

# Logical operators
TRUE & FALSE       # AND: FALSE
TRUE | FALSE       # OR: TRUE
!TRUE              # NOT: FALSE

# Comparison operators
5 > 3              # TRUE
5 < 3              # FALSE
5 >= 5             # TRUE
5 == 5             # TRUE (equality)
5 != 3             # TRUE (not equal)

3.5.3.5 5. Complex

z <- 3 + 2i
class(z)           # "complex"
Re(z)              # Real part: 3
Im(z)              # Imaginary part: 2
Mod(z)             # Modulus: 3.605551

3.5.4 Type Conversion

# Convert between types
as.numeric("123")      # 123
as.character(123)      # "123"
as.integer(10.7)       # 10
as.logical(1)          # TRUE
as.logical(0)          # FALSE

# Failed conversion
as.numeric("abc")      # NA (with warning)

# Check for NA
is.na(NA)              # TRUE

3.5.5 Special Values

# NA: Not Available (missing value)
x <- NA
is.na(x)               # TRUE

# NULL: Empty/undefined
y <- NULL
is.null(y)             # TRUE
length(NULL)           # 0

# NaN: Not a Number
z <- 0/0
is.nan(z)              # TRUE

# Inf: Infinity
w <- 1/0
is.infinite(w)         # TRUE

# Differences
length(NA)             # 1
length(NULL)           # 0
NA + 5                 # NA
NULL + 5               # Error

3.6 Vectors

3.6.1 Creating Vectors

# Using c() function (combine/concatenate)
numbers <- c(1, 2, 3, 4, 5)
names <- c("Alice", "Bob", "Charlie")
logicals <- c(TRUE, FALSE, TRUE, TRUE)

# Sequences
seq1 <- 1:10                    # 1 2 3 4 5 6 7 8 9 10
seq2 <- 10:1                    # Descending
seq3 <- seq(1, 10, by=2)        # 1 3 5 7 9
seq4 <- seq(0, 1, length.out=5) # 0.00 0.25 0.50 0.75 1.00

# Repeated values
rep1 <- rep(1, times=5)         # 1 1 1 1 1
rep2 <- rep(c(1,2), times=3)    # 1 2 1 2 1 2
rep3 <- rep(c(1,2), each=3)     # 1 1 1 2 2 2
rep4 <- rep(1:3, each=2, times=2) # 1 1 2 2 3 3 1 1 2 2 3 3

3.6.2 Vector Properties

x <- c(10, 20, 30, 40, 50)

length(x)              # 5
class(x)               # "numeric"
typeof(x)              # "double"
is.vector(x)           # TRUE

3.6.3 Vector Arithmetic

x <- c(1, 2, 3, 4, 5)
y <- c(10, 20, 30, 40, 50)

# Element-wise operations
x + y          # 11 22 33 44 55
x - y          # -9 -18 -27 -36 -45
x * y          # 10 40 90 160 250
x / y          # 0.1 0.1 0.1 0.1 0.1
x^2            # 1 4 9 16 25

# Scalar operations
x + 10         # 11 12 13 14 15
x * 2          # 2 4 6 8 10

# Vector recycling (shorter vector repeats)
c(1, 2, 3) + c(10, 20)  # 11 22 13 (with warning)

3.6.4 Indexing and Subsetting

x <- c(10, 20, 30, 40, 50)

# Positive indexing (1-based!)
x[1]           # 10 (first element)
x[3]           # 30
x[c(1, 3, 5)]  # 10 30 50
x[1:3]         # 10 20 30

# Negative indexing (exclude)
x[-1]          # 20 30 40 50
x[-c(1, 3)]    # 20 40 50
x[-(1:3)]      # 40 50

# Logical indexing
x[c(TRUE, FALSE, TRUE, FALSE, TRUE)]  # 10 30 50
x[x > 25]      # 30 40 50
x[x >= 20 & x <= 40]  # 20 30 40

# Named vectors
ages <- c(Alice=25, Bob=30, Charlie=35)
ages["Alice"]                    # 25
ages[c("Alice", "Charlie")]      # 25 35
names(ages)                      # "Alice" "Bob" "Charlie"

3.6.5 Modifying Vectors

x <- c(10, 20, 30, 40, 50)

# Replace elements
x[3] <- 99             # c(10, 20, 99, 40, 50)
x[c(1, 5)] <- 0        # Replace multiple

# Append elements
x <- c(x, 60)          # Add to end
x <- c(0, x)           # Add to beginning
x <- append(x, 100, after=3)  # Insert at position

# Remove elements
x <- x[-1]             # Remove first
x <- x[x != 30]        # Remove all 30s

3.6.6 Vector Functions

x <- c(3, 7, 2, 9, 1, 5)

# Basic statistics
length(x)      # 6
sum(x)         # 27
mean(x)        # 4.5
median(x)      # 4
min(x)         # 1
max(x)         # 9
range(x)       # 1 9
sd(x)          # Standard deviation: 2.88
var(x)         # Variance: 8.3
quantile(x)    # Quartiles

# Sorting
sort(x)                        # 1 2 3 5 7 9
sort(x, decreasing=TRUE)       # 9 7 5 3 2 1
order(x)                       # Indices for sorting
rank(x)                        # Ranks
rev(x)                         # Reverse: 5 1 9 2 7 3

# Unique and duplicates
y <- c(1, 2, 2, 3, 3, 3, 4)
unique(y)                      # 1 2 3 4
duplicated(y)                  # FALSE FALSE TRUE FALSE TRUE TRUE FALSE
table(y)                       # Frequency table

# Cumulative functions
cumsum(c(1,2,3,4))             # 1 3 6 10
cumprod(c(1,2,3,4))            # 1 2 6 24
cummin(c(3,1,4,1,5))          # 3 1 1 1 1
cummax(c(3,1,4,1,5))          # 3 3 4 4 5

3.6.7 Vector Operations

# Set operations
a <- c(1, 2, 3, 4, 5)
b <- c(4, 5, 6, 7, 8)

union(a, b)              # 1 2 3 4 5 6 7 8
intersect(a, b)          # 4 5
setdiff(a, b)            # 1 2 3
setequal(a, b)           # FALSE

# Element-wise comparison
x <- c(1, 2, 3)
y <- c(3, 2, 1)
x == y                   # FALSE TRUE FALSE
x > y                    # FALSE FALSE TRUE
all(x > 0)               # TRUE (all elements)
any(x > 2)               # TRUE (at least one)

# Which indices
x <- c(10, 25, 30, 15, 40)
which(x > 20)            # 2 3 5
which.max(x)             # 5
which.min(x)             # 1

3.7 Matrices and Arrays

3.7.1 Creating Matrices

# Using matrix() function
mat1 <- matrix(1:12, nrow=3, ncol=4)
#      [,1] [,2] [,3] [,4]
# [1,]    1    4    7   10
# [2,]    2    5    8   11
# [3,]    3    6    9   12

# Fill by row
mat2 <- matrix(1:12, nrow=3, ncol=4, byrow=TRUE)
#      [,1] [,2] [,3] [,4]
# [1,]    1    2    3    4
# [2,]    5    6    7    8
# [3,]    9   10   11   12

# Combining vectors
mat3 <- rbind(c(1,2,3), c(4,5,6))  # Row bind
mat4 <- cbind(c(1,2,3), c(4,5,6))  # Column bind

# Identity matrix
diag(3)  # 3x3 identity matrix

3.7.2 Matrix Operations

A <- matrix(1:4, nrow=2, ncol=2)
B <- matrix(5:8, nrow=2, ncol=2)

# Element-wise operations
A + B         # Addition
A - B         # Subtraction
A * B         # Element-wise multiplication
A / B         # Element-wise division

# Matrix multiplication
A %*% B       # True matrix multiplication

# Transpose
t(A)          # Transpose

# Matrix functions
det(A)        # Determinant
solve(A)      # Inverse (if invertible)
eigen(A)      # Eigenvalues and eigenvectors

3.7.3 Matrix Indexing

mat <- matrix(1:12, nrow=3, ncol=4)

# Single element
mat[2, 3]     # Row 2, Column 3

# Entire row
mat[2, ]      # All of row 2

# Entire column
mat[, 3]      # All of column 3

# Submatrix
mat[1:2, 2:3]

# Named dimensions
rownames(mat) <- c("A", "B", "C")
colnames(mat) <- c("W", "X", "Y", "Z")
mat["A", "X"]

3.7.4 Matrix Functions

mat <- matrix(1:12, nrow=3, ncol=4)

nrow(mat)     # 3
ncol(mat)     # 4
dim(mat)      # c(3, 4)

rowSums(mat)  # Sum each row
colSums(mat)  # Sum each column
rowMeans(mat) # Mean of each row
colMeans(mat) # Mean of each column

# Apply function
apply(mat, 1, sum)   # Apply to rows (margin=1)
apply(mat, 2, mean)  # Apply to columns (margin=2)

3.8 Data Frames

3.8.1 Creating Data Frames

# Using data.frame()
df <- data.frame(
  name = c("Alice", "Bob", "Charlie", "David"),
  age = c(25, 30, 35, 28),
  salary = c(50000, 60000, 75000, 55000),
  employed = c(TRUE, TRUE, FALSE, TRUE),
  stringsAsFactors = FALSE  # Keep strings as character
)

print(df)
View(df)  # Opens in viewer pane

3.8.2 Exploring Data Frames

# Structure and summary
str(df)       # Data structure
summary(df)   # Statistical summary
glimpse(df)   # dplyr alternative (if installed)

# Dimensions
nrow(df)      # Number of rows
ncol(df)      # Number of columns
dim(df)       # Both dimensions

# Column names
names(df)
colnames(df)

# First/last rows
head(df, 3)   # First 3 rows
tail(df, 2)   # Last 2 rows

3.8.3 Accessing Data Frame Elements

# Access columns
df$name           # $ operator (returns vector)
df[["name"]]      # [[ ]] (returns vector)
df[, "name"]      # [ ] (returns vector)
df["name"]        # [ ] (returns data frame)
df[, 1]           # By index

# Multiple columns
df[, c("name", "age")]
df[, 1:2]

# Access rows
df[1, ]           # First row
df[1:2, ]         # First two rows

# Specific cell
df[1, "age"]      # Row 1, age column
df$age[1]         # First element of age

3.8.4 Filtering Data Frames

# Logical subsetting
df[df$age > 28, ]
df[df$employed == TRUE, ]
df[df$salary >= 60000, ]

# Multiple conditions
df[df$age > 25 & df$employed == TRUE, ]
df[df$age < 26 | df$age > 30, ]

# Using subset() function
subset(df, age > 28)
subset(df, age > 25 & employed == TRUE)
subset(df, age > 28, select = c(name, salary))

3.8.5 Modifying Data Frames

# Add new column
df$bonus <- df$salary * 0.1
df$age_group <- ifelse(df$age < 30, "Young", "Senior")

# Modify existing column
df$age <- df$age + 1

# Add new row
new_row <- data.frame(
  name = "Eve",
  age = 32,
  salary = 70000,
  employed = TRUE
)
df <- rbind(df, new_row)

# Remove column
df$bonus <- NULL
df <- df[, -4]  # Remove 4th column

# Remove row
df <- df[-5, ]  # Remove 5th row

3.8.6 Sorting Data Frames

# Sort by one column
df[order(df$age), ]                    # Ascending
df[order(-df$age), ]                   # Descending
df[order(df$age, decreasing=TRUE), ]

# Sort by multiple columns
df[order(df$employed, -df$salary), ]   # employed asc, salary desc

3.9 Lists

3.9.1 Creating Lists

# Lists can contain different types
my_list <- list(
  numbers = c(1, 2, 3, 4, 5),
  name = "John",
  matrix = matrix(1:4, nrow=2),
  df = data.frame(x=1:3, y=4:6),
  nested = list(a=1, b=2)
)

print(my_list)
str(my_list)

3.9.2 Accessing List Elements

# Using [[ ]] returns the element
my_list[[1]]           # First element
my_list[["numbers"]]   # By name

# Using $ returns the element
my_list$numbers

# Using [ ] returns a sublist
my_list[1]             # List containing first element
my_list[c(1, 3)]       # List with elements 1 and 3

# Nested access
my_list$nested$a
my_list[["df"]]$x

3.9.3 Modifying Lists

# Add elements
my_list$new_item <- "New value"
my_list[[6]] <- c(10, 20)

# Remove elements
my_list$new_item <- NULL
my_list[[6]] <- NULL

# List functions
length(my_list)        # Number of elements
names(my_list)         # Element names

3.10 Control Structures

3.10.1 If-Else Statements

# Basic if
x <- 10
if (x > 5) {
  print("x is greater than 5")
}

# If-else
if (x > 15) {
  print("x is large")
} else {
  print("x is not large")
}

# If-else if-else
score <- 75
if (score >= 90) {
  grade <- "A"
} else if (score >= 80) {
  grade <- "B"
} else if (score >= 70) {
  grade <- "C"
} else if (score >= 60) {
  grade <- "D"
} else {
  grade <- "F"
}

# Vectorized ifelse
scores <- c(95, 82, 78, 65)
grades <- ifelse(scores >= 90, "A",
          ifelse(scores >= 80, "B",
          ifelse(scores >= 70, "C", "F")))

3.10.2 For Loops

# Basic for loop
for (i in 1:5) {
  print(i)
}

# Loop over vector
fruits <- c("apple", "banana", "cherry")
for (fruit in fruits) {
  print(paste("I like", fruit))
}

# Loop with index
for (i in seq_along(fruits)) {
  print(paste(i, ":", fruits[i]))
}

# Nested loops
for (i in 1:3) {
  for (j in 1:3) {
    print(paste(i, "*", j, "=", i*j))
  }
}

3.10.3 While Loops

# Basic while loop
count <- 1
while (count <= 5) {
  print(count)
  count <- count + 1
}

# While with condition
x <- 1
while (x < 100) {
  print(x)
  x <- x * 2
}

3.10.4 Loop Control

# break: exit loop
for (i in 1:10) {
  if (i == 5) break
  print(i)
}

# next: skip to next iteration
for (i in 1:10) {
  if (i %% 2 == 0) next  # Skip even numbers
  print(i)
}

3.11 Functions

3.11.1 Creating Functions

# Basic function
greet <- function() {
  print("Hello!")
}
greet()

# Function with parameters
greet_person <- function(name) {
  print(paste("Hello,", name))
}
greet_person("Alice")

# Function with return value
add_numbers <- function(a, b) {
  result <- a + b
  return(result)
}
sum_result <- add_numbers(5, 3)

# Implicit return (last expression)
multiply <- function(a, b) {
  a * b  # Automatically returned
}

3.11.2 Default Parameters

power <- function(base, exponent = 2) {
  base^exponent
}

power(5)      # 25 (uses default)
power(5, 3)   # 125

# Multiple defaults
greet <- function(name = "Guest", time = "day") {
  paste("Good", time, name)
}
greet()                    # "Good day Guest"
greet("Alice")            # "Good day Alice"
greet("Bob", "morning")   # "Good morning Bob"

3.11.3 Multiple Return Values

# Return a list
stats <- function(x) {
  list(
    mean = mean(x),
    median = median(x),
    sd = sd(x),
    range = range(x)
  )
}

result <- stats(c(1, 2, 3, 4, 5))
result$mean    # 3
result$sd      # Standard deviation

3.11.4 Function Documentation

#' Calculate Circle Area
#'
#' @param radius Numeric value for circle radius
#' @return Area of the circle
#' @examples
#' circle_area(5)
circle_area <- function(radius) {
  if (radius < 0) {
    stop("Radius must be positive")
  }
  pi * radius^2
}

3.12 Basic Plotting

3.12.1 Scatter Plots

# Basic scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 6)
plot(x, y)

# Customized scatter plot
plot(x, y,
     main = "My Scatter Plot",
     xlab = "X Axis",
     ylab = "Y Axis",
     col = "blue",
     pch = 19,    # Point type
     cex = 1.5)   # Point size

3.12.2 Line Plots

# Line plot
plot(x, y, type = "l", col = "red", lwd = 2)

# Points and lines
plot(x, y, type = "b", col = "blue")

# Multiple lines
plot(x, y, type = "l", col = "blue")
lines(x, y*1.5, col = "red")
legend("topleft", legend = c("Series 1", "Series 2"),
       col = c("blue", "red"), lty = 1)

3.12.3 Bar Plots

# Simple bar plot
counts <- c(5, 10, 15, 20)
barplot(counts)

# Named bars
names(counts) <- c("A", "B", "C", "D")
barplot(counts,
        main = "Bar Chart",
        xlab = "Category",
        ylab = "Count",
        col = "steelblue")

3.12.4 Histograms

# Generate random data
data <- rnorm(1000, mean = 50, sd = 10)

# Histogram
hist(data,
     main = "Histogram",
     xlab = "Value",
     col = "lightblue",
     breaks = 30)

3.12.5 Box Plots

# Box plot
data1 <- rnorm(100, mean = 50)
data2 <- rnorm(100, mean = 60)
data3 <- rnorm(100, mean = 55)

boxplot(data1, data2, data3,
        names = c("Group A", "Group B", "Group C"),
        main = "Box Plot Comparison",
        col = c("red", "blue", "green"))

3.12.6 Saving Plots

# Save as PDF
pdf("myplot.pdf", width = 8, height = 6)
plot(x, y)
dev.off()

# Save as PNG
png("myplot.png", width = 800, height = 600)
plot(x, y)
dev.off()

3.13 Reading and Writing Data

3.13.1 CSV Files

# Read CSV
df <- read.csv("data.csv")
df <- read.csv("data.csv", header = TRUE, stringsAsFactors = FALSE)

# Write CSV
write.csv(df, "output.csv", row.names = FALSE)

3.13.2 Text Files

# Read delimited files
df <- read.table("data.txt", header = TRUE, sep = "\t")

# Write text file
write.table(df, "output.txt", sep = "\t", row.names = FALSE)

3.13.3 Built-in Datasets

# Load built-in datasets
data(mtcars)
head(mtcars)

# See available datasets
data()

# Common datasets
data(iris)
data(airquality)
data(ChickWeight)

3.13.4 RDS Files (R Native Format)

# Save R object
saveRDS(df, "mydata.rds")

# Read R object
df <- readRDS("mydata.rds")

# Save multiple objects
save(df, x, y, file = "myworkspace.RData")

# Load saved objects
load("myworkspace.RData")

3.14 Practice Exercises

3.14.1 Exercise 1: Vector Operations

# Create a vector of ages
ages <- c(23, 45, 67, 89, 12, 34, 56, 78)

# Tasks:
# 1. Find mean, median, and standard deviation
# 2. Count how many ages are above 50
# 3. Create a new vector with ages below 40
# 4. Sort ages in descending order

# Solutions:
mean(ages)
median(ages)
sd(ages)
sum(ages > 50)
young <- ages[ages < 40]
sort(ages, decreasing = TRUE)

3.14.2 Exercise 2: Data Frame Manipulation

# Create a data frame of students
students <- data.frame(
  name = c("Alice", "Bob", "Charlie", "Diana", "Eve"),
  math_score = c(85, 92, 78, 95, 88),
  english_score = c(90, 85, 82, 91, 87),
  age = c(20, 21, 19, 22, 20)
)

# Tasks:
# 1. Add a column for average score
# 2. Filter students with average > 85
# 3. Sort by math_score descending
# 4. Find the student with highest total score

# Solutions:
students$avg_score <- (students$math_score + students$english_score) / 2
students[students$avg_score > 85, ]
students[order(-students$math_score), ]
students$total <- students$math_score + students$english_score
students[which.max(students$total), ]

3.14.3 Exercise 3: Functions

# Write a function that:
# 1. Takes a vector of numbers
# 2. Removes outliers (values > 2 SD from mean)
# 3. Returns cleaned vector

remove_outliers <- function(x) {
  mean_x <- mean(x, na.rm = TRUE)
  sd_x <- sd(x, na.rm = TRUE)
  lower_bound <- mean_x - 2 * sd_x
  upper_bound <- mean_x + 2 * sd_x
  x[x >= lower_bound & x <= upper_bound]
}

# Test
data <- c(1, 2, 3, 4, 5, 100, 200)
remove_outliers(data)

3.14.4 Exercise 4: Control Structures

# FizzBuzz: Print numbers 1-100, but:
# - "Fizz" for multiples of 3
# - "Buzz" for multiples of 5
# - "FizzBuzz" for multiples of both

for (i in 1:100) {
  if (i %% 15 == 0) {
    print("FizzBuzz")
  } else if (i %% 3 == 0) {
    print("Fizz")
  } else if (i %% 5 == 0) {
    print("Buzz")
  } else {
    print(i)
  }
}

3.14.5 Exercise 5: Data Analysis Project

# Use the built-in mtcars dataset
data(mtcars)

# Tasks:
# 1. Calculate average MPG by number of cylinders
# 2. Find cars with MPG > 25
# 3. Create a scatter plot of MPG vs Weight
# 4. Identify the most fuel-efficient car

# Solutions:
aggregate(mpg ~ cyl, data = mtcars, FUN = mean)
mtcars[mtcars$mpg > 25, ]
plot(mtcars$wt, mtcars$mpg,
     xlab = "Weight", ylab = "MPG",
     main = "MPG vs Weight")
rownames(mtcars)[which.max(mtcars$mpg)]

3.15 Summary and Next Steps

3.15.1 What You’ve Learned

  • ✅ R installation and RStudio interface
  • ✅ Basic syntax and data types
  • ✅ Vectors, matrices, data frames, and lists
  • ✅ Control structures (if/else, loops)
  • ✅ Writing functions
  • ✅ Basic plotting with base R
  • ✅ Reading and writing data files

3.15.2 Next Steps

  1. Practice Daily: Code for at least 30 minutes every day
  2. Build Projects: Apply concepts to real datasets
  3. Move to Intermediate: Continue with 2_Intermediate_Level_R.md
  4. Join Communities: R4DS Slack, Stack Overflow
  5. Read Documentation: Use ?function frequently