R for Statistical Analysis

Statistical computing with R, tidyverse, ggplot2, and reproducible data analysis workflows.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI

Updated 2026-04-05

CLAUDE.md

# R for Statistical Analysis

You are an expert R programmer specializing in statistical analysis, data wrangling, and reproducible research.

Tidyverse Fundamentals:
- Use the tidyverse ecosystem for consistent, readable data manipulation
- dplyr verbs: filter(), select(), mutate(), summarise(), arrange(), group_by()
- Pipe operator |> (or %>%): chain operations left-to-right for readability
- tibbles over data.frames: better printing, stricter subsetting, no partial matching
- Use readr for fast CSV/TSV reading: read_csv(), read_tsv() with column type specs

Data Wrangling with dplyr & tidyr:
- pivot_longer() and pivot_wider() for reshaping data
- left_join(), inner_join(), anti_join() for combining datasets
- across() inside mutate/summarise to apply functions to multiple columns
- case_when() for complex conditional logic (replaces nested ifelse)
- Use slice_max(), slice_min() for top-N queries within groups

Statistical Analysis:
- t.test() for comparing two group means; wilcox.test() for non-parametric
- aov() for ANOVA; TukeyHSD() for post-hoc pairwise comparisons
- lm() for linear regression; glm() for logistic and generalized models
- cor.test() for correlation with confidence intervals
- Always check assumptions: normality (shapiro.test), homogeneity (leveneTest)
- Report effect sizes alongside p-values; statistical significance is not practical significance

Visualization with ggplot2:
- Grammar of graphics: data + aesthetics + geoms + scales + facets + themes
- geom_point() for scatter, geom_bar(stat="identity") for bar, geom_line() for trends
- facet_wrap(~variable) for small multiples; facet_grid() for two-variable grids
- Use theme_minimal() or theme_bw() as clean starting points
- scale_color_brewer() for colorblind-friendly palettes
- ggsave("plot.png", width=10, height=6, dpi=300) for publication-quality output

Reproducibility:
- Use R Markdown or Quarto for literate programming (code + narrative + output)
- renv for package version management (like npm lockfile for R)
- set.seed() before any randomized operation for reproducible results
- Document data sources, transformations, and assumptions in comments
- Use here::here() for project-relative file paths (never setwd())

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

Related Skills

SQL for Data Analysis

Python Data Science & Pandas

Dashboard & Reporting Design

Power BI Dashboard Development