Community by FindUtils

Jupyter Notebook Best Practices

Write clean, reproducible, and collaborative Jupyter notebooks for data analysis and research.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI
Updated 2026-04-05
CLAUDE.md
# Jupyter Notebook Best Practices

You are an expert in Jupyter notebooks, reproducible data analysis, and computational narrative design.

Notebook Structure:
- Start with a title cell (Markdown H1) and a brief description of the analysis
- Import all libraries in the first code cell — never scatter imports through the notebook
- Second cell: configuration constants (file paths, parameters, thresholds)
- Follow a narrative arc: Introduction -> Data Loading -> Exploration -> Analysis -> Conclusions
- End with a summary cell: key findings, next steps, and caveats

Cell Best Practices:
- One logical step per cell: load data, clean data, visualize, model — separate cells
- Keep code cells short (under 20 lines); extract helpers into utility modules
- Use Markdown cells liberally to explain WHY, not WHAT (code shows what)
- Display intermediate results: df.shape, df.head(), df.describe() after each transform
- Number sections with Markdown headings for easy navigation

Reproducibility:
- Pin all package versions: use requirements.txt or environment.yml
- Set random seeds: np.random.seed(42), random.seed(42), torch.manual_seed(42)
- Use relative paths with pathlib: Path("data") / "input.csv"
- Record the kernel and Python version in the first cell
- Clear all outputs and run top-to-bottom before sharing (Kernel -> Restart & Run All)
- If notebook takes over 5 minutes to run, note expected runtime at the top

Data Handling:
- Never modify source data files; read-only access, write outputs to separate directory
- Use parquet over CSV for large datasets (faster, smaller, typed)
- Cache expensive computations: save intermediate DataFrames to parquet
- Show data shapes and dtypes after loading: helps readers understand the data
- Handle missing data explicitly; document decisions (drop, fill, impute)

Visualization in Notebooks:
- Use %matplotlib inline or %matplotlib widget for interactive plots
- Set figure size globally: plt.rcParams['figure.figsize'] = [12, 6]
- Plotly for interactive exploration; Matplotlib/Seaborn for publication figures
- Always add axis labels, titles, and units to every plot
- Use consistent color schemes across the entire notebook

Collaboration:
- Use JupyterLab over classic Jupyter for better multi-file support
- Git-friendly: use jupytext to sync .ipynb with .py files (avoids merge conflicts)
- nbstripout: automatically strip outputs before committing to git
- Review notebooks as HTML exports (nbconvert) — easier than reviewing JSON diffs
- For teams: use Papermill for parameterized notebook execution (batch runs)

Common Anti-Patterns:
- Global state mutation: modifying a DataFrame in cell 5 that cell 3 depends on
- Hidden state: results depend on cell execution order, not top-to-bottom
- Mega-notebooks: 100+ cells that do everything — split into focused notebooks
- No error handling: wrap risky operations in try/except with informative messages
- Copy-paste analysis: extract repeated patterns into functions or separate modules

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

jupyternotebooksreproducibilitypythondata-analysiscollaboration