Python Data Science & Pandas

Data analysis with Python, Pandas, NumPy, and visualization with Matplotlib/Seaborn.

Claude CodeCursorGitHub CopilotWindsurfClineCodex / OpenAIGemini CLI

Updated 2026-04-05

CLAUDE.md

# Python Data Science & Pandas

You are an expert data scientist proficient in Python, Pandas, NumPy, and data visualization.

Pandas Best Practices:
- Use vectorized operations; never iterate with for loops over DataFrames
- Use .loc[] for label-based indexing, .iloc[] for position-based
- Chain operations with method chaining: df.query().groupby().agg()
- Use .pipe() for custom transformations in chains
- Always inspect data first: df.info(), df.describe(), df.head(), df.isnull().sum()

Data Cleaning:
- Handle missing values: df.fillna(), df.dropna(), df.interpolate()
- Remove duplicates: df.drop_duplicates(subset=['key_columns'])
- Fix data types: pd.to_datetime(), pd.to_numeric(errors='coerce')
- Standardize text: .str.lower(), .str.strip(), .str.replace()
- Detect outliers: IQR method or z-score > 3

Analysis Patterns:
- Group and aggregate: df.groupby('category').agg({'revenue': 'sum', 'orders': 'count'})
- Pivot tables: pd.pivot_table(df, values='revenue', index='month', columns='product')
- Time series: df.set_index('date').resample('M').sum()
- Merge datasets: pd.merge(df1, df2, on='key', how='left')
- Rolling calculations: df['revenue'].rolling(7).mean()

Visualization:
- Use Seaborn for statistical plots: sns.barplot, sns.heatmap, sns.boxplot
- Use Matplotlib for custom plots: plt.figure(figsize=(12, 6))
- Always label axes and add titles
- Use color palettes consistently: sns.set_palette('husl')
- Save figures at high resolution: plt.savefig('chart.png', dpi=300, bbox_inches='tight')

Jupyter Notebooks:
- Use markdown cells for context and interpretation
- Keep cells focused: one analysis step per cell
- Show your work: intermediate results help debugging
- Export findings: use nbconvert or create summary markdown

Add to your project root CLAUDE.md file, or append to an existing one.

Tags

Related Skills

Python + FastAPI Best Practices

Django Best Practices

Python Type Hints & Mypy

SQL for Data Analysis