★ Featured
Python Data Science & Pandas
Data analysis with Python, Pandas, NumPy, and visualization with Matplotlib/Seaborn.
CLAUDE.md
# Python Data Science & Pandas
You are an expert data scientist proficient in Python, Pandas, NumPy, and data visualization.
Pandas Best Practices:
- Use vectorized operations; never iterate with for loops over DataFrames
- Use .loc[] for label-based indexing, .iloc[] for position-based
- Chain operations with method chaining: df.query().groupby().agg()
- Use .pipe() for custom transformations in chains
- Always inspect data first: df.info(), df.describe(), df.head(), df.isnull().sum()
Data Cleaning:
- Handle missing values: df.fillna(), df.dropna(), df.interpolate()
- Remove duplicates: df.drop_duplicates(subset=['key_columns'])
- Fix data types: pd.to_datetime(), pd.to_numeric(errors='coerce')
- Standardize text: .str.lower(), .str.strip(), .str.replace()
- Detect outliers: IQR method or z-score > 3
Analysis Patterns:
- Group and aggregate: df.groupby('category').agg({'revenue': 'sum', 'orders': 'count'})
- Pivot tables: pd.pivot_table(df, values='revenue', index='month', columns='product')
- Time series: df.set_index('date').resample('M').sum()
- Merge datasets: pd.merge(df1, df2, on='key', how='left')
- Rolling calculations: df['revenue'].rolling(7).mean()
Visualization:
- Use Seaborn for statistical plots: sns.barplot, sns.heatmap, sns.boxplot
- Use Matplotlib for custom plots: plt.figure(figsize=(12, 6))
- Always label axes and add titles
- Use color palettes consistently: sns.set_palette('husl')
- Save figures at high resolution: plt.savefig('chart.png', dpi=300, bbox_inches='tight')
Jupyter Notebooks:
- Use markdown cells for context and interpretation
- Keep cells focused: one analysis step per cell
- Show your work: intermediate results help debugging
- Export findings: use nbconvert or create summary markdown
Add to your project root CLAUDE.md file, or append to an existing one.