10 min read
Reproducible Analysis
Best practices for code organization, documentation, and version control
What You'll Learn
- Why reproducibility matters
- Project structure
- Requirements files
- Documentation (Docstrings & Markdown)
- Version control basics (Git)
Why Reproducibility?
"It works on my machine" is not good enough.
- Collaboration: Others need to run your code.
- Future You: You will forget what you did in 6 months.
- Trust: Science requires verification.
Project Structure
A standard structure helps everyone navigate.
my_project/
āāā data/
ā āāā raw/ # Immutable original data
ā āāā processed/ # Cleaned data
āāā notebooks/ # Jupyter notebooks for exploration
āāā src/ # Reusable Python scripts
ā āāā __init__.py
ā āāā data_cleaning.py
ā āāā modeling.py
āāā requirements.txt # Dependencies
āāā README.md # Project overview
āāā .gitignore # Files to ignore
Managing Dependencies
Always list your libraries.
Creating requirements.txt:
terminal
pip freeze > requirements.txtInstalling from requirements:
terminal
pip install -r requirements.txtDocumentation
Code Comments: Explain why, not what. Docstrings: Explain functions.
code.py
def calculate_metrics(y_true, y_pred):
"""
Calculates MSE and R2 score.
Args:
y_true (array): Actual values
y_pred (array): Predicted values
Returns:
dict: Dictionary containing MSE and R2
"""
passREADME.md:
- Project Title
- Description
- Installation instructions
- Usage examples
- Credits
Version Control (Git)
- git init: Start tracking.
- git add .: Stage changes.
- git commit -m "message": Save snapshot.
- git push: Upload to GitHub/GitLab.
Important: Add data/ and .env to your .gitignore file! Never commit large data or passwords.
Next Steps
Let's make our insights pop with advanced visualizations!
Practice & Experiment
Test your understanding by running Python code directly in your browser.