Version Control with Git

Learning Objectives

By the end of this reading, you will be able to:

  • Understand the fundamental concepts of version control systems
  • Use Git for tracking changes in your codebase
  • Create and manage branches effectively
  • Merge and rebase branches appropriately
  • Implement common Git workflows
  • Collaborate with teams using distributed version control

Introduction

Version control is a system that records changes to files over time, allowing you to recall specific versions later. Git is a distributed version control system that has become the industry standard for software development. Unlike centralized systems, every developer has a complete copy of the repository, including its full history.

Why Version Control?

  1. History Tracking: See what changed, when, and by whom
  2. Collaboration: Multiple developers can work simultaneously
  3. Backup: Every clone is a full backup
  4. Experimentation: Try new features without affecting stable code
  5. Rollback: Revert to previous versions if something breaks

Core Concepts

Repository

A repository (repo) is a directory that contains your project files and the complete history of changes. Git stores this information in a hidden .git directory.

# Example: Project structure with Git
"""
my_project/
├── .git/              # Git metadata and history
├── src/
│   ├── main.py
│   └── utils.py
├── tests/
│   └── test_main.py
├── .gitignore         # Files to ignore
└── README.md
"""

Commits

A commit is a snapshot of your repository at a specific point in time. Each commit has:

  • A unique SHA-1 hash identifier
  • Author information
  • Timestamp
  • Commit message describing the changes
  • Reference to parent commit(s)
# Creating commits
git add main.py utils.py          # Stage files
git commit -m "Add utility functions for data processing"

# View commit history
git log --oneline --graph --all

The Three States

Git has three main states that files can be in:

  1. Modified: Changed but not staged
  2. Staged: Marked for inclusion in next commit
  3. Committed: Safely stored in the repository
# Example: Understanding file states
"""
Working Directory     Staging Area        Repository
----------------      ------------        ----------
main.py (modified) -> main.py (staged) -> main.py (committed)
utils.py (new)     -> utils.py (staged)-> utils.py (committed)
config.py (modified)
"""

Branching

Branches allow you to diverge from the main line of development and work independently without affecting the main codebase.

Creating and Switching Branches

# Create a new branch
git branch feature/user-authentication

# Switch to the branch
git checkout feature/user-authentication

# Create and switch in one command
git checkout -b feature/payment-integration

# Modern Git (2.23+)
git switch feature/user-authentication
git switch -c feature/new-feature

Branch Strategy Example

"""
Branch Naming Conventions:

main/master        - Production-ready code
develop            - Integration branch for features
feature/NAME       - New features
bugfix/NAME        - Bug fixes
hotfix/NAME        - Urgent production fixes
release/VERSION    - Release preparation

Example:
feature/user-login
bugfix/memory-leak
hotfix/security-patch
release/v2.0.0
"""

Visualizing Branches

# Example: Branch visualization
"""
main:     A---B---C---F---G
               \       /
feature:        D-----E

A, B, C: Commits on main
D, E: Commits on feature branch
F: Merge commit
G: New commit on main
"""

Merging

Merging integrates changes from one branch into another.

Fast-Forward Merge

Occurs when there are no new commits on the target branch.

# Fast-forward merge example
git checkout main
git merge feature/simple-update

# Force a merge commit even during fast-forward
git merge --no-ff feature/simple-update
# Fast-forward visualization
"""
Before:
main:     A---B---C
               \
feature:        D---E

After (fast-forward):
main:     A---B---C---D---E
"""

Three-Way Merge

Occurs when both branches have diverged.

# Three-way merge
git checkout main
git merge feature/complex-feature
# Three-way merge visualization
"""
Before:
main:     A---B---C---F
               \
feature:        D---E

After (merge commit M):
main:     A---B---C---F---M
               \           /
feature:        D---------E
"""

Handling Merge Conflicts

# Example: Conflict in calculator.py
"""
<<<<<<< HEAD
def calculate_total(items):
    return sum(item.price * item.quantity for item in items)
=======
def calculate_total(items):
    total = 0
    for item in items:
        total += item.price * item.quantity * (1 - item.discount)
    return total
>>>>>>> feature/add-discounts
"""

# Resolution
def calculate_total(items):
    """Calculate total price including discounts."""
    total = 0
    for item in items:
        discount = getattr(item, 'discount', 0)
        total += item.price * item.quantity * (1 - discount)
    return total

Rebasing

Rebasing moves or combines commits to create a linear history.

Basic Rebase

# Rebase feature branch onto main
git checkout feature/my-feature
git rebase main

# Interactive rebase (last 3 commits)
git rebase -i HEAD~3
# Rebase visualization
"""
Before:
main:     A---B---C---D
               \
feature:        E---F

After rebase:
main:     A---B---C---D
                       \
feature:                E'---F'

Note: E' and F' are new commits with same changes but different hashes
"""

Merge vs Rebase

"""
MERGE:
Pros:
- Preserves complete history
- Safe for public branches
- Easy to understand
Cons:
- Creates merge commits
- Non-linear history

REBASE:
Pros:
- Clean, linear history
- Easier to follow
- No merge commits
Cons:
- Rewrites history (dangerous for shared branches)
- Can be complex with conflicts

Golden Rule: Never rebase public/shared branches!
"""

Interactive Rebase

# Interactive rebase allows you to:
# - Reword commit messages
# - Squash commits together
# - Reorder commits
# - Drop commits

git rebase -i HEAD~4

# Example interactive rebase file:
# pick a1b2c3d Add user model
# squash e4f5g6h Fix typo in user model
# reword h7i8j9k Update user validation
# drop k0l1m2n Experimental feature

Git Workflows

Centralized Workflow

Simple workflow for small teams.

"""
Everyone works on main branch:

Developer 1: A---B---C
Developer 2:     D---E

Merged:     A---B---C---D---E
"""

Feature Branch Workflow

Each feature gets its own branch.

"""
main:           A---B-------F---G
                     \     /
feature-1:            C---D
                      \
feature-2:             E (still in progress)

Process:
1. Create feature branch from main
2. Work on feature
3. Merge back to main when complete
4. Delete feature branch
"""
# Feature branch workflow
git checkout main
git pull origin main
git checkout -b feature/user-profile
# ... make changes ...
git add .
git commit -m "Add user profile page"
git push origin feature/user-profile
# ... create pull request ...
# ... after merge ...
git checkout main
git pull origin main
git branch -d feature/user-profile

Gitflow Workflow

Structured workflow for release management.

"""
main:       A-----------H-------K
                       /       /
develop:    B---C---E---F---I---J
                 \     /
feature:          D---

Branch types:
- main: Production code
- develop: Integration branch
- feature/*: New features
- release/*: Release preparation
- hotfix/*: Production fixes
"""
# Gitflow example
# Start new feature
git checkout develop
git checkout -b feature/shopping-cart

# Finish feature
git checkout develop
git merge feature/shopping-cart
git branch -d feature/shopping-cart

# Start release
git checkout -b release/1.0.0 develop
# ... version bump, bug fixes ...
git checkout main
git merge release/1.0.0
git tag -a v1.0.0 -m "Release version 1.0.0"
git checkout develop
git merge release/1.0.0
git branch -d release/1.0.0

Forking Workflow

Common in open source projects.

"""
Original Repo:  main: A---B---C---D

Fork (Developer 1): main: A---B---C---D---E
                                         \
                          feature:         F---G

Process:
1. Fork repository
2. Clone your fork
3. Create feature branch
4. Push to your fork
5. Create pull request to original repo
"""

Essential Git Commands

Configuration

# Set user information
git config --global user.name "Your Name"
git config --global user.email "your.email@example.com"

# Set default editor
git config --global core.editor "vim"

# View configuration
git config --list

Basic Operations

# Initialize repository
git init

# Clone repository
git clone https://github.com/user/repo.git

# Check status
git status

# Add files to staging
git add file.txt           # Specific file
git add *.py               # Pattern
git add .                  # All files

# Commit changes
git commit -m "Message"
git commit -am "Message"   # Add and commit tracked files

# View history
git log
git log --oneline
git log --graph --all --decorate

# View changes
git diff                   # Unstaged changes
git diff --staged          # Staged changes
git diff main..feature     # Between branches

Remote Operations

# Add remote
git remote add origin https://github.com/user/repo.git

# View remotes
git remote -v

# Fetch changes
git fetch origin

# Pull changes (fetch + merge)
git pull origin main

# Push changes
git push origin main
git push -u origin feature  # Set upstream

# Delete remote branch
git push origin --delete feature/old-feature

Undoing Changes

# Unstage file
git reset HEAD file.txt

# Discard changes in working directory
git checkout -- file.txt
git restore file.txt        # Modern Git

# Amend last commit
git commit --amend

# Undo commit (keep changes)
git reset --soft HEAD~1

# Undo commit (discard changes)
git reset --hard HEAD~1

# Revert commit (create new commit)
git revert abc123

Best Practices

1. Write Meaningful Commit Messages

# Good commit messages
"""
Add user authentication with JWT tokens

Implement login, logout, and token refresh endpoints.
Add middleware for protected routes.
Include unit tests for auth service.

Closes #123
"""

# Bad commit messages
"""
Fixed stuff
WIP
Updated files
asdf
"""

# Format:
# <type>: <subject>
#
# <body>
#
# <footer>

# Types: feat, fix, docs, style, refactor, test, chore

2. Commit Often, Perfect Later

# Make frequent commits during development
git commit -m "WIP: Add basic user model"
git commit -m "WIP: Add validation"
git commit -m "WIP: Add tests"

# Clean up before merging with interactive rebase
git rebase -i HEAD~3
# Squash into single meaningful commit

3. Use .gitignore

# .gitignore example
"""
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
env/
venv/
.venv

# IDEs
.vscode/
.idea/
*.swp
*.swo

# OS
.DS_Store
Thumbs.db

# Project specific
config/local_settings.py
*.log
.env
secrets.json
"""

4. Branch Naming Conventions

"""
Use descriptive, hierarchical names:

feature/user-authentication
feature/payment-integration
bugfix/login-error
hotfix/security-vulnerability
release/v2.0.0
docs/api-documentation

Avoid:
- Special characters except - and /
- Spaces
- Ambiguous names (temp, fix, stuff)
"""

Practical Example: Complete Workflow

# Example: Adding a new feature
"""
Project: E-commerce platform
Feature: Product recommendation system
"""

# 1. Start from updated main
# git checkout main
# git pull origin main

# 2. Create feature branch
# git checkout -b feature/product-recommendations

# 3. Implement feature
# File: recommendations.py
class RecommendationEngine:
    """Generate product recommendations based on user history."""

    def __init__(self, user_id):
        self.user_id = user_id
        self.history = self._load_user_history()

    def _load_user_history(self):
        """Load user purchase and browse history."""
        # Implementation
        pass

    def get_recommendations(self, count=5):
        """
        Get personalized product recommendations.

        Args:
            count (int): Number of recommendations to return

        Returns:
            list: List of recommended product IDs
        """
        # Implementation using collaborative filtering
        similar_users = self._find_similar_users()
        recommendations = self._aggregate_preferences(similar_users)
        return recommendations[:count]

    def _find_similar_users(self):
        """Find users with similar purchase patterns."""
        # Implementation
        pass

    def _aggregate_preferences(self, users):
        """Aggregate product preferences from similar users."""
        # Implementation
        pass

# 4. Write tests
# File: test_recommendations.py
import pytest
from recommendations import RecommendationEngine

class TestRecommendationEngine:
    def test_initialization(self):
        engine = RecommendationEngine(user_id=1)
        assert engine.user_id == 1

    def test_get_recommendations_returns_list(self):
        engine = RecommendationEngine(user_id=1)
        recs = engine.get_recommendations(count=3)
        assert isinstance(recs, list)
        assert len(recs) <= 3

    def test_recommendations_are_unique(self):
        engine = RecommendationEngine(user_id=1)
        recs = engine.get_recommendations(count=10)
        assert len(recs) == len(set(recs))

# 5. Commit changes
# git add recommendations.py test_recommendations.py
# git commit -m "feat: Add product recommendation engine
#
# Implement collaborative filtering for personalized recommendations.
# Include comprehensive test suite.
#
# Related to #456"

# 6. Push to remote
# git push -u origin feature/product-recommendations

# 7. Create pull request (on GitHub/GitLab)
# 8. Code review and address feedback
# 9. Merge to main
# 10. Delete feature branch
# git branch -d feature/product-recommendations

Exercises

Basic Exercises

  1. Repository Setup

    • Initialize a Git repository
    • Create a Python project with at least 3 files
    • Make your first commit
    • Check the commit history
  2. Basic Workflow

    • Modify two files
    • Stage only one file
    • Commit the staged file
    • Check status and diff for remaining changes
    • Commit the second file
  3. Branching Practice

    • Create a new branch called feature/calculator
    • Add a simple calculator function
    • Switch back to main
    • Merge the feature branch

Intermediate Exercises

  1. Merge Conflict Resolution

    • Create two branches from main
    • Modify the same line in the same file in both branches
    • Merge one branch to main
    • Attempt to merge the second branch (will conflict)
    • Resolve the conflict and complete the merge
  2. Gitflow Simulation

    • Set up main and develop branches
    • Create a feature branch from develop
    • Implement a feature (e.g., user authentication)
    • Merge back to develop
    • Create a release branch
    • Merge release to both main and develop
  3. Interactive Rebase

    • Create 5 small commits
    • Use interactive rebase to squash them into 2 meaningful commits
    • Reword one commit message

Advanced Exercises

  1. Complex Workflow

    • Simulate a team environment with multiple features
    • Create 3 feature branches
    • Make commits on each branch
    • Handle conflicts during merging
    • Use rebase to maintain clean history
  2. Cherry-Pick Scenario

    • Create a feature branch with multiple commits
    • Identify one specific commit that's needed urgently
    • Cherry-pick that commit to main
    • Handle any conflicts
    • Research: git cherry-pick <commit-hash>
  3. Recovering Lost Work

    • Make several commits
    • Reset hard to an earlier commit (losing recent work)
    • Use git reflog to find lost commits
    • Recover the lost work
    • Research: git reflog and git reset --hard <commit>
  4. Bisect Debugging

    • Create a repository with 10 commits
    • Introduce a bug in one of the middle commits
    • Use git bisect to identify which commit introduced the bug
    • Research: git bisect start, git bisect good, git bisect bad

Common Pitfalls

1. Working Directly on Main

# Bad
"""
main: A---B---C---D (all your work)
"""

# Good
"""
main:    A-----------D (merges only)
          \         /
feature:   B-------C
"""

2. Committing Large Binaries

# Avoid committing large files:
# - Videos, large images
# - Database dumps
# - Compiled binaries
# - Dependencies (use package managers)

# Use Git LFS for necessary large files
# Or store in external storage with references

3. Not Pulling Before Pushing

# This can cause conflicts
# git push origin main
# ! [rejected] main -> main (fetch first)

# Always pull first
git pull origin main
# Resolve any conflicts
git push origin main

Summary

Version control with Git is essential for modern software development. Key takeaways:

  1. Commits are snapshots of your codebase at specific points in time
  2. Branches allow parallel development without affecting main code
  3. Merging integrates changes from different branches
  4. Rebasing creates linear history but rewrites commits
  5. Workflows provide structure for team collaboration
  6. Best practices include meaningful commits, frequent commits, and proper branching

Git enables:

  • Effective collaboration across teams
  • Safe experimentation with features
  • Complete history tracking
  • Ability to rollback changes
  • Code review through pull requests

Mastering Git takes practice, but the investment pays dividends in productivity and code quality.

Additional Resources

Next Reading

Continue to 02-sdlc.md to learn about the Software Development Lifecycle and how teams plan, develop, and deliver software projects.