API Authentication & Pagination
Learn to handle API authentication and work with paginated data
API Authentication & Pagination
Why Authentication?
APIs need to know who is making requests. Authentication proves your identity.
Reasons for authentication:
- Limit how many requests you can make
- Track usage
- Protect private data
- Charge for usage (paid APIs)
Common Authentication Methods
1. API Key
Simplest method. You get a key when you sign up.
import requests
API_KEY = "your_api_key_here"
headers = {
"X-API-Key": API_KEY
}
response = requests.get(
"https://api.example.com/data",
headers=headers
)Or in URL:
params = {
"api_key": API_KEY
}
response = requests.get(
"https://api.example.com/data",
params=params
)2. Bearer Token
Token-based authentication (OAuth, JWT).
import requests
ACCESS_TOKEN = "your_access_token_here"
headers = {
"Authorization": "Bearer " + ACCESS_TOKEN
}
response = requests.get(
"https://api.example.com/data",
headers=headers
)What "Bearer" means: Standard way to send access tokens.
3. Basic Authentication
Username and password (less common now).
import requests
response = requests.get(
"https://api.example.com/data",
auth=("username", "password")
)What this does: Automatically encodes username:password for authentication.
Environment Variables for API Keys
Never hardcode API keys in your code!
Create .env file:
API_KEY=your_actual_key_here
API_SECRET=your_secret_here
Use in Python:
import os
import requests
API_KEY = os.environ.get("API_KEY")
if not API_KEY:
print("API key not found!")
else:
headers = {"X-API-Key": API_KEY}
response = requests.get(
"https://api.example.com/data",
headers=headers
)Or use python-dotenv:
from dotenv import load_dotenv
import os
import requests
load_dotenv()
API_KEY = os.getenv("API_KEY")Install dotenv:
pip install python-dotenv
What is Pagination?
When data has thousands of items, APIs don't send everything at once. They break it into pages.
Why pagination:
- Faster responses
- Less data transfer
- Prevents server overload
Common pagination styles:
- Page number (page 1, 2, 3...)
- Cursor-based (next token)
- Offset-based (skip first N items)
Page Number Pagination
Most common style. Request specific page numbers.
import requests
page = 1
per_page = 10
params = {
"page": page,
"per_page": per_page
}
response = requests.get(
"https://api.example.com/posts",
params=params
)
data = response.json()
print("Page", page, "items:", len(data))What per_page means: How many items per page.
Getting All Pages
Loop through all pages until no more data.
import requests
all_items = []
page = 1
while True:
params = {"page": page, "per_page": 10}
response = requests.get(
"https://api.example.com/posts",
params=params
)
if response.status_code != 200:
break
items = response.json()
if not items:
break
all_items.extend(items)
print("Got page", page, ":", len(items), "items")
page = page + 1
print("Total items:", len(all_items))What this does:
- Starts at page 1
- Gets items from each page
- Adds to all_items list
- Stops when no more items
Pagination with Link Headers
Some APIs provide next page URL in headers.
import requests
url = "https://api.github.com/users/octocat/repos"
all_repos = []
while url:
response = requests.get(url)
if response.status_code != 200:
break
repos = response.json()
all_repos.extend(repos)
if "next" in response.links:
url = response.links["next"]["url"]
else:
url = None
print("Total repos:", len(all_repos))What response.links does: Extracts pagination links from response headers.
Cursor-Based Pagination
Uses cursor token to get next set of results.
import requests
all_items = []
cursor = None
while True:
params = {"limit": 20}
if cursor:
params["cursor"] = cursor
response = requests.get(
"https://api.example.com/data",
params=params
)
if response.status_code != 200:
break
data = response.json()
items = data["items"]
all_items.extend(items)
cursor = data.get("next_cursor")
if not cursor:
break
print("Total items:", len(all_items))What cursor does: Points to position in dataset. More reliable than page numbers.
Rate Limiting
APIs limit how many requests you can make.
Common limits:
- 100 requests per hour
- 1000 requests per day
- 10 requests per second
Check headers for rate limit info:
import requests
response = requests.get(
"https://api.example.com/data",
headers={"Authorization": "Bearer token"}
)
print("Limit:", response.headers.get("X-RateLimit-Limit"))
print("Remaining:", response.headers.get("X-RateLimit-Remaining"))
print("Reset:", response.headers.get("X-RateLimit-Reset"))What these mean:
- Limit: Maximum requests allowed
- Remaining: How many left
- Reset: When limit resets (usually timestamp)
Handling Rate Limits
import requests
import time
def make_request_with_rate_limit(url, headers):
response = requests.get(url, headers=headers)
if response.status_code == 429:
print("Rate limit hit. Waiting...")
retry_after = int(response.headers.get("Retry-After", 60))
time.sleep(retry_after)
return make_request_with_rate_limit(url, headers)
return response
response = make_request_with_rate_limit(
"https://api.example.com/data",
{"Authorization": "Bearer token"}
)What status 429 means: Too many requests. Need to wait.
Practice Example
The scenario: Get all repositories from GitHub user with authentication and pagination.
import requests
import os
import time
GITHUB_TOKEN = os.getenv("GITHUB_TOKEN")
if not GITHUB_TOKEN:
print("Please set GITHUB_TOKEN environment variable")
exit()
headers = {
"Authorization": "Bearer " + GITHUB_TOKEN,
"Accept": "application/vnd.github+json"
}
username = "octocat"
url = "https://api.github.com/users/" + username + "/repos"
all_repos = []
page = 1
while True:
params = {
"per_page": 30,
"page": page
}
print("Fetching page", page + "...")
response = requests.get(url, headers=headers, params=params)
if response.status_code != 200:
print("Error:", response.status_code)
break
remaining = response.headers.get("X-RateLimit-Remaining")
print("Rate limit remaining:", remaining)
repos = response.json()
if not repos:
break
all_repos.extend(repos)
page = page + 1
time.sleep(1)
print()
print("Total repositories:", len(all_repos))
print()
print("Repository names:")
for repo in all_repos:
print("-", repo["name"], "(" + str(repo["stargazers_count"]) + " stars)")What this program does:
- Gets GitHub token from environment
- Sets up authentication headers
- Loops through all pages
- Checks rate limit remaining
- Adds small delay between requests
- Shows all repos with star counts
OAuth 2.0 Flow
For APIs that use OAuth (like Google, Facebook).
Basic flow:
- Direct user to authorization URL
- User logs in and approves
- Get authorization code
- Exchange code for access token
- Use access token in requests
Getting access token:
import requests
token_url = "https://oauth.example.com/token"
data = {
"grant_type": "authorization_code",
"code": "authorization_code_here",
"client_id": "your_client_id",
"client_secret": "your_client_secret"
}
response = requests.post(token_url, data=data)
if response.status_code == 200:
tokens = response.json()
access_token = tokens["access_token"]
refresh_token = tokens["refresh_token"]
print("Access token obtained")Using access token:
headers = {"Authorization": "Bearer " + access_token}
response = requests.get("https://api.example.com/data", headers=headers)Key Points to Remember
Use environment variables for API keys and secrets. Never hardcode them in your code.
Add authentication to requests using headers (most common) or auth parameter.
Pagination breaks large datasets into pages. Loop through pages to get all data.
Check response headers for pagination info (next page URL, cursor, etc).
Respect rate limits. Check headers for limit info and handle 429 status code.
Common Mistakes
Mistake 1: Hardcoding API keys
API_KEY = "abc123" # DON'T DO THIS!Use environment variables.
Mistake 2: Not handling rate limits
while True:
response = requests.get(url) # Will hit rate limit!Add delays or check rate limit headers.
Mistake 3: Assuming all data in one response
response = requests.get(url)
data = response.json() # Might be only first page!Check for pagination.
Mistake 4: Infinite pagination loop
while True:
response = requests.get(url, params={"page": page})
# Forgot to check if empty or add break condition!What's Next?
Congratulations! You've completed Module 3: Data Import/Export. You now know how to work with CSV, Excel, SQL databases, JSON, XML, and APIs. These skills let you get data from anywhere and use it in your Python programs.