Instagram has over 2 billion monthly active users and is one of the richest sources of public data on the internet — brand mentions, influencer metrics, hashtag trends, competitor analysis, and market research signals. But scraping it without the right setup leads to instant IP bans and CAPTCHAs.
This guide covers every practical method for extracting Instagram data in 2026, from the official Graph API to custom Python scrapers with proxy rotation. We include working code examples and explain exactly which proxy type to use for each approach.
Businesses and researchers scrape Instagram for legitimate purposes including:
Instagram employs aggressive anti-bot measures that make scraping challenging without proxies:
This is why proxy rotation is essential for any Instagram scraping project beyond a few requests. Without it, your IP gets flagged within minutes.
The Instagram Graph API is the official way to access Instagram data. It is best for accessing your own business account data or accounts that have authorized your app.
developers.facebook.comimport requests
ACCESS_TOKEN = "your_long_lived_token"
INSTAGRAM_ACCOUNT_ID = "your_account_id"
BASE_URL = "https://graph.facebook.com/v19.0"
# Get your recent media
def get_recent_posts(account_id, limit=25):
url = f"{BASE_URL}/{account_id}/media"
params = {
"fields": "id,caption,media_type,timestamp,like_count,comments_count,permalink",
"limit": limit,
"access_token": ACCESS_TOKEN,
}
response = requests.get(url, params=params)
return response.json()
# Get hashtag posts
def search_hashtag(hashtag_name):
# Step 1: Get hashtag ID
search_url = f"{BASE_URL}/ig_hashtag_search"
params = {
"q": hashtag_name,
"user_id": INSTAGRAM_ACCOUNT_ID,
"access_token": ACCESS_TOKEN,
}
result = requests.get(search_url, params=params).json()
hashtag_id = result["data"][0]["id"]
# Step 2: Get recent media for that hashtag
media_url = f"{BASE_URL}/{hashtag_id}/recent_media"
media_params = {
"user_id": INSTAGRAM_ACCOUNT_ID,
"fields": "id,caption,media_type,timestamp,permalink",
"access_token": ACCESS_TOKEN,
}
return requests.get(media_url, params=media_params).json()
posts = get_recent_posts(INSTAGRAM_ACCOUNT_ID)
for post in posts.get("data", []):
print(f"{post['timestamp']} - Likes: {post.get('like_count', 'N/A')}")
Limitations: The Graph API only works for accounts you own or manage. You cannot scrape competitor profiles, discover random public posts, or access data at scale through this method. For broader data collection, you need the methods below.
For scraping public profiles, posts, and hashtags at scale, you need a custom scraper with rotating proxies. Instagram exposes some data through public-facing web endpoints that return JSON.
First, configure SpyderProxy residential proxies for rotation. Residential IPs are essential here because Instagram blocks datacenter IP ranges aggressively.
import requests
import time
import random
# SpyderProxy residential proxy configuration
PROXY_HOST = "geo.spyderproxy.com"
PROXY_PORT = 11000
PROXY_USER = "your_username"
PROXY_PASS = "your_password"
def get_proxy():
"""Returns proxy dict with session rotation"""
session_id = random.randint(100000, 999999)
proxy_url = f"http://{PROXY_USER}-session-{session_id}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
return {"http": proxy_url, "https": proxy_url}
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,en;q=0.9",
"Accept-Encoding": "gzip, deflate, br",
}
import json
def scrape_instagram_profile(username):
"""Scrape public Instagram profile data"""
url = f"https://www.instagram.com/api/v1/users/web_profile_info/?username={username}"
headers = {
**HEADERS,
"X-IG-App-ID": "936619743392459",
"X-Requested-With": "XMLHttpRequest",
}
for attempt in range(3):
try:
proxy = get_proxy()
response = requests.get(url, headers=headers, proxies=proxy, timeout=15)
if response.status_code == 200:
data = response.json()
user = data["data"]["user"]
return {
"username": user["username"],
"full_name": user["full_name"],
"biography": user["biography"],
"followers": user["edge_followed_by"]["count"],
"following": user["edge_follow"]["count"],
"posts_count": user["edge_owner_to_timeline_media"]["count"],
"is_verified": user["is_verified"],
"is_business": user["is_business_account"],
"profile_pic": user["profile_pic_url_hd"],
}
elif response.status_code == 429:
print(f"Rate limited, rotating proxy (attempt {attempt + 1})")
time.sleep(random.uniform(3, 7))
else:
print(f"Status {response.status_code}, retrying...")
time.sleep(2)
except requests.exceptions.RequestException as e:
print(f"Request error: {e}")
time.sleep(2)
return None
# Example usage
profile = scrape_instagram_profile("instagram")
if profile:
print(f"@{profile['username']} - {profile['followers']:,} followers")
def scrape_user_posts(username, max_posts=50):
"""Scrape recent posts from a public Instagram profile"""
url = f"https://www.instagram.com/api/v1/users/web_profile_info/?username={username}"
headers = {
**HEADERS,
"X-IG-App-ID": "936619743392459",
"X-Requested-With": "XMLHttpRequest",
}
proxy = get_proxy()
response = requests.get(url, headers=headers, proxies=proxy, timeout=15)
if response.status_code != 200:
return []
data = response.json()
edges = data["data"]["user"]["edge_owner_to_timeline_media"]["edges"]
posts = []
for edge in edges[:max_posts]:
node = edge["node"]
posts.append({
"shortcode": node["shortcode"],
"caption": node.get("edge_media_to_caption", {}).get("edges", [{}])[0].get("node", {}).get("text", ""),
"likes": node.get("edge_liked_by", {}).get("count", 0),
"comments": node.get("edge_media_to_comment", {}).get("count", 0),
"timestamp": node["taken_at_timestamp"],
"is_video": node["is_video"],
"display_url": node["display_url"],
"url": f"https://www.instagram.com/p/{node['shortcode']}/",
})
return posts
posts = scrape_user_posts("natgeo", max_posts=12)
for p in posts:
print(f" Likes: {p['likes']:,} | Comments: {p['comments']:,} | {p['url']}")
For scraping content that requires JavaScript rendering or login sessions, Playwright with proxy rotation provides the most reliable approach.
from playwright.sync_api import sync_playwright
import json
import random
def scrape_hashtag_playwright(hashtag, proxy_user, proxy_pass):
"""Scrape hashtag page using Playwright with proxy"""
session_id = random.randint(100000, 999999)
with sync_playwright() as p:
browser = p.chromium.launch(
headless=True,
proxy={
"server": "http://geo.spyderproxy.com:11000",
"username": f"{proxy_user}-session-{session_id}",
"password": proxy_pass,
},
)
page = browser.new_page()
page.set_extra_http_headers({"Accept-Language": "en-US,en;q=0.9"})
url = f"https://www.instagram.com/explore/tags/{hashtag}/"
page.goto(url, wait_until="networkidle", timeout=30000)
# Extract data from page
content = page.content()
browser.close()
return content
Your proxy choice determines whether your scraper succeeds or gets blocked immediately. Here is what works for Instagram:
| Proxy Type | Instagram Success Rate | Best For | SpyderProxy Price |
|---|---|---|---|
| Premium Residential | 95%+ | Profile scraping, post extraction | From $2.75/GB |
| LTE Mobile | 99%+ | Highest success rate, login sessions | From $2/proxy |
| Budget Residential | 85-90% | High-volume hashtag monitoring | From $1.75/GB |
| Datacenter | Below 30% | Not recommended for Instagram | From $3.55/mo |
Recommendation: Use Premium Residential for most scraping tasks. For accounts that require login or targets with the strictest anti-bot, use LTE Mobile proxies — they use real carrier IPs that are nearly impossible for Instagram to distinguish from real mobile users.
import csv
import json
def save_to_csv(profiles, filename="instagram_data.csv"):
"""Save scraped profiles to CSV"""
if not profiles:
return
keys = profiles[0].keys()
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(profiles)
print(f"Saved {len(profiles)} profiles to {filename}")
def save_to_json(data, filename="instagram_data.json"):
"""Save scraped data to JSON"""
with open(filename, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2, ensure_ascii=False)
print(f"Saved data to {filename}")
Before scraping Instagram, understand the legal landscape:
Always scrape responsibly. Only collect publicly available data, implement reasonable rate limits, and comply with applicable privacy laws.
Technically yes, but you will be rate-limited within a few dozen requests. Any serious data collection project requires rotating residential proxies to avoid IP bans.
Scraping publicly accessible Instagram data is generally legal under US law, but it may violate Instagram’s Terms of Service. Always consult legal counsel for your specific use case and jurisdiction.
Residential proxies deliver the best balance of success rate and cost. For the highest success rate, LTE mobile proxies are unmatched because they use real carrier IPs. See our mobile proxy comparison for details.
With properly rotating residential proxies, you can scrape thousands of profiles per day. The key is using a new IP for each request and adding random delays between 2 and 5 seconds.
Yes, Instagram uses browser fingerprinting to detect automation tools. Playwright with stealth plugins and residential proxies provides the best evasion. Read our guide on rotating proxies with Python for more techniques.
Scraping Instagram effectively in 2026 requires the right combination of tools: Python for the scraping logic, residential or mobile proxies for IP rotation, and smart request patterns to avoid detection. Start with the official Graph API for your own account data, and use the proxy-based methods for public data collection at scale.
Ready to start? Get SpyderProxy residential proxies with 120M+ IPs and built-in rotation. For Instagram specifically, we recommend starting with our Premium Residential plan or LTE Mobile proxies for the highest success rates.