spyderproxy

How to Scrape Instagram in 2026: Profiles, Posts & Hashtags

A

Alex R.

|
Published date

Apr 07, 2026

|14 min read

Instagram has over 2 billion monthly active users and is one of the richest sources of public data on the internet — brand mentions, influencer metrics, hashtag trends, competitor analysis, and market research signals. But scraping it without the right setup leads to instant IP bans and CAPTCHAs.

This guide covers every practical method for extracting Instagram data in 2026, from the official Graph API to custom Python scrapers with proxy rotation. We include working code examples and explain exactly which proxy type to use for each approach.

Why Scrape Instagram?

Businesses and researchers scrape Instagram for legitimate purposes including:

  • Influencer marketing — Analyze follower counts, engagement rates, and posting frequency before partnering with creators
  • Brand monitoring — Track mentions of your brand across posts, stories, and comments
  • Competitor analysis — Monitor competitor posting strategies, hashtag usage, and audience growth
  • Hashtag research — Find trending hashtags in your niche and track their performance over time
  • Market research — Understand consumer sentiment from comments and captions
  • Academic research — Study social media behavior, trends, and content virality

Instagram’s Anti-Scraping Protections

Instagram employs aggressive anti-bot measures that make scraping challenging without proxies:

  • Rate limiting — Requests from a single IP are throttled after a few hundred calls
  • IP blocking — Repeated requests trigger temporary or permanent IP bans
  • Login walls — Many endpoints require authentication, and Instagram monitors login patterns
  • Browser fingerprinting — Headless browsers are detected through JavaScript execution patterns
  • CAPTCHA challenges — Suspicious traffic triggers image verification challenges

This is why proxy rotation is essential for any Instagram scraping project beyond a few requests. Without it, your IP gets flagged within minutes.

Method 1: Instagram Graph API (Official)

The Instagram Graph API is the official way to access Instagram data. It is best for accessing your own business account data or accounts that have authorized your app.

What You Can Access

  • Your own business/creator account metrics
  • Media objects (posts, reels, stories) on accounts you manage
  • Comments on your posts
  • Basic profile information
  • Hashtag search (limited to 30 unique hashtags per 7-day window)

Setup Steps

  1. Create a Facebook Developer account at developers.facebook.com
  2. Create a new app and add the Instagram Graph API product
  3. Connect your Instagram Business or Creator account
  4. Generate a long-lived access token

Python Example

import requests

ACCESS_TOKEN = "your_long_lived_token"
INSTAGRAM_ACCOUNT_ID = "your_account_id"
BASE_URL = "https://graph.facebook.com/v19.0"

# Get your recent media
def get_recent_posts(account_id, limit=25):
    url = f"{BASE_URL}/{account_id}/media"
    params = {
        "fields": "id,caption,media_type,timestamp,like_count,comments_count,permalink",
        "limit": limit,
        "access_token": ACCESS_TOKEN,
    }
    response = requests.get(url, params=params)
    return response.json()

# Get hashtag posts
def search_hashtag(hashtag_name):
    # Step 1: Get hashtag ID
    search_url = f"{BASE_URL}/ig_hashtag_search"
    params = {
        "q": hashtag_name,
        "user_id": INSTAGRAM_ACCOUNT_ID,
        "access_token": ACCESS_TOKEN,
    }
    result = requests.get(search_url, params=params).json()
    hashtag_id = result["data"][0]["id"]

    # Step 2: Get recent media for that hashtag
    media_url = f"{BASE_URL}/{hashtag_id}/recent_media"
    media_params = {
        "user_id": INSTAGRAM_ACCOUNT_ID,
        "fields": "id,caption,media_type,timestamp,permalink",
        "access_token": ACCESS_TOKEN,
    }
    return requests.get(media_url, params=media_params).json()

posts = get_recent_posts(INSTAGRAM_ACCOUNT_ID)
for post in posts.get("data", []):
    print(f"{post['timestamp']} - Likes: {post.get('like_count', 'N/A')}")

Limitations: The Graph API only works for accounts you own or manage. You cannot scrape competitor profiles, discover random public posts, or access data at scale through this method. For broader data collection, you need the methods below.

Method 2: Scraping Instagram with Python & Proxies

For scraping public profiles, posts, and hashtags at scale, you need a custom scraper with rotating proxies. Instagram exposes some data through public-facing web endpoints that return JSON.

Setting Up Proxy Rotation

First, configure SpyderProxy residential proxies for rotation. Residential IPs are essential here because Instagram blocks datacenter IP ranges aggressively.

import requests
import time
import random

# SpyderProxy residential proxy configuration
PROXY_HOST = "geo.spyderproxy.com"
PROXY_PORT = 11000
PROXY_USER = "your_username"
PROXY_PASS = "your_password"

def get_proxy():
    """Returns proxy dict with session rotation"""
    session_id = random.randint(100000, 999999)
    proxy_url = f"http://{PROXY_USER}-session-{session_id}:{PROXY_PASS}@{PROXY_HOST}:{PROXY_PORT}"
    return {"http": proxy_url, "https": proxy_url}

HEADERS = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36",
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
}

Scraping a Public Profile

import json

def scrape_instagram_profile(username):
    """Scrape public Instagram profile data"""
    url = f"https://www.instagram.com/api/v1/users/web_profile_info/?username={username}"

    headers = {
        **HEADERS,
        "X-IG-App-ID": "936619743392459",
        "X-Requested-With": "XMLHttpRequest",
    }

    for attempt in range(3):
        try:
            proxy = get_proxy()
            response = requests.get(url, headers=headers, proxies=proxy, timeout=15)

            if response.status_code == 200:
                data = response.json()
                user = data["data"]["user"]
                return {
                    "username": user["username"],
                    "full_name": user["full_name"],
                    "biography": user["biography"],
                    "followers": user["edge_followed_by"]["count"],
                    "following": user["edge_follow"]["count"],
                    "posts_count": user["edge_owner_to_timeline_media"]["count"],
                    "is_verified": user["is_verified"],
                    "is_business": user["is_business_account"],
                    "profile_pic": user["profile_pic_url_hd"],
                }
            elif response.status_code == 429:
                print(f"Rate limited, rotating proxy (attempt {attempt + 1})")
                time.sleep(random.uniform(3, 7))
            else:
                print(f"Status {response.status_code}, retrying...")
                time.sleep(2)
        except requests.exceptions.RequestException as e:
            print(f"Request error: {e}")
            time.sleep(2)

    return None

# Example usage
profile = scrape_instagram_profile("instagram")
if profile:
    print(f"@{profile['username']} - {profile['followers']:,} followers")

Scraping Posts from a Profile

def scrape_user_posts(username, max_posts=50):
    """Scrape recent posts from a public Instagram profile"""
    url = f"https://www.instagram.com/api/v1/users/web_profile_info/?username={username}"

    headers = {
        **HEADERS,
        "X-IG-App-ID": "936619743392459",
        "X-Requested-With": "XMLHttpRequest",
    }

    proxy = get_proxy()
    response = requests.get(url, headers=headers, proxies=proxy, timeout=15)

    if response.status_code != 200:
        return []

    data = response.json()
    edges = data["data"]["user"]["edge_owner_to_timeline_media"]["edges"]

    posts = []
    for edge in edges[:max_posts]:
        node = edge["node"]
        posts.append({
            "shortcode": node["shortcode"],
            "caption": node.get("edge_media_to_caption", {}).get("edges", [{}])[0].get("node", {}).get("text", ""),
            "likes": node.get("edge_liked_by", {}).get("count", 0),
            "comments": node.get("edge_media_to_comment", {}).get("count", 0),
            "timestamp": node["taken_at_timestamp"],
            "is_video": node["is_video"],
            "display_url": node["display_url"],
            "url": f"https://www.instagram.com/p/{node['shortcode']}/",
        })

    return posts

posts = scrape_user_posts("natgeo", max_posts=12)
for p in posts:
    print(f"  Likes: {p['likes']:,} | Comments: {p['comments']:,} | {p['url']}")

Method 3: Headless Browser with Playwright

For scraping content that requires JavaScript rendering or login sessions, Playwright with proxy rotation provides the most reliable approach.

from playwright.sync_api import sync_playwright
import json
import random

def scrape_hashtag_playwright(hashtag, proxy_user, proxy_pass):
    """Scrape hashtag page using Playwright with proxy"""
    session_id = random.randint(100000, 999999)

    with sync_playwright() as p:
        browser = p.chromium.launch(
            headless=True,
            proxy={
                "server": "http://geo.spyderproxy.com:11000",
                "username": f"{proxy_user}-session-{session_id}",
                "password": proxy_pass,
            },
        )
        page = browser.new_page()
        page.set_extra_http_headers({"Accept-Language": "en-US,en;q=0.9"})

        url = f"https://www.instagram.com/explore/tags/{hashtag}/"
        page.goto(url, wait_until="networkidle", timeout=30000)

        # Extract data from page
        content = page.content()
        browser.close()

        return content

Which Proxy Type Should You Use?

Your proxy choice determines whether your scraper succeeds or gets blocked immediately. Here is what works for Instagram:

Proxy TypeInstagram Success RateBest ForSpyderProxy Price
Premium Residential95%+Profile scraping, post extractionFrom $2.75/GB
LTE Mobile99%+Highest success rate, login sessionsFrom $2/proxy
Budget Residential85-90%High-volume hashtag monitoringFrom $1.75/GB
DatacenterBelow 30%Not recommended for InstagramFrom $3.55/mo

Recommendation: Use Premium Residential for most scraping tasks. For accounts that require login or targets with the strictest anti-bot, use LTE Mobile proxies — they use real carrier IPs that are nearly impossible for Instagram to distinguish from real mobile users.

Best Practices for Instagram Scraping

  1. Rotate IPs per request — Use a new proxy session for each profile or page you scrape
  2. Add random delays — Wait 2–5 seconds between requests to mimic human behavior
  3. Rotate User-Agents — Use a pool of real browser User-Agent strings
  4. Handle rate limits gracefully — On 429 responses, back off exponentially and switch proxies
  5. Respect robots.txt — Only scrape publicly accessible data
  6. Cache results — Store scraped data locally to avoid re-fetching the same profiles
  7. Use residential proxies — Datacenter IPs are blocked almost immediately on Instagram

Saving Scraped Data

import csv
import json

def save_to_csv(profiles, filename="instagram_data.csv"):
    """Save scraped profiles to CSV"""
    if not profiles:
        return

    keys = profiles[0].keys()
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=keys)
        writer.writeheader()
        writer.writerows(profiles)
    print(f"Saved {len(profiles)} profiles to {filename}")

def save_to_json(data, filename="instagram_data.json"):
    """Save scraped data to JSON"""
    with open(filename, "w", encoding="utf-8") as f:
        json.dump(data, f, indent=2, ensure_ascii=False)
    print(f"Saved data to {filename}")

Legal Considerations

Before scraping Instagram, understand the legal landscape:

  • Terms of Service — Instagram’s ToS prohibits automated data collection. Violating ToS can result in account suspension but is generally a civil matter, not criminal
  • Public data — Courts have generally ruled that scraping publicly accessible data is legal (see hiQ Labs v. LinkedIn)
  • GDPR/CCPA — If you collect personal data from EU or California residents, ensure compliance with data protection regulations
  • Rate of access — Excessive scraping that degrades service could constitute a computer fraud violation in some jurisdictions

Always scrape responsibly. Only collect publicly available data, implement reasonable rate limits, and comply with applicable privacy laws.

Frequently Asked Questions

Can I scrape Instagram without proxies?

Technically yes, but you will be rate-limited within a few dozen requests. Any serious data collection project requires rotating residential proxies to avoid IP bans.

Is it legal to scrape Instagram?

Scraping publicly accessible Instagram data is generally legal under US law, but it may violate Instagram’s Terms of Service. Always consult legal counsel for your specific use case and jurisdiction.

What is the best proxy type for Instagram?

Residential proxies deliver the best balance of success rate and cost. For the highest success rate, LTE mobile proxies are unmatched because they use real carrier IPs. See our mobile proxy comparison for details.

How many profiles can I scrape per day?

With properly rotating residential proxies, you can scrape thousands of profiles per day. The key is using a new IP for each request and adding random delays between 2 and 5 seconds.

Does Instagram detect headless browsers?

Yes, Instagram uses browser fingerprinting to detect automation tools. Playwright with stealth plugins and residential proxies provides the best evasion. Read our guide on rotating proxies with Python for more techniques.

Conclusion

Scraping Instagram effectively in 2026 requires the right combination of tools: Python for the scraping logic, residential or mobile proxies for IP rotation, and smart request patterns to avoid detection. Start with the official Graph API for your own account data, and use the proxy-based methods for public data collection at scale.

Ready to start? Get SpyderProxy residential proxies with 120M+ IPs and built-in rotation. For Instagram specifically, we recommend starting with our Premium Residential plan or LTE Mobile proxies for the highest success rates.