Scraping Guide¶
Xeepy provides powerful, flexible scraping capabilities for X/Twitter. This guide covers all scraping features with detailed examples.
Overview¶
Xeepy can scrape virtually any public data from X/Twitter:
-
Scrape all replies to any tweet
-
Get detailed user profile information
-
Extract follower lists with metadata
-
Get who a user follows
-
Scrape user tweets and timelines
-
Unroll and extract full threads
-
Search tweets with advanced filters
-
Scrape tweets by hashtag
-
Extract images and videos
-
Scrape list members and tweets
Quick Start¶
from xeepy import Xeepy
async with Xeepy() as x:
# Scrape 100 replies to a tweet
replies = await x.scrape.replies(
"https://x.com/elonmusk/status/1234567890",
limit=100
)
# Export to CSV
x.export.to_csv(replies, "replies.csv")
Common Patterns¶
Scrape with Progress¶
async with Xeepy() as x:
async for tweet in x.scrape.tweets_stream("username", limit=1000):
print(f"Got tweet: {tweet.text[:50]}...")
# Process each tweet as it comes
await process_tweet(tweet)
Scrape Multiple Users¶
async with Xeepy() as x:
users = ["user1", "user2", "user3"]
for user in users:
tweets = await x.scrape.tweets(user, limit=100)
x.export.to_csv(tweets, f"{user}_tweets.csv")
Handle Large Datasets¶
async with Xeepy() as x:
# Scrape in batches to avoid memory issues
async for batch in x.scrape.followers_batched("popular_user", batch_size=100):
# Process and save each batch
x.export.append_csv(batch, "followers.csv")
print(f"Processed {len(batch)} followers")
Rate Limiting¶
Xeepy automatically handles rate limiting to protect your account:
async with Xeepy() as x:
# Default: 20 requests/minute (safe)
replies = await x.scrape.replies(url, limit=1000)
# Customize rate limit
x.config.rate_limit.requests_per_minute = 30
Be Respectful
Higher rate limits increase detection risk. Stick to defaults unless you have a specific need.
Data Models¶
All scraped data uses typed models for consistency:
# Tweet model
reply.id # Tweet ID
reply.text # Tweet content
reply.author # User model
reply.created_at # Datetime
reply.likes # Like count
reply.retweets # Retweet count
reply.replies # Reply count
reply.url # Tweet URL
# User model
user.id # User ID
user.username # Handle (without @)
user.name # Display name
user.bio # Bio/description
user.followers_count
user.following_count
user.tweet_count
user.verified
user.created_at
Export Options¶
Every scraping function integrates with export:
async with Xeepy() as x:
data = await x.scrape.replies(url, limit=100)
# Multiple export formats
x.export.to_csv(data, "data.csv")
x.export.to_json(data, "data.json")
x.export.to_excel(data, "data.xlsx")
x.export.to_parquet(data, "data.parquet")
# Database export
await x.export.to_database(data, "sqlite:///data.db")
Best Practices¶
- Start small - Test with
limit=10before scaling up - Use caching - Avoid re-scraping the same data
- Respect rate limits - Don't disable built-in protections
- Handle errors - Network issues happen; use try/except
- Store incrementally - Save data as you scrape for large jobs
Detailed Guides¶
Choose a specific scraping topic:
- Replies Scraping - Extract conversation threads
- Profile Scraping - Get user details
- Followers Scraping - Build follower lists
- Tweet Scraping - Get user timelines
- Search Scraping - Find tweets by query
- Hashtag Scraping - Monitor hashtag activity
- Thread Unrolling - Extract full threads
- Media Scraping - Download images/videos