Scraping User Tweets¶

Extract tweets from any public Twitter profile to analyze content strategy, engagement patterns, and posting behavior.

Overview¶

The tweets scraper retrieves a user's timeline including tweet text, media, engagement metrics, and metadata. This data powers content analysis, competitive research, and historical tweet archiving.

Use Cases¶

Content Analysis: Study what content performs best for an account
Competitive Intelligence: Monitor competitor posting strategies
Historical Archive: Back up tweets for research or compliance
Engagement Research: Identify high-performing tweet patterns

Basic Usage¶

import asyncio
from xeepy import Xeepy

async def scrape_user_tweets():
    async with Xeepy() as x:
        # Get recent tweets from a user
        tweets = await x.scrape.tweets("elonmusk", limit=100)

        for tweet in tweets:
            print(f"{tweet.created_at}: {tweet.text[:50]}...")
            print(f"  Likes: {tweet.likes} | RTs: {tweet.retweets}")

        # Export for analysis
        x.export.to_csv(tweets, "user_tweets.csv")

asyncio.run(scrape_user_tweets())

Advanced Filtering¶

async def filtered_tweet_scrape():
    async with Xeepy() as x:
        # Scrape with advanced filters
        tweets = await x.scrape.tweets(
            username="techcrunch",
            limit=500,
            include_retweets=False,    # Original tweets only
            include_replies=False,      # No reply tweets
            min_likes=100,              # Only popular tweets
            media_only=False,           # Include text-only tweets
            since="2024-01-01",         # Date range start
            until="2024-12-31"          # Date range end
        )

        # Analyze engagement
        total_engagement = sum(t.likes + t.retweets for t in tweets)
        avg_engagement = total_engagement / len(tweets) if tweets else 0
        print(f"Average engagement: {avg_engagement:.1f}")

asyncio.run(filtered_tweet_scrape())

Extracting Tweet Components¶

async def extract_tweet_details():
    async with Xeepy() as x:
        tweets = await x.scrape.tweets("username", limit=50)

        for tweet in tweets:
            # Access all tweet properties
            print(f"ID: {tweet.id}")
            print(f"Text: {tweet.text}")
            print(f"Created: {tweet.created_at}")
            print(f"Likes: {tweet.likes}")
            print(f"Retweets: {tweet.retweets}")
            print(f"Replies: {tweet.reply_count}")
            print(f"Quotes: {tweet.quote_count}")
            print(f"Views: {tweet.views}")

            # Media attachments
            if tweet.media:
                for media in tweet.media:
                    print(f"Media: {media.type} - {media.url}")

            # Hashtags and mentions
            print(f"Hashtags: {tweet.hashtags}")
            print(f"Mentions: {tweet.mentions}")
            print(f"URLs: {tweet.urls}")

asyncio.run(extract_tweet_details())

Batch Processing¶

async def batch_tweet_analysis():
    async with Xeepy() as x:
        accounts = ["user1", "user2", "user3"]
        all_tweets = []

        for account in accounts:
            tweets = await x.scrape.tweets(account, limit=200)
            all_tweets.extend(tweets)
            print(f"Scraped {len(tweets)} tweets from @{account}")

        # Aggregate analysis
        top_tweets = sorted(all_tweets, key=lambda t: t.likes, reverse=True)[:10]
        print("\nTop 10 tweets by likes:")
        for t in top_tweets:
            print(f"@{t.author.username}: {t.likes} likes - {t.text[:40]}...")

asyncio.run(batch_tweet_analysis())

Configuration Options¶

Parameter	Type	Default	Description
`username`	str	required	Target username
`limit`	int	100	Maximum tweets to retrieve
`include_retweets`	bool	True	Include retweets
`include_replies`	bool	True	Include reply tweets
`min_likes`	int	0	Minimum likes filter
`media_only`	bool	False	Only tweets with media
`since`	str	None	Start date (YYYY-MM-DD)
`until`	str	None	End date (YYYY-MM-DD)

Performance Optimization

For accounts with thousands of tweets, use date ranges to paginate through history efficiently. This prevents timeout issues and reduces memory usage.

Tweet Availability

Twitter limits how far back you can retrieve tweets. Very old tweets may not be accessible through the standard timeline.

Content Analysis Example¶

async def analyze_content_strategy():
    async with Xeepy() as x:
        tweets = await x.scrape.tweets("username", limit=500)

        # Categorize by content type
        with_media = [t for t in tweets if t.media]
        with_links = [t for t in tweets if t.urls]
        questions = [t for t in tweets if "?" in t.text]

        print(f"Tweets with media: {len(with_media)} ({len(with_media)/len(tweets)*100:.1f}%)")
        print(f"Tweets with links: {len(with_links)} ({len(with_links)/len(tweets)*100:.1f}%)")
        print(f"Questions asked: {len(questions)}")

asyncio.run(analyze_content_strategy())

Best Practices¶

Use Date Ranges: For historical analysis, specify since and until parameters
Filter Retweets: Set include_retweets=False for original content analysis
Monitor Rate Limits: Large scrapes may hit rate limits; use delays
Store Raw Data: Export raw JSON for future analysis flexibility
Respect ToS: Use scraped data responsibly and ethically