Live — 370+ endpoints

BuyCrowds API

Unified social intelligence across 22+ platforms. One call, every creator. Public data, BYOK tokens, zero credential storage.

Privacy & Data Notice

BuyCrowds retrieves only publicly available data. For private data (Instagram insights, YouTube analytics), users provide their own API tokens (BYOK). Tokens are used in-memory only and never stored.

BuyCrowds uses only publicly available data. OAuth tokens are provided by the user and used in-memory only — we do not store credentials.

Token Vault (BYOK — Bring Your Own Keys) 0 tokens

Paste your API tokens here to unlock private data on Try It buttons. Tokens are stored in sessionStorage only — they disappear when you close this tab. BuyCrowds never stores or transmits your keys.

Instagram Graph Token

Instagram User ID

YouTube Data API Key

Twitter Bearer Token

Spotify Client ID

Spotify Client Secret

Facebook Page Access Token

LinkedIn Access Token

Twitch Client ID

Twitch OAuth Token

Quick Start 3 examples

Get started in seconds. All public endpoints require no authentication.

Unified Profile — all platforms in one call

curl https://api.buycrowds.com/v1/public/profile/elonmusk

Flex Check — fun social credibility score

curl https://api.buycrowds.com/v1/public/flexcheck/mkbhd

Deep Data — maximum extraction (BYOK for private platforms)

curl "https://api.buycrowds.com/v1/public/deep/tiktok/charlidamelio"

Unified Profile 2 endpoints

Aggregate profile data across all platforms in a single call. Pass BYOK params to enrich with private data.

BYOK params (optional): ig_token, ig_user_id (Instagram Graph API token from Facebook Developer), youtube_key (YouTube Data API v3 key from Google Cloud Console), twitter_token (Twitter API Bearer Token from developer.twitter.com).

GET/v1/public/profile/:usernameUnified profile summary across all found platforms PUBLIC

GET/v1/public/profile/:username/fullFull unified profile with all available data PUBLIC

Identity & Fun 14 endpoints

Fun analytics, identity resolution, and social personality endpoints.

GET/v1/public/whoami/:usernameCross-platform identity summary PUBLIC

GET/v1/public/whereami/:usernameWhich platforms this username exists on PUBLIC

GET/v1/public/wheniwas/:usernameAccount creation timeline across platforms PUBLIC

GET/v1/public/mynumbers/:usernameAggregated follower/following counts PUBLIC

GET/v1/public/stalker/:usernameDeep-dive social activity report PUBLIC

GET/v1/public/flexcheck/:usernameSocial credibility and flex score PUBLIC

GET/v1/public/compare/:user1/:user2Head-to-head comparison of two users PUBLIC

GET/v1/public/receipts/:usernameProof of social presence and achievements PUBLIC

GET/v1/public/roast/:usernameHumorous roast based on social data PUBLIC

GET/v1/public/vibe/:usernameVibe check and personality analysis PUBLIC

GET/v1/public/resolve/:usernameResolve username to platform identities PUBLIC

GET/v1/public/find/:usernameFind user across all platforms PUBLIC

GET/v1/public/crossrefCross-reference different usernames per platform PUBLIC

GET/v1/public/all/:usernameAll public data for a username in one call PUBLIC

Analytics 7 endpoints

Cross-platform analytics: engagement rates, growth tracking, audience insights, and content performance.

GET/v1/public/analytics/engagement/:usernameEngagement rate across platforms PUBLIC

GET/v1/public/analytics/growth/:usernameGrowth metrics and trajectory PUBLIC

GET/v1/public/analytics/benchmark/:username1/:username2Benchmark two users side by side PUBLIC

GET/v1/public/analytics/summary/:usernameAnalytics overview summary PUBLIC

GET/v1/public/analytics/audience/:usernameAudience demographics and composition PUBLIC

GET/v1/public/analytics/content/:usernameContent performance analysis PUBLIC

GET/v1/public/analytics/history/:usernameHistorical analytics over time PUBLIC

Search 10 endpoints

Search across individual platforms. Pass ?q=query to each endpoint.

GET/v1/public/search/instagramSearch Instagram users PUBLIC

GET/v1/public/search/youtubeSearch YouTube channels/videos PUBLIC

GET/v1/public/search/twitterSearch Twitter/X users PUBLIC

GET/v1/public/search/spotifySearch Spotify artists/tracks PUBLIC

GET/v1/public/search/redditSearch Reddit posts PUBLIC

GET/v1/public/search/reddit/subredditsSearch subreddits by name PUBLIC

GET/v1/public/search/bluesky/postsSearch Bluesky posts PUBLIC

GET/v1/public/search/bluesky/usersSearch Bluesky users PUBLIC

GET/v1/public/search/tiktokSearch TikTok users PUBLIC

GET/v1/public/search/facebookSearch Facebook pages PUBLIC

Mega Search 2 endpoints

Search across ALL platforms simultaneously. Returns aggregated results from every supported platform.

GET/v1/public/megasearch/:querySearch all platforms for a query PUBLIC

GET/v1/public/megasearch/person/:nameFind a person across all platforms PUBLIC

Cross-Data Analysis 2 endpoints

Cross-platform data analysis. Compare audiences, find overlaps, and correlate activity.

POST/v1/public/cross/analyzeCross-platform analysis from multiple sources PUBLIC

GET/v1/public/cross/overlap/:user1/:user2Audience overlap between two users PUBLIC

Deep Data (BYOK) 1 endpoint

Maximum data extraction for any platform. For private platforms, pass your own API token via ?token= query param.

BYOK: Pass ?token=YOUR_TOKEN for platforms that require authentication. For Spotify use ?client_id=X&client_secret=Y. Tokens are used in-memory only and never stored.

GET/v1/public/deep/:platform/:usernameDeep data extraction for any platform BYOK

Data Extraction (Scraping) 5 endpoints

Maximum data extraction via scraping. Returns raw structured data from public profiles.

GET/v1/public/data/instagram/post/:shortcodeScrape a single Instagram post PUBLIC

GET/v1/public/data/instagram/:usernameScrape Instagram profile data PUBLIC

GET/v1/public/data/facebook/:pageScrape Facebook page data PUBLIC

GET/v1/public/data/tiktok/:usernameScrape TikTok profile data PUBLIC

GET/v1/public/data/twitter/:usernameScrape Twitter/X profile data PUBLIC

Per-Platform: TikTok 6 endpoints

GET/v1/public/tiktok/:username/videosUser's recent videos PUBLIC

GET/v1/public/tiktok/:username/infoUser profile information PUBLIC

GET/v1/public/tiktok/:username/followersFollower count and data PUBLIC

GET/v1/public/tiktok/:username/likesLiked videos list PUBLIC

GET/v1/public/tiktok/trendingCurrently trending TikTok content PUBLIC

GET/v1/public/tiktok/video/:video_idDetails for a specific video PUBLIC

Per-Platform: Bluesky 10 endpoints

GET/v1/public/bluesky/:handle/postsUser's recent posts PUBLIC

GET/v1/public/bluesky/:handle/followersList of followers PUBLIC

GET/v1/public/bluesky/:handle/followingAccounts this user follows PUBLIC

GET/v1/public/bluesky/:handle/likesPosts liked by this user PUBLIC

GET/v1/public/bluesky/:handle/feedUser's feed timeline PUBLIC

GET/v1/public/bluesky/:handle/listsUser's curated lists PUBLIC

GET/v1/public/bluesky/:handle/blocksBlocked accounts PUBLIC

GET/v1/public/bluesky/:handle/repostsUser's reposts PUBLIC

GET/v1/public/bluesky/post/:uri/likesLikes on a specific post PUBLIC

GET/v1/public/bluesky/post/:uri/repostsReposts of a specific post PUBLIC

Per-Platform: Reddit 10 endpoints

GET/v1/public/reddit/:username/postsUser's submitted posts PUBLIC

GET/v1/public/reddit/:username/commentsUser's comment history PUBLIC

GET/v1/public/reddit/:username/trophiesUser's Reddit trophies PUBLIC

GET/v1/public/reddit/:username/aboutUser profile information PUBLIC

GET/v1/public/reddit/:username/awardsAwards received by user PUBLIC

GET/v1/public/reddit/:username/karmaKarma breakdown by subreddit PUBLIC

GET/v1/public/reddit/subreddit/:nameSubreddit information PUBLIC

GET/v1/public/reddit/subreddit/:name/hotHot posts in a subreddit PUBLIC

GET/v1/public/reddit/subreddit/:name/topTop posts in a subreddit PUBLIC

GET/v1/public/reddit/subreddit/:name/newNew posts in a subreddit PUBLIC

Per-Platform: YouTube 8 endpoints

BYOK: Pass ?key=YOUR_API_KEY (YouTube Data API v3 key from Google Cloud Console). For analytics, pass ?token=OAUTH_TOKEN (OAuth2 token with YouTube Analytics scope).

GET/v1/public/youtube/:channel/videosChannel's uploaded videos BYOK

GET/v1/public/youtube/:channel/playlistsChannel's playlists BYOK

GET/v1/public/youtube/:channel/infoChannel information and stats BYOK

GET/v1/public/youtube/:channel/analyticsChannel analytics (requires OAuth token) BYOK

GET/v1/public/youtube/:channel/commentsRecent comments on channel videos BYOK

GET/v1/public/youtube/:channel/subscribersSubscriber count and data BYOK

GET/v1/public/youtube/searchSearch YouTube videos/channels BYOK

GET/v1/public/youtube/video/:video_id/commentsComments on a specific video BYOK

Per-Platform: Instagram 8 endpoints

BYOK: Pass ?token=IG_ACCESS_TOKEN&ig_user_id=YOUR_IG_USER_ID. Get your token from Facebook Developer > Instagram Graph API. Required for insights, stories, and private data.

GET/v1/public/instagram/:username/postsRecent posts (public scraping or BYOK) BYOK

GET/v1/public/instagram/:username/profileProfile info and metrics BYOK

GET/v1/public/instagram/:username/insightsAccount insights (requires token) BYOK

GET/v1/public/instagram/:username/storiesCurrent stories (requires token) BYOK

GET/v1/public/instagram/:username/reelsUser's reels BYOK

GET/v1/public/instagram/:username/taggedPosts user is tagged in BYOK

GET/v1/public/instagram/:username/mentionsPosts mentioning this user BYOK

GET/v1/public/instagram/hashtag/searchSearch hashtags BYOK

Per-Platform: Twitter / X 6 endpoints

BYOK: Pass ?token=BEARER_TOKEN. Get your Bearer Token from developer.twitter.com > Projects & Apps > Keys and Tokens.

GET/v1/public/twitter/:username/tweetsUser's recent tweets BYOK

GET/v1/public/twitter/:username/metricsFollower count and profile metrics BYOK

GET/v1/public/twitter/:username/followersList of followers BYOK

GET/v1/public/twitter/:username/followingAccounts the user follows BYOK

GET/v1/public/twitter/:username/likesUser's liked tweets BYOK

GET/v1/public/twitter/:username/listsUser's Twitter lists BYOK

Per-Platform: Facebook 9 endpoints

BYOK: Pass ?token=PAGE_ACCESS_TOKEN. Get from Facebook Developer > Graph API Explorer or your App's Page tokens. Required for insights and private page data.

GET/v1/public/facebook/:page_id/infoPage information and metrics BYOK

GET/v1/public/facebook/:page_id/postsPage's recent posts BYOK

GET/v1/public/facebook/:page_id/insightsPage insights and analytics BYOK

GET/v1/public/facebook/:page_id/eventsPage events BYOK

GET/v1/public/facebook/:page_id/photosPage photos BYOK

GET/v1/public/facebook/:page_id/videosPage videos BYOK

GET/v1/public/facebook/:page_id/ratingsPage ratings and reviews BYOK

GET/v1/public/facebook/post/:post_id/commentsComments on a specific post BYOK

GET/v1/public/facebook/searchSearch Facebook pages BYOK

Per-Platform: Spotify 8 endpoints

BYOK: Pass ?client_id=X&client_secret=Y. Get credentials from Spotify Developer Dashboard > Create App > Settings.

GET/v1/public/spotify/:artist_id/tracksArtist's top tracks BYOK

GET/v1/public/spotify/:artist_id/albumsArtist's albums BYOK

GET/v1/public/spotify/:artist_id/relatedRelated artists BYOK

GET/v1/public/spotify/:artist_id/infoArtist information BYOK

GET/v1/public/spotify/searchSearch artists, tracks, albums BYOK

GET/v1/public/spotify/playlist/:idPlaylist details and tracks BYOK

GET/v1/public/spotify/track/:idTrack details and audio features BYOK

GET/v1/public/spotify/album/:idAlbum details and tracks BYOK

Per-Platform: Deezer 9 endpoints

Deezer's API is fully public — no authentication required.

GET/v1/public/deezer/artist/:artist_idArtist information PUBLIC

GET/v1/public/deezer/artist/:artist_id/topArtist's top tracks PUBLIC

GET/v1/public/deezer/artist/:artist_id/albumsArtist's albums PUBLIC

GET/v1/public/deezer/artist/:artist_id/relatedRelated artists PUBLIC

GET/v1/public/deezer/searchSearch artists, tracks, albums PUBLIC

GET/v1/public/deezer/user/:user_idUser profile data PUBLIC

GET/v1/public/deezer/track/:track_idTrack details PUBLIC

GET/v1/public/deezer/album/:album_idAlbum details PUBLIC

GET/v1/public/deezer/playlist/:playlist_idPlaylist details PUBLIC

Per-Platform: LinkedIn 5 endpoints

BYOK: Pass ?token=ACCESS_TOKEN. Get from LinkedIn Developer Portal > My Apps > Auth. Requires OAuth 2.0 with r_liteprofile or r_organization_social scopes.

GET/v1/public/linkedin/:username/profileUser profile data BYOK

GET/v1/public/linkedin/:username/postsUser's recent posts BYOK

GET/v1/public/linkedin/:username/connectionsUser's connections count BYOK

GET/v1/public/linkedin/org/:org_idOrganization/company page data BYOK

GET/v1/public/linkedin/searchSearch LinkedIn profiles BYOK

Per-Platform: Twitch 6 endpoints

BYOK: Pass ?client_id=X&token=Y. Get from Twitch Developer Console > Register Application. Use Client Credentials flow for the OAuth token.

GET/v1/public/twitch/:username/channelChannel information BYOK

GET/v1/public/twitch/:username/streamsCurrent/recent streams BYOK

GET/v1/public/twitch/:username/followersFollower list BYOK

GET/v1/public/twitch/:username/videosPast broadcasts and VODs BYOK

GET/v1/public/twitch/:username/clipsTop clips BYOK

GET/v1/public/twitch/searchSearch Twitch channels BYOK

Per-Platform: Discord 3 endpoints

BYOK: Pass ?token=BOT_TOKEN. Get from Discord Developer Portal > Bot > Token. Bot must be in the server to access guild data.

GET/v1/public/discord/user/:user_idDiscord user profile BYOK

GET/v1/public/discord/guild/:guild_idServer/guild information BYOK

GET/v1/public/discord/guild/:guild_id/channelsList of channels in a guild BYOK

Per-Platform: Telegram 2 endpoints

BYOK: Pass ?bot_token=BOT_TOKEN. Create a bot via @BotFather on Telegram and use the provided token.

GET/v1/public/telegram/channel/:chat_idChannel/group information BYOK

GET/v1/public/telegram/channel/:chat_id/membersMember count BYOK

Per-Platform: WhatsApp Business 8 endpoints

BYOK: Pass ?token=ACCESS_TOKEN. Get from Meta Business Suite > WhatsApp > API Setup. Requires WhatsApp Business API access.

GET/v1/public/whatsapp/:phone_number_id/profileBusiness profile information BYOK

GET/v1/public/whatsapp/:phone_number_id/infoPhone number details BYOK

GET/v1/public/whatsapp/account/:waba_id/numbersList phone numbers in account BYOK

GET/v1/public/whatsapp/account/:waba_id/templatesMessage templates BYOK

POST/v1/public/whatsapp/:phone_number_id/sendSend a message via WhatsApp BYOK

GET/v1/public/whatsapp/account/:waba_id/analyticsAccount analytics BYOK

GET/v1/public/whatsapp/webhookWebhook verification (GET) BYOK

POST/v1/public/whatsapp/webhookWebhook event receiver (POST) BYOK

Per-Platform: Mastodon 3 endpoints

Mastodon is fully public and federated — no authentication required. Use user@instance.social format for handle.

GET/v1/public/mastodon/:handle/postsUser's recent posts/toots PUBLIC

GET/v1/public/mastodon/:handle/followersList of followers PUBLIC

GET/v1/public/mastodon/:handle/followingAccounts this user follows PUBLIC

Per-Platform: Kick 2 endpoints

Kick data is publicly accessible — no authentication required.

GET/v1/public/kick/:username/infoChannel/user information PUBLIC

GET/v1/public/kick/:username/clipsTop clips from channel PUBLIC

Per-Platform: Pinterest 5 endpoints

GET/v1/public/pinterest/:username/pinsUser's recent pins PUBLIC

GET/v1/public/pinterest/:username/boardsUser's boards PUBLIC

GET/v1/public/pinterest/:username/followersFollower list PUBLIC

GET/v1/public/pinterest/:username/followingAccounts followed PUBLIC

GET/v1/public/pinterest/searchSearch pins and boards PUBLIC

Per-Platform: Threads 1 endpoint

GET/v1/public/threads/:username/postsUser's recent Threads posts PUBLIC

Per-Platform: SoundCloud 3 endpoints

BYOK: Pass ?client_id=X. Get from SoundCloud Developer Portal (apps). Required for API access.

GET/v1/public/soundcloud/:username/profileUser profile information BYOK

GET/v1/public/soundcloud/:username/tracksUser's uploaded tracks BYOK

GET/v1/public/soundcloud/searchSearch tracks and users BYOK

Content Fetch 8 endpoints

Fetch individual content items (posts, videos, tweets) by their ID or URL.

GET/v1/public/fetch?url=URLUniversal content fetch by URL PUBLIC

GET/v1/public/instagram/post/:shortcodeFetch Instagram post by shortcode PUBLIC

GET/v1/public/instagram/reel/:shortcodeFetch Instagram reel by shortcode PUBLIC

GET/v1/public/youtube/video/:video_idFetch YouTube video details PUBLIC

GET/v1/public/reddit/post/:post_idFetch Reddit post and comments PUBLIC

GET/v1/public/bluesky/post/:handle/:rkeyFetch Bluesky post PUBLIC

GET/v1/public/twitter/tweet/:tweet_idFetch a tweet by ID PUBLIC

GET/v1/public/tiktok/video/detail/:video_idFetch TikTok video by ID PUBLIC

Batch Operations 3 endpoints

Process multiple users/checks in a single request. Send a JSON body with a list of usernames.

POST/v1/public/batch/profilesFetch profiles for multiple usernames at once PUBLIC

POST/v1/public/batch/checkCheck username availability across platforms PUBLIC

POST/v1/public/batch/compareCompare multiple users at once PUBLIC

Export 4 endpoints

Export profile data in different formats for reports, integrations, or sharing.

GET/v1/public/export/csv/:usernameExport profile data as CSV PUBLIC

GET/v1/public/export/json/:usernameExport profile data as JSON PUBLIC

GET/v1/public/export/markdown/:usernameExport profile data as Markdown PUBLIC

GET/v1/public/export/card/:usernameGenerate a shareable profile card PUBLIC

Trends 5 endpoints

Discover what's trending on each platform right now.

GET/v1/public/trends/redditTrending subreddits and posts PUBLIC

GET/v1/public/trends/blueskyTrending Bluesky content PUBLIC

GET/v1/public/trends/githubTrending GitHub repositories PUBLIC

GET/v1/public/trends/deezerTrending tracks on Deezer PUBLIC

GET/v1/public/trends/spotifyTrending on Spotify PUBLIC

Influence Scoring 3 endpoints

Calculate influence scores, categorize influencers, and rank them on leaderboards.

GET/v1/public/influence/:usernameInfluence score for a user PUBLIC

GET/v1/public/influence/category/:usernameInfluencer category classification PUBLIC

GET/v1/public/influence/leaderboardLeaderboard (?users=user1,user2,user3) PUBLIC

Hashtag Analysis 5 endpoints

Analyze hashtag performance and reach across platforms.

GET/v1/public/hashtag/instagram/:tagInstagram hashtag volume and top posts PUBLIC

GET/v1/public/hashtag/tiktok/:tagTikTok hashtag views and trending PUBLIC

GET/v1/public/hashtag/reddit/:tagReddit posts with this tag/keyword PUBLIC

GET/v1/public/hashtag/bluesky/:tagBluesky posts with this hashtag PUBLIC

GET/v1/public/hashtag/youtube/:tagYouTube videos with this tag PUBLIC

Timeline 1 endpoint

Unified cross-platform activity timeline for a user.

GET/v1/public/timeline/:usernameChronological timeline across all platforms PUBLIC

Verification 2 endpoints

Verify account authenticity and cross-platform identity consistency.

GET/v1/public/verify/:usernameVerification status across platforms PUBLIC

GET/v1/public/verify/cross/:usernameCross-platform identity verification PUBLIC

Reports 2 endpoints

Generate comprehensive reports with insights and recommendations.

GET/v1/public/report/:usernameFull analytical report PUBLIC

GET/v1/public/report/:username/strengthsStrengths and opportunities analysis PUBLIC

Link-in-Bio 2 endpoints

Auto-generated link-in-bio pages from cross-platform data.

GET/v1/public/bio/:usernameLink-in-bio data (JSON) PUBLIC

GET/v1/public/bio/:username/htmlRendered HTML link-in-bio page PUBLIC

Monitoring 2 endpoints

Monitor platform health and account status.

GET/v1/public/monitor/status/:platform/:usernameAccount status on a specific platform PUBLIC

GET/v1/public/monitor/healthPlatform health and availability PUBLIC

Embeds & Widgets 3 endpoints

Embeddable widgets and badges for websites. Also supports oEmbed protocol.

GET/v1/public/embed/:platform/:usernameEmbeddable profile widget PUBLIC

GET/v1/public/embed/:platform/:username/badgeSmall profile badge for embedding PUBLIC

GET/v1/public/oembedoEmbed protocol endpoint PUBLIC

Diff & Snapshots 2 endpoints

Take snapshots of profiles and compare changes over time.

GET/v1/public/diff/:platform/:usernameTake a snapshot of current state PUBLIC

POST/v1/public/diff/compareCompare two snapshots PUBLIC

Quick Connect OAuth 3 endpoints

Simplified OAuth flow for connecting social accounts without full dashboard registration.

GET/v1/auth/providersList available OAuth providers PUBLIC

POST/v1/auth/connect/:platformInitiate OAuth connection to a platform PUBLIC

GET/v1/auth/callbackOAuth callback handler PUBLIC

AI Generation (BYOK) 10 endpoints

AI-powered content generation and analysis. Requires API Key authentication.

BYOK + API Key: These endpoints require an API Key (header x-api-key) and may use your own AI provider keys passed in the request body. AI tokens are used in-memory only.

POST/v1/ai/generate-postGenerate a social media post API KEY

POST/v1/ai/generate-captionGenerate a caption for media API KEY

POST/v1/ai/generate-hashtagsGenerate relevant hashtags API KEY

POST/v1/ai/generate-imageGenerate an image for a post API KEY

POST/v1/ai/analyze-profileAI analysis of a social profile API KEY

POST/v1/ai/generate-bioGenerate an optimized bio API KEY

POST/v1/ai/content-calendarGenerate a content calendar API KEY

POST/v1/ai/reply-suggestionsSuggest replies to comments API KEY

POST/v1/ai/trend-analysisAI-powered trend analysis API KEY

POST/v1/ai/competitor-reportAI competitor analysis report API KEY

Dashboard (JWT Auth) 28 endpoints

Dashboard API for managing your BuyCrowds account. Requires JWT token from POST /api/auth/login. Pass as Authorization: Bearer TOKEN.

Auth

POST/api/auth/registerCreate a new account PUBLIC

POST/api/auth/loginLogin and get JWT token PUBLIC

GET/api/auth/meCurrent user info JWT

Social Networks (BYOK Credentials)

GET/api/social-networksList connected networks JWT

POST/api/social-networksAdd a social network JWT

GET/api/social-networks/:idGet network details JWT

PATCH/api/social-networks/:idUpdate network config JWT

DELETE/api/social-networks/:idRemove a network JWT

POST/api/social-networks/:network/auth-urlGet OAuth URL for a network JWT

Social Accounts

GET/api/social-accountsList linked social accounts JWT

GET/api/social-accounts/:idAccount details JWT

DELETE/api/social-accounts/:idUnlink an account JWT

GET/api/social-accounts/:id/metricsAccount metrics JWT

GET/api/social-accounts/pending/:session_tokenCheck pending OAuth connection JWT

POST/api/social-accounts/pending/:session_token/finalizeFinalize OAuth connection JWT

Press Kits

GET/api/press-kitsList your press kits JWT

POST/api/press-kitsCreate a press kit JWT

PATCH/api/press-kits/:idUpdate a press kit JWT

DELETE/api/press-kits/:idDelete a press kit JWT

GET/api/pk/:slugPublic press kit by slug PUBLIC

Posts & Content

POST/api/postsCreate a multipost JWT

GET/api/postsList your posts JWT

GET/api/posts/:idGet post details JWT

DELETE/api/posts/:idDelete a post JWT

API Keys & Webhooks

GET/api/api-keysList your API keys JWT

POST/api/api-keysCreate an API key JWT

DELETE/api/api-keys/:idRevoke an API key JWT

GET/api/webhooksList webhooks JWT

POST/api/webhooksCreate a webhook JWT

DELETE/api/webhooks/:idDelete a webhook JWT

Connections (Legacy)

GET/api/connectionsList platform connections JWT

GET/api/connections/:idConnection details JWT

DELETE/api/connections/:platformDisconnect a platform JWT

GET/api/connections/:id/metrics/:metric_typeConnection metrics JWT

POST/api/oauth/:platform/callbackOAuth callback for platform JWT

API v1 (API Key + Rate Limited) 18 endpoints

External API for integrations. Requires x-api-key header. Rate limited per key. Create keys at POST /api/api-keys.

Accounts

GET/v1/accountsList connected accounts API KEY

GET/v1/accounts/:idAccount details API KEY

Posts

POST/v1/postsCreate a post API KEY

GET/v1/postsList posts API KEY

GET/v1/posts/:idGet post by ID API KEY

PATCH/v1/posts/:idUpdate a post API KEY

DELETE/v1/posts/:idDelete a post API KEY

GET/v1/posts/:id/resultsPost publishing results API KEY

POST/v1/posts/:id/retryRetry failed post API KEY

Media

POST/v1/media/uploadUpload media file API KEY

GET/v1/mediaList uploaded media API KEY

GET/v1/media/:idMedia details API KEY

DELETE/v1/media/:idDelete media API KEY

Other

GET/v1/platformsSupported platforms info API KEY

GET/v1/usageCurrent API usage stats API KEY

GET/v1/quotaLive quota — rate limit buckets + MTD cost + today cost + budget cap + daily cap + credits + burn summary + cache savings API KEY

One call, everything you need to self-throttle. Returns the tier's canonical ceilings (per minute/hour/day), current bucket counts (read-only Hammer probe), month-to-date spend in USD, linear projected end-of-month cost, budget cap status (configured? hard_enforce? headroom? over_cap? projected_over_cap?), and next reset timestamps for each window.

Budget enforcement (402 Payment Required): when you configure PUT /v1/full-scrape/budget-cap with hard_enforce: true, every subsequent /v1 request is guarded at the plug layer. If month_to_date_usd >= monthly_usd_cap, the request short-circuits with a 402 response and a structured error payload pointing to PUT /v1/full-scrape/budget-cap (raise or disable) or GET /v1/public/cost/tiers (upgrade). Cap is checked via a 30s ETS cache over a single indexed sum(cost_usd) query — negligible per-request overhead.

Response headers (on EVERY /v1 call):
• x-ratelimit-tier, x-ratelimit-limit, x-ratelimit-remaining — current tier + per-minute window
• x-budget-cap-usd, x-budget-spent-usd, x-budget-remaining-usd, x-budget-used-pct — only when a cap is configured
• x-cost-category, x-cost-marginal-usd, x-tier — marginal cost of this request

Use case: a dashboard polling /v1/quota every 10s can render a full cost + rate-limit widget without hitting any other endpoint. A production client can read the x-budget-* headers from any response and pre-emptively back off before hitting the hard cap.

GET/v1/creator/:slugCreator data by slug API KEY

POST/v1/profiles/scoreScore a profile (Super-Premium) API KEY

GET/v1/profiles/scoring-objectivesAvailable scoring objectives API KEY

MCP Server (Model Context Protocol) 1 endpoint

Model Context Protocol server. Supports tools/list, initialize, and tools/call methods via JSON-RPC 2.0.

POST/mcpMCP JSON-RPC handler (tools/list, initialize, tools/call) PUBLIC

OAuth Integration System 8 endpoints

Multi-provider OAuth integration management. Create integrations for any supported platform and manage tokens.

GET/v1/oauth/providersList available OAuth providers API KEY

POST/v1/oauth/integrationsCreate an OAuth integration API KEY

GET/v1/oauth/integrationsList your integrations API KEY

DELETE/v1/oauth/integrations/:idDelete an integration API KEY

POST/v1/oauth/integrations/:id/authorizeGet authorization URL API KEY

POST/v1/oauth/integrations/:id/refreshRefresh OAuth token API KEY

POST/v1/oauth/integrations/:id/testTest integration connectivity API KEY

GET/v1/oauth/callbackOAuth callback (redirect target) PUBLIC

Vault (Secret Management) 3 endpoints

Manage BYOK secrets securely. Store, list, and revoke connector credentials.

POST/v1/vault/secretsStore a BYOK secret API KEY

GET/v1/vault/secretsList stored secrets (metadata only) API KEY

DELETE/v1/vault/secrets/:connector_idRevoke a stored secret API KEY

Per-Platform: GitHub 12 endpoints

GitHub data is mostly public. No auth required for public repos/profiles.

GET/v1/public/github/:username/reposUser's repositories PUBLIC

GET/v1/public/github/:username/followersUser's followers PUBLIC

GET/v1/public/github/:username/followingAccounts user follows PUBLIC

GET/v1/public/github/:username/eventsRecent public events PUBLIC

GET/v1/public/github/:username/starredStarred repositories PUBLIC

GET/v1/public/github/:username/orgsOrganizations PUBLIC

GET/v1/public/github/:username/gistsPublic gists PUBLIC

GET/v1/public/github/:username/languagesProgramming languages used PUBLIC

GET/v1/public/github/:username/contributionsContribution graph data PUBLIC

GET/v1/public/github/:username/socialSocial accounts linked to GitHub PUBLIC

GET/v1/public/github/repo/:owner/:repoRepository details PUBLIC

GET/v1/public/github/repo/:owner/:repo/contributorsRepository contributors PUBLIC

Network Analysis 3 endpoints

Analyze social networks, find overlaps, and get connection suggestions.

GET/v1/public/network/:usernameSocial network graph PUBLIC

GET/v1/public/network/overlap/:user1/:user2Network overlap between two users PUBLIC

GET/v1/public/network/suggest/:usernameSuggested connections PUBLIC

Media Analysis 2 endpoints

GET/v1/public/media/summary/:usernameMedia content summary across platforms PUBLIC

GET/v1/public/media/calendar/:usernameContent publishing calendar PUBLIC

Collaboration 2 endpoints

Find collaboration matches and compatibility between creators.

GET/v1/public/collab/match/:user1/:user2Collaboration compatibility score PUBLIC

POST/v1/public/collab/findFind potential collaborators PUBLIC

Brand Safety 2 endpoints

GET/v1/public/brand-safety/:usernameBrand safety check PUBLIC

GET/v1/public/brand-safety/:username/detailedDetailed brand safety report PUBLIC

Audience Analysis 2 endpoints

GET/v1/public/audience/:usernameAudience profile and demographics PUBLIC

GET/v1/public/audience/compare/:user1/:user2Compare audiences of two users PUBLIC

SEO Analysis 2 endpoints

GET/v1/public/seo/:usernameSocial SEO score PUBLIC

GET/v1/public/seo/:username/backlinksSocial backlinks analysis PUBLIC

Pricing & ROI 3 endpoints

GET/v1/public/pricing/:usernameEstimated sponsorship pricing PUBLIC

GET/v1/public/pricing/roi/:usernameROI estimate for campaigns PUBLIC

GET/v1/public/pricing/compare/:user1/:user2Compare pricing between creators PUBLIC

Campaign Planning 2 endpoints

POST/v1/public/campaign/planPlan an influencer campaign PUBLIC

GET/v1/public/campaign/suggest/:usernameSuggest campaign strategies PUBLIC

AI Analysis (Public) 3 endpoints

AI-powered public analysis endpoints. No auth required.

GET/v1/public/ai/persona/:usernameAI persona analysis PUBLIC

GET/v1/public/ai/predict/:usernameGrowth prediction PUBLIC

GET/v1/public/ai/content-ideas/:usernameContent ideas generation PUBLIC

Reputation 2 endpoints

GET/v1/public/reputation/:usernameReputation score PUBLIC

GET/v1/public/reputation/:username/historyReputation history over time PUBLIC

Fake Detection 2 endpoints

GET/v1/public/fake-check/:usernameAnalyze account authenticity PUBLIC

GET/v1/public/fake-check/compare/:user1/:user2Compare authenticity of two accounts PUBLIC

Growth Hacking 3 endpoints

GET/v1/public/growth/tips/:usernamePersonalized growth tips PUBLIC

GET/v1/public/growth/best-time/:usernameBest times to post PUBLIC

GET/v1/public/growth/hashtags/:usernameHashtag strategy recommendations PUBLIC

Competitor Analysis 2 endpoints

GET/v1/public/competitor/:usernameCompetitor analysis PUBLIC

GET/v1/public/competitor/gap/:user1/:user2Gap analysis between competitors PUBLIC

More Tools (Sentiment, Links, Quiz, Benchmarks, Portfolio, Monetization, Niche, Stats, Power, Digest & more) 45 endpoints

Sentiment

GET/v1/public/sentiment/:usernameSentiment analysis of user's content PUBLIC

Link Tracker

GET/v1/public/links/:usernameScan all links in user's bios and posts PUBLIC

Quiz / Fun Analysis

GET/v1/public/quiz/social-iq/:usernameSocial IQ quiz based on data PUBLIC

GET/v1/public/quiz/personality/:usernameSocial personality type PUBLIC

Benchmarks

GET/v1/public/benchmark/industry/:usernameIndustry benchmark comparison PUBLIC

GET/v1/public/benchmark/percentile/:usernamePercentile ranking PUBLIC

Portfolio / Media Kit

GET/v1/public/portfolio/:usernameAuto-generated portfolio PUBLIC

GET/v1/public/portfolio/:username/media-kitMedia kit for brands PUBLIC

Rate Check

GET/v1/public/rate-checkAPI rate limit status PUBLIC

Monetization

GET/v1/public/monetization/:usernameMonetization opportunities PUBLIC

GET/v1/public/monetization/:username/revenueRevenue estimate PUBLIC

Crosspost Strategy

GET/v1/public/crosspost/:usernameCrosspost strategy recommendations PUBLIC

GET/v1/public/crosspost/:username/scheduleOptimal crosspost schedule PUBLIC

Archive / Snapshots

GET/v1/public/archive/:usernameSnapshot all platform data PUBLIC

GET/v1/public/archive/:username/compareCompare historical snapshots PUBLIC

Demographics

GET/v1/public/demographics/:usernameEstimated audience demographics PUBLIC

Niche Detection

GET/v1/public/niche/:usernameDetect creator's niche PUBLIC

GET/v1/public/niche/suggestions/:nicheGet creators in a niche PUBLIC

Namecheck

GET/v1/public/namecheck/:usernameCheck username availability everywhere PUBLIC

GET/v1/public/namecheck/suggest/:usernameSuggest available username variations PUBLIC

Stats

GET/v1/public/stats/platformsPlatform statistics overview PUBLIC

GET/v1/public/stats/apiAPI usage statistics PUBLIC

Power Score

GET/v1/public/power/:usernameCalculate cross-platform power score PUBLIC

Daily Digest

GET/v1/public/digest/:usernameDaily activity digest PUBLIC

Similar Creators

GET/v1/public/similar/:usernameFind similar creators PUBLIC

Engagement Calculator

GET/v1/public/engagement/:usernameCalculate engagement rate PUBLIC

Achievements

GET/v1/public/achievements/:usernameCheck social achievements/milestones PUBLIC

Viral Potential

GET/v1/public/viral/:usernameViral potential score PUBLIC

Rate Card

GET/v1/public/rate-card/:usernameGenerate sponsorship rate card PUBLIC

Platform Compare

GET/v1/public/platform-compare/:usernameCompare performance across platforms PUBLIC

Summary

GET/v1/public/summary/:usernameOne-liner profile summary PUBLIC

GET/v1/public/summary/:username/pitchElevator pitch for creator PUBLIC

Media Kit HTML

GET/v1/public/media-kit/:usernameRendered HTML media kit PUBLIC

Consistency Check

GET/v1/public/consistency/:usernameCheck branding consistency PUBLIC

Clone Detection

GET/v1/public/clone-check/:usernameDetect impersonator/clone accounts PUBLIC

Worth Estimation

GET/v1/public/worth/:usernameEstimate total account worth PUBLIC

Account Age

GET/v1/public/account-age/:usernameCheck account age across platforms PUBLIC

Social DNA

GET/v1/public/dna/:usernameAnalyze social DNA and behavior patterns PUBLIC

Security Shield

GET/v1/public/shield/:usernameSecurity and privacy scan PUBLIC

Fanbase Analysis

GET/v1/public/fanbase/:usernameAnalyze fanbase composition PUBLIC

Social Wrapped

GET/v1/public/wrapped/:usernameGenerate yearly social media wrapped PUBLIC

Generic Platform Data

GET/v1/public/:platform/:usernamePublic data for any platform PUBLIC

Cost & Subscription 19 endpoints

Full transparency on what the API costs to run and what you'll pay. Subscription tiers, rate limits, amortized infrastructure costs per request, and a live estimator so you can model your monthly bill before you commit.

Pricing model: Free tier is subsidized by paid plans. Scraper-backed endpoints (Apify) carry the highest marginal cost. BYOK AI endpoints are zero marginal cost to us — you pay your model provider directly.

Every /v1 response ships with cost headers:
x-cost-category — which billing bucket the endpoint hit
x-cost-marginal-usd — our amortized marginal cost for the call
x-tier — your resolved subscription tier
Clients can track spend out-of-band without an extra billing round-trip.

Subscription Tiers

GET/v1/public/cost/tiersAll subscription tiers with pricing, limits & features PUBLIC

Rate Limits

GET/v1/public/cost/rate-limitsPer-tier rate limits, response headers & enforcement rules PUBLIC

Infrastructure Costs (amortized)

GET/v1/public/cost/infrastructureCompute, DB, bandwidth, scraper & monitoring cost per 1k requests PUBLIC

Monthly Estimator

GET/v1/public/cost/estimate?tier=pro&requests_per_day=5000&scraper_ratio=0.3Project your monthly bill given usage & tier PUBLIC

Cost Breakdown by Endpoint Category

GET/v1/public/cost/breakdownWeighted cost per request across endpoint categories PUBLIC

Machine-readable Rate Card (SDK-friendly)

GET/v1/public/cost/rate-cardCanonical pricing spec — tiers, categories, infra & credits in one stable payload PUBLIC

Tier Comparison (side-by-side)

GET/v1/public/cost/compare?tiers=free,starter,pro,business&requests_per_day=10000Compare multiple tiers at a given usage level — highlights the cheapest fit PUBLIC

Subscription & Billing Info

GET/v1/public/cost/subscriptionBilling provider, payment methods, trial, cancellation & refund policy PUBLIC

Your Actual Usage & Bill (API key required)

Authenticated endpoints — pass your API key via X-API-Key header. These read from the live rate-limit buckets and project a personalized monthly bill.

GET/v1/cost/my-usageLive counters for your minute/hour/day buckets + full scrape credits, % of limit & health indicator API KEY

GET/v1/cost/my-billProjected monthly bill based on your observed traffic API KEY

GET/v1/cost/my-summaryUnified dashboard — requests + credits + bill + tier recommendation in one call API KEY

GET/v1/cost/upgrade-recommendationBest-fit tier for your actual usage pattern (factors in full scrape credits too) API KEY

GET/v1/cost/tier-recommendationBidirectional tier recommender — classifies direction (upgrade/downgrade/optimal) with monthly savings/extra cost API KEY

GET/v1/cost/tier-recommendationsFull cost matrix per tier: subscription + projected overage = total. Marks ineligible (Free blocks overage). Picks cheapest eligible API KEY

GET/v1/cost/rate-limit-forecastPredictive exhaustion: given current req/sec velocity, how many seconds until minute/hour/day limit is hit. Identifies binding window API KEY

GET/v1/cost/tier-safety-check?tier=starter&days=30Backward-looking: would tier X have worked for the last N days? Peak rate vs target limits, cost delta, overage blocked check API KEY

GET/v1/cost/labels-breakdown?since=2026-04-01T00:00:00Z&limit=50Spend by metadata.labels (unnested server-side). One event with multiple labels contributes to each bucket — labels are orthogonal tags API KEY

GET/v1/cost/hourly-timeseries?hours=48&cost_center=X&fill_gaps=trueHour-bucketed spend time series via date_trunc. Optional center filter. fill_gaps=true inserts zero rows for quiet hours API KEY

GET/v1/cost/spike-detector?hours=168&baseline_hours=120&threshold=3.0Z-score anomaly detection on hourly spend. Flags detection-window hours where |z| ≥ threshold API KEY

GET/v1/cost/center-drift?current_days=7&baseline_days=7&min_delta_pct=5Per-center share_pct drift between two back-to-back windows. Flags new/dropped centers + biggest movers sorted by |delta| API KEY

GET/v1/cost/hourly-timeseries/export?hours=168&cost_center=XCSV download of iter 131 hourly series. Always fill_gaps=true. Content-Disposition attachment, max 720 rows API KEY

GET/v1/cost/platform-breakdown?since=2026-04-01T00:00:00Z&cost_center=XSpend aggregated by metadata.platform (instagram/tiktok/etc). Optional cost_center drill-down. Unknown bucket for missing tags API KEY

GET/v1/cost/platform-center-crosstab?since=2026-04-01T00:00:00Z2D matrix: platform × cost_center spend. Sorted axes + row/col totals + grand total. Heatmap-ready API KEY

GET/v1/cost/histogram?bucket_count=10&since=2026-04-01T00:00:00ZCost-per-event distribution via Postgres width_bucket. Min/max/avg/p50/p95/p99 + N bucketed ranges + skew flag API KEY

GET/v1/cost/top-expensive?limit=20&cost_center=X&event_type=full_scrape_overageTop-N scrapes ordered by cost_usd DESC. Drill-down pair with /histogram to identify heavy-tail offenders API KEY

GET/v1/cost/top-usernames?limit=20&cost_center=XAggregated top-N most scraped handles. GROUP BY metadata.username with event count, total cost, avg cost, share_pct API KEY

GET/v1/cost/refund-rate-by-center?days=30&min_scrapes=5Reliability-as-money per center. SQL pivot of scrapes vs refunds, refund_rate_pct + healthy/watch/degraded/critical tier API KEY

GET/v1/cost/pareto?threshold_pct=8080/20 concentration analysis by cost_center. Cumulative share + pareto_80_20 flag + interpretation string API KEY

GET/v1/cost/refund-rate-by-platform?days=30&min_scrapes=5Upstream provider reliability signal. Per-platform refund rate + tier + worst_platform callout API KEY

POST/v1/cost/project-scenarioProspective calculator. Body {scenarios: [{name, daily_scrapes, price_per_scrape_usd, days}]}. Checks each against current headroom API KEY

POST/v1/cost/estimate-batchData-driven per-username estimate. Body {usernames[], default_price_usd?, days?}. Uses historical avg cost, falls back to default for new handles API KEY

GET/v1/cost/healthCompressed status gauge: healthy/warning/critical/exceeded + used_pct + binding_cap + one-liner. Widget-ready polling endpoint API KEY

GET/v1/cost/top-usernames/export?limit=200&cost_center=XCSV export of iter 141 top-usernames. Columns: username, count, total, avg, share_pct, first/last event_at API KEY

GET/v1/cost/cap-etaCap exhaustion predictor: ETA for monthly + daily caps at current velocity. Identifies binding cap. Pure compute via CapExhaustionPredictor module API KEY

GET/v1/cost/paceBudget pace tracker: actual vs linear-pace expected at this point in month. Status way_ahead/ahead/on_pace/behind/way_behind + ahead_or_behind_days API KEY

GET/v1/cost/volatility?days=30Spend volatility analyzer: stddev + coefficient of variation + class (stable/moderate/volatile/chaotic). Pure compute via SpendVolatility module API KEY

GET/v1/cost/forecast-bands?days=14&forecast_days=7Linear regression spend forecast with 95% prediction intervals. Slope, intercept, r_squared, residual_se + per-day expected/lower/upper bands API KEY

GET/v1/cost/top-expensive/export?limit=200&cost_center=XCSV export of iter 140 top-expensive events. RFC 4180 escaping. Content-Disposition attachment. Max 200 rows API KEY

GET/v1/cost/proration?target_tier=pro&switch_date=2026-04-15Prorated subscription delta for a mid-month tier switch API KEY

GET/v1/cost/alerts-log?event_type=budget_cap_threshold_crossed&since=2026-04-01T00:00:00Z&limit=50Historical log of cost alert fires — threshold crossings + daily digests + delivery outcomes API KEY

Cost alert fire history (iter 108). Every time the CostAlerts GenServer decides to fire — whether it's a credit threshold crossing (iter 63), budget threshold crossing, or daily digest (iter 93) — a row is written to the new cost_alert_fires table. This endpoint queries that history with filters.

Distinct from /webhook-deliveries: /v1/full-scrape/webhook-deliveries (iter 86) shows Oban retry queue state — which attempts are scheduled, retrying, or discarded. /cost/alerts-log is the PRE-Oban ledger: "at this timestamp, our GenServer decided to fire an alert, and after the delivery attempt the outcome was X". The two together give you the full picture — alert-log for business logic audit, webhook-deliveries for network-layer debugging.

Response shape:

{count, summary: {credit_threshold_crossed: N, budget_cap_threshold_crossed: M,
      daily_spend_digest: K}, filters, fires: [{id, event_type, threshold_level, payload,
      fired_at, delivery_outcome, inserted_at}], retention_days: 60, note}

.

delivery_outcome field: stamped asynchronously after the webhook delivery attempt completes. Possible values: "delivered" (200 OK on first try), "retry_scheduled" (first attempt failed, Oban retry queue picked it up), "error:<reason>" (retry queue insertion itself failed — rare). null if the row was just inserted and the async delivery Task hasn't completed yet.

Retention: 60 days, pruned hourly by RetentionSweeper (iter 44) alongside snapshots and billing_events. Indexed on (api_key_id, fired_at) and (api_key_id, event_type) for efficient filtering.

Use case — compliance audit: "show me all budget threshold crossings for April". One call: GET /v1/cost/alerts-log?event_type=budget_cap_threshold_crossed&since=2026-04-01T00:00:00Z&until=2026-05-01T00:00:00Z. Response lists every fire with payload + outcome. Auditor cross-references against Slack channel logs to verify alerts were acted upon.

Use case — silence investigation: "why didn't I get the 80% alert last week?". Call with filter for the expected window — if the row exists with delivery_outcome: "error:...", the alert fired but delivery failed (inspect webhook URL health). If no row exists, the threshold wasn't crossed in the stored state (inspect last_fired dedup logic).

Tier switch proration (iter 105). /cost/tier-recommendation tells you WHICH tier fits your usage. This endpoint tells you WHAT the mid-month switch would cost in dollars — the prorated delta between the two tier fees for the remaining days of the current cycle. Pure compute, no state change.

Proration math:
• days_remaining = days_in_month - switch_date.day + 1 (inclusive of switch day)
• proration_factor = days_remaining / days_in_month
• current_tier_unused_refund = current_fee × proration_factor — credit for unused time on current tier
• target_tier_prorated_charge = target_fee × proration_factor — charge for new tier for remaining days
• delta_usd = target_charge - current_refund — positive means you owe extra, negative means you get a refund

Response shape: includes all intermediate math values so UIs can render a breakdown, plus a direction classifier (upgrade / downgrade / no_change / lateral), owes_delta? / receives_refund? booleans for fast branching, and a ready-to-render call_to_action string.

Enterprise special case: Enterprise tier has no fixed monthly fee (price_monthly_usd: nil). Prorating to/from Enterprise returns special_case: "contact_sales" — proration math is negotiated directly, not computed automatically.

Parameters: target_tier (required, one of free/starter/pro/business/enterprise) and switch_date (optional, YYYY-MM-DD, defaults to today UTC).

Billing policy caveat: this endpoint calculates the MATHEMATICAL proration. Actual billing depends on your payment provider's policy — some providers bill the prorated delta immediately, others roll it into the next cycle, others offer credit that gets applied to future invoices. Treat this as "what the fair delta would be", not "what gets charged to my card tonight".

Use case — upgrade confirmation dialog: user on Starter hits a rate limit, UI pops a "upgrade to Pro" dialog. Instead of "upgrade for $99/month" (which is misleading mid-cycle), UI calls /v1/cost/proration?target_tier=pro and renders "Upgrading today costs $35 prorated ($70 - $0 refund). Full monthly fee of $99 starts next cycle." — the user sees the true mid-month impact, not a misleading full-month number.

Tier recommendation with direction (iter 88). The existing /upgrade-recommendation returns the best-fit tier but conflates both directions — a Pro user with light usage gets the same "you should change" signal as a Free user who needs to upgrade. /tier-recommendation is explicit: it classifies the direction and surfaces concrete savings or extra cost.

Direction classification:
• upgrade — recommended tier is more expensive than current. Current tier is bottlenecking actual usage.
• downgrade — recommended tier is cheaper than current. You're overpaying for headroom you don't use.
• optimal — already on the cheapest tier that fits your pattern. No change recommended.
• enterprise — projected volume exceeds all self-serve tiers (contact sales).

Response shape:

{current_tier, current_monthly_price_usd, recommended_tier,
      recommended_monthly_price_usd, direction, delta_usd, monthly_savings_usd, extra_cost_usd,
      projected_daily_requests, monthly_full_scrapes, recommendation_detail, call_to_action,
      note}

. Use monthly_savings_usd to render a "downgrade and save $X" banner, extra_cost_usd for an "upgrade for $Y more/month" CTA.

Math source: both endpoints share the same Billing.recommend_tier/2 internal function that ranks tiers by monthly cost and filters out options that would exceed daily rate limits or full scrape quota. The difference is purely presentation — /tier-recommendation adds direction classification on top of the same source of truth.

Use case — Q2 cost review: CFO asks "can we save on subscriptions?". Operator runs GET /v1/cost/tier-recommendation across all API keys and surfaces every one where direction: "downgrade". Each entry shows exact monthly savings. Agency defers the change to a safe moment (mid-month after current billing cycle settles), then the user-facing dashboard renders "Your usage fits Starter — save $70/month by downgrading".

Reliability score (rolling success rate)

GET/v1/cost/reliability?days=30Rolling success rate + upstream health classification + top failing usernames from billing events API KEY

"Is my upstream healthy?" Answers that in one call. Pure aggregation over billing events — no new state, no new tables. Counts :full_scrape_consume + :full_scrape_overage as total attempts, :full_scrape_refund as failures (iter 54/55 refund model), computes the ratio over a configurable window (default 30 days, max 90).

Response fields:
• stats.total_attempts / successful / refunded — raw counts
• stats.success_rate / success_rate_pct — 0.0-1.0 and human percent
• stats.refund_rate / refund_rate_pct — the inverse signal
• health — one of :insufficient_data (fewer than 10 attempts), :healthy (≥98%), :degraded (≥90%), :poor (<90%)
• top_failing_usernames — array of up to 10 usernames grouped by refund count, with refund_count and total_refund_usd per username. SQL GROUP BY on metadata->>'username' scoped to the refund event type
• narrative — human-readable strings ready for Slack/dashboard banner, context-aware per health bucket
• drill_down — direct links to the events log and CSV export pre-filtered to the failure window, for deeper investigation

Health thresholds:
• healthy — 98%+ success rate, flaky recovery is working. Acceptable steady-state.
• degraded — 90-98% success rate. Investigate trends; something upstream is softer than normal. Common causes: platform rate-limiting, specific accounts going private, Apify actor version drift.
• poor — <90% success rate. Critical — expect follow-up from the CostAlerts webhook (iter 63) as budget/refund numbers also trip. Consider freezing scheduled runs until resolved.

Insufficient data safeguard: below 10 attempts in the window, the endpoint returns health: "insufficient_data" instead of a misleading rate. A 0/1 refund shouldn't read as 0% success.

Use case — operator dashboard: agency dashboard polls /v1/cost/reliability every few minutes. The narrative banner shows current state. When it flips to degraded or poor, the top_failing_usernames list surfaces exactly which creators are causing the drop — so the operator can freeze that creator's schedules, contact the client, or investigate via the drill_down links without hunting through raw logs.

Period comparison (month-over-month)

GET/v1/cost/compare-periods?from=2026-03&to=2026-04Month-over-month comparison — totals + per-cost-center deltas + top movers + narrative API KEY

"How does this month compare to last month?" Answers the single most common question in agency reporting. One call, zero client-side aggregation.

Defaults: from = previous month, to = current month. Override with ?from=YYYY-MM&to=YYYY-MM. Year range is 2000-2100, month 1-12. Invalid format returns 400 with a format hint.

Response shape:
• from / to — period descriptors with year, month, label, start, end
• totals.{from, to, delta} — each delta has {from, to, delta, delta_pct} for: event_count, gross_cost_usd, refund_cost_usd, net_cost_usd, included_count, overage_count, refunded_count, scheduled_run_count
• cost_centers.{from, to, deltas} — per-center aggregate + delta array sorted by delta_usd descending. Each center delta includes {cost_center, from_cost_usd, to_cost_usd, delta_usd, delta_pct, appeared, disappeared}. appeared/disappeared flags catch centers that onboarded or churned between periods.
• top_movers.{gainers, losers} — top 5 centers with biggest positive/negative delta (sorted by magnitude). Agencies render "growth" and "churn" columns directly from these.
• narrative — array of human-readable strings ready for Slack/email/dashboard banner, e.g. "Spending up 23.4% month-over-month (+$42.15). Biggest gainer: client-acme (+$28.50)."

Use case — monthly client review: agency calls /v1/cost/compare-periods on the 1st and pastes the narrative into a Slack channel for each client. The top_movers object feeds a "growth leaders / churn alerts" table in the internal dashboard. appeared: true centers get a welcome email; disappeared: true gets a retention ping.

Reconciles with: /v1/cost/invoice for absolute numbers of either period individually, /v1/cost/centers for live current-month state, and /v1/cost/burn-down for the forward-looking projection. Together they form the complete monthly reporting pipeline.

Daily digest (on-demand spend summary)

GET/v1/cost/digest?date=2026-04-08Same payload as the iter 93 daily_digest webhook, returned as an API response for any date API KEY

On-demand variant of the iter 93 webhook digest. The daily_digest flag on CostAlert (iter 93) fires a webhook once per day automatically. This endpoint returns the SAME payload as an API response — useful for:
• Agencies that want to pull digests on their own schedule (e.g. business hours, not UTC midnight)
• Historical backfill — compute yesterday's or last-Monday's digest retroactively
• Dashboards that prefer pull over webhook delivery
• Debug / testing before enabling the auto-digest webhook

Date parameter: ?date=YYYY-MM-DD. Defaults to yesterday UTC. Bounded by the 60-day billing event retention window (iter 44 RetentionSweeper) — older dates return 400 invalid_date. Future dates also rejected.

Response shape:

{api_key_id, date, period: {since, until}, totals: {event_count, total_cost_usd},
      by_type: {full_scrape_consume, full_scrape_overage, full_scrape_refund,
      scrape_scheduled_run} each with {count, cost_usd}, top_cost_centers[] (up to 5),
      top_creators[] (up to 5, iter 80 aggregator)}

. Slightly richer than the webhook payload — includes top_creators because the on-demand endpoint can afford the extra query.

Composition with /cost/overview: the overview (iter 92) shows MTD totals across the whole month. The digest shows a SINGLE DAY's breakdown. Agencies wanting "today vs yesterday" comparison call both and render side by side.

Use case — custom delivery cadence: agency wants the digest to arrive in their ops Slack at 9am local time (not UTC midnight). They run their own cron that calls GET /v1/cost/digest?date=<yesterday> at 9am local, formats the response, and posts to Slack. The iter 93 webhook firing at 00-06 UTC is optional — some agencies disable it and drive everything through this endpoint.

Overview dashboard (morning standup in one call)

GET/v1/cost/overviewTop-level dashboard — tier + MTD + cap + top centers + top creators + operational counters + savings + rate limit in one call API KEY

The single call for "how are we doing?". The cost stack has 15+ specialized endpoints — this one stitches the most-used pieces into a unified dashboard response. Perfect for the morning standup view that surfaces everything important without navigating between endpoints.

Aggregates 9 sources:
• tier — id, name, subscription_fee_usd (from Billing.get_tier)
• mtd — spent_usd, event_count, included_scrapes, overage_scrapes, refunds {count, total_usd}
• budget_cap — configured flag, cap_usd, hard_enforce, spent_usd, headroom_usd, used_pct (or stub if no cap configured)
• top_cost_centers — top 3 by MTD spend via iter 52 aggregator
• top_creators — top 3 by MTD spend via iter 80 aggregator
• operational — quarantined_creator_count (iter 69), pending_deferred_batches (iter 78)
• rate_limit — tier_limits, current buckets, remaining (iter 50 source)
• savings_headline — cached_hits_30d, estimated_cached_savings_usd, refunds_mtd_usd
• drill_down — ready-to-use URLs for every specialized endpoint (invoice, burn-down, reliability, savings, activity, tier-recommendation)

Response is a projection, not a cache. Every sub-query runs live against current state — no stored aggregate, no invalidation concerns. Each source is indexed SQL so the full dashboard response typically returns in <200ms even with months of history.

Drill-down pattern: the overview gives you the headline numbers. Each metric has a corresponding specialized endpoint that digs deeper — the drill_down block in the response lists them with pre-filled URLs scoped to the current window. Click-through from "top creators: @neymarjr $48" to GET /v1/cost/creators?limit=20 gives you the full ranked list; from "pending deferred batches: 3" to GET /v1/full-scrape/deferred gives you the list of actual job IDs.

Use case — operator dashboard home: agency's internal dashboard loads this endpoint on page load. The top banner shows "Pro tier · MTD $187.50 / $500 cap · 8 refunds recovered · 2 pending overnight batches". All the widgets below that feed off the same JSON — no need to call /invoice + /cost/centers + /creators + /quarantine separately on page load. One request, full situational awareness.

Forward projection (day-by-day chart data)

GET/v1/cost/projection?days=14Day-by-day projected spend over a forward horizon — combines daily burn rate + recurring schedule fires API KEY

Chart-ready forward projection (iter 110). Complement to /v1/cost/burn-down (iter 62) which gives a single exhaustion date. This endpoint returns per-day rows for the next N days (default 14, max 90), ready to drop into a chart library or spreadsheet.

Projection sources:
1. Daily burn rate baseline — computed as mtd_spent / day_of_month. Applied to every future day as the constant "ad-hoc traffic" baseline.
2. Recurring schedule fires — walks each active schedule's next_run_at forward by interval_seconds and counts how many times it would fire on each target day. Per-fire cost is the marginal cost (not overage price).

Per-day row shape:

{date, estimated_new_spend_usd, schedule_fires, schedule_cost_usd,
      daily_burn_baseline_usd, cumulative_projected_usd, over_daily_cap?, over_monthly_cap?}

. The cumulative field is MTD spent + projected new spend through that day — useful for "when will I break $500?" questions.

Cap breach detection: over_monthly_cap? and over_daily_cap? flags are computed against the currently-configured budget cap (iter 50 + iter 106). If any day in the projection flips to true, the top-level totals.projected_cap_breach_date field surfaces the first breach date. Zero projections → null (safely under cap).

Methodology transparency: the response includes a methodology block explaining the linear model and its limitations: doesn't extrapolate trend acceleration, doesn't account for cache hit ratio, schedule fires counted at marginal cost not overage price. Users can cross-reference this with actuals after the window closes to gauge drift.

Complements: 3 forecasting endpoints now:
• /v1/cost/forecast (legacy) — credit quota projection
• /v1/cost/burn-down (iter 62) — single exhaustion date across global + per-center caps
• /v1/cost/projection (iter 110) — day-by-day chart data over a forward window

Use case — CFO dashboard: operations team renders a 14-day stacked bar chart with daily_burn_baseline_usd as the base, schedule contribution as a second segment, and a horizontal line at the daily cap. Days where over_daily_cap? flips to true get a red highlight. Clicking a date drills down to that day's schedules via /v1/full-scrape/schedules.

Burn-down forecast (when will I hit my cap?)

GET/v1/cost/burn-downProjects current burn rate against budget cap + per-cost-center sub-caps, with actionable daily spend recommendation API KEY

The "when will I hit my cap?" question, answered. Distinct from /v1/cost/forecast which projects credit-quota exhaustion (tier-based); this endpoint projects dollar burn against your configured budget cap (iter 50 + 59). Single call, no parameters.

Per-cap forecast fields:
• cap_usd — the ceiling (global or per-center)
• current_spent_usd — MTD spend against this cap
• daily_avg_usd — mtd_spent / day_of_month
• projected_eom_spent_usd — daily_avg × days_in_month (linear)
• headroom_usd — cap − current
• days_until_exhausted — headroom / daily_avg, nil if burn rate is zero
• projected_exhaustion_date — absolute date calendar form
• recommended_daily_spend_usd — headroom / days_remaining, the burn rate that lands exactly at the cap on the last day of the month
• over_cap? / projected_over_cap? — binary flags for UI badges

Per-cost-center rollup: each sub-cap configured via per_center_caps gets its own forecast, sorted by days_until_exhausted ascending (most-at-risk first). Frozen centers (iter 61) are included but marked frozen: true. The top-level highest_risk_center field pulls out the non-frozen center projected to exhaust soonest — the one that needs attention.

Headlines: human-readable strings suitable for Slack alerts or dashboard banners. Examples:
• "At current burn rate, you'll exceed your global cap of $500 on 2026-04-22. Reduce daily spend to $12.50 to stay under cap."
• "Cost center 'client-acme' projected to exceed its sub-cap on 2026-04-18."
• "On track — projected spend stays under all configured caps this month." (default when nothing is alarming)

Methodology: linear extrapolation from month-to-date average. Documented in the methodology section of the response so consumers know what they're rendering. Front-loaded or seasonal spend patterns can mislead a linear projection — the forecast improves with more days of data.

Use case: a CFO dashboard runs /v1/cost/burn-down every morning. When global.projected_over_cap? flips to true, the dashboard Slacks the team with the headline. Agencies using per-center sub-caps get early warning on a specific client before that client's sub-cap gets hit.

Savings dashboard (ROI of the cost stack)

GET/v1/cost/savings?year=2026&month=4Quantifies dollars saved by cached hits, refunds, and subscription coverage vs. paying live overage for everything API KEY

What did the cost stack actually save you? This endpoint turns every layer built in iters 50-57 into a single dollar number. Three savings sources:

1. Cached scrape hits (iter 57). Every ?max_age_seconds=N cache hit avoids one live scrape. Counted in a 30-day Hammer sliding bucket via Credits.record_cache_hit/2 (incremented in both the single-scrape fast-path and batch pre-scan). Savings = count × full_scrape_overage_price_usd. marginal_cost_saved_usd separately reports our internal savings (Apify + DB write).

2. Refunds (iters 54/55). Every :full_scrape_refund billing event is a dollar you didn't pay for a scrape that failed. Sum of absolute refund amounts.

3. Subscription coverage (always-on). Every :full_scrape_consume event (included-tier scrape) would have cost overage_price if you had no included quota. This is the hidden value of the subscription itself. Surfaced as subscription_coverage.usd_saved_vs_all_overage.

Headline number: hypothetical_vs_actual.dollars_saved_usd = cached savings + refund savings + coverage savings. savings_ratio_pct = savings / hypothetical gross cost (what you'd have paid without the cost stack). An agency caching aggressively on Pro tier typically sees 60–85% savings ratio.

Drill-down links: the response includes machine-readable pointers back to /v1/cost/invoice, /v1/cost/centers, receipt endpoints, and /v1/quota so auditors can trace any line from headline to source event.

Window caveat: refund + coverage savings are month-scoped (via BillingEvents.monthly_invoice_breakdown). Cached hits live in a 30-day sliding window and are labelled accordingly — historical months before the sliding window's start may under-report cache savings. This is intentional: we don't log cache hits as billing events (they're zero-cost by design), so no DB persistence.

Monthly invoice (accounting statement)

GET/v1/cost/invoice?year=2026&month=4Canonical monthly statement — subscription + usage + overage + refunds + cost centers, one call API KEY

The capstone of the cost stack. Single call that pulls from every layer built in iters 50-55: subscription tier fee, billing events (included / overage / refunded / scheduled_run), cost center attribution, and the live tier metadata. Defaults to the current UTC month; pass ?year=2026&month=3 to pull historical periods.

Response shape:
• invoice_id — inv_<api_key_id>_YYYYMM, stable across calls
• period — {year, month, start, end, is_current_month}
• subscription — {tier, tier_name, fee_usd, billing_cycle}
• usage.full_scrape — {included_count, overage_count, refunded_count, scheduled_run_count, net_full_scrapes}
• cost_breakdown — {subscription_usd, overage_usd, refunds_usd, gross_usage_usd, net_usage_usd, total_due_usd, currency}
• cost_centers[] — per-center aggregate with share_pct (scoped to the invoice period window)
• reconciliation.how_to_drill_down — links to receipt, events, quota, and centers endpoints for audit
• format — {version: "1", generated_at}

total_due_usd math: subscription_usd + overage_usd + refunds_usd (refunds carry negative sign so they subtract naturally). Included-tier scrapes log billing events at our internal marginal cost for accounting reconciliation but contribute $0 to the customer-facing bill — they're covered by the subscription fee.

Use case: agency fires 300 scrapes across 5 clients in April, tagged with cost_center=client-*. At month-end, one call to GET /v1/cost/invoice?year=2026&month=4 gives everything the accountant needs — fee ($99 Pro tier), 42 overage scrapes ($21.00), 3 failed refunds (−$1.50), net $118.50, plus per-client breakdown. Drop it into the accounting system or render as PDF client-side.

Per-creator cost breakdown (cost drivers)

GET/v1/cost/creators?since=2026-04-01T00:00:00Z&limit=20&cost_center=client-acmeTop-N creators by total spend — per-creator breakdown aggregated from billing events API KEY

"Who are my biggest cost drivers?" Aggregates billing events by metadata->>'username' — one SQL GROUP BY query returns per-creator counts + totals + first/last scrape timestamps, sorted by cost descending. Defaults to month-to-date, max 20 creators per call (overridable up to 200).

Response shape: per-creator

{username, total_scrapes, included_count, overage_count, refunded_count,
      total_cost_usd, share_pct, first_scrape_at, last_scrape_at}

. Share_pct is each creator's fraction of the total window cost — a quick read on concentration. Refunded events carry negative cost_usd so they net out of total_cost_usd naturally (a creator with 3 scrapes all refunded shows up with cost $0.00, not $1.50).

Filters:
• ?since= / ?until= — ISO8601 UTC bounds (defaults: start of current UTC month → now)
• ?limit= — top-N (default 20, max 200)
• ?cost_center= — scope to a single client. Lets you answer "of client-acme's $45 spend this month, which creators were it on?"

Totals block: creator_count (distinct usernames), total_scrapes (sum across all creators), total_cost_usd. When filtered by cost_center, total_cost_usd should match the corresponding entry in /v1/cost/centers for the same window — the two endpoints cross-check.

Use case — invoice drill-down: agency generates a monthly invoice for client-acme with /v1/cost/invoice. The client asks "what did you actually scrape for us?". One call: GET /v1/cost/creators?since=2026-04-01T00:00:00Z&cost_center=client-acme returns a ready-to-paste list of every creator they were billed for, with per-creator scrape counts and costs. Directly maps to the invoice line items without digging through the raw event log.

Cost center attribution (agency billing)

GET/v1/cost/centersMonth-to-date cost aggregated by cost_center tag — per-client billing reconciliation for agencies API KEY

Tag every scrape with an attribution code, then reconcile per client. Pass cost_center=<tag> as a body/query param on POST /v1/full-scrape/:username or POST /v1/full-scrape/batch, OR set the X-Cost-Center: <tag> request header. Tag is validated (max 64 chars, [A-Za-z0-9_.-]); invalid tags are silently dropped (attribution is opt-in — never rejects the request). Stored in the billing event's metadata.cost_center JSONB field.

/v1/cost/centers response: groups MTD events by metadata.cost_center via indexed JSONB aggregate, bucketing untagged events under unattributed. Returns per-center {event_count, total_cost_usd, share_pct, first_event_at, last_event_at} sorted by cost descending, plus top-level totals. Optional ?since=...&until=... overrides the default month-to-date window.

Use case: agency scrapes 50 creators for 3 different clients across a month. Each scrape tagged cost_center=client-acme / cost_center=client-beta / cost_center=client-gamma. At month-end, one call to /v1/cost/centers gives a ready-to-invoice per-client cost breakdown. Combine with /v1/full-scrape/jobs/:id/receipt for line-item reconciliation.

Example:

curl -X POST "https://.../v1/full-scrape/batch" \

        -H "X-API-Key: ak_xxx" \

        -H "X-Cost-Center: client-acme" \

        -d '{"usernames": ["neymarjr", "lewis.hamilton"]}'

Billing event log (audit trail)

GET/v1/cost/events?event_type=full_scrape_consume&label=q2-launch&cost_center=client-acme&username=neymarjr&limit=50&since=2026-04-01T00:00:00ZPaginated log of every credit consumption — filter by event_type, label, cost_center, username, since API KEY

Scrape labels — free-form tags orthogonal to cost_center (iter 66). Every scrape / batch POST now accepts an optional labels body param: a string array (or comma-separated string). Each label: max 40 chars, [A-Za-z0-9_.-], up to 10 per call. Invalid labels silently dropped. Stored in metadata.labels JSONB array of the resulting billing event.

Labels vs cost_center:
• cost_center (iter 52) — single string, attribution. Enforceable via sub-caps (iter 59), freezable (iter 61), aggregated in /v1/cost/centers. One per scrape.
• labels (iter 66) — array of strings, cross-cutting tags. Query-only, no enforcement. Multiple per scrape. Use for campaigns, purposes, team names, feature flags.

Example: a scrape can carry cost_center: "client-acme" AND labels: ["q2-launch", "urgent", "neymar-campaign"] simultaneously. Billing for client-acme comes from cost_center; cross-cutting analysis ("all q2-launch spend across clients") comes from labels.

Query filters on /v1/cost/events:
• ?label=q2-launch — matches events via JSONB ? operator on metadata->'labels'. Indexable if you add a GIN index on (metadata->'labels').
• ?cost_center=client-acme — matches events via metadata->>'cost_center' = $1. Faster than JSONB array ops since it's a scalar.
• ?username=neymarjr (iter 82) — matches events via metadata->>'username' = $1. Normalized lowercase + trimmed so ?username=NeymarJR and ?username=neymarjr return the same rows. Composes with the other filters — e.g. ?username=neymarjr&event_type=full_scrape_refund&since=2026-04-01T00:00:00Z returns every refund on @neymarjr in April.
• All filters can be combined with ?event_type= and ?since= for tight drill-downs. The CSV export endpoint (/v1/cost/events/export) honors all the same filters.

Username filter cross-references: use /v1/cost/creators (iter 80) for the aggregated view of spend per creator, and /v1/cost/events?username=<x> (iter 82) for the event-level timeline of one specific creator. The two endpoints give you top-down (who are my biggest spenders) and bottom-up (every individual scrape for @x) lenses on the same data.

Use case — campaign ROI: agency runs a 3-month Q2 campaign across 5 clients. Every scrape tagged with labels: ["q2-launch"] plus the client-specific cost_center. At end of campaign: GET /v1/cost/events?label=q2-launch&since=2026-04-01T00:00:00Z returns every event touched by the campaign regardless of which client paid for it — perfect for "what did the Q2 launch cost us in total" reports.

GET/v1/cost/events/summaryAggregate: totals, per-type counts, per-day breakdown — ready for billing charts API KEY

GET/v1/cost/events/export?format=csv&since=...&until=...&label=...&cost_center=...Stream billing events as CSV or JSON — same filters as /events, chunked response for accounting exports API KEY

Memory-efficient accounting dump. Uses Ecto.Repo.stream/2 (pages 500 rows at a time) inside a Repo.transaction, piped into a Phoenix chunked HTTP response. Millions of rows without blowing RAM. Same infrastructure as iter 46's snapshot export.

Filters (iter 67): all the familiar knobs from /v1/cost/events plus streaming scope — ?since= / ?until= (DateTime bounds), ?event_type= (full_scrape_consume / overage / refund / scheduled_run), ?label= (JSONB array match on metadata->'labels'), and ?cost_center= (scalar string match on metadata->>'cost_center'). All filters are cumulative — ?event_type=full_scrape_overage&cost_center=client-acme&label=q2-launch gives you a pre-sliced export ready to drop into an invoice line.

CSV columns (iter 67 extended): id, occurred_at, event_type, mode, cost_usd, resource_id, metadata_username, metadata_tier, metadata_cost_center, metadata_labels (pipe-joined, e.g. q2-launch|urgent). RFC 4180 escaped. One row per event.

JSON format: alternative ?format=json emits a JSON array — each element is the full event object including nested metadata. Use for programmatic consumption where CSV flattening would lose the labels array shape.

Headers: CSV responses come with content-disposition: attachment; filename="billing_events_YYYY-MM-DD.csv" so a browser download starts automatically.

Use case — monthly reconciliation: GET /v1/cost/events/export?format=csv&since=2026-04-01&until=2026-05-01 gives the accountant a full month's events ready to drop into QuickBooks. Per-client filtered dump: ?cost_center=client-acme. Per-campaign: ?label=q2-launch. Reconciles exactly with /v1/cost/invoice?year=2026&month=4 totals since both read the same event log.

Spend forecasting (derived from event log)

GET/v1/cost/forecastMTD cost, projected EOM, days until quota exhausted, anomaly signal, threshold crossings API KEY

GET/v1/cost/forecast/daily?days=30&include_forecast=trueChart-ready daily cost breakdown with forecast trailing line for rest of month API KEY

Proactive credit alerts (reactive webhooks)

POST/v1/cost/alertsRegister an alert webhook that fires when you cross a credit threshold API KEY

GET/v1/cost/alertsRead current alert config + last_fired state per threshold API KEY

DELETE/v1/cost/alertsRemove the alert config (stops future webhooks) API KEY

Reactive alerts — no polling needed. Register a webhook URL and thresholds, and BuyCrowds fires an HMAC-signed POST the moment your credit usage crosses any of them.

Body: {"webhook_url": "https://...", "thresholds": [0.5, 0.8, 0.95, 1.0], "cooldown_seconds": 300} — thresholds default to [0.5, 0.8, 0.95, 1.0] if omitted. Max 10 thresholds, values in (0, 1]. Cooldown 0-86400s, default 300 (prevents spam when multiple thresholds cross in quick succession).

Dedup: each threshold fires at most once per monthly cycle. last_fired resets automatically when the UTC month rolls over.

Payload shape: event: "credit_threshold_crossed", threshold, usage (credits used/quota/pct), action_hint, occurred_at. Signed with X-BuyCrowds-Signature: sha256=<hmac-sha256(api_key.key, body)>.

Architecture: subscribes to the internal billing:events PubSub topic. Every credit consume triggers a check; delivery happens under a supervised Task, so slow webhooks never block consumption. No periodic polling — purely event-driven.

Budget cap threshold webhooks (iter 63): the SAME config also fires webhooks when your dollar budget cap crosses thresholds — distinct signal from credit-quota crossings. Credit thresholds answer "am I running out of tier allowance?"; budget thresholds answer "am I running out of the dollar cap I set?". Both fire independently using the same threshold list (thresholds: [0.5, 0.8, 0.95, 1.0]) but track dedup state in parallel budget_last_fired so crossing the 80% credit alert doesn't consume the 80% budget alert.

Budget payload shape: event: "budget_cap_threshold_crossed", threshold: {level, label}, budget: {monthly_usd_cap, spent_usd, remaining_usd, ratio, pct_used}, action_hint (contextual: "approaching budget cap — consider freezing cost centers" at 80%, "budget cap exhausted — all new spend blocked" at 100%), occurred_at. Same HMAC signature header.

Required config: budget threshold alerts only fire when BOTH a CostAlert config (this endpoint) AND a BudgetCap config (PUT /v1/full-scrape/budget-cap) exist. No budget cap = no budget alerts. Missing CostAlert = no webhook destination, so no alerts of either kind.

Cycle reset: both last_fired and budget_last_fired reset together at the start of each UTC month.

Cooldown: the cooldown_seconds window applies to both alert types globally — firing a credit alert cools down budget alerts too (and vice versa) to prevent spam when multiple thresholds trip in quick succession.

Daily spend digest (iter 93): opt-in via daily_digest: true on the cost alert config. Fires a scheduled webhook once per day (UTC) with yesterday's spend summary. Distinct signal from the event-driven threshold alerts — digests always fire, regardless of whether any threshold was crossed. Useful for agencies that want a predictable morning report regardless of spend patterns.

Digest payload: event: "daily_spend_digest", api_key_id, date (yesterday ISO8601 date), period: {since, until} (UTC day bounds), totals: {event_count, total_cost_usd}, by_type: {full_scrape_consume, full_scrape_overage, full_scrape_refund, scrape_scheduled_run} each with {count, cost_usd}, and top_cost_centers[] (up to 5, via iter 52 aggregator). Same HMAC signature header as the other alert types.

Scheduling: the CostAlerts GenServer ticks every 6 hours checking which configs need a digest. A config gets its digest sent the FIRST tick on a new UTC day (e.g. if the last tick was 23:00 UTC and the next is 05:00 UTC, that's when the digest fires for that day). Dedup via last_digest_sent_at field — a digest only fires when the stored date differs from today, so crash-recovery never double-fires.

Latency note: because of the 6h tick granularity, digests typically arrive between 00:00 and 06:00 UTC (up to 6 hours after midnight). Predictable enough for daily operations but not suitable for time-critical alerts — use threshold crossings (event-driven) for that.

Use case — morning cost report: agency's ops team starts the day checking Slack. CostAlert has daily_digest: true + webhook pointing to a Slack channel. Each morning before 6 AM UTC (2 AM EST, 7 AM CET), the digest arrives with yesterday's totals + top 5 cost centers. Team opens Slack, sees the headline "yesterday: 42 scrapes, $8.50, top: client-acme $5.00, client-beta $2.50", and decides whether to dig deeper via /v1/cost/overview drill-down.

Pure compute, zero new state. Reads the billing event log and projects. Linear extrapolation from month-to-date daily average — intentionally simple and explainable.

/forecast returns: current (MTD cost, credits used, quota %), projection (daily avg, projected EOM, days until exhausted, will-exhaust-before-EOM flag), thresholds (which of 50/80/95/100% have been crossed this cycle), anomaly (today vs rolling 7d average, flagged if today ≥ 2× average AND ≥ 3 events).

/forecast/daily returns: array of day entries with is_forecast flag. Historical days show real event counts and cost; forecast days extrapolate from the historical average. Ideal for a stacked area chart — different color for historical vs forecast.

Use case: render a "you're on track for $X this month" widget, alert users before they blow their quota, detect cost anomalies from runaway schedules.

Append-only audit log. Every credit consumption (full scrape single, batch, scheduled run) writes an event with event_type, cost_usd, mode (included/overage), and metadata (username, tier, schedule_id…). Events live 30 days in-memory (ETS).

Filters on /events: event_type (full_scrape_consume, full_scrape_overage, scrape_scheduled_run), since (ISO 8601), limit (1–200).
Summary shape: total_events, total_cost_usd, by_type, by_mode, by_day — everything a dashboard needs in one call.
Use case: monthly reconciliation, dispute resolution, showing users a "credits used this month" graph, integrating with their own accounting systems.

Full Scrape On-Demand 53 endpoints

The nuclear option. One call hits every platform fresh (bypasses cache), runs every deep fetcher in parallel, and returns a unified payload with everything we can extract. Billed as credits against a monthly quota — overage charged per-request.

When to use it: one-shot enrichment for a new creator, real-time competitor checks, press-kit snapshots that need to be timestamped. Every other cheap/cached endpoint should be preferred when freshness isn't critical.

Credit quotas (per month):
Free: 3 · Starter: 50 · Pro: 300 · Business: 2000 · Enterprise: unlimited
Overage price: $0.50 per scrape · Our marginal cost: ~$0.42 per scrape (Apify + DB + bandwidth)

Preview (public — no credit consumed)

GET/v1/public/full-scrape/preview/:username?tier=pro&used=42What would be scraped, quota status, overage flag PUBLIC

Quota check (authed, cheap — no scrape)

GET/v1/full-scrape/quotaCredit state only — used/quota/remaining + status (healthy/low/exhausted/overage_only) API KEY

Run single scrape (authed — consumes credit)

POST/v1/full-scrape/:usernameRun the full scrape — deducts one credit, returns unified payload + per-platform cost breakdown API KEY

Optional body params: platforms (comma-list to restrict scope), timeout_ms (5000–120000), byok (map of platform → credentials), async (true → returns 202 + job id instead of blocking).
402 Payment Required if quota exhausted and tier has no overage allowance (Free tier is hard-blocked).

Historical job archive (iter 71)

Receipts and replays now work for jobs older than 1 hour. Until iter 71, completed/failed jobs lived in ETS with a 1-hour TTL — once swept, the /jobs/:id, /jobs/:id/receipt, and /jobs/:id/replay endpoints all returned 404. An agency reviewing last week's batch had to reconstruct it from billing events manually.

How the archive works. When a job transitions to a terminal state (:complete, :failed, :cancelled), the Jobs GenServer spawns a fire-and-forget Task that mirrors the job record to a new job_archives Postgres table. The ETS path stays unchanged — it's still the hot store, still 1-hour TTL, still microsecond lookups. The archive is a warm fallback.

Transparent fallback. Jobs.get_for/2 now checks ETS first; on miss, falls through to JobArchives.get_for/2. The controller calls the same function and never needs to know whether the job came from hot or warm storage. Both return the same map shape, plus the archived copy carries an extra archived: true flag so auditors can distinguish.

Transition-triggered archive. Archiving happens ONCE per job, on the first patch that flips status from non-terminal to terminal. Subsequent patches (refund stamping, webhook delivery) don't trigger a re-archive — the terminal_transition?/2 guard inside the GenServer cast detects the flip. This avoids hammering the DB when a job has multiple mutations in quick succession.

Retention. RetentionSweeper (iter 44) gained a third sweep target: it now also deletes job_archives rows whose finished_at is older than 30 days. Hourly tick. Zero new infrastructure — rides on the same GenServer that already sweeps snapshots + billing_events. Default retention can be tuned via JobArchives.archive_retention_days/0.

What's persisted. Everything receipt/replay need:

id, api_key_id, status, batch, username, batch_usernames, result, error, cost_center,
      replay_of, refunded_at, refund_amount_usd, refund_reason, webhook_url

, plus the lifecycle timestamps. In-flight fields like pid, progress, and expires_at are dropped — they're meaningless for a finished job.

Safety. All archive writes wrapped in try/rescue and spawned under Task.Supervisor, so a DB hiccup never crashes the Jobs GenServer and never blocks the caller. Lost archive writes are acceptable — ETS is still the source of truth for the 1-hour window.

Use case: agency processes a 20-creator batch for client-acme on Monday. Tuesday morning, the accountant wants the receipt — pre-iter 71 this was gone. Now GET /v1/full-scrape/jobs/fs_abc123/receipt pulls from the archive seamlessly. Similarly, POST /v1/full-scrape/jobs/fs_abc123/replay can re-run last week's batch without re-specifying usernames.

Async job polling (authed)

GET/v1/full-scrape/jobs/:idPoll an async scrape job — status, elapsed, result when complete, webhook delivery outcome API KEY

GET/v1/full-scrape/jobs/:id/receiptItemized post-hoc receipt — links job to billing events, cost breakdown, margin, formatted for accounting exports API KEY

POST/v1/full-scrape/jobs/:id/refundManual refund for a :failed single-scrape job — idempotent, issues credit + negative billing event API KEY

POST/v1/full-scrape/jobs/:id/scheduleCreate recurring Schedules entries from a historical job — one-shot → recurring conversion API KEY

POST/v1/full-scrape/jobs/:id/noteAppend a free-form operational note to an archived job API KEY

Scrape job notes (iter 94). Attach operational annotations to archived jobs post-hoc. Use cases: linking to external ticket IDs, documenting why a batch had issues, leaving context for future audits, noting transfers/refunds done manually.

Body:

POST /v1/full-scrape/jobs/fs_abc123/note {"note": "Wrong cost_center, moved to client-beta via /transfer-center", "label": "fix-note"}

• note — required, 1-1000 chars, free-form text
• label — optional, max 40 chars, for UI categorization (e.g. "fix-note", "jira-BC-1234", "client-request")

Storage shape: notes accumulate as a JSONB array on the job_archives row. Each entry: {at: "2026-04-09T14:32:00Z", note: "...", label: "..." | null}. New notes are appended to the end of the array — history is preserved, no deletion.

Visibility: notes appear in the notes field of the GET /v1/full-scrape/jobs/:id response once the job is archived (terminal state + within 30-day retention). They're also included in /v1/full-scrape/jobs/history listings.

Scope limitation: only archived jobs can be annotated. Active ETS jobs (in the 1-hour hot store) return 404 — wait for the job to transition to terminal state (iter 71 archives on first terminal transition). This keeps the hot path free of additional mutation concerns.

Use case — manual correction audit trail: agency realizes last week's batch was tagged with the wrong cost_center. They fix it via POST /v1/full-scrape/budget-cap/rename-center, then append a note to the original job:

POST /v1/full-scrape/jobs/fs_xxx/note {"note": "Migrated cost_center from client-acme-old to client-acme on 2026-04-09 via rename-center. See billing events for the rewrite window.", "label": "attribution-fix"}

. When someone audits the archive a month later, the note explains the historical drift.

POST/v1/full-scrape/jobs/:id/replayRe-run a historical job with the same usernames + cost_center — fresh job_id, credits charged normally API KEY

POST/v1/full-scrape/jobs/replay-bulkReplay up to 20 jobs in a single call — per-item error isolation, composes with 30-day archive API KEY

POST/v1/full-scrape/jobs/:id/resumeRe-run ONLY the failed/unprocessed items from a batch — preserves successes, charges credits only for the remainder API KEY

Resume vs replay (iter 90). Replay (iter 60) re-runs EVERYTHING from a historical job — full credit charge again. Resume runs only the items that didn't succeed the first time. Use replay when you need fresh data across the whole batch; use resume when a batch was partially successful and you just want to finish the job.

Classification logic: the controller loads the original batch job (via iter 71 archive fallback, so 30 days of history), then walks batch_usernames against result.results to classify each entry:
• succeeded — present in results with error == nil. Skipped during resume, preserved from the original job.
• failed — present in results with error != nil. Included in resume (these are the retries).
• unprocessed — in batch_usernames but NOT in result.results. This happens when a batch was cancelled mid-loop, crashed before completing, or the Task got killed. Included in resume.

Response shape:

{resumed: true, original_job_id, new_job_id, batch_size, succeeded_count, failed_count,
      unprocessed_count, resumed_count, succeeded_usernames, resumed_usernames}

. The succeeded_usernames list tells you which items were preserved — no credits charged, data already in place from the original job. resumed_usernames is what the new job is actually scraping.

Nothing-to-resume case: if every username in the original batch succeeded (clean 100% run), resume returns 422 nothing_to_resume with a hint pointing to /replay for a full re-run.

Single-scrape jobs: not supported. Single jobs are all-or-nothing — either the whole thing worked or the whole thing got refunded. Resume only makes sense for batches. Returns 422 resume_only_supports_batch.

Use case — mid-batch cancellation: ops team spotted an auth issue during a 20-creator batch and cancelled it at 12/20 completed. Five days later the auth is fixed. Instead of re-running all 20 (wasting credits on the 12 that succeeded), they call POST /v1/full-scrape/jobs/fs_abc123/resume. New job picks up the 8 remaining creators with fresh data; the 12 successful ones remain intact in the original archive. Credits consumed: 8 instead of 20.

Use case — flaky upstream recovery: a batch finished with 15/20 successful and 5 refunded (per iter 55 auto-refund). Resume re-runs just the 5 failed ones with a chance at a better outcome — the retry layer (iter 64/75) absorbs transient retries, the quarantine system (iter 69) flags persistent failures, and the agency saves credits compared to a full replay.

One-shot → recurring (iter 100): POST /v1/full-scrape/jobs/:id/schedule converts a historical job into recurring Schedules entries without retyping usernames or params. Body: {"interval_seconds": 86400, "platforms": "instagram,tiktok", "webhook_url": "https://...", "template": "..."}.

Batch fan-out: schedules are per-username (one Schedules row per creator), so scheduling a batch job creates N schedules — one per batch_usernames entry. Per-username success/failure surfaces in the response schedules[] array so the caller sees which creators got scheduled and which hit quota/duplicate errors.

Response shape:

{original_job_id, interval_seconds, usernames_count, created_count, failed_count,
      schedules: [{username, status, schedule_id, next_run_at}, {username, status: "failed",
      error}, ...]}

. Successful schedules remain committed even if others fail — individual validation errors don't roll back the batch.

Archive-aware: works for jobs in the ETS hot store (last 1h) OR the DB archive (last 30 days) via the iter 71 transparent fallback. So an agency that ran a Monday batch can come back on Friday and schedule the same creators for weekly refreshes with one call.

Use case — baseline → recurring: agency runs an initial 20-creator batch for a new client to establish baseline metrics. Once validated, they convert it to weekly refresh via POST /v1/full-scrape/jobs/fs_abc/schedule {"interval_seconds": 604800}. Response returns 20 schedule_ids, one per creator, all running on 7-day intervals. Zero retyping, consistent cost_center carry-over, direct link from ad-hoc batch → ongoing refresh.

Bulk replay — refresh a group of jobs in one call. Iter 60's /jobs/:id/replay works one job at a time; iter 73's /jobs/replay-bulk batches up to 20 originals in a single request. Each replay runs as an independent async job — the caller gets back the new job_ids and polls them individually.

Body:

{"job_ids": ["fs_abc", "fs_def", ...], "cost_center": "optional_override",
      "platforms": "optional_override", "webhook_url": "optional", "max_age_seconds": 86400}

Max 20 job_ids per bulk call — same cap as /batch. Overrides apply to ALL replays in the bulk — if you need per-job overrides, use individual /jobs/:id/replay calls.

Per-item error isolation. One failed lookup or authorization doesn't block the others. Each entry in the replays response array is either {original_id, new_job_id, status: "enqueued", replay_of, archived_source} or {original_id, status: "error", error: ""}. Top-level enqueued + errors counts give a quick rollup.

Archive-aware. Original job_ids up to 30 days old work automatically — the replay path loads each job via Jobs.get_for/2 which falls back to JobArchives (iter 71) on ETS miss. The response's archived_source: true flag tells you when the source came from warm storage vs the hot 1h ETS path.

Credit semantics. Bulk replay consumes credits normally — it's not a free retry. Each original's usernames get charged as if you were running the batch fresh. Budget cap + rate limit enforcement still applies per-replay. A bulk call that would collectively exceed your cap fails individual replays with 402-equivalent errors, but other replays continue.

Use case — weekly refresh of a client portfolio: agency tracks 20 creators for client-acme and 20 for client-beta via two separate weekly batches. On Monday morning, they list the previous week's terminal jobs via GET /v1/full-scrape/jobs/history?status=complete&since=2026-03-31, collect the job_ids, and POST them to /jobs/replay-bulk with ?max_age_seconds=604800 — any creator still fresh from the original run hits the cached path, and only stale creators get re-scraped. Zero params re-entered, 7-day cache preserved automatically.

Always async. Like single replay (iter 60), bulk replay doesn't have a sync mode — the caller gets job_ids immediately and polls each. Sync bulk would mean holding an HTTP connection open for potentially minutes per job × 20 jobs; not a reasonable HTTP shape.

Replay model. One-shot "refresh this data" button for any job, regardless of its terminal status (complete / failed / cancelled / refunded). The replay loads the original job record, inherits the usernames (single or batch) and cost_center, and dispatches a brand-new async job with a fresh id. Credits are charged normally for the replay — this is not a free retry, just a convenience shortcut that saves the caller from re-specifying params.

Body (all optional — defaults pulled from the original):
• cost_center — override the original's attribution tag
• platforms — override scrape_opts platforms (e.g. "instagram,tiktok")
• webhook_url — fire a new webhook on replay completion
• max_age_seconds — combine replay with conditional caching (iter 57) to get "refresh this batch, but skip any username I already have fresh data on"

Always async. Replay returns 202 with a new job_id regardless of whether the original was sync or async. Polling semantics match any normal async job. The new job record carries replay_of: "<original_id>" so downstream consumers (receipts, dashboards) can link back.

Quote tokens are NOT inherited. The quoted price from the original pre-flight is long expired by replay time — forcing a new pre-flight just to retain the lock would defeat the one-click replay promise. If you need price certainty on the replay, run a fresh pre-flight with the same usernames/cost_center and include the token in a regular batch call instead.

Use case — scheduled data refresh: an agency runs a 20-creator batch every Monday for a client. Instead of storing the usernames list in their own DB, they record the Monday job_id and replay it the following Monday — zero client-side state, zero params to re-specify, cost_center stays consistent. Combine with ?max_age_seconds=86400 to skip any creator that was already scraped in the last 24h by some other flow.

Not supported: sync batches (which don't create a Jobs record) cannot be replayed — they return no job_id to replay against. Use async batches ("async": true) if you want replay capability down the line.

GET/v1/full-scrape/jobs?status=pending&limit=20List active jobs (last 1h, ETS) with per-status summary API KEY

GET/v1/full-scrape/jobs/history?status=failed&cost_center=client-acme&since=2026-04-01T00:00:00Z&limit=50List historical (archived, last 30d) jobs with filters on status, cost_center, batch, time range API KEY

GET/v1/full-scrape/jobs/export?format=csv&cost_center=client-acme&since=2026-04-01T00:00:00ZStream archived jobs as CSV or JSON — accounting-ready export, memory-efficient API KEY

Job archive export (iter 109). Streams archived jobs as CSV (one row per job) or JSON (preserves nested result map). Same memory-efficient pattern as the iter 46 snapshot export — uses Ecto.Repo.stream/2 inside a transaction, chunks 200 rows at a time, pipes into the Phoenix chunked response.

CSV columns: id, status, batch, username, batch_size, successful_count, failed_count, total_cost_usd, cost_center, orchestration_id, replay_of, started_at, finished_at, refunded_at, refund_amount_usd, refund_reason. Ordered ascending by finished_at.

JSON format: alternative ?format=json emits a JSON array — each element is a full job object including the nested result map (per-username scrape results, billing breakdown, etc.). Use for programmatic consumption where CSV flattening would lose detail.

Filters (all cumulative):
• ?since= / ?until= — ISO8601 bounds on finished_at
• ?cost_center= — exact match
• ?status= — complete / failed / cancelled
• ?batch=true|false — single vs batch jobs

Use case — monthly reconciliation: finance needs an end-of-month CSV of every batch job for each client. GET /v1/full-scrape/jobs/export?format=csv&cost_center=client-acme&since=2026-04-01T00:00:00Z&until=2026-05-01T00:00:00Z returns the full month's jobs ready to drop into QuickBooks. Cross-references with /v1/cost/invoice totals for double-entry sanity checking — per-job rows here should sum to the invoice line items.

Pairs with iter 67 events export: jobs export is a higher-level view (one row per batch job), events export is a lower-level view (one row per billing event). For line-item drill-down use events; for job-level audit use this.

Historical jobs — queryable warm store. Iter 71 persisted terminal-state jobs to job_archives; iter 72 makes that table queryable. Agencies can now list "all failed jobs for client-acme last week" in one call instead of iterating through billing events.

Filters (all optional, all composable):
• ?status= — complete / failed / cancelled (terminal states only — pending/running live in ETS)
• ?cost_center= — exact match on the stored attribution tag
• ?since= / ?until= — ISO8601 bounds on finished_at
• ?batch=true|false — single vs batch jobs
• ?limit= — default 20, max 200

Response shape: jobs[] (same shape as live /jobs list + archived: true flag), count, summary (counts by status scoped to the same since/until window, via a single GROUP BY query), echoed filters, and retention_days telling the caller how far back the archive goes (default 30).

Indexes: the iter 71 migration added (api_key_id, finished_at), (api_key_id, status), and (finished_at) indexes. Typical queries (WHERE api_key_id = $1 AND finished_at >= $2 ORDER BY finished_at DESC LIMIT N) are index scans — cheap even at months of accumulated history.

Use case — end-of-week review: GET /v1/full-scrape/jobs/history?status=failed&since=2026-04-01T00:00:00Z returns every job that failed in the first week of April. Drill into each via /jobs/:id/receipt and /jobs/:id/refund (both now archive-aware from iter 71). Feed the list into a Slack thread for the ops standup.

DELETE/v1/full-scrape/jobs/:idCancel a pending or running async scrape — kills the task, credit NOT refunded API KEY

Refund model — fairness for on-demand billing. Full scrape on-demand charges up front at enqueue time. When a scrape fails (network error, platform down, upstream crash), you shouldn't pay for it. Two refund paths:

1. Automatic refund on async failure. When an async single-scrape job is marked :failed, the controller immediately calls Credits.refund/2 (incrementing an offsetting refund counter in the sliding 30-day bucket) AND writes a :full_scrape_refund billing event with negative cost_usd, so /v1/quota, /v1/cost/centers, and all MTD aggregates net out automatically. The job record is stamped with refunded_at, refund_amount_usd, and refund_reason: "auto_on_failure:<err>".

2. Manual refund via POST /v1/full-scrape/jobs/:id/refund. Idempotent — calling twice returns the existing refund record with already_refunded: true instead of double-refunding. Returns 400 for batch jobs (partial-failure batches need per-username resolution via the receipt endpoint), 422 for jobs not in :failed state. On success returns the fresh credits_after_refund effective count so clients can update their UI without re-polling /v1/quota.

How the math nets out: Credits.read/2 returns max(raw_consumed - refunded, 0) from two separate Hammer buckets — the consume counter and the refund counter. BillingEvents.mtd_cost_usd/1 does sum(cost_usd) over the month, and negative refund events naturally subtract. No DB migration required.

Sync failures: for sync-mode scrapes (async=false), an upstream crash raises a 500 and the credit stays consumed — refund manually via the endpoint above once you identify the failed job. This asymmetry is intentional: sync scrapes don't have a persistent job record until the response lands, so auto-refund hooks can't attach cleanly.

Receipt endpoint — canonical accounting view of a scrape job. Combines the in-memory job record (timeline, results, status) with billing events from the DB that fall within the job's started_at → finished_at window (±2s buffer). Matched by api_key_id + event type (full_scrape_consume / full_scrape_overage). For batch jobs, events are indexed by metadata.username to pin each line item to its specific scrape.

Response shape: receipt_id (rcpt_<job_id>), job_id, tier, status, batch, usernames, timeline (created_at / started_at / finished_at / duration_ms), items[] (per-username: mode, charged_usd, our_marginal_cost_usd, platforms, cost_breakdown), totals (item_count, total_charged_usd, our_marginal_cost_usd, gross_margin_usd), billing_events_linked (count + window), format (version, currency, generated_at).

Closes the loop: pre-flight → batch → receipt. Pre-flight projects, batch commits, receipt reconciles. The gross_margin_usd line (total_charged − our_marginal_cost_usd) gives operators visibility into per-job unit economics.

Jobs live 1h in-memory (ETS-backed, volatile across deploys). Credit is consumed at enqueue time — polling a non-existent or expired job_id does not refund the credit. Jobs are scoped to the owning API key — another key polling the same id gets 403. Recommended poll interval: 2s.

Idempotency keys (retry safety)

GET/v1/full-scrape/idempotency-keysList cached idempotency keys for the calling api_key — audit-only, bodies omitted API KEY

Pass Idempotency-Key: <unique> on credit-consuming POSTs (/v1/full-scrape/:username, /batch, /schedules) and BuyCrowds will replay the exact cached response on any retry within 24h — no duplicate credit consumption.

Replay indicator: replayed responses carry X-Idempotency-Replay: true. Fresh responses carry X-Idempotency-Key: <echo> of what was stored.

Scope: keys are scoped per API key — different customers can reuse the same literal.
Constraints: printable ASCII, max 255 chars. Malformed keys return 400.
Not cached: 4xx/5xx responses (you can retry after fixing the request). Concurrent requests with the same key are not serialized — use for retry safety, not concurrency control.

Audit listing (iter 87): GET /v1/full-scrape/idempotency-keys returns the active cached keys for your api_key — useful for debugging "did my request go through?". ETS scan via :ets.foldl filtered by the first element of each key's tuple ({api_key_id, key}), so other tenants are invisible. Expired entries are filtered out at query time even if the sweeper hasn't cleaned them yet.

Response shape: per-entry {key, status, created_at, expires_at, age_seconds}, sorted by created_at descending. Response bodies are intentionally omitted — they can be multi-KB each and this is audit, not replay. To replay a cached response, re-submit the original POST with the same Idempotency-Key header.

Use case — retry debugging: client sees a timeout on a batch POST. Was the batch consumed? Call GET /v1/full-scrape/idempotency-keys, find the key, verify status: 200 + age_seconds: 15. The batch DID go through — the client can safely re-submit the same key to replay the cached response (same job_id) without double-charging credits.

Snapshots (30-day metric history)

GET/v1/full-scrape/snapshots?username=neymarjr&limit=30List condensed metric snapshots — auto-captured on every scrape, 30-day retention API KEY

GET/v1/full-scrape/snapshots/:idSingle snapshot with full per-platform metric breakdown API KEY

POST/v1/full-scrape/snapshots/:id/extendExtend a snapshot's TTL beyond the default 30 days (max 365 days from capture) — pin reference data API KEY

POST/v1/full-scrape/snapshots/:id/reset-expiryRestore a snapshot's TTL back to the default 30 days from capture API KEY

Snapshot TTL extension (iter 85). By default every snapshot has a 30-day TTL enforced by the RetentionSweeper (iter 44) — rows past expires_at get deleted on the hourly sweep. For reference data you want to keep longer (quarterly comparison baselines, client-approved historical states, audit records), extend the TTL to prevent the sweep.

Body: POST /v1/full-scrape/snapshots/:id/extend {"additional_days": 60}. Adds additional_days to the current expires_at. Clamped to 365 days from the original capture — you can't extend indefinitely. The response echoes the new expires_at + max_possible_expires_at + a clamped_to_max? flag if the extension hit the ceiling.

Ownership-scoped: 404 for snapshots owned by another api_key. 422 exceeds_max_ttl when the snapshot is already past the 365-day ceiling (no-op request).

Reset: POST /v1/full-scrape/snapshots/:id/reset-expiry restores the TTL back to default (captured_at + 30 days). If the default TTL has already passed by the time you reset, the snapshot is marked for sweep in 1 hour as a grace period (response carries grace_period_applied?: true).

Use case — quarterly client baselines: agency captures a snapshot of client-acme's creators on March 31 to establish the Q1 baseline. The default TTL would sweep it at end of April, but they need to reference it in the Q2 review on June 30. One call: POST /v1/full-scrape/snapshots/snap_xxx/extend {"additional_days": 120}. The snapshot now survives until end of July, available via GET /v1/full-scrape/snapshots/snap_xxx for the entire review window. After the review, call reset-expiry and let the sweep reclaim the storage naturally.

Use case — audit preservation: a snapshot captured during an anomaly investigation needs to be preserved for legal/compliance reasons. Extend to 365d and document the snapshot_id in the investigation ticket. The retention sweeper won't touch it until the investigation closes.

POST/v1/full-scrape/snapshots/diffCompare two snapshots — same diff engine as job diff, works across weeks API KEY

GET/v1/full-scrape/snapshots/timeseries?username=neymarjr&metric=followers&days=30Chart-ready time series of a single metric across snapshots API KEY

GET/v1/full-scrape/snapshots/trends?username=neymarjr&metric=followers&days=30Linear regression trend analysis — per-platform rate of change + confidence + classification API KEY

GET/v1/full-scrape/snapshots/anomalies?username=neymarjr&metric=followers&days=30&sigma=2.0Detect spike/drop anomalies — points beyond N sigma from the regression line API KEY

Catches what trend lines smooth over. A creator growing at 5k/week with one 150k follower spike on April 5 looks "steady" to the trend endpoint. The anomalies endpoint flags that exact snapshot as a :spike event with residual_sigma: 4.2, severity: :high.

Algorithm: compute OLS regression line, measure residual at each sample (actual - predicted), compute std dev of residuals, flag any point where |residual| ≥ sigma × std.

Query params: username (required), metric (default followers), days (3-90, default 30), platform (filter), sigma (1.0-10.0, default 2.0 — higher = stricter, fewer anomalies).

Response: list of anomalies sorted by severity with direction (:spike/:drop), severity (:low/:medium/:high based on σ magnitude), actual, predicted, residual, captured_at, snapshot_id.

Use case: power a "what happened" view on the dashboard. Creator went viral? Got hacked? Deleted a chunk of posts? The anomaly endpoint finds the exact date. Pairs with /snapshots/:id to inspect the full per-platform state at the anomaly moment.

The "is this creator trending up or down?" answer. Runs ordinary least squares regression over the last N days of snapshot history for a username. Returns per-platform rate of change (per day and per week), direction, R² confidence, and status classification.

Status classification:
• :trending — R² ≥ 0.3 AND rate ≥ 1% per week → real signal
• :flat — low rate or low confidence → no meaningful change
• :noisy — R² < 0.1 but non-trivial slope → data too erratic to trust
• :insufficient_data — fewer than 3 samples in window

Query params: username (required), metric (default followers, one of followers/following/posts/engagement_rate/video_count/subscribers), days (3-90, default 30), platform (filter to one), all_metrics=true (compute trends for all canonical metrics in one shot).

Aggregate shape: fastest_growing, fastest_declining, counts by status, total_slope_per_day. Ready for a "growth dashboard" card.

Use case: tell your users "Creator X gained 5% followers on Instagram this week with high confidence, but their Twitter presence is declining at -2%/week."

The durable counterpart to jobs. Jobs live 1h (async polling window); snapshots live 30 days (historical analysis window). Every scrape — ad-hoc, batch, or scheduled — auto-captures a snapshot with condensed metrics (followers, following, posts, engagement_rate, video_count, subscribers). The full result payload is NOT stored — only the numbers that matter for trend analysis.

Use cases:
• Daily/weekly monitoring via schedules: snapshot every run → diff last 7 days → "gained 5k followers"
• Chart rendering: /snapshots/timeseries returns a flat array of (captured_at, platform, value) sorted by date — feeds directly into a line/area chart
• Cross-week diffs: /snapshots/diff reuses the same diff engine as /full-scrape/diff but works on snapshots that outlive the 1h job TTL

Storage: ETS, 30-day sliding, periodic cleanup every 15 min. Volatile across deploy. ~22KB per user for a full 30-day history — fits comfortably.

Budget cap (hard enforce monthly ceiling)

GET/v1/full-scrape/budget-capRead your current budget cap config, or default state if unset API KEY

GET/v1/full-scrape/budget-cap/history?limit=100&since=2026-01-01T00:00:00ZCompliance audit trail — every PUT to /budget-cap is snapshotted; returns newest-first with change_source + change_note API KEY

GET/v1/full-scrape/budget-cap/history/diff?from=bcv_xxx&to=bcv_yyyField-by-field delta between two versions. Omit both for "what changed in the most recent edit?" API KEY

POST/v1/full-scrape/budget-cap/history/restoreRollback current config to a prior bcv_* version — logs a new version with change_source=rollback API KEY

GET/v1/full-scrape/budget-cap/burn-ratePer-center burn rate + days-until-empty + EOM projection. Worst-first triage, status flags: healthy/warning/critical/exceeded/frozen API KEY

GET/v1/full-scrape/budget-cap/recommendations?reserve_pct=10&min_move_usd=1Rebalance suggestions: donors (healthy + headroom) → recipients (critical/exceeded). Pipe directly into /transfer-allocation API KEY

POST/v1/full-scrape/budget-cap/recommendations/applyBundle-execute all rebalance suggestions atomically. Default dry_run=true — body {dry_run:false, max_moves:50, change_note} to commit API KEY

POST/v1/full-scrape/batch-size-advisorGiven price_per_scrape_usd + optional cost_center + count_requested, returns max batch under each cap + binding constraint. Pure read API KEY

POST/v1/full-scrape/batch-size-advisor/multiMulti-center version: body {centers: [{cost_center, count_requested}]}. Sequential fill deducts running cost from shared global/daily caps. Order = priority API KEY

GET/v1/full-scrape/budget-cap/center/:center/activity?since=2026-04-01T00:00:00Z&limit=100Unified timeline for one cost_center: billing_events + cost_alert_fires merged, sorted desc, tagged by type API KEY

GET/v1/full-scrape/budget-cap/center/:center/report-cardOne-call dashboard: config + burn + 7d trend + recent spend + 30d alert summary + cross-references API KEY

GET/v1/full-scrape/budget-cap/centers/report-cardsBulk compact report card for every sub-capped center. Sorted worst-first. Avoids N+1 on agency dashboards API KEY

POST/v1/full-scrape/budget-cap/what-ifReplays last N days of billing events against a proposed cap config. Reports blocked counts, first breach timestamps, daily burn curve API KEY

POST/v1/full-scrape/preflightUnified go/no-go gate. Body {usernames[], cost_center?, estimated_price_per_scrape_usd?}. Runs rate+credits+monthly+daily+sub-cap+frozen. Active reservations subtracted from headroom API KEY

POST/v1/full-scrape/budget-cap/reservationsCreate short-lived budget reservation. Body {amount_usd, cost_center?, ttl_seconds? (max 3600)}. Closes preflight race condition API KEY

GET/v1/full-scrape/budget-cap/reservationsList active reservations with TTL countdown + total reserved USD API KEY

GET/v1/full-scrape/budget-cap/reservations/statsPool stats: count/USD by center, TTL histogram, at-risk (<60s) count, oldest/newest, avg remaining TTL API KEY

POST/v1/full-scrape/budget-cap/reservations/:id/commitFinalize a reservation. Removes from active pool; actual spend tracked via billing_events API KEY

POST/v1/full-scrape/budget-cap/reservations/:id/releaseRelease reservation without committing — returns budget to available pool API KEY

POST/v1/full-scrape/budget-cap/reservations/:id/extendBump reservation TTL by {additional_seconds} (max 3600) — keeps same id and amount API KEY

POST/v1/full-scrape/budget-cap/reservations/release-allBulk release every active reservation for this api_key — emergency cleanup after aborted pipeline API KEY

GET/v1/full-scrape/budget-cap/presetsList built-in cap preset library: conservative / balanced / aggressive / dev / pause API KEY

POST/v1/full-scrape/budget-cap/presets/:name/applyApply a preset. Preserves per_center_caps + frozen list. Logs version with change_source=preset_applied:<name> API KEY

PUT/v1/full-scrape/budget-capSet or replace the monthly USD cap with optional hard enforcement API KEY

DELETE/v1/full-scrape/budget-capRemove the cap — no more enforcement, back to unlimited recurring spend API KEY

POST/v1/full-scrape/budget-cap/freezeFreeze a cost_center — pause all spending tagged with that attribution until unfrozen API KEY

POST/v1/full-scrape/budget-cap/unfreezeUnfreeze a cost_center — spending resumes immediately, sub-cap config preserved API KEY

POST/v1/full-scrape/budget-cap/freeze-bulkBulk freeze up to 50 cost_centers in one call — per-center error isolation API KEY

POST/v1/full-scrape/budget-cap/unfreeze-bulkBulk unfreeze — counterpart of freeze-bulk API KEY

POST/v1/full-scrape/budget-cap/transfer-allocationMove $ between two sub-caps without re-specifying the whole per_center_caps map API KEY

Targeted allocation rebalance (iter 103). The existing PUT /v1/full-scrape/budget-cap requires a full per_center_caps map — to move $20 from client-acme to client-beta, callers had to read the current config, modify two entries, and PUT the whole map back. Iter 103 adds a targeted helper for the common "rebalance between centers" case:

POST /v1/full-scrape/budget-cap/transfer-allocation {"from": "client-acme", "to": "client-beta", "amount_usd": 20}

Semantics:
• from must have an existing sub-cap entry in per_center_caps AND value ≥ amount (422 errors otherwise)
• to may already exist (value gets bumped by amount) or be new (creates with value = amount)
• from after transfer: if result is 0, the entry is dropped from the map entirely (signal that the center no longer has a dedicated sub-cap)

Response shape: {from, to, amount_usd, from_before, from_after, to_before, to_after, source_dropped, updated_per_center_caps, note}. The before/after pair makes it auditable — you can trace exactly what changed.

Historical billing untouched: transfer only reshapes FUTURE enforcement. Past billing events remain attributed to their original cost_center — use POST /v1/full-scrape/budget-cap/transfer-center (iter 76) if you want to reattribute historical events too.

Atomic: single DB UPDATE via the changeset — either both entries update or neither (no partial state).

Use case — mid-month rebalance: agency's monthly budget is $500 split 50/50 between client-acme and client-beta ($250 each). Mid-month, client-beta pitches a new campaign needing extra headroom, client-acme is comfortably under. Operator calls POST /v1/full-scrape/budget-cap/transfer-allocation {"from": "acme", "to": "beta", "amount_usd": 50}. Result: acme → $200, beta → $300. One call, no re-specifying the map, fully auditable.

POST/v1/full-scrape/budget-cap/rename-centerRename a cost_center across config + historical billing events — rewrites JSONB, preserves sub-cap + freeze state API KEY

POST/v1/full-scrape/budget-cap/transfer-centerMove billing events within a time window from one cost_center to another — granular, config-preserving API KEY

The guardrail for set-and-forget users. You set a monthly ceiling in USD. When a new recurring operation would push projected monthly cost above the cap, the API returns 402 Payment Required instead of creating it.

Body: monthly_usd_cap (required, 0-100000), daily_usd_cap (optional, 0-10000, iter 106 velocity limiter), hard_enforce (default true — false = warning only).

Daily cap (iter 106 — velocity limiter): optional companion to monthly_usd_cap. Caps spend for the current UTC day independently. Prevents "burn the entire monthly budget in 2 days" scenarios when a new campaign goes viral or a flaky upstream generates retry storms. Max $10,000/day. Nullable — agencies without velocity concerns can use only the monthly cap.

Enforcement order: global monthly cap → daily cap → frozen center → per-center sub-cap. Daily cap fails with 402 daily_spend_cap_exceeded containing daily_cap_usd, today_spent_usd, and a resolution block with three paths (wait for UTC midnight reset, raise cap, remove cap). Response headers: x-daily-cap-usd and x-daily-spent-usd.

Window resets at UTC midnight. The cap is computed via BillingEvents.today_cost_usd/1 — a scalar SQL sum of cost_usd for events with occurred_at >= today_utc_start. Same indexing as the MTD check, so it's equally cheap per-request.

Scope: applies to recurring scheduled operations only (schedules + bulk watchlist scheduling). Ad-hoc scrapes remain governed by tier quota + overage — they consume credits but don't check the cap. The cap is specifically a guardrail for "I set 40 schedules and forgot".

Hard vs soft enforce:
• hard_enforce=true — over-cap schedule creation is rejected with 402. Use this when you can't tolerate a surprise bill.
• hard_enforce=false — over-cap creation proceeds but the over_cap? flag in /recurring-cost surfaces the warning. Good for alerting without blocking.

Checked on: single POST /schedules (new schedule), bulk POST /watchlists/:id/schedules (atomic — all or nothing). Cap is re-projected on every write; mutations above cap are rejected upfront.

Visible in projection: GET /recurring-cost now includes a budget_cap section with cap_configured, cap_usd, hard_enforce, over_cap?, headroom_usd, overage_usd. UI can render "you've used $150 of your $200 cap" without extra roundtrips.

Example: User on Pro ($99 base, 1500 credits quota) sets monthly_usd_cap=150. They have 3 schedules projected to add $45/month overage, total projection $144. Next schedule creation projects $165 total → 402. User must raise cap, delete a schedule, or extend intervals.

Per-cost-center sub-caps (iter 59): the PUT body now accepts an optional per_center_caps map that scopes sub-ceilings to specific cost_center tags (iter 52 attribution). Example:
PUT /v1/full-scrape/budget-cap {"monthly_usd_cap": 500, "per_center_caps": {"client-acme": 50, "client-beta": 100}}
Enforcement: if a request carries X-Cost-Center: client-acme and that center's MTD cost has already hit $50, the budget enforcement plug returns 402 cost_center_cap_exceeded in addition to the global cap check. The response includes cost_center, cap_usd, spent_usd, overage_usd, and a resolution block pointing to three fixes (raise sub-cap, remove it, or drop the header). Response headers: X-Budget-Center, X-Budget-Center-Cap-Usd, X-Budget-Center-Spent-Usd.

Pre-flight integration: when a cost_center is set on a pre-flight request, the report includes a new cost_center_sub_cap section mirroring the existing budget_cap section. Both are evaluated — a projection that fits under the global cap but exceeds a sub-cap is blocked with the new reason would_exceed_cost_center_sub_cap. Agencies get a single call that says "this batch would fit globally BUT client-acme is already at $48 and your batch would push it to $52".

sub-cap math: per-center MTD cost is computed via BillingEvents.cost_for_center/2 — an indexed SQL aggregate over metadata->>'cost_center' = '<tag>' for the current UTC month. One scalar query; cheap enough to run on every request.

Hard enforce honors sub-caps: setting hard_enforce: false disables BOTH global and per-center enforcement (warning-only mode). Sub-caps never bypass the global hard_enforce toggle.

Cost center freeze/unfreeze (iter 61): temporarily pause all spending for a specific cost_center without deleting the sub-cap or losing attribution history. POST /v1/full-scrape/budget-cap/freeze {"cost_center": "client-acme"} adds the tag to the frozen_cost_centers array. From that moment, any request carrying X-Cost-Center: client-acme is rejected with 402 cost_center_frozen — the frozen check runs BEFORE the sub-cap check, so frozen state blocks independently of spend. POST /v1/full-scrape/budget-cap/unfreeze {"cost_center": "client-acme"} restores spending immediately. Both endpoints are idempotent.

Freeze vs delete: freezing preserves the sub-cap config (per_center_caps["client-acme"] stays put) — perfect for "pause this client for Q2 then resume in Q3". Deleting the sub-cap loses the ceiling config. The freeze list can hold up to 100 centers.

Use cases: client went delinquent → freeze until payment clears. Audit investigation → freeze to stop new activity. End-of-quarter budget cutoff → freeze all clients, unfreeze on the 1st. Contract dispute → freeze without losing the sub-cap you negotiated.

Bulk freeze/unfreeze (iter 101): for incident response and end-of-quarter cutoffs, use the bulk variants: POST /v1/full-scrape/budget-cap/freeze-bulk {"cost_centers": ["client-acme", "client-beta", "client-gamma"]}. Up to 50 centers per call. Each center is validated independently (regex [A-Za-z0-9_.-]+, max 64 chars); invalid tags are silently dropped before freeze. Response includes per-center status so the caller can see what was frozen vs what had errors (e.g. no budget cap configured).

Idempotency: bulk freeze is idempotent — re-calling on already-frozen centers is a no-op reported as status: "frozen". Same for unfreeze-bulk. Safe to retry from scripts without checking current state.

Incident response use case: detection system spots anomalous spend across multiple clients. Ops team calls POST /v1/full-scrape/budget-cap/freeze-bulk with the full client list to freeze everything at once. Investigation happens. When cleared, they call /unfreeze-bulk with the same list. Two API calls total vs N individual freeze calls + manual tracking of which got frozen.

Pre-flight surfaces it: when you pre-flight a batch tagged with a frozen center, the cost_center_sub_cap section returns {frozen: true, blocks_request?: true} and blocking_reasons includes "cost_center_frozen" — distinct from "would_exceed_cost_center_sub_cap" so client code can surface the right message.

Response headers on 402: X-Budget-Center: <tag> and X-Budget-Center-Frozen: true.

Cost center rename (iter 70): POST /v1/full-scrape/budget-cap/rename-center {"from": "client-acme-old", "to": "client-acme"} consolidates a renamed client across BOTH the budget cap config AND all historical billing events. Uses PostgreSQL jsonb_set to rewrite metadata->'cost_center' in-place for every matching event, preserving labels, username, tier and any other metadata keys untouched. The per_center_caps sub-cap entry gets its key renamed, and frozen_cost_centers string array entries get substituted + deduped.

Response: {from, to, rewritten_events: 87, config_updated: true, cap: {...updated cap map...}}. Idempotent — calling with from == to is a no-op returning 0.

Merge behavior: if the destination name already has a sub-cap configured, the OLD value is silently dropped (the pre-existing destination wins). This prevents a rename from overwriting a carefully tuned sub-cap with an old/stale one. The frozen list is de-duplicated naturally via Enum.uniq.

Historical reporting reconciliation: after a rename, every report endpoint (/v1/cost/invoice, /v1/cost/centers, /v1/cost/compare-periods, /v1/cost/events/export) automatically reads the rewritten events and shows the new name retroactively. Month-over-month comparisons stay consistent — the old name disappears from all time periods in one shot.

Use case — client rebrand: agency had cost_center=acme-corp. Client went through a merger and is now globex-industries. One call renames all past spend so the monthly invoice PDF sent to the client shows the CURRENT name for BOTH historical and current events. No manual reconciliation, no dangling labels, no duplicate tracking during transition.

Time-scoped transfer (iter 76):

POST /v1/full-scrape/budget-cap/transfer-center {"from": "client-acme", "to": "client-acme-prelaunch", "since": "2026-04-01T00:00:00Z", "until": "2026-04-07T23:59:59Z"}

moves ONLY events within the window — distinct from rename (iter 70) which rewrites ALL history. Leaves per_center_caps + frozen_cost_centers config untouched (those are current state, not retro attribution).

Rename vs transfer:
• rename-center — wholesale, rewrites ALL matching history + config. Use when a client is permanently renamed (merger, rebrand).
• transfer-center — time-scoped, rewrites ONLY events in the window, config preserved. Use when campaigns get re-categorized retroactively (e.g. "the first week of April was actually the prelaunch, not the main campaign").

Transfer response shape:

{from, to, window: {since, until}, rewritten_events: N, idempotent_noop: false,
      note}

. Returns 0 rewritten when from == to (idempotent). 400 on invalid datetimes or reversed window (since > until).

Use case — campaign re-categorization: agency ran 50 scrapes tagged client-acme throughout April. On April 15, finance team decides the first week was actually the pre-launch budget, not the main campaign. One call: transfer-center {from: "client-acme", to: "client-acme-prelaunch", since: "2026-04-01", until: "2026-04-07T23:59:59Z"}. The monthly invoice now splits cleanly, and week 2+ events remain under the main tag untouched.

Dry-run simulator (iter 81): POST /v1/full-scrape/budget-cap/simulate with a proposed {monthly_usd_cap, per_center_caps} body. Returns a projection of what WOULD happen if those caps were applied NOW — zero mutation, pure preview. Each scope (global + each proposed per-center entry) gets a forecast block:

{proposed_cap_usd, current_spent_usd, would_block_now?, projected_eom_spent_usd,
      headroom_usd, days_until_exhausted, projected_exhaustion_date, recommended_daily_spend_usd,
      projected_over_cap?, currently_over_cap?}

.

Why this matters: before applying a tighter cap, agencies need to know which clients would be immediately 402'd. Blindly lowering the cap from $500 to $300 mid-month could block every in-flight workflow. The simulator surfaces this upfront with a concrete breakdown: "client-acme is already at $48 spent vs proposed $40 — applying this config would break client-acme's schedule immediately".

Recommendations block: human-readable strings ready for Slack/dashboard, context-aware: • "ALERT: your current MTD spend already exceeds the proposed global cap..."
• "At current burn, your projected EOM ($X) exceeds the proposed cap. Reduce daily to $Y to fit."
• "Cost center 'client-acme' is ALREADY over the proposed sub-cap..."
• "Cost center 'client-beta' projected to exceed its sub-cap on 2026-04-22 at current burn."
• "Global cap fits your current burn rate with headroom." (default when all is OK)

Use case — quarterly budget review: CFO wants to cut scraping spend 40% next quarter. Operator runs the simulator with the proposed new caps, gets a list of exactly which clients/centers would be immediately blocked and which would exhaust mid-month. The team then negotiates the cuts with the client or defers some schedules before committing the new config via PUT /v1/full-scrape/budget-cap.

Auto-allocate (iter 83): POST /v1/full-scrape/budget-cap/auto-allocate computes a proposed per_center_caps map by distributing a new global cap proportionally to each center's historical spend in a basis window. Body:

{"monthly_usd_cap": 500, "basis": "last_month" | "last_30_days",
      "reserve_pct": 10, "min_per_center_usd": 5}

.

Allocation math:
1. Query summary_by_cost_center for the basis window
2. Filter to eligible centers (positive spend, excludes "unattributed")
3. allocatable = monthly_usd_cap × (1 - reserve_pct / 100) — the reserve is held back from allocation as a buffer (default 10% of the global cap)
4. For each center: proposed_cap = allocatable × (center_spend / total_spend)
5. Enforce min_per_center_usd floor by bumping small allocations up
6. If floor bumps push total over allocatable, rescale proportionally to fit

Response shape:

{proposal: true, basis: {mode, since, until, total_historical_cost_usd,
      eligible_center_count}, inputs: {monthly_usd_cap, reserve_pct, reserve_amount_usd,
      allocatable_usd, min_per_center_usd}, allocations: [{cost_center, historical_cost_usd,
      historical_event_count, historical_share_pct, proposed_cap_usd, bumped_to_minimum}, ...],
      per_center_caps: {cost_center: cap_usd, ...}, overshoot_note, next_steps}

.

Pure preview: no mutation. The per_center_caps field is structured to drop straight into /simulate (preview impact) or PUT /budget-cap (commit).

Full planning workflow (iters 81 + 83):
1. POST /budget-cap/auto-allocate — get proposed allocation based on history
2. POST /budget-cap/simulate — preview impact of the proposal
3. Optionally iterate (adjust reserve_pct, basis, min_per_center, or edit the per_center_caps map)
4. PUT /budget-cap — commit
5. GET /cost/burn-down — monitor the new caps going forward

Reserve_pct rationale: 10% default means only 90% of the global cap gets distributed to per-center sub-caps. The other 10% sits as a global buffer that catches untagged overages, one-off emergency scrapes, or anything not bound to a sub-cap. Set to 0 for strict allocation (every dollar pinned to a center); set to 30 for loose allocation with headroom.

Use case — month-end rebalance: agency finishes April, reviews actuals. Some clients grew, some shrank. Instead of manually recomputing sub-caps, they call POST /budget-cap/auto-allocate {monthly_usd_cap: 500, basis: "last_month"}. The result reflects April's actual weights. They simulate it, tweak one or two, and commit in under a minute. Next month's budget is automatically rebalanced to match actual usage.

Recurring cost projection (budget sanity check)

GET/v1/full-scrape/recurring-costTotal projected monthly cost from all your schedules and digest schedules, with per-item breakdown API KEY

POST/v1/full-scrape/recurring-cost/simulateWhat-if projection — add/remove/change schedules hypothetically and see the delta API KEY

GET/v1/full-scrape/recurring-cost/compare-tiersProject current recurring state against every tier — cheapest-fit recommendation API KEY

GET/v1/full-scrape/recurring-cost/historical-tiers?days=30Retrospective tier analysis — replays past billing events across all tiers API KEY

"Was I on the right tier for the past 30 days?" Reads your actual BillingEvents from the window, counts total credits consumed, and replays that volume against every tier's pro-rated quota + overage model to tell you which tier would have been cheapest in hindsight.

Different from /compare-tiers: that's prospective (current schedules → future projection). This is retrospective (past events → what-would-have-been cost). Both answer "which tier?" but from opposite ends of the timeline.

Window: days query param (1-90, default 30). Reads from the BillingEvents log, which retains 30 days of history sliding.

Per-tier fields:
• base_pro_rated_usd — tier base price scaled to window length
• overage_usd — what overage would have cost on this tier
• total_would_have_paid_usd = base + overage
• savings_vs_actual_usd — positive = would have saved, negative = would have cost more
• would_have_fit? — whether the tier accommodates overage
• window_quota — tier quota scaled to window days
• overage_credits — credits above the pro-rated quota
• reason — human-readable explanation

Top-level insights: total_actual_cost_usd (what you paid), best_retrospective_tier, insight (human-readable summary like "You paid $103.74 on Pro. Pro was optimal — any other tier would have cost more").

Use case: monthly billing review. "Should I have been on Business last month?" Returns concrete savings calculation. Combines with /compare-tiers for "where I was vs where I should go".

"Should I upgrade?" Answered precisely. This endpoint projects your actual current recurring load (real schedules, real digest schedules, real credit usage) against every BuyCrowds tier and returns a sorted table with savings vs current and a concrete recommendation.

Different from /cost/compare: that endpoint takes an abstract "requests_per_day" parameter. This one uses your actual state — no guessing, no projections based on hypothetical load.

Per-tier response fields:
• tier, base_monthly_usd, projected_extra_usd, projected_total_usd
• savings_vs_current_usd — negative = would cost more, positive = would save
• fits? — boolean "does this tier accommodate my current load?"
• quota_fits? — credit quota check
• rate_limit_ok? — daily rate limit check
• reason — human-readable fit explanation
• full_scrapes_quota — tier's monthly credit allowance

Top-level: current_tier, current_total_usd, cheapest_fit (tier id), recommendation (human-readable string).

Enterprise handling: tier has nil pricing → fits?: :unknown, projected_total_usd: nil, reason: "custom-priced, contact sales". Doesn't corrupt the recommendation for the self-serve tiers.

Use case: Pro user sees recurring-cost showing $174/month ($99 base + $75 overage). Calls compare-tiers → sees Business would be $299 (no overage but more headroom). Recommendation: "Stay on Pro — Business would cost $125 more without meaningful quota gain at current load". Decision made in one call.

Pure compute over existing stores. Same pattern as the other /recurring-cost/* endpoints — no new state, no side effects.

Non-destructive preview. Propose mutations in the body, see the hypothetical projection, decide whether to commit. Nothing is changed. Pure compute.

Body (all fields optional):
• add_schedules — array of hypothetical schedules: [{"username": "alice", "interval_seconds": 86400, "template": "influencer_kit"}, ...]
• remove_schedule_ids — array of existing schedule ids to simulate removing: ["fss_abc", "fss_xyz"]
• change_intervals — map of schedule id → new interval: {"fss_abc": 3600}

Response shape:
• hypothetical — same structure as GET /recurring-cost but computed with the mutations applied
• current_baseline — actual current state for comparison
• delta — runs/credits/cost changes (positive = increase, negative = decrease)
• mutations_applied — counts of added/removed/changed
• would_exceed_cap? — boolean, true if hypothetical projection crosses your budget cap

Use case: "I want to add 20 creators to my TopAthletes watchlist as daily schedules. Will I stay under my $200 budget cap?" → call simulate with 20 hypothetical add_schedules, check would_exceed_cap?, commit or adjust before hitting the real endpoints.

Bulk scenario planning: combine removals + additions to see the net impact of a portfolio rotation: "remove 5 old creators, add 10 new ones at 6h interval instead of 24h".

"How much will my set-and-forget actually cost me this month?" Single pass over all your schedules + digest schedules, extrapolates to monthly cost based on interval, sums up the damage.

Response sections:
• current — credits used this cycle, quota, headroom
• schedules — per-schedule and aggregate runs/month, credits/month, internal cost, overage cost
• digest_schedules — delivery counts (no credits, just webhook bandwidth)
• budget — base_monthly_usd + projected_extra_usd = projected_total_usd, plus headroom and will_exhaust_quota? flag with days_until_exhaust countdown

Month model: 30.44 days (365.25 / 12) — realistic average that handles 30- and 31-day months without user confusion.

Mode hints per schedule: "fits in quota", "may trigger overage when combined with ad-hoc usage", or "included (unlimited tier)". Lets the UI flag risky schedules without doing tier math client-side.

Use case: Pro user has 8 daily schedules + 2 weekly digests. Opens this endpoint before creating a 9th schedule. Sees "3500 runs/month projected, quota 1500, extra cost $400". Decides to upgrade to Business instead of blowing $400 in overage.

Pure compute, zero new state. Composes Schedules + DigestSchedules + Billing + Credits. Same pattern as /cost/forecast but focused on recurring-only spend.

Scrape templates (reusable presets)

POST/v1/full-scrape/templatesCreate a named preset — platforms, timeout, byok defaults API KEY

GET/v1/full-scrape/templates?include_system=trueList your templates (optionally including built-in system presets) API KEY

GET/v1/full-scrape/templates/systemList the 4 always-available built-in system presets API KEY

GET/v1/full-scrape/templates/:idSingle template (works for both user ids and _system:name ids) API KEY

PATCH/v1/full-scrape/templates/:idUpdate preset fields (rename, change platforms, etc.) API KEY

DELETE/v1/full-scrape/templates/:idRemove a template — does not affect existing schedules that referenced it API KEY

Define once, reuse everywhere. Create a template like "influencer_kit" (platforms: instagram+tiktok+youtube) and reference it in any scrape call via ?template=influencer_kit or {"template": "influencer_kit"} in the body. Works on single scrapes, batches, schedules, and watchlist scrapes.

Resolution order: caller's explicit params ALWAYS win over template defaults. Template is a fallback. Pass platforms explicitly and the template's platforms are ignored.

Lookup: reference by id (tpl_abc) or by name (influencer_kit). Name lookups are scoped per api_key — two different customers can both have a template called "default".

Body params: name (required, alphanumeric + underscore/dash, 1-64 chars, unique per key), description (optional, max 300 chars), platforms (list of platform atoms or "all"), timeout_ms (5000-120000, optional), byok_defaults (map of platform → credentials, optional).

Limits: max 20 templates per api_key. BYOK tokens stored in template are echoed back by key name only (not value) — credentials never leave via GET.

Built-in system templates: 4 presets shipped with the platform, accessible by every api_key without any setup:
• _system:default — all platforms, 45s timeout
• _system:influencer_kit — Instagram + TikTok + YouTube
• _system:tech_creator — GitHub + Reddit + Twitter
• _system:music_creator — Spotify + YouTube + TikTok + Twitter
Reference any of these by name: {"template": "_system:influencer_kit"}. Use GET /templates/system to list them, or GET /templates?include_system=true to see user + system in one list. Zero setup needed for a first-use user.

Works across: single POST /full-scrape/:username (sync/async), batch POST /full-scrape/batch, scheduled scrapes POST /full-scrape/schedules, and bulk watchlist scheduling POST /watchlists/:id/schedules. Scheduled scrapes persist the template reference — the Scheduler tick re-resolves on every run, so updates to the template propagate to all future fires automatically. Mutable templates, immutable schedules.

Example: POST /v1/full-scrape/neymarjr?async=true&template=influencer_kit runs an async scrape with the template's platforms + timeout, no need to re-list them every time.

Delete semantics: deleting a template does NOT cascade to referencing schedules. Those schedules keep running with caller-only defaults (no platforms filter = scrape all). Graceful degradation — you don't break 50 schedules by removing a misnamed template.

Watchlists (portfolio abstraction)

POST/v1/full-scrape/watchlistsCreate a named collection of usernames with optional tags and description API KEY

POST/v1/full-scrape/watchlists/from-top-creators?limit=20&since=2026-04-01T00:00:00ZAuto-populate a watchlist with the top-N creators by cost from a billing window API KEY

Auto-populate from spend history (iter 107). Composition of iter 80 (cost_per_creator aggregator) + Watchlists.create. Instead of hand-picking usernames, let the endpoint infer them from actual spend — "the creators I already pay the most for". Useful for:
• Creating a "biggest cost drivers" list to monitor with tighter observation (anomaly alerts, schedules, manual review)
• Bootstrapping a client baseline — at the end of a pilot month, auto-populate a watchlist from everyone actually scraped and lock it in as the official list
• Discovering unexpected top spenders — "wait, why is @xyz in my top 20?" surfaces creators that leaked into the workflow without explicit tracking

Parameters: name (default "Top creators"), limit (1-100, default 20), since/until (ISO8601, default month-to-date), cost_center (optional filter — scope to one client's top creators).

Response shape:

{watchlist: {id, name, usernames, created_at, ...}, seeded_from: {creator_count,
      since, until, cost_center_filter, creators: [...]}, note}

. The seeded_from block includes the full aggregator result (cost per creator, event count, etc.) so the caller can see WHY each creator made the list.

Next steps — the response note lists three:
• POST /v1/full-scrape/watchlists/:id/scrape — scrape everyone in the new list as a single batch
• POST /v1/full-scrape/watchlists/:id/schedules — create recurring schedules for each member
• DELETE /v1/full-scrape/watchlists/:id — remove if the auto-populated list isn't what you wanted

Use case — end-of-month baseline: agency onboards client-acme in March. By end of month, they've scraped 150 different creators. Some were intentional tracking, some were ad-hoc explorations. To lock in the official baseline for April, they call POST /v1/full-scrape/watchlists/from-top-creators?cost_center=client-acme&limit=50&since=2026-03-01T00:00:00Z. The top 50 by spend (the ones actually being monitored regularly) become the official watchlist. Outliers and one-off scrapes don't make the cut. Next month's recurring schedules fire off this watchlist.

GET/v1/full-scrape/watchlistsList your watchlists with member counts and limits API KEY

GET/v1/full-scrape/watchlists/:idSingle watchlist with full member list and metadata API KEY

PATCH/v1/full-scrape/watchlists/:idRename, re-tag, or incrementally add/remove members via add_usernames/remove_usernames API KEY

DELETE/v1/full-scrape/watchlists/:idDelete a watchlist (members remain scrapeable individually) API KEY

POST/v1/full-scrape/watchlists/:id/scrapeBatch-scrape every member — supports async + webhook, reuses batch infra API KEY

GET/v1/full-scrape/watchlists/:id/trends?metric=followers&days=30Portfolio trend rollup — top growers, top decliners, total velocity API KEY

GET/v1/full-scrape/watchlists/:id/digest?days=7&metric=followersFull portfolio briefing — snapshots + trends + anomalies + billing + schedule health in one call API KEY

The daily briefing endpoint. Answers "what happened with my creators in the last N days?" in a single composed response. Everything an agency account manager needs in one tab.

Response sections:
• snapshots — count per member, active vs silent
• trends — top 10 movers, top 10 laggards, status counts
• anomalies — total + by_severity/by_direction + 10 most extreme
• billing — total cost + per-member breakdown (filtered to watchlist members)
• schedules — active/inactive counts + members without coverage + last errors

Query params: days (1-30, default 7), metric (default followers). Scoped per api_key.

Performance: ~4×N per-member reads where N = watchlist size. Typical latency < 50ms at max 50 members. Cacheable via Idempotency-Key header for expensive windows.

Use case: Monday morning — manager opens the dashboard, hits the digest endpoint for their 3 watchlists, sees: "TopAthletes gained 1.2M total followers this week with 2 high-severity spikes (neymarjr went viral), NewClients had 3 schedule errors, ContentBrands spent $18.60 in included credits."

POST/v1/full-scrape/watchlists/:id/schedulesBulk-create schedules for every member in one call (Pro+ only) API KEY

The agency-level abstraction. Individual endpoints work on one username at a time. Watchlists let you treat 10-50 creators as a portfolio — scrape them all with one call, see aggregate growth trends, create schedules in bulk, track performance rollups.

Composition, not duplication. Under the hood, watchlist endpoints delegate to the existing primitives: /scrape calls the batch endpoint with the member list, /schedules loops Schedules.create, /trends calls Trends.analyze per member and rolls up the results. No new scraping logic, no new billing logic.

Limits: max 10 watchlists per API key, max 50 members per watchlist, max 100 chars in name, max 500 chars in description, max 20 tags.

Trend rollup shape: total_slope_per_day, total_slope_per_week, top_growers (5 best growers by slope), top_decliners (5 worst), counts per status (trending/flat/noisy/insufficient). Ready for an "agency dashboard" card.

Incremental membership updates: use add_usernames or remove_usernames in PATCH to avoid sending the full member list on every edit.

Use case: "I manage 25 creators, I want to scrape them all daily, see which ones are trending up this week, and get alerted when any of them has a viral event." → create watchlist, bulk-schedule all members daily, register anomaly alert with username="*". Three API calls and you're done.

Webhook delivery health

GET/v1/full-scrape/webhook-healthList health records for all your webhook subjects (schedules, digest_schedules, anomaly_alerts, cost_alerts) API KEY

POST/v1/full-scrape/webhook-health/resetReset consecutive_failures after fixing a broken webhook URL API KEY

POST/v1/full-scrape/webhook-health/testFire a synthetic test payload sync — validates a URL without waiting for the next tick API KEY

GET/v1/full-scrape/webhook-deliveries?subject_type=cost_alert&limit=50Attempt-level inspection of the Oban retry queue — see individual delivery states, not just health aggregates API KEY

Attempt-level visibility (iter 86). The /webhook-health endpoint gives you the aggregate "is this subject healthy or dead" signal. When you need to dig into WHY a subject is degrading — which retries are still in flight, which attempt just failed, what error came back from the upstream — /webhook-deliveries queries the Oban webhook_retry queue directly and returns per-attempt state.

Scoping: queries are filtered via args->>'api_key_id' = <caller> in the oban_jobs table (same JSONB pattern as iter 78 deferred batch listing). No cross-tenant leakage possible — Oban job IDs from other api_keys are invisible.

States returned: scheduled (waiting for next backoff), available (ready to run, waiting for worker), retryable (failed, will retry), executing (running now), discarded (exhausted all 5 attempts), cancelled (manually cancelled). Filter by ?subject_type=cost_alert / anomaly_alert / schedule / digest_schedule / deferred_batch to narrow down.

Response shape: per-attempt

{oban_job_id, state, subject_type, subject_id, webhook_url, attempt, max_attempts,
      scheduled_at, inserted_at, attempted_at, last_error}

. The last_error is extracted from Oban's errors column — typically an HTTP status code, timeout, or DNS failure. Plus a top-level by_state count breakdown for quick dashboard rendering.

Debugging workflow:
1. /webhook-health → spot a subject with elevated consecutive_failures
2. /webhook-deliveries?subject_type=cost_alert → see the actual in-flight retries + last errors
3. Fix the root cause (DNS, TLS cert, server response code)
4. /webhook-health/test → verify with a synthetic payload
5. /webhook-health/reset → re-enable the subject (clears consecutive_failures)

Use case — midnight Slack alert: your CostAlerts stop firing overnight. Morning coffee, you call GET /v1/full-scrape/webhook-deliveries?subject_type=cost_alert and see 3 deliveries in discarded state with last_error: "502 Bad Gateway" pointing at your Slack webhook. Slack had an outage — you reset the subject, fire a test, and the retry queue drains normally on the next alert trigger.

Centralized health tracking across 4 webhook systems. Every webhook delivery (scrape, digest, anomaly alert, cost alert) is recorded against a (subject_type, subject_id) tuple in a shared store. Track failure counts, last success/failure timestamps, and auto-disable behavior for dead endpoints.

Status lifecycle:
• :healthy — last delivery succeeded, or no failures recorded
• :degraded — 1-4 consecutive failures, still retrying on next tick
• :dead — 5+ consecutive failures, source config auto-disabled to stop wasting resources

Auto-disable behavior: when a subject crosses 5 consecutive failures, the corresponding config (schedule, digest_schedule, anomaly_alert, cost_alert) gets enabled: false automatically. Ticks still check, but skip dead subjects without firing. User must manually reset.

Reset flow: fix your webhook URL → POST /webhook-health/reset with {"subject_type": "digest_schedule", "subject_id": "dsch_xxx"} → also PATCH enabled=true on the source config to resume deliveries.

Subject types: schedule, digest_schedule, anomaly_alert, cost_alert (for cost alerts, subject_id = api_key_id).

Response fields: consecutive_failures, total_failures, total_successes, last_success_at, last_failure_at, last_failure_reason, auto_disabled_at, recent_attempts (ring buffer of last 20 attempts with at, outcome, reason). Ordered by severity then recency.

Test endpoint: POST /webhook-health/test fires a synthetic event: "webhook_test" payload synchronously and returns the delivery outcome with latency. Two modes:
• By subject: body {"subject_type": "digest_schedule", "subject_id": "dsch_xxx"} — looks up the existing config, fires through its webhook_url, records the attempt in health tracking
• By raw URL: body {"webhook_url": "https://my-new-endpoint.com/hook"} — validates + fires a test payload to an arbitrary URL (pre-commit validation, not recorded in health)

Recovery flow: PATCH webhook_url → test it → reset health → PATCH enabled=true. Four calls total, no more waiting days for the next tick to validate your fix.

Scheduled digest delivery (push-based briefings)

POST/v1/full-scrape/digest-schedulesRegister a recurring digest delivery — every N seconds, HMAC-signed webhook with the full watchlist digest API KEY

GET/v1/full-scrape/digest-schedulesList your digest schedules with limits and run counts API KEY

GET/v1/full-scrape/digest-schedules/:idSingle digest schedule with payload example and signature header docs API KEY

PATCH/v1/full-scrape/digest-schedules/:idPause/resume or change interval, days_window, webhook_url, metric API KEY

DELETE/v1/full-scrape/digest-schedules/:idStop future digest deliveries API KEY

Push-based version of /watchlists/:id/digest. Register once, get the full portfolio briefing delivered to your webhook on whatever schedule you want. Most common use case: interval_seconds=604800 (weekly) pointed at a Slack channel.

Zero credit consumption. Digest is pure compute over existing stores (snapshots, trends, anomalies, billing, schedules). Delivery costs are just webhook bandwidth.

Body: watchlist_id (required, owned by you), webhook_url (required, https only), interval_seconds (required, min 3600 = 1 hour), days_window (1-30, default 7 — how much history the digest covers), metric (default followers — which metric trends use).

Limits: Pro tier or higher required, max 5 digest schedules per API key, min interval 1 hour.

Architecture: DigestScheduler GenServer ticks every minute, finds due schedules via DigestSchedules.due_now/0, dispatches each in its own supervised Task that composes the digest and delivers the webhook. A slow webhook cannot block other digests — isolation per config.

Payload wrapping: event: "watchlist_digest", digest_schedule_id, watchlist_id, digest (the full briefing payload identical to GET /watchlists/:id/digest), delivered_at.

Use case: "Every Monday 9am send my agency watchlist digest to this Slack webhook." Set it once, forget it, receive briefings forever. Set-and-forget monitoring at the portfolio level.

Proactive anomaly webhooks (event-driven)

POST/v1/full-scrape/anomaly-alertsRegister a webhook that fires whenever a captured snapshot contains a new anomaly API KEY

GET/v1/full-scrape/anomaly-alertsList your registered anomaly alerts with fire counts API KEY

GET/v1/full-scrape/anomaly-alerts/:idSingle alert config with payload example and signature header docs API KEY

DELETE/v1/full-scrape/anomaly-alerts/:idRemove an anomaly alert config API KEY

Pull → push. The /snapshots/anomalies endpoint requires polling. This one registers a webhook and pushes the same data to you event-driven as snapshots land.

Architecture: subscribes to the internal snapshots:captured PubSub topic. Every snapshot insertion triggers anomaly detection over the last 30 days; fired webhook per NEW anomaly (dedup via fired_snapshot_ids set, capped at 500 entries).

Body: username (required, or "*" for any), metric (default followers), sigma_threshold (1.0-10.0, default 2.5), severity_filter (low/medium/high, default low), platforms (comma list or "all"), cooldown_seconds (0-86400, default 300 — prevents webhook spam during bursts), trigger_type (point_anomaly/changepoint, default point_anomaly — see below), webhook_url (required, https only, signed with HMAC-SHA256).

Trigger types:
• point_anomaly (default): residual-based outlier detection. Fires when today's value deviates ≥ sigma_threshold σ from the OLS-predicted value. Good for catching viral spikes, creator hacks, bulk deletions — point-in-time shocks.
• changepoint: RSS-minimizing split detection (recursive binary segmentation). Fires when a creator's growth trajectory shifts — acceleration, deceleration, or reversal — not just one outlier day. Better for "this creator is losing steam" or "this creator just started going viral" style signals. Dedupes via last_changepoint_at: only fires for changepoints strictly newer than the last one seen. Payload event becomes growth_changepoint_detected and includes pre/post slopes (per day + per week), slope_change, rss_improvement, and direction (acceleration/deceleration/reversal).

Cooldown behavior: after firing, the config is "cooling down" for N seconds. Any new anomalies detected during that window are silently skipped. Prevents your Slack channel from being flooded when a creator has 10 simultaneous platform anomalies during a viral event. Set cooldown_seconds: 0 to fire every detection (legacy behavior).

Payload (point_anomaly): event: "snapshot_anomaly_detected", alert_id, username, full anomaly (platform, direction, severity, residual_sigma, actual, predicted), triggering_snapshot, occurred_at.
Payload (changepoint): event: "growth_changepoint_detected", alert_id, username, full changepoint (platform, direction, changepoint_at, pre_slope_per_day, post_slope_per_day, pre_slope_per_week, post_slope_per_week, slope_change, slope_change_pct, rss_improvement, pre_samples, post_samples), triggering_snapshot, occurred_at.

Use case: "Tell my Slack when @neymarjr has a viral event" → schedule daily scrape + anomaly alert with severity=medium + webhook to Slack → automated creator intel, zero polling.

Scrape diff (compare two jobs over time)

POST/v1/full-scrape/diffCompare two completed scrape jobs — per-platform deltas, aggregate follower change, direction counts API KEY

The killer feature for scheduled scrapes. Pass two job_ids of completed scrapes and get back a structured diff showing exactly what changed between them.

Body: {"job_id_a": "fs_xxx", "job_id_b": "fs_yyy"} — convention is A = older, B = newer, but not enforced. Both jobs must belong to you and be :complete.

Per-platform changes detected: :changed (metrics diffed), :added (only in newer), :removed (only in older), :recovered (failed → ok), :failed_in_b (ok → failed), :still_failing.

Metrics extracted: followers, following, posts, engagement_rate, video_count, subscribers. Each delta has {from, to, abs, pct, direction}. K/M suffixed strings ("12.5K") are parsed automatically.

Aggregate shape: total_follower_change, total_follower_change_pct, counts of platforms improved/declined/flat/added/removed, plus status transitions. Ready to render a "since last week" summary card.

Recurring schedules (Pro+ only)

POST/v1/full-scrape/schedulesCreate a recurring scrape — runs every N seconds, fires webhook on completion API KEY

GET/v1/full-scrape/schedulesList your schedules with tier eligibility info API KEY

PATCH/v1/full-scrape/schedules/:idPause/resume, change interval or webhook url API KEY

DELETE/v1/full-scrape/schedules/:idDelete a schedule — stops future runs API KEY

POST/v1/full-scrape/schedules/pause-allBulk-disable every schedule for the api_key — emergency stop for incident response API KEY

POST/v1/full-scrape/schedules/resume-allBulk re-enable counterpart of pause-all — recurrence timing resumes from each schedule's next_run_at API KEY

Bulk pause/resume (iter 102). Single UPDATE query toggles enabled on every schedule for the calling api_key. Idempotent — safe to retry. Returns affected_count so the caller knows how many rows changed.

Pairs with /budget-cap/freeze-bulk (iter 101) for a complete emergency stop during incidents:
1. POST /v1/full-scrape/budget-cap/freeze-bulk {"cost_centers": [...]} — all enforcement flips to 402 for the frozen clients
2. POST /v1/full-scrape/schedules/pause-all — all recurring jobs stop firing new runs
3. Investigation happens
4. POST /v1/full-scrape/schedules/resume-all — schedules pick up from where they left off (recurrence timing isn't reset; each schedule's individual next_run_at governs the first run after resume)
5. POST /v1/full-scrape/budget-cap/unfreeze-bulk {"cost_centers": [...]} — spending resumes

Total: 4 API calls to pause and resume across N clients, regardless of how many schedules or cost centers are involved.

In-flight runs caveat: schedules that were ALREADY enqueued into the Oban cron queue before the pause may still fire. Pause prevents FUTURE enqueues but doesn't cancel runs mid-execution. To cancel an in-flight batch, use DELETE /v1/full-scrape/jobs/:id (single) or DELETE /v1/full-scrape/orchestrations/:id (bulk, iter 99).

Individual control still available: PATCH /v1/full-scrape/schedules/:id continues working for fine-grained per-schedule toggles. Pause-all is an operator-level convenience, not a replacement.

Body params for POST /schedules:
username (required) — the creator to scrape
interval_seconds (required, min 300) — how often to re-run
webhook_url (optional) — https URL that receives each result (HMAC-signed)
platforms (optional) — restrict scope, e.g. "instagram,tiktok"

Constraints: Pro tier or higher required, max 10 schedules per API key, min interval 5 min. Each run consumes one credit. If credit quota is exhausted, the schedule records an error but stays enabled and retries next cycle.

Volatility: Schedules are ETS-backed and lost on BEAM restart. DB persistence is future work. The scheduler ticks every 30 seconds, so drift from the exact interval can be up to 30s.

Use case: "Track @neymarjr every 6 hours and webhook my Slack" → daily monitoring without polling.

Webhook callbacks (eliminate polling)

Pass webhook_url (https only, no localhost/private IPs) when submitting an async scrape and BuyCrowds will POST the result to that URL when the job finishes — success or failure.

Payload signing: Every delivery carries an X-BuyCrowds-Signature: sha256=<hex> header where the hex is HMAC-SHA256(api_key.key, raw_body). Verify it before trusting the payload.

Other headers: X-BuyCrowds-Event (full_scrape.completed / .failed), X-BuyCrowds-Job-Id, X-BuyCrowds-Delivery-Attempt (1 or 2).

Retries: 1 retry after 500ms on network error or non-2xx response. Your handler must be idempotent — we don't do deduplication.

Timeout: 10s per attempt. Slow handlers get marked as failed in the jobs store but the result is still readable via polling.

Activity timeline (unified chronological feed)

GET/v1/full-scrape/activity?since=2026-04-09T00:00:00Z&cost_center=client-acme&limit=100Chronologically merged feed — billing events + job archives in one sorted stream API KEY

"What happened?" Merges two data sources into one time-sorted feed:
• Billing events — per-scrape charges, overages, refunds, scheduled runs (iter 50+)
• Job archives — terminal-state batches from both immediate (iter 71) and deferred (iter 79) execution paths

Why merge them: /cost/events shows individual event charges. /jobs/history shows batch-level job state. Neither alone tells the full story of "what happened in the last hour" — you see charges without knowing which batch they belonged to, or batches without the intermediate charge timeline. Activity feed interleaves both for operator dashboards and compliance review.

Each entry: {source, occurred_at, summary, ...}. The source tag is either "billing_event" or "job_archive", letting UIs render distinct icons. The summary string is a human-readable headline for the entry ("Scraped @neymarjr · client-acme", "Batch complete — 20 user(s)", "Refunded @lewis.hamilton").

Defaults + filters: default window is the last 24 hours. Override with ?since=...&until=... (ISO8601 UTC). Filter by ?cost_center=client-acme to scope to a single client. ?limit= caps entries (default 100, max 500).

counts_by_source rollup: response includes {billing_event: N, job_archive: M} for quick metric rendering.

Use case — compliance audit: legal asks "show me everything that happened for client-acme between April 1 and April 7". One call: GET /v1/full-scrape/activity?since=2026-04-01T00:00:00Z&until=2026-04-07T23:59:59Z&cost_center=client-acme&limit=500. Returns every scrape, every batch, every refund in chronological order — ready to paste into a compliance report without stitching events + jobs manually.

Use case — operator live dashboard: ops team's live view polls /v1/full-scrape/activity?since=<5min ago>&limit=50 every 30 seconds. The feed shows scrape activity + batch completions in real time. When something unusual happens (spike of refunds, deferred batch fires), it shows up in the feed without requiring a separate query per source.

Creator dashboard (unified operational view)

GET/v1/full-scrape/creators/:usernameOne-call dashboard — quarantine status + latest snapshot + 30d activity + recommendation API KEY

GET/v1/full-scrape/creators/compare?usernames=a,b,cSide-by-side snapshot comparison for 2-5 creators — zero credits, pure cache read API KEY

Creator comparison (iter 95). Pass 2-5 usernames and get a side-by-side view of their latest snapshots. Uses the same fresh_snapshot/3 helper as iter 57 (max_age) and iter 84 (creator dashboard) so results reflect the same cache state — zero credits, zero billing events, zero Apify calls.

Parameters: ?usernames=neymarjr,lewis.hamilton,zendaya — comma-separated, max 5, duplicates deduped. Minimum 2 (for a single creator use /creators/:username instead).

Response shape:

{requested, with_snapshot, without_snapshot, shared_platforms, comparisons,
      missing_usernames, note}

. Each comparison entry is either {username, snapshot: {id, captured_at, age_seconds, per_platform}} or {username, snapshot: null, hint}.

shared_platforms: computed as the intersection of platforms across all creators that DO have a snapshot. Lets the client render only the columns that exist for everyone — avoids the "tiktok metrics for 2 creators, instagram for 3" display problem. Empty list if nothing is shared.

Missing snapshots: creators without a snapshot in the last 30 days appear in missing_usernames and their comparison entry has snapshot: null + a hint pointing to POST /v1/full-scrape/<username>. The caller sees which ones need a fresh scrape before the comparison is complete.

Use case — creator vetting: agency considering 3 potential creators for a campaign. Call /v1/full-scrape/creators/compare?usernames=creator_a,creator_b,creator_c. Response returns side-by-side follower counts, engagement rates, and post counts across their shared platforms. Account manager compares in 10 seconds without running individual scrapes (if they're already cached from recent scheduled jobs).

Pairs with /creators/:username: use compare for quick multi-creator views, and drill into a single creator via the dashboard endpoint when you need quarantine status, activity history, and scrape recommendation.

Everything about a creator in one call. Composes quarantine state (iter 69), latest cached snapshot (iter 57), 30-day event history (iter 82 username filter), reliability math (iter 68), and a scrape recommendation into a single response. Agencies reviewing a specific creator get operational context without stitching 5 endpoints together.

Response shape:
• username — normalized (lowercase, trimmed)
• quarantine —

{flagged, details: {refund_count, total_refund_usd,
      first_refund_at, last_refund_at}}

• latest_snapshot — most recent non-expired snapshot (up to 30 days old), or null. Includes id, captured_at, age_seconds, and the full per_platform blob
• activity_30d —

{total_events, total_attempts, included, overage,
      refunded, total_cost_usd, success_rate, success_rate_pct, first_event_at,
      last_event_at}

• recommendation — {action, severity, reason, fresh_snapshot_available}
• drill_down — ready-to-use URLs for events log, CSV export, and cached probe scoped to this creator

Recommendation decision tree (priority order):
1. skip (severity high) — creator is quarantined. Reason explains refund count + how to fix.
2. use_cached (severity low) — fresh snapshot within 5 minutes exists. Hints at the ?max_age_seconds=300 query to hit the zero-credit cache path.
3. proceed_with_caution (severity medium) — success rate under 70% in the last 30 days. Scraping will likely trigger refunds — investigate upstream before committing.
4. proceed (severity low) — creator is healthy, no quarantine, no recent failures.

Use case — pre-flight check before ad-hoc scrape: account manager gets a request from client "can you grab fresh data on @neymarjr?". Instead of firing a scrape blind, they GET /v1/full-scrape/creators/neymarjr first. Response says "use_cached — fresh snapshot from 2 minutes ago". They return the cached data to the client at zero cost, zero credits consumed. Or if the response says "skip — quarantined with 5 refunds", they warn the client that the account needs attention before re-scraping.

Pairs with /v1/cost/creators: the creators aggregator (iter 80) gives you the top-N view of your entire portfolio. This endpoint drills into ONE specific creator. Click through an aggregator row → single-creator dashboard for full context.

Creator quarantine (skip known-bad creators)

GET/v1/full-scrape/quarantine?window_days=7&min_refunds=3List creators with 3+ refunds in the last 7 days — derived live from billing events API KEY

Pay attention, not credits. When a creator has been failing repeatedly (account went private, platform blocking, stale handle), you're burning credits + triggering refunds on every batch that includes them. Quarantine surfaces those creators so you can drop them from future batches before they cost you anything.

Derivation: pure query over :full_scrape_refund events. A creator is flagged when they accumulated min_refunds (default 3) refunds in the last window_days (default 7). Both configurable via query params — max 30 days / 20 refunds. No stored state; flag auto-expires as refund events slide out of the window.

Response shape: per-creator {username, refund_count, total_refund_usd, first_refund_at, last_refund_at} sorted by refund_count descending. Plus a policy section showing the current thresholds and a usage section explaining how to act on the list.

Pre-flight integration: every pre-flight response now includes a quarantine section at the top level — flagged_in_batch lists which of the batch's usernames are currently flagged, and each per_username entry gains quarantined: true/false + quarantine_info with refund history. Agencies see before they commit what's risky.

Opt-in skip on batch: add ?skip_quarantined=true to POST /v1/full-scrape/batch and the controller drops flagged creators from the batch BEFORE authorization check + BEFORE credit consumption. Skipped creators cost nothing (no credit, no scrape, no refund). If every username in the batch is quarantined, the endpoint returns 422 all_usernames_quarantined with the list of dropped creators. If you want to force-run despite the warning, omit the flag — quarantine is advisory by default, not enforcing.

Clearing: no manual clear endpoint. Quarantine is derived, not stored. Fix the underlying issue (account restored, platform block lifted) and wait for the 7-day window to slide past the last refund event. The creator drops off the list naturally as the sliding window advances. If you need an immediate override, change the thresholds via query params on the /quarantine endpoint for your specific call — the batch skip-quarantined filter uses the default thresholds.

Use case — weekly refresh protection: an agency runs a weekly 20-creator batch for each client. Over time, 2-3 creators reliably fail (private accounts, stale handles). Adding ?skip_quarantined=true to the batch call means the weekly refresh automatically stops burning credits on those creators, and a Slack alert wired to /v1/full-scrape/quarantine tells the account manager which clients need handle updates.

Conditional on-demand (freshness-aware, credit-saving)

GET/v1/full-scrape/cached/:username?max_age_seconds=3600Zero-credit cache probe — returns the latest snapshot if fresh, 404 if not API KEY

Pay only for stale data. Every full-scrape endpoint now accepts a freshness window via max_age_seconds / max_age_minutes / max_age_hours (1s min, 30d max). When a snapshot exists within that window, the request is served from cache with zero credits consumed, zero billing events, zero Apify calls.

Single scrape: POST /v1/full-scrape/neymarjr?max_age_seconds=600 — returns the cached snapshot if captured in the last 10 minutes, otherwise runs a fresh live scrape and bills normally. The cached response includes cached: true, cached_snapshot.{id, captured_at, age_seconds, per_platform}, billing.credit_used: false, and a savings_usd_vs_live_scrape line (estimates overage charge avoided if you were past quota).

Batch scrape: POST /v1/full-scrape/batch {"usernames": [...], "max_age_seconds": 3600}. The controller pre-scans all usernames, splits them into cached vs stale, and only runs the scrape loop for the stale ones. Cached results are stitched into the same results array with mode: "cached". Top-level adds cached_count, scraped_count, savings: {credits_saved, overage_scrapes_avoided, overage_charge_avoided_usd, our_marginal_cost_saved_usd}. If all usernames hit cache: all_cached: true, zero auth check, immediate return.

Async batch interaction: when the async flag is combined with max_age_seconds, only the stale subset is enqueued as a scrape job. The 202 response carries stale_to_scrape, cached_hits, and stale_usernames[] so the client knows exactly what's running in background vs what already landed instantly. When the job completes, the final result merges cached + scraped into one unified results list.

Quote token interaction: if you use a price-lock token with max_age_seconds, pre-flight must be called with the SAME max_age_seconds so the quoted usernames hash matches the stale subset at commit time. Otherwise batch returns 422 quote_token_usernames_mismatch.

GET /v1/full-scrape/cached/:username — pure cache probe. Default max_age is 7 days if not specified. Returns the snapshot or 404. Use this as the cheapest possible way to answer "do I already have this?" before even looking at quote tokens or pre-flight.

Use case — dashboard polling: a client-facing dashboard polls POST /v1/full-scrape/neymarjr?max_age_seconds=300 every 60 seconds. First hit runs a live scrape and caches the snapshot. The next 4 polls (within the 5-minute window) return the cached snapshot at zero cost. 80% credit savings with no client logic changes.

Pre-flight (dry-run cost + feasibility projection)

POST/v1/full-scrape/pre-flightDry-run a batch scrape — returns full cost + feasibility report without consuming credits API KEY

Zero-side-effect cost projection. Body: {"usernames": ["neymarjr", "lewis.hamilton", ...]} (max 20). Returns per-username feasibility (cached snapshot available? simulated mode — included/overage/quota_exhausted — with charge), aggregate credit accounting (already_used_this_month, remaining_before, remaining_after_if_proceed, included_units, overage_units, quota_exhausted_units), total cost projection (charged vs our marginal), budget_cap feasibility check (projected_spent_after_usd, headroom_after_usd, would_exceed_cap?, blocks_request?), rate-limit headroom (per_minute_remaining, headroom_after), and a top-level can_proceed boolean with human-readable blocking_reasons.

NO credits consumed. NO billing events logged. NO jobs created. Feasibility is evaluated against LIVE state at this exact moment — credit counter, MTD cost aggregate, rate-limit bucket, budget cap. Call this right before dispatching the real batch to know exactly what you're committing to. The canonical pattern for accounting-grade on-demand scraping: pre-flight → human review → batch.

Use case: you're about to fire a 20-creator monthly refresh for a client. Pre-flight tells you "15 included, 5 overage ($2.50), projected MTD $87.42 against $200 cap, can_proceed: true". You log that projection, fire the batch, and reconcile against the receipt endpoint afterward.

Price-lock quote tokens (atomic plan → commit): when can_proceed: true, the response includes a quote section with a Phoenix.Token-signed quote_token valid for 300 seconds. The token binds (api_key_id, usernames_hash, cost_center, total_charged_usd) with HMAC-SHA256 over the app's secret_key_base. Pass the token back to POST /v1/full-scrape/batch as "quote_token": "..." and the batch commits at the quoted price — even if your credit balance shifted between pre-flight and commit. The batch response echoes quote.locked: true with the quoted vs actual cost.

Quote token error cases:
• 410 quote_token_expired — token is past its 300s max_age; re-run pre-flight
• 422 quote_token_invalid — signature failed (tampered / wrong environment)
• 403 quote_token_api_key_mismatch — different api_key than the one that issued the quote
• 422 quote_token_usernames_mismatch — batch usernames differ from the quoted list (hash mismatch)
• 422 quote_token_cost_center_mismatch — cost_center at commit disagrees with quote

The quote_token is optional: omit it and the batch runs at live-state pricing (no lock). Include it and the batch enforces strict redemption — no silent degradation if anything drifts.

Batch scrape (authed — atomic multi-username, sync or async)

POST/v1/full-scrape/batchScrape up to 20 usernames in one call — all-or-nothing credit check, optionally async API KEY

Body:

{"usernames": ["user1", "user2", ...], "platforms": "instagram,tiktok", "async": true, "webhook_url": "https://...", "cost_center": "client-acme", "quote_token": "SFMyNTY..."}

Max 20 usernames per batch. The upfront authorization check is atomic — if your tier can't afford the whole batch (Free hits quota), zero credits are consumed and 402 is returned. Once the batch is authorized, credits are consumed per-item as scrapes run.

Async mode (async: true): returns 202 with a job_id immediately. Task processes usernames serially in background; progress field on the job updates as each username completes (GET /v1/full-scrape/jobs/:id shows {"completed": 12, "total": 20, "percent": 60.0}). Single webhook fires on batch completion with full aggregate result — event type is full_scrape.batch_completed.

Per-item failure isolation (iter 55): individual scrape failures inside a batch no longer crash the whole batch. Each failed username auto-refunds its credit (increments the refund bucket + writes a negative full_scrape_refund billing event with metadata refund_reason: "auto_batch_item_failure"), and the batch continues. The per-item result carries {error: "<msg>", refunded: true, refund_amount_usd: -0.50, scrape: null, cost_usd: 0.0} while successful items keep their normal shape.

Batch aggregate adds (iter 55): successful_count, failed_count, refunded_count, refund_total_usd, billing.refunded_overage_count, billing.net_overage_count, and billing.net_overage_charge_usd — the customer's real bill after refunds net through. Included-tier scrapes that fail also refund (against the sliding credit bucket) even though they have zero dollar charge, so your quota is fully protected against flaky upstreams.

Smart retry before refund (iter 64): before giving up and triggering the per-item auto-refund, each scrape call is retried up to 2 times with exponential backoff (1s, 2s). Max added latency per failing item: 3 seconds. First attempt is always fast — only transient failures pay the retry cost. Per-item result now carries an attempts field (1 = first try succeeded, 2 = recovered on first retry, 3 = recovered on second retry or final failure).

Retry telemetry on the batch aggregate: retries.items_that_retried, retries.total_retry_attempts, retries.recovered_by_retry (successes that would have been refunds without the retry), and retries.saved_refunds_usd (the dollar value the retry layer prevented from being refunded). Operators can track this field to see how much the retry layer is earning the customer — a healthy upstream should see near-zero saved_refunds_usd; a flaky one might recover $5-10/day automatically.

Sync batch latency note: sync batches now incur up to 3s extra per failed item before the retry layer gives up. A 20-item batch with one flaky username can take 3s longer to respond. If that's unacceptable, use async mode — the retry layer runs in background and the caller polls.

Retry config override (iter 75): pass optional max_retries=N (integer 0-5) on scrape/batch to tune the retry budget per request. Clamped to [0, 5] at the plug boundary — out-of-range values are silently corrected.

Backoff schedule per max_retries:
• 0 — [] — no retries, fail immediately on first error (fastest, lowest tolerance)
• 1 — [1s] — max +1s added latency
• 2 — [1s, 2s] — default, +3s max
• 3 — [1s, 2s, 4s] — +7s max
• 4 — [1s, 2s, 4s, 15s] — +22s max
• 5 — [1s, 2s, 4s, 15s, 60s] — +82s max, extreme-reliability

Use cases:
• Latency-sensitive dashboards (max_retries=0): fail-fast, surface errors immediately, client can retry with its own backoff strategy.
• Default batch workloads (max_retries=2): balanced, absorbs most transient blips without blowing HTTP budgets.
• High-stakes monthly refresh (max_retries=5): an agency generating the monthly client report will tolerate +82s to maximize success rate over the whole batch.

Refund semantics preserved: items that fail all their retries still go through the iter 55 auto-refund path. The difference is just how many attempts happen before giving up. The per-item attempts field in the response reflects the actual count, and retries.saved_refunds_usd still tells operators how much the retry layer recovered.

Deferred one-shot scrapes (iter 77): pass an optional run_at (ISO8601 UTC) on the batch body and the controller hands the batch off to an Oban-scheduled worker instead of running it inline. Must-have fields: run_at (must be in the future) + webhook_url (required — deferred batches can't be polled because the scheduled fire time is typically beyond the 1-hour ETS TTL and the job is tracked in Oban, not Jobs).

Request shape:

{"usernames": [...], "run_at": "2026-04-10T03:00:00Z", "webhook_url": "https://...",
      "cost_center": "...", "labels": [...], "platforms": "..."}

. Past timestamps are silently rejected (run_at parses to nil and the batch runs inline instead).

Response 202:

{deferred: true, oban_job_id, scheduled_at, batch_size, cost_center, webhook_url,
      webhook_event_on_completion: "deferred_batch_completed", webhook_event_on_failure:
      "deferred_batch_failed"}

. The oban_job_id is the handle for cancellation (future work) — the Jobs ETS system is NOT involved until the worker actually fires and creates per-item billing events.

Auth at RUN time, not SUBMIT time. Critical design choice: when a deferred batch fires hours or days later, the caller's state (quota, budget cap, sub-caps, frozen centers) may have shifted. The worker re-evaluates authorization at fire time and fires a deferred_batch_failed webhook with the specific reason (quota_exhausted_at_run_time, cost_center_frozen: client-acme, budget_cap_exceeded_at_run_time) instead of silently consuming credits from a stale authorization. Agencies setting up overnight runs get honest feedback on whether their state drifted.

On successful fire: worker iterates the usernames serially, consumes credits per-item (matching the normal batch semantics), handles per-item failures via the same refund path (iter 55), logs billing events with metadata.deferred: true so reports can distinguish deferred spend from live spend, then delivers a deferred_batch_completed webhook with

{event, oban_job_id, scheduled_at, completed_at, elapsed_ms, batch_size,
      successful_count, failed_count, results[]}

.

Oban config: dedicated queue :deferred_scrape with concurrency 5 (lower than webhook_retry's 20 because each job runs a full batch of up to 20 scrapes serially — total concurrent scrapes ≈ 100).

Use case — off-peak weekly refresh: agency runs a 30-creator weekly batch. Upstream platforms rate-limit harder during business hours, so they want the batch to run at 3am UTC. One call:

POST /v1/full-scrape/batch {"usernames": [...], "run_at": "2026-04-10T03:00:00Z",
      "webhook_url": "https://ops.agency.com/batch-done"}

. The batch sleeps in Oban until 3am, fires, and the ops webhook receives the aggregated result before the team starts their day.

Management endpoints (iter 78):
• GET /v1/full-scrape/deferred — lists your pending deferred batches. Queries the oban_jobs table scoped to worker = DeferredBatch + state in (scheduled, available, retryable) + args->>'api_key_id' = <caller> (JSONB containment). Returns

{count, pending: [{oban_job_id, state, scheduled_at, batch_size, cost_center,
      webhook_url, replay_of, max_retries, max_age_seconds, inserted_at}, ...]}

ordered by scheduled_at ASC.
• DELETE /v1/full-scrape/deferred/:oban_job_id — cancel a pending deferred batch before it fires. Ownership is verified by checking args.api_key_id matches the caller. Errors: 404 deferred_job_not_found, 403 deferred_job_owned_by_another_api_key, 422 deferred_job_already_fired (job already transitioned past a cancellable state).

Why no post-fire cancel: once a deferred batch fires, it runs through the normal per-item process loop — per-item failures auto-refund (iter 55), the whole-batch webhook fires when done. If you need to "cancel" a fired batch's effects, use the refund path on individual jobs after-the-fact. Deferred cancel is pre-fire only.

Unified history via archive (iter 79): fired deferred batches are now mirrored to the job_archives table (iter 71) with an id of the form def_<oban_id>. This closes the audit gap — both successful runs (status: complete) and auth-failures at run time (status: failed) get archived. Downstream effects:
• GET /v1/full-scrape/jobs/history now surfaces deferred runs alongside ETS-sourced batches. Filter by ?cost_center= or ?status=failed to find them.
• GET /v1/full-scrape/jobs/def_42/receipt works because the receipt endpoint falls back to archive on ETS miss (iter 71).
• POST /v1/full-scrape/jobs/def_42/replay and POST /v1/full-scrape/jobs/replay-bulk work too — bulk weekly refreshes can mix ETS job ids and deferred job ids freely.
• The def_ prefix is visually distinct from the ETS-sourced fs_ prefix, so operators can tell execution path at a glance in logs and dashboards.

Billing events still carry the flag: per-scrape billing events logged by the deferred worker stamp metadata.deferred: true (distinct from the job_archives record), so /v1/cost/events/export filtered CSV dumps can separate deferred spend from immediate spend at the event granularity — useful when an agency wants "what did we spend on overnight batches this month" broken out from the main usage.

credits_used_this_month_after_batch now pulls directly from Credits.read/2 so it reflects the net effective count (consumed − refunded) the moment the batch finishes, not a cached pre-refund counter.

Auto-split for >20 usernames (iter 91): pass auto_split: true along with up to 100 usernames and the controller will chunk them into 5 sub-batches of 20, enqueue each as an independent async batch, and return the array of new job_ids. Always async — sync execution of 100 scrapes sequentially would blow every HTTP budget. Each chunk runs its own cached-split + quarantine + retry + refund pipeline independently.

Incompatible with quote_token: quote tokens are issued for a fixed username list; chunking breaks that contract. Passing both returns 400 auto_split_incompatible_with_quote_token. If you need price lock per chunk, pre-flight each chunk separately and submit 5 individual quoted batches.

Response shape:

{auto_split: true, total_usernames, chunk_count, chunk_size, enqueued, errors,
      chunks: [{chunk_index, job_id, usernames, batch_size, status}, ...], cost_center, note}

. Top-level enqueued/errors counts give a quick rollup; per-chunk status is either "enqueued" or "error" with error reason if the enqueue failed (e.g. Jobs store unreachable).

Webhook behavior: if you pass webhook_url, it fires once per chunk — plan for 5 delivery attempts if you're running a full 100-username auto-split. The webhooks are independent; there is no aggregated "all-chunks-done" webhook. To correlate, track the chunk job_ids client-side and wait for all N to deliver.

cost_center propagation: the caller-provided cost_center (or X-Cost-Center header) is attached to EVERY chunk — all 100 scrapes attribute to the same client. Iter 59 per-center sub-caps still enforce globally, so if any chunk would push the cumulative spend over the sub-cap, that chunk's credits get blocked at the normal enforcement layer. Chunks that already ran stay committed.

Use case — bulk initial onboarding: new enterprise client with 75 creators to track. Agency calls POST /v1/full-scrape/batch with {"usernames": [...75...], "auto_split": true, "cost_center": "client-bigcorp"}. Response returns 4 chunk job_ids (20 + 20 + 20 + 15). Ops dashboard polls all 4 until complete, then the initial baseline is established with a single API call from the caller's perspective.

Orchestration tracking (iter 97): auto-split now generates a shared orchestration_id (format orch_<base32>) attached to all sub-jobs. The 202 response includes orchestration_id at the top level AND an orchestration_poll_url pointing to the aggregated lookup.

Aggregated view: GET /v1/full-scrape/orchestrations/:id returns all sub-jobs sharing the id plus an aggregate rollup:

{orchestration_id, sub_job_count, statuses: {complete, failed, cancelled},
      totals: {usernames, successful, failed}, sub_jobs: [...], cost_center, note}

. Each sub_jobs entry has

{id, status, batch_size, successful_count,
      failed_count, finished_at, started_at, cost_center}

. Drill into individual sub-jobs via GET /v1/full-scrape/jobs/:id for full per-chunk detail (results array, notes, receipt).

Merged view (iter 98): orchestration lookup now merges live ETS jobs (pending/running) with archived jobs (terminal state). Real-time progress is visible the moment chunks start executing — no more waiting for terminal state. Each sub-job entry carries a source tag ("ets" for live, "archive" for historical). ETS wins on id collision so callers always see the most current state. Response adds active_count (ETS entries in pending/running) and archived_count (terminal in DB) for quick status rollup, plus progress field per ETS entry showing {completed, total} live counters.

Bulk cancel (iter 99): DELETE /v1/full-scrape/orchestrations/:id cancels all non-terminal sub-jobs sharing the orchestration_id in a single call. Iterates the ETS sub-jobs via Jobs.list_by_orchestration, calls the existing per-job cancel logic (Task.Supervisor.terminate_child + Jobs.mark_cancelled) on each, and returns an aggregate {sub_job_count, cancelled_count, already_terminal_count, sub_jobs}. Terminal jobs are left untouched and appear as action: "already_terminal".

Credit refund semantics: bulk cancel does NOT refund credits. Same policy as single-job cancel — compute + bandwidth were already paid up to the cancellation point. For refunds on partial-failure items, use POST /v1/full-scrape/jobs/:id/refund per sub-job after cancellation. If you need the "refund everything" semantic, cancel first + loop over the aggregated response firing manual refunds per eligible sub-job.

404 semantics: returns 404 no_active_sub_jobs when no ETS entries match — could mean the orchestration already finished (check GET), the id is wrong, or the chunks already expired past the 1-hour ETS TTL (auto-archived). Delete is only meaningful on running orchestrations.

Use case — abort runaway batch: agency submits a 75-creator auto-split and realizes mid-execution that the wrong creators were selected. Instead of cancelling 4 sub-jobs individually via DELETE /jobs/:id, they call DELETE /v1/full-scrape/orchestrations/orch_abc123. All 4 running chunks are cancelled in one call. Any chunk that had already finished naturally is reported as already_terminal and left alone.

Orchestration replay (iter 104): POST /v1/full-scrape/orchestrations/:id/replay re-runs an entire historical orchestration as a fresh auto-split batch. The controller loads all sub-jobs (merged ETS + archive), combines their batch_usernames into a single deduped list, and delegates to the normal batch path with auto_split=true. A BRAND NEW orchestration_id is generated for the replay run.

What gets inherited: cost_center and webhook_url are pulled from the first sub-job of the original orchestration. If you need to override them, call with POST /v1/full-scrape/batch directly with a custom auto_split body — this endpoint is a convenience shortcut for the common "run it again" workflow.

Replay linkage: the replay batch carries replay_of: "<original orchestration_id>" in the generated sub-jobs so archives can trace the chain. The response format matches /v1/full-scrape/batch with auto_split=true, including the new orchestration_id at the top level.

Credit cost: replay is NOT a free re-run — credits are charged normally for each scrape in the new orchestration, same as any regular batch. If the goal is to refresh cached data, combine with POST /v1/full-scrape/batch directly and pass max_age_seconds for conditional caching per chunk.

Use case — weekly refresh: agency runs a 75-creator auto-split on Monday for client-bigcorp. Following Monday, they call POST /v1/full-scrape/orchestrations/orch_abc123/replay. A new orchestration runs the same 75 creators with fresh data, keyed to a new orchestration_id. Zero retyping of usernames, cost_center, or webhook — all inherited from the original archive.

Indexed lookup: the iter 97 migration added a (api_key_id, orchestration_id) index, so even large histories return the aggregate quickly. Scoped to the caller's api_key — ownership enforced at the SQL layer.

Cost envelope (iter 74): pass an optional max_spend_usd param on the batch body and the controller projects the total charge for the stale subset BEFORE any authorization check or credit consume. If the projection exceeds the envelope, the request short-circuits with 402 batch_exceeds_max_spend and zero side effects — no credits consumed, no billing events, no snapshots. Cached hits don't count against the envelope (they cost nothing), so max_spend only applies to the portion that would actually be scraped live.

Envelope vs pre-flight+quote_token. Pre-flight + quote_token (iter 51/53) gives you a LOCKED price via a two-call handshake. Envelope (iter 74) gives you a HARD CEILING in a single call — no separate pre-flight round-trip. Use envelope when you know your max tolerance; use pre-flight+quote when you need to surface the projection to the end user before committing. Both paths can be combined: envelope as a safety belt, quote_token as the price lock.

402 response shape:

{error, max_spend_usd, projected_spend_usd, overage_usd, breakdown:
      {included_units, overage_units, quota_exhausted_units, overage_price_usd, quota, already_used},
      hint}

. The breakdown tells the caller exactly why — maybe they're partway through their quota so more of the batch would be overage than they expected.

Use case — safety belt on weekly refresh: agency runs POST /v1/full-scrape/batch with 20 creators + max_spend_usd: 5.00. Under normal conditions they're inside quota and the batch costs $0 (envelope untouched). If they've been consuming heavier than usual and 15 of the 20 would be overage, the projection ($7.50) exceeds the envelope ($5.00) → 402, no charge. Agency sees the breakdown, raises the envelope to $10 or removes 5 creators, retries. Zero accidental spend.

Tier-aware performance:
Free: 6 concurrent / 30s timeout · Starter: 8 / 30s · Pro: 12 / 25s · Business: 16 / 20s · Enterprise: 20 / 15s
Pro+ tiers get higher concurrency and tighter timeouts — the scrape finishes faster under load.