Live — 370+ endpoints
BuyCrowds API
Unified social intelligence across 22+ platforms. One call, every creator. Public data, BYOK tokens, zero credential storage.
Privacy & Data Notice
BuyCrowds retrieves only publicly available data. For private data (Instagram insights, YouTube analytics), users provide their own API tokens (BYOK). Tokens are used in-memory only and never stored.
BuyCrowds uses only publicly available data. OAuth tokens are provided by the user and used in-memory only — we do not store credentials.
Token Vault (BYOK — Bring Your Own Keys)
0 tokens
Paste your API tokens here to unlock private data on Try It buttons. Tokens are stored in sessionStorage only — they disappear when you close this tab. BuyCrowds never stores or transmits your keys.
Quick Start
3 examples
Get started in seconds. All public endpoints require no authentication.
Unified Profile — all platforms in one call
curl https://api.buycrowds.com/v1/public/profile/elonmusk
Flex Check — fun social credibility score
curl https://api.buycrowds.com/v1/public/flexcheck/mkbhd
Deep Data — maximum extraction (BYOK for private platforms)
curl "https://api.buycrowds.com/v1/public/deep/tiktok/charlidamelio"
Unified Profile
2 endpoints
Aggregate profile data across all platforms in a single call. Pass BYOK params to enrich with private data.
BYOK params (optional):
ig_token, ig_user_id (Instagram Graph API token from Facebook Developer), youtube_key (YouTube Data API v3 key from Google Cloud Console), twitter_token (Twitter API Bearer Token from developer.twitter.com).
GET/v1/public/profile/:usernameUnified profile summary across all found platforms PUBLIC
GET/v1/public/profile/:username/fullFull unified profile with all available data PUBLIC
Identity & Fun
14 endpoints
Fun analytics, identity resolution, and social personality endpoints.
GET/v1/public/whoami/:usernameCross-platform identity summary PUBLIC
GET/v1/public/whereami/:usernameWhich platforms this username exists on PUBLIC
GET/v1/public/wheniwas/:usernameAccount creation timeline across platforms PUBLIC
GET/v1/public/mynumbers/:usernameAggregated follower/following counts PUBLIC
GET/v1/public/stalker/:usernameDeep-dive social activity report PUBLIC
GET/v1/public/flexcheck/:usernameSocial credibility and flex score PUBLIC
GET/v1/public/compare/:user1/:user2Head-to-head comparison of two users PUBLIC
GET/v1/public/receipts/:usernameProof of social presence and achievements PUBLIC
GET/v1/public/roast/:usernameHumorous roast based on social data PUBLIC
GET/v1/public/vibe/:usernameVibe check and personality analysis PUBLIC
GET/v1/public/resolve/:usernameResolve username to platform identities PUBLIC
GET/v1/public/find/:usernameFind user across all platforms PUBLIC
GET/v1/public/crossrefCross-reference different usernames per platform PUBLIC
GET/v1/public/all/:usernameAll public data for a username in one call PUBLIC
Analytics
7 endpoints
Cross-platform analytics: engagement rates, growth tracking, audience insights, and content performance.
GET/v1/public/analytics/engagement/:usernameEngagement rate across platforms PUBLIC
GET/v1/public/analytics/growth/:usernameGrowth metrics and trajectory PUBLIC
GET/v1/public/analytics/benchmark/:username1/:username2Benchmark two users side by side PUBLIC
GET/v1/public/analytics/summary/:usernameAnalytics overview summary PUBLIC
GET/v1/public/analytics/audience/:usernameAudience demographics and composition PUBLIC
GET/v1/public/analytics/content/:usernameContent performance analysis PUBLIC
GET/v1/public/analytics/history/:usernameHistorical analytics over time PUBLIC
Social Graph
3 endpoints
Map social connections, discover shared networks, and visualize relationship graphs.
GET/v1/public/graph/connections/:usernameSocial connections graph for a user PUBLIC
GET/v1/public/graph/network/:username1/:username2Shared network between two users PUBLIC
POST/v1/public/graph/mapMap a custom network of usernames PUBLIC
Search
10 endpoints
Search across individual platforms. Pass
?q=query to each endpoint.GET/v1/public/search/instagramSearch Instagram users PUBLIC
GET/v1/public/search/youtubeSearch YouTube channels/videos PUBLIC
GET/v1/public/search/twitterSearch Twitter/X users PUBLIC
GET/v1/public/search/spotifySearch Spotify artists/tracks PUBLIC
GET/v1/public/search/redditSearch Reddit posts PUBLIC
GET/v1/public/search/reddit/subredditsSearch subreddits by name PUBLIC
GET/v1/public/search/bluesky/postsSearch Bluesky posts PUBLIC
GET/v1/public/search/bluesky/usersSearch Bluesky users PUBLIC
GET/v1/public/search/tiktokSearch TikTok users PUBLIC
GET/v1/public/search/facebookSearch Facebook pages PUBLIC
Mega Search
2 endpoints
Search across ALL platforms simultaneously. Returns aggregated results from every supported platform.
GET/v1/public/megasearch/:querySearch all platforms for a query PUBLIC
GET/v1/public/megasearch/person/:nameFind a person across all platforms PUBLIC
Cross-Data Analysis
2 endpoints
Cross-platform data analysis. Compare audiences, find overlaps, and correlate activity.
POST/v1/public/cross/analyzeCross-platform analysis from multiple sources PUBLIC
GET/v1/public/cross/overlap/:user1/:user2Audience overlap between two users PUBLIC
Deep Data (BYOK)
1 endpoint
Maximum data extraction for any platform. For private platforms, pass your own API token via
?token= query param.
BYOK: Pass
?token=YOUR_TOKEN for platforms that require authentication. For Spotify use ?client_id=X&client_secret=Y. Tokens are used in-memory only and never stored.
GET/v1/public/deep/:platform/:usernameDeep data extraction for any platform BYOK
Data Extraction (Scraping)
5 endpoints
Maximum data extraction via scraping. Returns raw structured data from public profiles.
GET/v1/public/data/instagram/post/:shortcodeScrape a single Instagram post PUBLIC
GET/v1/public/data/instagram/:usernameScrape Instagram profile data PUBLIC
GET/v1/public/data/facebook/:pageScrape Facebook page data PUBLIC
GET/v1/public/data/tiktok/:usernameScrape TikTok profile data PUBLIC
GET/v1/public/data/twitter/:usernameScrape Twitter/X profile data PUBLIC
Per-Platform: TikTok
6 endpoints
GET/v1/public/tiktok/:username/videosUser's recent videos PUBLIC
GET/v1/public/tiktok/:username/infoUser profile information PUBLIC
GET/v1/public/tiktok/:username/followersFollower count and data PUBLIC
GET/v1/public/tiktok/:username/likesLiked videos list PUBLIC
GET/v1/public/tiktok/trendingCurrently trending TikTok content PUBLIC
GET/v1/public/tiktok/video/:video_idDetails for a specific video PUBLIC
Per-Platform: Bluesky
10 endpoints
GET/v1/public/bluesky/:handle/postsUser's recent posts PUBLIC
GET/v1/public/bluesky/:handle/followersList of followers PUBLIC
GET/v1/public/bluesky/:handle/followingAccounts this user follows PUBLIC
GET/v1/public/bluesky/:handle/likesPosts liked by this user PUBLIC
GET/v1/public/bluesky/:handle/feedUser's feed timeline PUBLIC
GET/v1/public/bluesky/:handle/listsUser's curated lists PUBLIC
GET/v1/public/bluesky/:handle/blocksBlocked accounts PUBLIC
GET/v1/public/bluesky/:handle/repostsUser's reposts PUBLIC
GET/v1/public/bluesky/post/:uri/likesLikes on a specific post PUBLIC
GET/v1/public/bluesky/post/:uri/repostsReposts of a specific post PUBLIC
Per-Platform: Reddit
10 endpoints
GET/v1/public/reddit/:username/postsUser's submitted posts PUBLIC
GET/v1/public/reddit/:username/commentsUser's comment history PUBLIC
GET/v1/public/reddit/:username/trophiesUser's Reddit trophies PUBLIC
GET/v1/public/reddit/:username/aboutUser profile information PUBLIC
GET/v1/public/reddit/:username/awardsAwards received by user PUBLIC
GET/v1/public/reddit/:username/karmaKarma breakdown by subreddit PUBLIC
GET/v1/public/reddit/subreddit/:nameSubreddit information PUBLIC
GET/v1/public/reddit/subreddit/:name/hotHot posts in a subreddit PUBLIC
GET/v1/public/reddit/subreddit/:name/topTop posts in a subreddit PUBLIC
GET/v1/public/reddit/subreddit/:name/newNew posts in a subreddit PUBLIC
Per-Platform: YouTube
8 endpoints
BYOK: Pass
?key=YOUR_API_KEY (YouTube Data API v3 key from Google Cloud Console). For analytics, pass ?token=OAUTH_TOKEN (OAuth2 token with YouTube Analytics scope).
GET/v1/public/youtube/:channel/videosChannel's uploaded videos BYOK
GET/v1/public/youtube/:channel/playlistsChannel's playlists BYOK
GET/v1/public/youtube/:channel/infoChannel information and stats BYOK
GET/v1/public/youtube/:channel/analyticsChannel analytics (requires OAuth token) BYOK
GET/v1/public/youtube/:channel/commentsRecent comments on channel videos BYOK
GET/v1/public/youtube/:channel/subscribersSubscriber count and data BYOK
GET/v1/public/youtube/searchSearch YouTube videos/channels BYOK
GET/v1/public/youtube/video/:video_id/commentsComments on a specific video BYOK
Per-Platform: Instagram
8 endpoints
BYOK: Pass
?token=IG_ACCESS_TOKEN&ig_user_id=YOUR_IG_USER_ID. Get your token from Facebook Developer > Instagram Graph API. Required for insights, stories, and private data.
GET/v1/public/instagram/:username/postsRecent posts (public scraping or BYOK) BYOK
GET/v1/public/instagram/:username/profileProfile info and metrics BYOK
GET/v1/public/instagram/:username/insightsAccount insights (requires token) BYOK
GET/v1/public/instagram/:username/storiesCurrent stories (requires token) BYOK
GET/v1/public/instagram/:username/reelsUser's reels BYOK
GET/v1/public/instagram/:username/taggedPosts user is tagged in BYOK
GET/v1/public/instagram/:username/mentionsPosts mentioning this user BYOK
GET/v1/public/instagram/hashtag/searchSearch hashtags BYOK
Per-Platform: Twitter / X
6 endpoints
BYOK: Pass
?token=BEARER_TOKEN. Get your Bearer Token from developer.twitter.com > Projects & Apps > Keys and Tokens.
GET/v1/public/twitter/:username/tweetsUser's recent tweets BYOK
GET/v1/public/twitter/:username/metricsFollower count and profile metrics BYOK
GET/v1/public/twitter/:username/followersList of followers BYOK
GET/v1/public/twitter/:username/followingAccounts the user follows BYOK
GET/v1/public/twitter/:username/likesUser's liked tweets BYOK
GET/v1/public/twitter/:username/listsUser's Twitter lists BYOK
Per-Platform: Facebook
9 endpoints
BYOK: Pass
?token=PAGE_ACCESS_TOKEN. Get from Facebook Developer > Graph API Explorer or your App's Page tokens. Required for insights and private page data.
GET/v1/public/facebook/:page_id/infoPage information and metrics BYOK
GET/v1/public/facebook/:page_id/postsPage's recent posts BYOK
GET/v1/public/facebook/:page_id/insightsPage insights and analytics BYOK
GET/v1/public/facebook/:page_id/eventsPage events BYOK
GET/v1/public/facebook/:page_id/photosPage photos BYOK
GET/v1/public/facebook/:page_id/videosPage videos BYOK
GET/v1/public/facebook/:page_id/ratingsPage ratings and reviews BYOK
GET/v1/public/facebook/post/:post_id/commentsComments on a specific post BYOK
GET/v1/public/facebook/searchSearch Facebook pages BYOK
Per-Platform: Spotify
8 endpoints
BYOK: Pass
?client_id=X&client_secret=Y. Get credentials from Spotify Developer Dashboard > Create App > Settings.
GET/v1/public/spotify/:artist_id/tracksArtist's top tracks BYOK
GET/v1/public/spotify/:artist_id/albumsArtist's albums BYOK
GET/v1/public/spotify/:artist_id/relatedRelated artists BYOK
GET/v1/public/spotify/:artist_id/infoArtist information BYOK
GET/v1/public/spotify/searchSearch artists, tracks, albums BYOK
GET/v1/public/spotify/playlist/:idPlaylist details and tracks BYOK
GET/v1/public/spotify/track/:idTrack details and audio features BYOK
GET/v1/public/spotify/album/:idAlbum details and tracks BYOK
Per-Platform: Deezer
9 endpoints
Deezer's API is fully public — no authentication required.
GET/v1/public/deezer/artist/:artist_idArtist information PUBLIC
GET/v1/public/deezer/artist/:artist_id/topArtist's top tracks PUBLIC
GET/v1/public/deezer/artist/:artist_id/albumsArtist's albums PUBLIC
GET/v1/public/deezer/artist/:artist_id/relatedRelated artists PUBLIC
GET/v1/public/deezer/searchSearch artists, tracks, albums PUBLIC
GET/v1/public/deezer/user/:user_idUser profile data PUBLIC
GET/v1/public/deezer/track/:track_idTrack details PUBLIC
GET/v1/public/deezer/album/:album_idAlbum details PUBLIC
GET/v1/public/deezer/playlist/:playlist_idPlaylist details PUBLIC
Per-Platform: LinkedIn
5 endpoints
BYOK: Pass
?token=ACCESS_TOKEN. Get from LinkedIn Developer Portal > My Apps > Auth. Requires OAuth 2.0 with r_liteprofile or r_organization_social scopes.
GET/v1/public/linkedin/:username/profileUser profile data BYOK
GET/v1/public/linkedin/:username/postsUser's recent posts BYOK
GET/v1/public/linkedin/:username/connectionsUser's connections count BYOK
GET/v1/public/linkedin/org/:org_idOrganization/company page data BYOK
GET/v1/public/linkedin/searchSearch LinkedIn profiles BYOK
Per-Platform: Twitch
6 endpoints
BYOK: Pass
?client_id=X&token=Y. Get from Twitch Developer Console > Register Application. Use Client Credentials flow for the OAuth token.
GET/v1/public/twitch/:username/channelChannel information BYOK
GET/v1/public/twitch/:username/streamsCurrent/recent streams BYOK
GET/v1/public/twitch/:username/followersFollower list BYOK
GET/v1/public/twitch/:username/videosPast broadcasts and VODs BYOK
GET/v1/public/twitch/:username/clipsTop clips BYOK
GET/v1/public/twitch/searchSearch Twitch channels BYOK
Per-Platform: Discord
3 endpoints
BYOK: Pass
?token=BOT_TOKEN. Get from Discord Developer Portal > Bot > Token. Bot must be in the server to access guild data.
GET/v1/public/discord/user/:user_idDiscord user profile BYOK
GET/v1/public/discord/guild/:guild_idServer/guild information BYOK
GET/v1/public/discord/guild/:guild_id/channelsList of channels in a guild BYOK
Per-Platform: Telegram
2 endpoints
BYOK: Pass
?bot_token=BOT_TOKEN. Create a bot via @BotFather on Telegram and use the provided token.
GET/v1/public/telegram/channel/:chat_idChannel/group information BYOK
GET/v1/public/telegram/channel/:chat_id/membersMember count BYOK
Per-Platform: WhatsApp Business
8 endpoints
BYOK: Pass
?token=ACCESS_TOKEN. Get from Meta Business Suite > WhatsApp > API Setup. Requires WhatsApp Business API access.
GET/v1/public/whatsapp/:phone_number_id/profileBusiness profile information BYOK
GET/v1/public/whatsapp/:phone_number_id/infoPhone number details BYOK
GET/v1/public/whatsapp/account/:waba_id/numbersList phone numbers in account BYOK
GET/v1/public/whatsapp/account/:waba_id/templatesMessage templates BYOK
POST/v1/public/whatsapp/:phone_number_id/sendSend a message via WhatsApp BYOK
GET/v1/public/whatsapp/account/:waba_id/analyticsAccount analytics BYOK
GET/v1/public/whatsapp/webhookWebhook verification (GET) BYOK
POST/v1/public/whatsapp/webhookWebhook event receiver (POST) BYOK
Per-Platform: Mastodon
3 endpoints
Mastodon is fully public and federated — no authentication required. Use
user@instance.social format for handle.GET/v1/public/mastodon/:handle/postsUser's recent posts/toots PUBLIC
GET/v1/public/mastodon/:handle/followersList of followers PUBLIC
GET/v1/public/mastodon/:handle/followingAccounts this user follows PUBLIC
Per-Platform: Kick
2 endpoints
Kick data is publicly accessible — no authentication required.
GET/v1/public/kick/:username/infoChannel/user information PUBLIC
GET/v1/public/kick/:username/clipsTop clips from channel PUBLIC
Per-Platform: Pinterest
5 endpoints
GET/v1/public/pinterest/:username/pinsUser's recent pins PUBLIC
GET/v1/public/pinterest/:username/boardsUser's boards PUBLIC
GET/v1/public/pinterest/:username/followersFollower list PUBLIC
GET/v1/public/pinterest/:username/followingAccounts followed PUBLIC
GET/v1/public/pinterest/searchSearch pins and boards PUBLIC
Per-Platform: Threads
1 endpoint
GET/v1/public/threads/:username/postsUser's recent Threads posts PUBLIC
Per-Platform: SoundCloud
3 endpoints
BYOK: Pass
?client_id=X. Get from SoundCloud Developer Portal (apps). Required for API access.
GET/v1/public/soundcloud/:username/profileUser profile information BYOK
GET/v1/public/soundcloud/:username/tracksUser's uploaded tracks BYOK
GET/v1/public/soundcloud/searchSearch tracks and users BYOK
Content Fetch
8 endpoints
Fetch individual content items (posts, videos, tweets) by their ID or URL.
GET/v1/public/fetch?url=URLUniversal content fetch by URL PUBLIC
GET/v1/public/instagram/post/:shortcodeFetch Instagram post by shortcode PUBLIC
GET/v1/public/instagram/reel/:shortcodeFetch Instagram reel by shortcode PUBLIC
GET/v1/public/youtube/video/:video_idFetch YouTube video details PUBLIC
GET/v1/public/reddit/post/:post_idFetch Reddit post and comments PUBLIC
GET/v1/public/bluesky/post/:handle/:rkeyFetch Bluesky post PUBLIC
GET/v1/public/twitter/tweet/:tweet_idFetch a tweet by ID PUBLIC
GET/v1/public/tiktok/video/detail/:video_idFetch TikTok video by ID PUBLIC
Batch Operations
3 endpoints
Process multiple users/checks in a single request. Send a JSON body with a list of usernames.
POST/v1/public/batch/profilesFetch profiles for multiple usernames at once PUBLIC
POST/v1/public/batch/checkCheck username availability across platforms PUBLIC
POST/v1/public/batch/compareCompare multiple users at once PUBLIC
Export
4 endpoints
Export profile data in different formats for reports, integrations, or sharing.
GET/v1/public/export/csv/:usernameExport profile data as CSV PUBLIC
GET/v1/public/export/json/:usernameExport profile data as JSON PUBLIC
GET/v1/public/export/markdown/:usernameExport profile data as Markdown PUBLIC
GET/v1/public/export/card/:usernameGenerate a shareable profile card PUBLIC
Trends
5 endpoints
Discover what's trending on each platform right now.
GET/v1/public/trends/redditTrending subreddits and posts PUBLIC
GET/v1/public/trends/blueskyTrending Bluesky content PUBLIC
GET/v1/public/trends/githubTrending GitHub repositories PUBLIC
GET/v1/public/trends/deezerTrending tracks on Deezer PUBLIC
GET/v1/public/trends/spotifyTrending on Spotify PUBLIC
Influence Scoring
3 endpoints
Calculate influence scores, categorize influencers, and rank them on leaderboards.
GET/v1/public/influence/:usernameInfluence score for a user PUBLIC
GET/v1/public/influence/category/:usernameInfluencer category classification PUBLIC
GET/v1/public/influence/leaderboardLeaderboard (?users=user1,user2,user3) PUBLIC
Hashtag Analysis
5 endpoints
Analyze hashtag performance and reach across platforms.
GET/v1/public/hashtag/instagram/:tagInstagram hashtag volume and top posts PUBLIC
GET/v1/public/hashtag/tiktok/:tagTikTok hashtag views and trending PUBLIC
GET/v1/public/hashtag/reddit/:tagReddit posts with this tag/keyword PUBLIC
GET/v1/public/hashtag/bluesky/:tagBluesky posts with this hashtag PUBLIC
GET/v1/public/hashtag/youtube/:tagYouTube videos with this tag PUBLIC
Timeline
1 endpoint
Unified cross-platform activity timeline for a user.
GET/v1/public/timeline/:usernameChronological timeline across all platforms PUBLIC
Verification
2 endpoints
Verify account authenticity and cross-platform identity consistency.
GET/v1/public/verify/:usernameVerification status across platforms PUBLIC
GET/v1/public/verify/cross/:usernameCross-platform identity verification PUBLIC
Reports
2 endpoints
Generate comprehensive reports with insights and recommendations.
GET/v1/public/report/:usernameFull analytical report PUBLIC
GET/v1/public/report/:username/strengthsStrengths and opportunities analysis PUBLIC
Link-in-Bio
2 endpoints
Auto-generated link-in-bio pages from cross-platform data.
GET/v1/public/bio/:usernameLink-in-bio data (JSON) PUBLIC
GET/v1/public/bio/:username/htmlRendered HTML link-in-bio page PUBLIC
Monitoring
2 endpoints
Monitor platform health and account status.
GET/v1/public/monitor/status/:platform/:usernameAccount status on a specific platform PUBLIC
GET/v1/public/monitor/healthPlatform health and availability PUBLIC
Embeds & Widgets
3 endpoints
Embeddable widgets and badges for websites. Also supports oEmbed protocol.
GET/v1/public/embed/:platform/:usernameEmbeddable profile widget PUBLIC
GET/v1/public/embed/:platform/:username/badgeSmall profile badge for embedding PUBLIC
GET/v1/public/oembedoEmbed protocol endpoint PUBLIC
Diff & Snapshots
2 endpoints
Take snapshots of profiles and compare changes over time.
GET/v1/public/diff/:platform/:usernameTake a snapshot of current state PUBLIC
POST/v1/public/diff/compareCompare two snapshots PUBLIC
Quick Connect OAuth
3 endpoints
Simplified OAuth flow for connecting social accounts without full dashboard registration.
GET/v1/auth/providersList available OAuth providers PUBLIC
POST/v1/auth/connect/:platformInitiate OAuth connection to a platform PUBLIC
GET/v1/auth/callbackOAuth callback handler PUBLIC
AI Generation (BYOK)
10 endpoints
AI-powered content generation and analysis. Requires API Key authentication.
BYOK + API Key: These endpoints require an API Key (header
x-api-key) and may use your own AI provider keys passed in the request body. AI tokens are used in-memory only.
POST/v1/ai/generate-postGenerate a social media post API KEY
POST/v1/ai/generate-captionGenerate a caption for media API KEY
POST/v1/ai/generate-hashtagsGenerate relevant hashtags API KEY
POST/v1/ai/generate-imageGenerate an image for a post API KEY
POST/v1/ai/analyze-profileAI analysis of a social profile API KEY
POST/v1/ai/generate-bioGenerate an optimized bio API KEY
POST/v1/ai/content-calendarGenerate a content calendar API KEY
POST/v1/ai/reply-suggestionsSuggest replies to comments API KEY
POST/v1/ai/trend-analysisAI-powered trend analysis API KEY
POST/v1/ai/competitor-reportAI competitor analysis report API KEY
Dashboard (JWT Auth)
28 endpoints
Dashboard API for managing your BuyCrowds account. Requires JWT token from
POST /api/auth/login. Pass as Authorization: Bearer TOKEN.Auth
POST/api/auth/registerCreate a new account PUBLIC
POST/api/auth/loginLogin and get JWT token PUBLIC
GET/api/auth/meCurrent user info JWT
Social Networks (BYOK Credentials)
GET/api/social-networksList connected networks JWT
POST/api/social-networksAdd a social network JWT
GET/api/social-networks/:idGet network details JWT
PATCH/api/social-networks/:idUpdate network config JWT
DELETE/api/social-networks/:idRemove a network JWT
POST/api/social-networks/:network/auth-urlGet OAuth URL for a network JWT
Social Accounts
GET/api/social-accountsList linked social accounts JWT
GET/api/social-accounts/:idAccount details JWT
DELETE/api/social-accounts/:idUnlink an account JWT
GET/api/social-accounts/:id/metricsAccount metrics JWT
GET/api/social-accounts/pending/:session_tokenCheck pending OAuth connection JWT
POST/api/social-accounts/pending/:session_token/finalizeFinalize OAuth connection JWT
Press Kits
GET/api/press-kitsList your press kits JWT
POST/api/press-kitsCreate a press kit JWT
PATCH/api/press-kits/:idUpdate a press kit JWT
DELETE/api/press-kits/:idDelete a press kit JWT
GET/api/pk/:slugPublic press kit by slug PUBLIC
Posts & Content
POST/api/postsCreate a multipost JWT
GET/api/postsList your posts JWT
GET/api/posts/:idGet post details JWT
DELETE/api/posts/:idDelete a post JWT
API Keys & Webhooks
GET/api/api-keysList your API keys JWT
POST/api/api-keysCreate an API key JWT
DELETE/api/api-keys/:idRevoke an API key JWT
GET/api/webhooksList webhooks JWT
POST/api/webhooksCreate a webhook JWT
DELETE/api/webhooks/:idDelete a webhook JWT
Connections (Legacy)
GET/api/connectionsList platform connections JWT
GET/api/connections/:idConnection details JWT
DELETE/api/connections/:platformDisconnect a platform JWT
GET/api/connections/:id/metrics/:metric_typeConnection metrics JWT
POST/api/oauth/:platform/callbackOAuth callback for platform JWT
API v1 (API Key + Rate Limited)
18 endpoints
External API for integrations. Requires
x-api-key header. Rate limited per key. Create keys at POST /api/api-keys.Accounts
GET/v1/accountsList connected accounts API KEY
GET/v1/accounts/:idAccount details API KEY
Posts
POST/v1/postsCreate a post API KEY
GET/v1/postsList posts API KEY
GET/v1/posts/:idGet post by ID API KEY
PATCH/v1/posts/:idUpdate a post API KEY
DELETE/v1/posts/:idDelete a post API KEY
GET/v1/posts/:id/resultsPost publishing results API KEY
POST/v1/posts/:id/retryRetry failed post API KEY
Media
POST/v1/media/uploadUpload media file API KEY
GET/v1/mediaList uploaded media API KEY
GET/v1/media/:idMedia details API KEY
DELETE/v1/media/:idDelete media API KEY
Other
GET/v1/platformsSupported platforms info API KEY
GET/v1/usageCurrent API usage stats API KEY
GET/v1/quotaLive quota — rate limit buckets + MTD cost + today cost + budget cap + daily cap + credits + burn summary + cache savings API KEY
One call, everything you need to self-throttle. Returns the tier's
canonical ceilings (per minute/hour/day), current bucket counts (read-only Hammer probe), month-to-date
spend in USD, linear projected end-of-month cost, budget cap status (configured? hard_enforce? headroom?
over_cap? projected_over_cap?), and next reset timestamps for each window.
Budget enforcement (402 Payment Required): when you configure
Response headers (on EVERY /v1 call):
•
•
•
Use case: a dashboard polling
Budget enforcement (402 Payment Required): when you configure
PUT /v1/full-scrape/budget-cap with hard_enforce: true, every subsequent /v1
request is guarded at the plug layer. If month_to_date_usd >= monthly_usd_cap, the request
short-circuits with a 402 response and a structured error payload pointing to
PUT /v1/full-scrape/budget-cap (raise or disable) or
GET /v1/public/cost/tiers (upgrade). Cap is checked via a 30s ETS cache over a single
indexed sum(cost_usd) query — negligible per-request overhead.Response headers (on EVERY /v1 call):
•
x-ratelimit-tier, x-ratelimit-limit, x-ratelimit-remaining — current tier + per-minute window•
x-budget-cap-usd, x-budget-spent-usd, x-budget-remaining-usd, x-budget-used-pct — only when a cap is configured•
x-cost-category, x-cost-marginal-usd, x-tier — marginal cost of this requestUse case: a dashboard polling
/v1/quota every 10s
can render a full cost + rate-limit widget without hitting any other endpoint. A production client can
read the x-budget-* headers from any response and pre-emptively back off before hitting the
hard cap.
GET/v1/creator/:slugCreator data by slug API KEY
POST/v1/profiles/scoreScore a profile (Super-Premium) API KEY
GET/v1/profiles/scoring-objectivesAvailable scoring objectives API KEY
MCP Server (Model Context Protocol)
1 endpoint
Model Context Protocol server. Supports
tools/list, initialize, and tools/call methods via JSON-RPC 2.0.POST/mcpMCP JSON-RPC handler (tools/list, initialize, tools/call) PUBLIC
OAuth Integration System
8 endpoints
Multi-provider OAuth integration management. Create integrations for any supported platform and manage tokens.
GET/v1/oauth/providersList available OAuth providers API KEY
POST/v1/oauth/integrationsCreate an OAuth integration API KEY
GET/v1/oauth/integrationsList your integrations API KEY
DELETE/v1/oauth/integrations/:idDelete an integration API KEY
POST/v1/oauth/integrations/:id/authorizeGet authorization URL API KEY
POST/v1/oauth/integrations/:id/refreshRefresh OAuth token API KEY
POST/v1/oauth/integrations/:id/testTest integration connectivity API KEY
GET/v1/oauth/callbackOAuth callback (redirect target) PUBLIC
Vault (Secret Management)
3 endpoints
Manage BYOK secrets securely. Store, list, and revoke connector credentials.
POST/v1/vault/secretsStore a BYOK secret API KEY
GET/v1/vault/secretsList stored secrets (metadata only) API KEY
DELETE/v1/vault/secrets/:connector_idRevoke a stored secret API KEY
Per-Platform: GitHub
12 endpoints
GitHub data is mostly public. No auth required for public repos/profiles.
GET/v1/public/github/:username/reposUser's repositories PUBLIC
GET/v1/public/github/:username/followersUser's followers PUBLIC
GET/v1/public/github/:username/followingAccounts user follows PUBLIC
GET/v1/public/github/:username/eventsRecent public events PUBLIC
GET/v1/public/github/:username/starredStarred repositories PUBLIC
GET/v1/public/github/:username/orgsOrganizations PUBLIC
GET/v1/public/github/:username/gistsPublic gists PUBLIC
GET/v1/public/github/:username/languagesProgramming languages used PUBLIC
GET/v1/public/github/:username/contributionsContribution graph data PUBLIC
GET/v1/public/github/:username/socialSocial accounts linked to GitHub PUBLIC
GET/v1/public/github/repo/:owner/:repoRepository details PUBLIC
GET/v1/public/github/repo/:owner/:repo/contributorsRepository contributors PUBLIC
Network Analysis
3 endpoints
Analyze social networks, find overlaps, and get connection suggestions.
GET/v1/public/network/:usernameSocial network graph PUBLIC
GET/v1/public/network/overlap/:user1/:user2Network overlap between two users PUBLIC
GET/v1/public/network/suggest/:usernameSuggested connections PUBLIC
Media Analysis
2 endpoints
GET/v1/public/media/summary/:usernameMedia content summary across platforms PUBLIC
GET/v1/public/media/calendar/:usernameContent publishing calendar PUBLIC
Collaboration
2 endpoints
Find collaboration matches and compatibility between creators.
GET/v1/public/collab/match/:user1/:user2Collaboration compatibility score PUBLIC
POST/v1/public/collab/findFind potential collaborators PUBLIC
Brand Safety
2 endpoints
GET/v1/public/brand-safety/:usernameBrand safety check PUBLIC
GET/v1/public/brand-safety/:username/detailedDetailed brand safety report PUBLIC
Audience Analysis
2 endpoints
GET/v1/public/audience/:usernameAudience profile and demographics PUBLIC
GET/v1/public/audience/compare/:user1/:user2Compare audiences of two users PUBLIC
SEO Analysis
2 endpoints
GET/v1/public/seo/:usernameSocial SEO score PUBLIC
GET/v1/public/seo/:username/backlinksSocial backlinks analysis PUBLIC
Pricing & ROI
3 endpoints
GET/v1/public/pricing/:usernameEstimated sponsorship pricing PUBLIC
GET/v1/public/pricing/roi/:usernameROI estimate for campaigns PUBLIC
GET/v1/public/pricing/compare/:user1/:user2Compare pricing between creators PUBLIC
Campaign Planning
2 endpoints
POST/v1/public/campaign/planPlan an influencer campaign PUBLIC
GET/v1/public/campaign/suggest/:usernameSuggest campaign strategies PUBLIC
AI Analysis (Public)
3 endpoints
AI-powered public analysis endpoints. No auth required.
GET/v1/public/ai/persona/:usernameAI persona analysis PUBLIC
GET/v1/public/ai/predict/:usernameGrowth prediction PUBLIC
GET/v1/public/ai/content-ideas/:usernameContent ideas generation PUBLIC
Reputation
2 endpoints
GET/v1/public/reputation/:usernameReputation score PUBLIC
GET/v1/public/reputation/:username/historyReputation history over time PUBLIC
Fake Detection
2 endpoints
GET/v1/public/fake-check/:usernameAnalyze account authenticity PUBLIC
GET/v1/public/fake-check/compare/:user1/:user2Compare authenticity of two accounts PUBLIC
Growth Hacking
3 endpoints
GET/v1/public/growth/tips/:usernamePersonalized growth tips PUBLIC
GET/v1/public/growth/best-time/:usernameBest times to post PUBLIC
GET/v1/public/growth/hashtags/:usernameHashtag strategy recommendations PUBLIC
Competitor Analysis
2 endpoints
GET/v1/public/competitor/:usernameCompetitor analysis PUBLIC
GET/v1/public/competitor/gap/:user1/:user2Gap analysis between competitors PUBLIC
More Tools (Sentiment, Links, Quiz, Benchmarks, Portfolio, Monetization, Niche, Stats, Power, Digest & more)
45 endpoints
Sentiment
GET/v1/public/sentiment/:usernameSentiment analysis of user's content PUBLIC
Link Tracker
GET/v1/public/links/:usernameScan all links in user's bios and posts PUBLIC
Quiz / Fun Analysis
GET/v1/public/quiz/social-iq/:usernameSocial IQ quiz based on data PUBLIC
GET/v1/public/quiz/personality/:usernameSocial personality type PUBLIC
Benchmarks
GET/v1/public/benchmark/industry/:usernameIndustry benchmark comparison PUBLIC
GET/v1/public/benchmark/percentile/:usernamePercentile ranking PUBLIC
Portfolio / Media Kit
GET/v1/public/portfolio/:usernameAuto-generated portfolio PUBLIC
GET/v1/public/portfolio/:username/media-kitMedia kit for brands PUBLIC
Rate Check
GET/v1/public/rate-checkAPI rate limit status PUBLIC
Monetization
GET/v1/public/monetization/:usernameMonetization opportunities PUBLIC
GET/v1/public/monetization/:username/revenueRevenue estimate PUBLIC
Crosspost Strategy
GET/v1/public/crosspost/:usernameCrosspost strategy recommendations PUBLIC
GET/v1/public/crosspost/:username/scheduleOptimal crosspost schedule PUBLIC
Archive / Snapshots
GET/v1/public/archive/:usernameSnapshot all platform data PUBLIC
GET/v1/public/archive/:username/compareCompare historical snapshots PUBLIC
Demographics
GET/v1/public/demographics/:usernameEstimated audience demographics PUBLIC
Niche Detection
GET/v1/public/niche/:usernameDetect creator's niche PUBLIC
GET/v1/public/niche/suggestions/:nicheGet creators in a niche PUBLIC
Namecheck
GET/v1/public/namecheck/:usernameCheck username availability everywhere PUBLIC
GET/v1/public/namecheck/suggest/:usernameSuggest available username variations PUBLIC
Stats
GET/v1/public/stats/platformsPlatform statistics overview PUBLIC
GET/v1/public/stats/apiAPI usage statistics PUBLIC
Power Score
GET/v1/public/power/:usernameCalculate cross-platform power score PUBLIC
Daily Digest
GET/v1/public/digest/:usernameDaily activity digest PUBLIC
Similar Creators
GET/v1/public/similar/:usernameFind similar creators PUBLIC
Engagement Calculator
GET/v1/public/engagement/:usernameCalculate engagement rate PUBLIC
Achievements
GET/v1/public/achievements/:usernameCheck social achievements/milestones PUBLIC
Viral Potential
GET/v1/public/viral/:usernameViral potential score PUBLIC
Rate Card
GET/v1/public/rate-card/:usernameGenerate sponsorship rate card PUBLIC
Platform Compare
GET/v1/public/platform-compare/:usernameCompare performance across platforms PUBLIC
Summary
GET/v1/public/summary/:usernameOne-liner profile summary PUBLIC
GET/v1/public/summary/:username/pitchElevator pitch for creator PUBLIC
Media Kit HTML
GET/v1/public/media-kit/:usernameRendered HTML media kit PUBLIC
Consistency Check
GET/v1/public/consistency/:usernameCheck branding consistency PUBLIC
Clone Detection
GET/v1/public/clone-check/:usernameDetect impersonator/clone accounts PUBLIC
Worth Estimation
GET/v1/public/worth/:usernameEstimate total account worth PUBLIC
Account Age
GET/v1/public/account-age/:usernameCheck account age across platforms PUBLIC
Social DNA
GET/v1/public/dna/:usernameAnalyze social DNA and behavior patterns PUBLIC
Security Shield
GET/v1/public/shield/:usernameSecurity and privacy scan PUBLIC
Fanbase Analysis
GET/v1/public/fanbase/:usernameAnalyze fanbase composition PUBLIC
Social Wrapped
GET/v1/public/wrapped/:usernameGenerate yearly social media wrapped PUBLIC
Generic Platform Data
GET/v1/public/:platform/:usernamePublic data for any platform PUBLIC
Cost & Subscription
19 endpoints
Full transparency on what the API costs to run and what you'll pay.
Subscription tiers, rate limits, amortized infrastructure costs per request,
and a live estimator so you can model your monthly bill before you commit.
Pricing model: Free tier is subsidized by paid plans.
Scraper-backed endpoints (Apify) carry the highest marginal cost.
BYOK AI endpoints are zero marginal cost to us — you pay your model provider directly.
Every /v1 response ships with cost headers:
Clients can track spend out-of-band without an extra billing round-trip.
x-cost-category — which billing bucket the endpoint hitx-cost-marginal-usd — our amortized marginal cost for the callx-tier — your resolved subscription tierClients can track spend out-of-band without an extra billing round-trip.
Subscription Tiers
GET/v1/public/cost/tiersAll subscription tiers with pricing, limits & features PUBLIC
Rate Limits
GET/v1/public/cost/rate-limitsPer-tier rate limits, response headers & enforcement rules PUBLIC
Infrastructure Costs (amortized)
GET/v1/public/cost/infrastructureCompute, DB, bandwidth, scraper & monitoring cost per 1k requests PUBLIC
Monthly Estimator
GET/v1/public/cost/estimate?tier=pro&requests_per_day=5000&scraper_ratio=0.3Project your monthly bill given usage & tier PUBLIC
Cost Breakdown by Endpoint Category
GET/v1/public/cost/breakdownWeighted cost per request across endpoint categories PUBLIC
Machine-readable Rate Card (SDK-friendly)
GET/v1/public/cost/rate-cardCanonical pricing spec — tiers, categories, infra & credits in one stable payload PUBLIC
Tier Comparison (side-by-side)
GET/v1/public/cost/compare?tiers=free,starter,pro,business&requests_per_day=10000Compare multiple tiers at a given usage level — highlights the cheapest fit PUBLIC
Subscription & Billing Info
GET/v1/public/cost/subscriptionBilling provider, payment methods, trial, cancellation & refund policy PUBLIC
Your Actual Usage & Bill (API key required)
Authenticated endpoints — pass your API key via
X-API-Key header.
These read from the live rate-limit buckets and project a personalized monthly bill.
GET/v1/cost/my-usageLive counters for your minute/hour/day buckets + full scrape credits, % of limit & health indicator API KEY
GET/v1/cost/my-billProjected monthly bill based on your observed traffic API KEY
GET/v1/cost/my-summaryUnified dashboard — requests + credits + bill + tier recommendation in one call API KEY
GET/v1/cost/upgrade-recommendationBest-fit tier for your actual usage pattern (factors in full scrape credits too) API KEY
GET/v1/cost/tier-recommendationBidirectional tier recommender — classifies direction (upgrade/downgrade/optimal) with monthly savings/extra cost API KEY
GET/v1/cost/tier-recommendationsFull cost matrix per tier: subscription + projected overage = total. Marks ineligible (Free blocks overage). Picks cheapest eligible API KEY
GET/v1/cost/rate-limit-forecastPredictive exhaustion: given current req/sec velocity, how many seconds until minute/hour/day limit is hit. Identifies binding window API KEY
GET/v1/cost/tier-safety-check?tier=starter&days=30Backward-looking: would tier X have worked for the last N days? Peak rate vs target limits, cost delta, overage blocked check API KEY
GET/v1/cost/labels-breakdown?since=2026-04-01T00:00:00Z&limit=50Spend by metadata.labels (unnested server-side). One event with multiple labels contributes to each bucket — labels are orthogonal tags API KEY
GET/v1/cost/hourly-timeseries?hours=48&cost_center=X&fill_gaps=trueHour-bucketed spend time series via date_trunc. Optional center filter. fill_gaps=true inserts zero rows for quiet hours API KEY
GET/v1/cost/spike-detector?hours=168&baseline_hours=120&threshold=3.0Z-score anomaly detection on hourly spend. Flags detection-window hours where |z| ≥ threshold API KEY
GET/v1/cost/center-drift?current_days=7&baseline_days=7&min_delta_pct=5Per-center share_pct drift between two back-to-back windows. Flags new/dropped centers + biggest movers sorted by |delta| API KEY
GET/v1/cost/hourly-timeseries/export?hours=168&cost_center=XCSV download of iter 131 hourly series. Always fill_gaps=true. Content-Disposition attachment, max 720 rows API KEY
GET/v1/cost/platform-breakdown?since=2026-04-01T00:00:00Z&cost_center=XSpend aggregated by metadata.platform (instagram/tiktok/etc). Optional cost_center drill-down. Unknown bucket for missing tags API KEY
GET/v1/cost/platform-center-crosstab?since=2026-04-01T00:00:00Z2D matrix: platform × cost_center spend. Sorted axes + row/col totals + grand total. Heatmap-ready API KEY
GET/v1/cost/histogram?bucket_count=10&since=2026-04-01T00:00:00ZCost-per-event distribution via Postgres width_bucket. Min/max/avg/p50/p95/p99 + N bucketed ranges + skew flag API KEY
GET/v1/cost/top-expensive?limit=20&cost_center=X&event_type=full_scrape_overageTop-N scrapes ordered by cost_usd DESC. Drill-down pair with /histogram to identify heavy-tail offenders API KEY
GET/v1/cost/top-usernames?limit=20&cost_center=XAggregated top-N most scraped handles. GROUP BY metadata.username with event count, total cost, avg cost, share_pct API KEY
GET/v1/cost/refund-rate-by-center?days=30&min_scrapes=5Reliability-as-money per center. SQL pivot of scrapes vs refunds, refund_rate_pct + healthy/watch/degraded/critical tier API KEY
GET/v1/cost/pareto?threshold_pct=8080/20 concentration analysis by cost_center. Cumulative share + pareto_80_20 flag + interpretation string API KEY
GET/v1/cost/refund-rate-by-platform?days=30&min_scrapes=5Upstream provider reliability signal. Per-platform refund rate + tier + worst_platform callout API KEY
POST/v1/cost/project-scenarioProspective calculator. Body {scenarios: [{name, daily_scrapes, price_per_scrape_usd, days}]}. Checks each against current headroom API KEY
POST/v1/cost/estimate-batchData-driven per-username estimate. Body {usernames[], default_price_usd?, days?}. Uses historical avg cost, falls back to default for new handles API KEY
GET/v1/cost/healthCompressed status gauge: healthy/warning/critical/exceeded + used_pct + binding_cap + one-liner. Widget-ready polling endpoint API KEY
GET/v1/cost/top-usernames/export?limit=200&cost_center=XCSV export of iter 141 top-usernames. Columns: username, count, total, avg, share_pct, first/last event_at API KEY
GET/v1/cost/cap-etaCap exhaustion predictor: ETA for monthly + daily caps at current velocity. Identifies binding cap. Pure compute via CapExhaustionPredictor module API KEY
GET/v1/cost/paceBudget pace tracker: actual vs linear-pace expected at this point in month. Status way_ahead/ahead/on_pace/behind/way_behind + ahead_or_behind_days API KEY
GET/v1/cost/volatility?days=30Spend volatility analyzer: stddev + coefficient of variation + class (stable/moderate/volatile/chaotic). Pure compute via SpendVolatility module API KEY
GET/v1/cost/forecast-bands?days=14&forecast_days=7Linear regression spend forecast with 95% prediction intervals. Slope, intercept, r_squared, residual_se + per-day expected/lower/upper bands API KEY
GET/v1/cost/top-expensive/export?limit=200&cost_center=XCSV export of iter 140 top-expensive events. RFC 4180 escaping. Content-Disposition attachment. Max 200 rows API KEY
GET/v1/cost/proration?target_tier=pro&switch_date=2026-04-15Prorated subscription delta for a mid-month tier switch API KEY
GET/v1/cost/alerts-log?event_type=budget_cap_threshold_crossed&since=2026-04-01T00:00:00Z&limit=50Historical log of cost alert fires — threshold crossings + daily digests + delivery outcomes API KEY
Cost alert fire history (iter 108). Every time
the CostAlerts GenServer decides to fire — whether it's a credit threshold crossing
(iter 63), budget threshold crossing, or daily digest (iter 93) — a row is written to
the new
Distinct from /webhook-deliveries:
Response shape:
delivery_outcome field: stamped asynchronously after the webhook delivery attempt completes. Possible values:
Retention: 60 days, pruned hourly by RetentionSweeper (iter 44) alongside snapshots and billing_events. Indexed on
Use case — compliance audit: "show me all budget threshold crossings for April". One call:
Use case — silence investigation: "why didn't I get the 80% alert last week?". Call with filter for the expected window — if the row exists with
cost_alert_fires table. This endpoint queries that history with
filters.Distinct from /webhook-deliveries:
/v1/full-scrape/webhook-deliveries (iter 86) shows Oban retry queue state
— which attempts are scheduled, retrying, or discarded. /cost/alerts-log
is the PRE-Oban ledger: "at this timestamp, our GenServer decided to fire an alert, and
after the delivery attempt the outcome was X". The two together give you the full picture
— alert-log for business logic audit, webhook-deliveries for network-layer debugging.Response shape:
{count, summary: {credit_threshold_crossed: N, budget_cap_threshold_crossed: M,
daily_spend_digest: K}, filters, fires: [{id, event_type, threshold_level, payload,
fired_at, delivery_outcome, inserted_at}], retention_days: 60, note}.delivery_outcome field: stamped asynchronously after the webhook delivery attempt completes. Possible values:
"delivered" (200 OK on first try),
"retry_scheduled" (first attempt failed, Oban retry queue picked it up),
"error:<reason>" (retry queue insertion itself failed — rare).
null if the row was just inserted and the async delivery Task hasn't
completed yet.Retention: 60 days, pruned hourly by RetentionSweeper (iter 44) alongside snapshots and billing_events. Indexed on
(api_key_id, fired_at) and (api_key_id, event_type) for
efficient filtering.Use case — compliance audit: "show me all budget threshold crossings for April". One call:
GET /v1/cost/alerts-log?event_type=budget_cap_threshold_crossed&since=2026-04-01T00:00:00Z&until=2026-05-01T00:00:00Z.
Response lists every fire with payload + outcome. Auditor cross-references against
Slack channel logs to verify alerts were acted upon.Use case — silence investigation: "why didn't I get the 80% alert last week?". Call with filter for the expected window — if the row exists with
delivery_outcome: "error:...", the alert fired but delivery
failed (inspect webhook URL health). If no row exists, the threshold wasn't crossed in
the stored state (inspect last_fired dedup logic).
Tier switch proration (iter 105).
Proration math:
•
•
•
•
•
Response shape: includes all intermediate math values so UIs can render a breakdown, plus a
Enterprise special case: Enterprise tier has no fixed monthly fee (
Parameters:
Billing policy caveat: this endpoint calculates the MATHEMATICAL proration. Actual billing depends on your payment provider's policy — some providers bill the prorated delta immediately, others roll it into the next cycle, others offer credit that gets applied to future invoices. Treat this as "what the fair delta would be", not "what gets charged to my card tonight".
Use case — upgrade confirmation dialog: user on Starter hits a rate limit, UI pops a "upgrade to Pro" dialog. Instead of "upgrade for $99/month" (which is misleading mid-cycle), UI calls
/cost/tier-recommendation tells you WHICH tier fits your usage. This endpoint
tells you WHAT the mid-month switch would cost in dollars — the prorated delta between
the two tier fees for the remaining days of the current cycle. Pure compute, no state
change.Proration math:
•
days_remaining = days_in_month - switch_date.day + 1 (inclusive of switch day)•
proration_factor = days_remaining / days_in_month•
current_tier_unused_refund = current_fee × proration_factor — credit for
unused time on current tier•
target_tier_prorated_charge = target_fee × proration_factor — charge for new
tier for remaining days•
delta_usd = target_charge - current_refund — positive means you owe extra,
negative means you get a refundResponse shape: includes all intermediate math values so UIs can render a breakdown, plus a
direction classifier
(upgrade / downgrade / no_change / lateral),
owes_delta? / receives_refund? booleans for fast branching, and
a ready-to-render call_to_action string.Enterprise special case: Enterprise tier has no fixed monthly fee (
price_monthly_usd: nil). Prorating to/from Enterprise
returns special_case: "contact_sales" — proration math is negotiated
directly, not computed automatically.Parameters:
target_tier (required, one of free/starter/pro/business/enterprise) and
switch_date (optional, YYYY-MM-DD, defaults to today UTC).Billing policy caveat: this endpoint calculates the MATHEMATICAL proration. Actual billing depends on your payment provider's policy — some providers bill the prorated delta immediately, others roll it into the next cycle, others offer credit that gets applied to future invoices. Treat this as "what the fair delta would be", not "what gets charged to my card tonight".
Use case — upgrade confirmation dialog: user on Starter hits a rate limit, UI pops a "upgrade to Pro" dialog. Instead of "upgrade for $99/month" (which is misleading mid-cycle), UI calls
/v1/cost/proration?target_tier=pro and renders
"Upgrading today costs $35 prorated ($70 - $0 refund). Full monthly fee of $99 starts
next cycle." — the user sees the true mid-month impact, not a misleading full-month
number.
Tier recommendation with direction (iter 88). The
existing
Direction classification:
•
•
•
•
Response shape:
Math source: both endpoints share the same
Use case — Q2 cost review: CFO asks "can we save on subscriptions?". Operator runs
/upgrade-recommendation returns the best-fit tier but conflates both
directions — a Pro user with light usage gets the same "you should change" signal as a Free
user who needs to upgrade. /tier-recommendation is explicit: it classifies the
direction and surfaces concrete savings or extra cost.Direction classification:
•
upgrade — recommended tier is more expensive than current. Current tier is
bottlenecking actual usage.•
downgrade — recommended tier is cheaper than current. You're overpaying for
headroom you don't use.•
optimal — already on the cheapest tier that fits your pattern. No change
recommended.•
enterprise — projected volume exceeds all self-serve tiers (contact sales).Response shape:
{current_tier, current_monthly_price_usd, recommended_tier,
recommended_monthly_price_usd, direction, delta_usd, monthly_savings_usd, extra_cost_usd,
projected_daily_requests, monthly_full_scrapes, recommendation_detail, call_to_action,
note}. Use monthly_savings_usd to render a "downgrade and save $X"
banner, extra_cost_usd for an "upgrade for $Y more/month" CTA.Math source: both endpoints share the same
Billing.recommend_tier/2 internal function that ranks tiers by monthly cost
and filters out options that would exceed daily rate limits or full scrape quota. The
difference is purely presentation — /tier-recommendation adds direction
classification on top of the same source of truth.Use case — Q2 cost review: CFO asks "can we save on subscriptions?". Operator runs
GET /v1/cost/tier-recommendation across all API keys and surfaces every one
where direction: "downgrade". Each entry shows exact monthly savings. Agency
defers the change to a safe moment (mid-month after current billing cycle settles), then
the user-facing dashboard renders "Your usage fits Starter — save $70/month by downgrading".
Reliability score (rolling success rate)
GET/v1/cost/reliability?days=30Rolling success rate + upstream health classification + top failing usernames from billing events API KEY
"Is my upstream healthy?" Answers that in one call. Pure
aggregation over billing events — no new state, no new tables. Counts
Response fields:
•
•
•
•
•
•
•
Health thresholds:
•
•
•
Insufficient data safeguard: below 10 attempts in the window, the endpoint returns
Use case — operator dashboard: agency dashboard polls
:full_scrape_consume + :full_scrape_overage as total attempts,
:full_scrape_refund as failures (iter 54/55 refund model), computes the ratio over a
configurable window (default 30 days, max 90).Response fields:
•
stats.total_attempts / successful / refunded — raw counts•
stats.success_rate / success_rate_pct — 0.0-1.0 and human percent•
stats.refund_rate / refund_rate_pct — the inverse signal•
health — one of :insufficient_data (fewer than 10 attempts),
:healthy (≥98%), :degraded (≥90%), :poor (<90%)•
top_failing_usernames — array of up to 10 usernames grouped by refund count,
with refund_count and total_refund_usd per username. SQL GROUP BY on
metadata->>'username' scoped to the refund event type•
narrative — human-readable strings ready for Slack/dashboard banner, context-aware
per health bucket•
drill_down — direct links to the events log and CSV export pre-filtered to the
failure window, for deeper investigationHealth thresholds:
•
healthy — 98%+ success rate, flaky recovery is working. Acceptable steady-state.•
degraded — 90-98% success rate. Investigate trends; something upstream is softer
than normal. Common causes: platform rate-limiting, specific accounts going private, Apify actor
version drift.•
poor — <90% success rate. Critical — expect follow-up from the CostAlerts
webhook (iter 63) as budget/refund numbers also trip. Consider freezing scheduled runs until
resolved.Insufficient data safeguard: below 10 attempts in the window, the endpoint returns
health: "insufficient_data" instead of a misleading
rate. A 0/1 refund shouldn't read as 0% success.Use case — operator dashboard: agency dashboard polls
/v1/cost/reliability every few minutes. The narrative banner shows current state.
When it flips to degraded or poor, the top_failing_usernames list
surfaces exactly which creators are causing the drop — so the operator can freeze that creator's
schedules, contact the client, or investigate via the drill_down links without hunting through
raw logs.
Period comparison (month-over-month)
GET/v1/cost/compare-periods?from=2026-03&to=2026-04Month-over-month comparison — totals + per-cost-center deltas + top movers + narrative API KEY
"How does this month compare to last month?" Answers the
single most common question in agency reporting. One call, zero client-side aggregation.
Defaults:
Response shape:
•
•
•
•
•
Use case — monthly client review: agency calls
Reconciles with:
Defaults:
from = previous month,
to = current month. Override with ?from=YYYY-MM&to=YYYY-MM. Year range
is 2000-2100, month 1-12. Invalid format returns 400 with a format hint.Response shape:
•
from / to — period descriptors with year, month, label, start, end•
totals.{from, to, delta} — each delta has {from, to, delta, delta_pct} for:
event_count, gross_cost_usd, refund_cost_usd,
net_cost_usd, included_count, overage_count,
refunded_count, scheduled_run_count•
cost_centers.{from, to, deltas} — per-center aggregate + delta array sorted by
delta_usd descending. Each center delta includes
{cost_center, from_cost_usd, to_cost_usd, delta_usd, delta_pct, appeared, disappeared}.
appeared/disappeared flags catch centers that onboarded or churned between
periods.•
top_movers.{gainers, losers} — top 5 centers with biggest positive/negative delta
(sorted by magnitude). Agencies render "growth" and "churn" columns directly from these.•
narrative — array of human-readable strings ready for Slack/email/dashboard banner,
e.g. "Spending up 23.4% month-over-month (+$42.15). Biggest gainer: client-acme (+$28.50)."Use case — monthly client review: agency calls
/v1/cost/compare-periods on the 1st and pastes the narrative into a Slack channel for
each client. The top_movers object feeds a "growth leaders / churn alerts" table in
the internal dashboard. appeared: true centers get a welcome email; disappeared: true
gets a retention ping.Reconciles with:
/v1/cost/invoice for absolute
numbers of either period individually, /v1/cost/centers for live current-month state,
and /v1/cost/burn-down for the forward-looking projection. Together they form the
complete monthly reporting pipeline.
Daily digest (on-demand spend summary)
GET/v1/cost/digest?date=2026-04-08Same payload as the iter 93 daily_digest webhook, returned as an API response for any date API KEY
On-demand variant of the iter 93 webhook digest. The
• Agencies that want to pull digests on their own schedule (e.g. business hours, not UTC midnight)
• Historical backfill — compute yesterday's or last-Monday's digest retroactively
• Dashboards that prefer pull over webhook delivery
• Debug / testing before enabling the auto-digest webhook
Date parameter:
Response shape:
Composition with /cost/overview: the overview (iter 92) shows MTD totals across the whole month. The digest shows a SINGLE DAY's breakdown. Agencies wanting "today vs yesterday" comparison call both and render side by side.
Use case — custom delivery cadence: agency wants the digest to arrive in their ops Slack at 9am local time (not UTC midnight). They run their own cron that calls
daily_digest flag on CostAlert (iter 93) fires a webhook once per day
automatically. This endpoint returns the SAME payload as an API response — useful for:• Agencies that want to pull digests on their own schedule (e.g. business hours, not UTC midnight)
• Historical backfill — compute yesterday's or last-Monday's digest retroactively
• Dashboards that prefer pull over webhook delivery
• Debug / testing before enabling the auto-digest webhook
Date parameter:
?date=YYYY-MM-DD. Defaults to yesterday UTC. Bounded by the
60-day billing event retention window (iter 44 RetentionSweeper) — older dates return
400 invalid_date. Future dates also rejected.Response shape:
{api_key_id, date, period: {since, until}, totals: {event_count, total_cost_usd},
by_type: {full_scrape_consume, full_scrape_overage, full_scrape_refund,
scrape_scheduled_run} each with {count, cost_usd}, top_cost_centers[] (up to 5),
top_creators[] (up to 5, iter 80 aggregator)}. Slightly richer than the webhook
payload — includes top_creators because the on-demand endpoint can afford the extra query.Composition with /cost/overview: the overview (iter 92) shows MTD totals across the whole month. The digest shows a SINGLE DAY's breakdown. Agencies wanting "today vs yesterday" comparison call both and render side by side.
Use case — custom delivery cadence: agency wants the digest to arrive in their ops Slack at 9am local time (not UTC midnight). They run their own cron that calls
GET /v1/cost/digest?date=<yesterday> at 9am local, formats the
response, and posts to Slack. The iter 93 webhook firing at 00-06 UTC is optional — some
agencies disable it and drive everything through this endpoint.
Overview dashboard (morning standup in one call)
GET/v1/cost/overviewTop-level dashboard — tier + MTD + cap + top centers + top creators + operational counters + savings + rate limit in one call API KEY
The single call for "how are we doing?". The cost
stack has 15+ specialized endpoints — this one stitches the most-used pieces into a unified
dashboard response. Perfect for the morning standup view that surfaces everything important
without navigating between endpoints.
Aggregates 9 sources:
• tier — id, name, subscription_fee_usd (from Billing.get_tier)
• mtd — spent_usd, event_count, included_scrapes, overage_scrapes, refunds {count, total_usd}
• budget_cap — configured flag, cap_usd, hard_enforce, spent_usd, headroom_usd, used_pct (or stub if no cap configured)
• top_cost_centers — top 3 by MTD spend via iter 52 aggregator
• top_creators — top 3 by MTD spend via iter 80 aggregator
• operational — quarantined_creator_count (iter 69), pending_deferred_batches (iter 78)
• rate_limit — tier_limits, current buckets, remaining (iter 50 source)
• savings_headline — cached_hits_30d, estimated_cached_savings_usd, refunds_mtd_usd
• drill_down — ready-to-use URLs for every specialized endpoint (invoice, burn-down, reliability, savings, activity, tier-recommendation)
Response is a projection, not a cache. Every sub-query runs live against current state — no stored aggregate, no invalidation concerns. Each source is indexed SQL so the full dashboard response typically returns in <200ms even with months of history.
Drill-down pattern: the overview gives you the headline numbers. Each metric has a corresponding specialized endpoint that digs deeper — the
Use case — operator dashboard home: agency's internal dashboard loads this endpoint on page load. The top banner shows "Pro tier · MTD $187.50 / $500 cap · 8 refunds recovered · 2 pending overnight batches". All the widgets below that feed off the same JSON — no need to call /invoice + /cost/centers + /creators + /quarantine separately on page load. One request, full situational awareness.
Aggregates 9 sources:
• tier — id, name, subscription_fee_usd (from Billing.get_tier)
• mtd — spent_usd, event_count, included_scrapes, overage_scrapes, refunds {count, total_usd}
• budget_cap — configured flag, cap_usd, hard_enforce, spent_usd, headroom_usd, used_pct (or stub if no cap configured)
• top_cost_centers — top 3 by MTD spend via iter 52 aggregator
• top_creators — top 3 by MTD spend via iter 80 aggregator
• operational — quarantined_creator_count (iter 69), pending_deferred_batches (iter 78)
• rate_limit — tier_limits, current buckets, remaining (iter 50 source)
• savings_headline — cached_hits_30d, estimated_cached_savings_usd, refunds_mtd_usd
• drill_down — ready-to-use URLs for every specialized endpoint (invoice, burn-down, reliability, savings, activity, tier-recommendation)
Response is a projection, not a cache. Every sub-query runs live against current state — no stored aggregate, no invalidation concerns. Each source is indexed SQL so the full dashboard response typically returns in <200ms even with months of history.
Drill-down pattern: the overview gives you the headline numbers. Each metric has a corresponding specialized endpoint that digs deeper — the
drill_down block in the response lists them with pre-filled URLs scoped to
the current window. Click-through from "top creators: @neymarjr $48" to
GET /v1/cost/creators?limit=20 gives you the full ranked list; from "pending
deferred batches: 3" to GET /v1/full-scrape/deferred gives you the list of
actual job IDs.Use case — operator dashboard home: agency's internal dashboard loads this endpoint on page load. The top banner shows "Pro tier · MTD $187.50 / $500 cap · 8 refunds recovered · 2 pending overnight batches". All the widgets below that feed off the same JSON — no need to call /invoice + /cost/centers + /creators + /quarantine separately on page load. One request, full situational awareness.
Forward projection (day-by-day chart data)
GET/v1/cost/projection?days=14Day-by-day projected spend over a forward horizon — combines daily burn rate + recurring schedule fires API KEY
Chart-ready forward projection (iter 110).
Complement to
Projection sources:
1. Daily burn rate baseline — computed as
2. Recurring schedule fires — walks each active schedule's
Per-day row shape:
Cap breach detection:
Methodology transparency: the response includes a
Complements: 3 forecasting endpoints now:
•
•
•
Use case — CFO dashboard: operations team renders a 14-day stacked bar chart with
/v1/cost/burn-down (iter 62) which gives a single exhaustion
date. This endpoint returns per-day rows for the next N days (default 14, max 90),
ready to drop into a chart library or spreadsheet.Projection sources:
1. Daily burn rate baseline — computed as
mtd_spent / day_of_month. Applied to every future day as the constant
"ad-hoc traffic" baseline.2. Recurring schedule fires — walks each active schedule's
next_run_at forward by interval_seconds and counts how many
times it would fire on each target day. Per-fire cost is the marginal cost (not overage
price).Per-day row shape:
{date, estimated_new_spend_usd, schedule_fires, schedule_cost_usd,
daily_burn_baseline_usd, cumulative_projected_usd, over_daily_cap?, over_monthly_cap?}.
The cumulative field is MTD spent + projected new spend through that day — useful for
"when will I break $500?" questions.Cap breach detection:
over_monthly_cap? and over_daily_cap? flags are computed
against the currently-configured budget cap (iter 50 + iter 106). If any day in the
projection flips to true, the top-level
totals.projected_cap_breach_date field surfaces the first breach date.
Zero projections → null (safely under cap).Methodology transparency: the response includes a
methodology block explaining the linear model and its limitations:
doesn't extrapolate trend acceleration, doesn't account for cache hit ratio,
schedule fires counted at marginal cost not overage price. Users can cross-reference
this with actuals after the window closes to gauge drift.Complements: 3 forecasting endpoints now:
•
/v1/cost/forecast (legacy) — credit quota projection•
/v1/cost/burn-down (iter 62) — single exhaustion date across global + per-center caps•
/v1/cost/projection (iter 110) — day-by-day chart data over a forward windowUse case — CFO dashboard: operations team renders a 14-day stacked bar chart with
daily_burn_baseline_usd as the base, schedule
contribution as a second segment, and a horizontal line at the daily cap. Days where
over_daily_cap? flips to true get a red highlight. Clicking a date drills
down to that day's schedules via /v1/full-scrape/schedules.
Burn-down forecast (when will I hit my cap?)
GET/v1/cost/burn-downProjects current burn rate against budget cap + per-cost-center sub-caps, with actionable daily spend recommendation API KEY
The "when will I hit my cap?" question, answered. Distinct
from
Per-cap forecast fields:
•
•
•
•
•
•
•
•
•
Per-cost-center rollup: each sub-cap configured via
Headlines: human-readable strings suitable for Slack alerts or dashboard banners. Examples:
• "At current burn rate, you'll exceed your global cap of $500 on 2026-04-22. Reduce daily spend to $12.50 to stay under cap."
• "Cost center 'client-acme' projected to exceed its sub-cap on 2026-04-18."
• "On track — projected spend stays under all configured caps this month." (default when nothing is alarming)
Methodology: linear extrapolation from month-to-date average. Documented in the
Use case: a CFO dashboard runs
/v1/cost/forecast which projects credit-quota exhaustion (tier-based); this
endpoint projects dollar burn against your configured budget cap (iter 50 + 59). Single call,
no parameters.Per-cap forecast fields:
•
cap_usd — the ceiling (global or per-center)•
current_spent_usd — MTD spend against this cap•
daily_avg_usd — mtd_spent / day_of_month•
projected_eom_spent_usd — daily_avg × days_in_month (linear)•
headroom_usd — cap − current•
days_until_exhausted — headroom / daily_avg, nil if burn rate is zero•
projected_exhaustion_date — absolute date calendar form•
recommended_daily_spend_usd — headroom / days_remaining, the burn rate that lands
exactly at the cap on the last day of the month•
over_cap? / projected_over_cap? — binary flags for UI badgesPer-cost-center rollup: each sub-cap configured via
per_center_caps gets its own forecast, sorted by days_until_exhausted
ascending (most-at-risk first). Frozen centers (iter 61) are included but marked
frozen: true. The top-level highest_risk_center field pulls out the
non-frozen center projected to exhaust soonest — the one that needs attention.Headlines: human-readable strings suitable for Slack alerts or dashboard banners. Examples:
• "At current burn rate, you'll exceed your global cap of $500 on 2026-04-22. Reduce daily spend to $12.50 to stay under cap."
• "Cost center 'client-acme' projected to exceed its sub-cap on 2026-04-18."
• "On track — projected spend stays under all configured caps this month." (default when nothing is alarming)
Methodology: linear extrapolation from month-to-date average. Documented in the
methodology section of the response so consumers know
what they're rendering. Front-loaded or seasonal spend patterns can mislead a linear projection —
the forecast improves with more days of data.Use case: a CFO dashboard runs
/v1/cost/burn-down
every morning. When global.projected_over_cap? flips to true, the dashboard Slacks
the team with the headline. Agencies using per-center sub-caps get early warning on a specific
client before that client's sub-cap gets hit.
Savings dashboard (ROI of the cost stack)
GET/v1/cost/savings?year=2026&month=4Quantifies dollars saved by cached hits, refunds, and subscription coverage vs. paying live overage for everything API KEY
What did the cost stack actually save you? This endpoint turns
every layer built in iters 50-57 into a single dollar number. Three savings sources:
1. Cached scrape hits (iter 57). Every
2. Refunds (iters 54/55). Every
3. Subscription coverage (always-on). Every
Headline number:
Drill-down links: the response includes machine-readable pointers back to
Window caveat: refund + coverage savings are month-scoped (via
1. Cached scrape hits (iter 57). Every
?max_age_seconds=N cache hit avoids one live scrape. Counted in a 30-day Hammer sliding
bucket via Credits.record_cache_hit/2 (incremented in both the single-scrape fast-path
and batch pre-scan). Savings = count × full_scrape_overage_price_usd.
marginal_cost_saved_usd separately reports our internal savings (Apify + DB write).2. Refunds (iters 54/55). Every
:full_scrape_refund billing event is a dollar you didn't pay for a scrape that failed.
Sum of absolute refund amounts.3. Subscription coverage (always-on). Every
:full_scrape_consume event (included-tier scrape) would have cost
overage_price if you had no included quota. This is the hidden value of the subscription
itself. Surfaced as subscription_coverage.usd_saved_vs_all_overage.Headline number:
hypothetical_vs_actual.dollars_saved_usd =
cached savings + refund savings + coverage savings.
savings_ratio_pct = savings / hypothetical gross cost (what you'd have paid without
the cost stack). An agency caching aggressively on Pro tier typically sees
60–85% savings ratio.Drill-down links: the response includes machine-readable pointers back to
/v1/cost/invoice, /v1/cost/centers, receipt endpoints,
and /v1/quota so auditors can trace any line from headline to source event.Window caveat: refund + coverage savings are month-scoped (via
BillingEvents.monthly_invoice_breakdown). Cached hits live in a 30-day sliding
window and are labelled accordingly — historical months before the sliding window's start may
under-report cache savings. This is intentional: we don't log cache hits as billing events (they're
zero-cost by design), so no DB persistence.
Monthly invoice (accounting statement)
GET/v1/cost/invoice?year=2026&month=4Canonical monthly statement — subscription + usage + overage + refunds + cost centers, one call API KEY
The capstone of the cost stack. Single call that pulls from
every layer built in iters 50-55: subscription tier fee, billing events (included / overage /
refunded / scheduled_run), cost center attribution, and the live tier metadata. Defaults to the
current UTC month; pass
Response shape:
•
•
•
•
•
•
•
•
total_due_usd math:
Use case: agency fires 300 scrapes across 5 clients in April, tagged with
?year=2026&month=3 to pull historical periods.Response shape:
•
invoice_id — inv_<api_key_id>_YYYYMM, stable across calls•
period — {year, month, start, end, is_current_month}•
subscription — {tier, tier_name, fee_usd, billing_cycle}•
usage.full_scrape — {included_count, overage_count, refunded_count, scheduled_run_count, net_full_scrapes}•
cost_breakdown — {subscription_usd, overage_usd, refunds_usd, gross_usage_usd, net_usage_usd, total_due_usd, currency}•
cost_centers[] — per-center aggregate with share_pct (scoped to the invoice period window)•
reconciliation.how_to_drill_down — links to receipt, events, quota, and centers endpoints for audit•
format — {version: "1", generated_at}total_due_usd math:
subscription_usd + overage_usd + refunds_usd (refunds carry negative sign so they
subtract naturally). Included-tier scrapes log billing events at our internal marginal cost for
accounting reconciliation but contribute $0 to the customer-facing bill — they're
covered by the subscription fee.Use case: agency fires 300 scrapes across 5 clients in April, tagged with
cost_center=client-*. At month-end, one call to
GET /v1/cost/invoice?year=2026&month=4 gives everything the accountant needs —
fee ($99 Pro tier), 42 overage scrapes ($21.00), 3 failed refunds (−$1.50), net $118.50,
plus per-client breakdown. Drop it into the accounting system or render as PDF client-side.
Per-creator cost breakdown (cost drivers)
GET/v1/cost/creators?since=2026-04-01T00:00:00Z&limit=20&cost_center=client-acmeTop-N creators by total spend — per-creator breakdown aggregated from billing events API KEY
"Who are my biggest cost drivers?" Aggregates billing
events by
Response shape: per-creator
Filters:
•
•
•
Totals block: creator_count (distinct usernames), total_scrapes (sum across all creators), total_cost_usd. When filtered by cost_center, total_cost_usd should match the corresponding entry in
Use case — invoice drill-down: agency generates a monthly invoice for client-acme with
metadata->>'username' — one SQL GROUP BY query returns per-creator
counts + totals + first/last scrape timestamps, sorted by cost descending. Defaults to
month-to-date, max 20 creators per call (overridable up to 200).Response shape: per-creator
{username, total_scrapes, included_count, overage_count, refunded_count,
total_cost_usd, share_pct, first_scrape_at, last_scrape_at}. Share_pct is each
creator's fraction of the total window cost — a quick read on concentration. Refunded events
carry negative cost_usd so they net out of total_cost_usd naturally (a creator with 3 scrapes
all refunded shows up with cost $0.00, not $1.50).Filters:
•
?since= / ?until= — ISO8601 UTC bounds (defaults: start of current
UTC month → now)•
?limit= — top-N (default 20, max 200)•
?cost_center= — scope to a single client. Lets you answer "of client-acme's
$45 spend this month, which creators were it on?"Totals block: creator_count (distinct usernames), total_scrapes (sum across all creators), total_cost_usd. When filtered by cost_center, total_cost_usd should match the corresponding entry in
/v1/cost/centers for the
same window — the two endpoints cross-check.Use case — invoice drill-down: agency generates a monthly invoice for client-acme with
/v1/cost/invoice. The client asks "what did
you actually scrape for us?". One call:
GET /v1/cost/creators?since=2026-04-01T00:00:00Z&cost_center=client-acme
returns a ready-to-paste list of every creator they were billed for, with per-creator scrape
counts and costs. Directly maps to the invoice line items without digging through the raw
event log.
Cost center attribution (agency billing)
GET/v1/cost/centersMonth-to-date cost aggregated by cost_center tag — per-client billing reconciliation for agencies API KEY
Tag every scrape with an attribution code, then reconcile per client.
Pass
/v1/cost/centers response: groups MTD events by
Use case: agency scrapes 50 creators for 3 different clients across a month. Each scrape tagged
Example:
cost_center=<tag> as a body/query param on
POST /v1/full-scrape/:username or POST /v1/full-scrape/batch, OR set the
X-Cost-Center: <tag> request header. Tag is validated (max 64 chars,
[A-Za-z0-9_.-]); invalid tags are silently dropped (attribution is opt-in — never rejects
the request). Stored in the billing event's metadata.cost_center JSONB field./v1/cost/centers response: groups MTD events by
metadata.cost_center via indexed JSONB aggregate, bucketing untagged events under
unattributed. Returns per-center
{event_count, total_cost_usd, share_pct, first_event_at, last_event_at} sorted by cost
descending, plus top-level totals. Optional ?since=...&until=... overrides the
default month-to-date window.Use case: agency scrapes 50 creators for 3 different clients across a month. Each scrape tagged
cost_center=client-acme /
cost_center=client-beta / cost_center=client-gamma. At month-end, one call
to /v1/cost/centers gives a ready-to-invoice per-client cost breakdown. Combine with
/v1/full-scrape/jobs/:id/receipt for line-item reconciliation.Example:
curl -X POST "https://.../v1/full-scrape/batch" \
-H "X-API-Key: ak_xxx" \
-H "X-Cost-Center: client-acme" \
-d '{"usernames": ["neymarjr", "lewis.hamilton"]}'
Billing event log (audit trail)
GET/v1/cost/events?event_type=full_scrape_consume&label=q2-launch&cost_center=client-acme&username=neymarjr&limit=50&since=2026-04-01T00:00:00ZPaginated log of every credit consumption — filter by event_type, label, cost_center, username, since API KEY
Scrape labels — free-form tags orthogonal to cost_center (iter 66).
Every scrape / batch POST now accepts an optional
Labels vs cost_center:
•
•
Example: a scrape can carry
Query filters on /v1/cost/events:
•
•
•
• All filters can be combined with
Username filter cross-references: use
Use case — campaign ROI: agency runs a 3-month Q2 campaign across 5 clients. Every scrape tagged with
labels body param: a string array
(or comma-separated string). Each label: max 40 chars, [A-Za-z0-9_.-], up to 10 per
call. Invalid labels silently dropped. Stored in metadata.labels JSONB array of the
resulting billing event.Labels vs cost_center:
•
cost_center (iter 52) — single string, attribution. Enforceable via sub-caps
(iter 59), freezable (iter 61), aggregated in /v1/cost/centers. One per scrape.•
labels (iter 66) — array of strings, cross-cutting tags. Query-only, no
enforcement. Multiple per scrape. Use for campaigns, purposes, team names, feature flags.Example: a scrape can carry
cost_center: "client-acme" AND
labels: ["q2-launch", "urgent", "neymar-campaign"] simultaneously. Billing for
client-acme comes from cost_center; cross-cutting analysis ("all q2-launch spend across clients")
comes from labels.Query filters on /v1/cost/events:
•
?label=q2-launch — matches events via JSONB ? operator on
metadata->'labels'. Indexable if you add a GIN index on
(metadata->'labels').•
?cost_center=client-acme — matches events via
metadata->>'cost_center' = $1. Faster than JSONB array ops since it's a scalar.•
?username=neymarjr (iter 82) — matches events via
metadata->>'username' = $1. Normalized lowercase + trimmed so
?username=NeymarJR and ?username=neymarjr return the same rows.
Composes with the other filters — e.g.
?username=neymarjr&event_type=full_scrape_refund&since=2026-04-01T00:00:00Z
returns every refund on @neymarjr in April.• All filters can be combined with
?event_type= and ?since= for tight
drill-downs. The CSV export endpoint (/v1/cost/events/export) honors all the same
filters.Username filter cross-references: use
/v1/cost/creators (iter 80) for the aggregated view of spend per creator, and
/v1/cost/events?username=<x> (iter 82) for the event-level timeline of one
specific creator. The two endpoints give you top-down (who are my biggest spenders) and
bottom-up (every individual scrape for @x) lenses on the same data.Use case — campaign ROI: agency runs a 3-month Q2 campaign across 5 clients. Every scrape tagged with
labels: ["q2-launch"] plus the
client-specific cost_center. At end of campaign:
GET /v1/cost/events?label=q2-launch&since=2026-04-01T00:00:00Z returns every
event touched by the campaign regardless of which client paid for it — perfect for "what did the
Q2 launch cost us in total" reports.
GET/v1/cost/events/summaryAggregate: totals, per-type counts, per-day breakdown — ready for billing charts API KEY
GET/v1/cost/events/export?format=csv&since=...&until=...&label=...&cost_center=...Stream billing events as CSV or JSON — same filters as /events, chunked response for accounting exports API KEY
Memory-efficient accounting dump. Uses
Filters (iter 67): all the familiar knobs from
CSV columns (iter 67 extended):
JSON format: alternative
Headers: CSV responses come with
Use case — monthly reconciliation:
Ecto.Repo.stream/2 (pages 500 rows at a time) inside a Repo.transaction,
piped into a Phoenix chunked HTTP response. Millions of rows without blowing RAM. Same infrastructure
as iter 46's snapshot export.Filters (iter 67): all the familiar knobs from
/v1/cost/events plus streaming scope —
?since= / ?until= (DateTime bounds),
?event_type= (full_scrape_consume / overage / refund / scheduled_run),
?label= (JSONB array match on metadata->'labels'), and
?cost_center= (scalar string match on metadata->>'cost_center').
All filters are cumulative — ?event_type=full_scrape_overage&cost_center=client-acme&label=q2-launch
gives you a pre-sliced export ready to drop into an invoice line.CSV columns (iter 67 extended):
id, occurred_at, event_type, mode,
cost_usd, resource_id, metadata_username,
metadata_tier, metadata_cost_center, metadata_labels
(pipe-joined, e.g. q2-launch|urgent). RFC 4180 escaped. One row per event.JSON format: alternative
?format=json emits a
JSON array — each element is the full event object including nested metadata. Use
for programmatic consumption where CSV flattening would lose the labels array shape.Headers: CSV responses come with
content-disposition: attachment; filename="billing_events_YYYY-MM-DD.csv" so a browser
download starts automatically.Use case — monthly reconciliation:
GET /v1/cost/events/export?format=csv&since=2026-04-01&until=2026-05-01 gives
the accountant a full month's events ready to drop into QuickBooks. Per-client filtered dump:
?cost_center=client-acme. Per-campaign:
?label=q2-launch. Reconciles exactly with
/v1/cost/invoice?year=2026&month=4 totals since both read the same event log.
Spend forecasting (derived from event log)
GET/v1/cost/forecastMTD cost, projected EOM, days until quota exhausted, anomaly signal, threshold crossings API KEY
GET/v1/cost/forecast/daily?days=30&include_forecast=trueChart-ready daily cost breakdown with forecast trailing line for rest of month API KEY
Proactive credit alerts (reactive webhooks)
POST/v1/cost/alertsRegister an alert webhook that fires when you cross a credit threshold API KEY
GET/v1/cost/alertsRead current alert config + last_fired state per threshold API KEY
DELETE/v1/cost/alertsRemove the alert config (stops future webhooks) API KEY
Reactive alerts — no polling needed. Register a webhook URL and thresholds,
and BuyCrowds fires an HMAC-signed POST the moment your credit usage crosses any of them.
Body:
Dedup: each threshold fires at most once per monthly cycle.
Payload shape:
Architecture: subscribes to the internal
Budget cap threshold webhooks (iter 63): the SAME config also fires webhooks when your dollar budget cap crosses thresholds — distinct signal from credit-quota crossings. Credit thresholds answer "am I running out of tier allowance?"; budget thresholds answer "am I running out of the dollar cap I set?". Both fire independently using the same threshold list (
Budget payload shape:
Required config: budget threshold alerts only fire when BOTH a CostAlert config (this endpoint) AND a BudgetCap config (
Cycle reset: both
Cooldown: the
Daily spend digest (iter 93): opt-in via
Digest payload:
Scheduling: the CostAlerts GenServer ticks every 6 hours checking which configs need a digest. A config gets its digest sent the FIRST tick on a new UTC day (e.g. if the last tick was 23:00 UTC and the next is 05:00 UTC, that's when the digest fires for that day). Dedup via
Latency note: because of the 6h tick granularity, digests typically arrive between 00:00 and 06:00 UTC (up to 6 hours after midnight). Predictable enough for daily operations but not suitable for time-critical alerts — use threshold crossings (event-driven) for that.
Use case — morning cost report: agency's ops team starts the day checking Slack. CostAlert has
Body:
{"webhook_url": "https://...", "thresholds": [0.5, 0.8, 0.95, 1.0], "cooldown_seconds": 300}
— thresholds default to [0.5, 0.8, 0.95, 1.0] if omitted. Max 10 thresholds, values in (0, 1].
Cooldown 0-86400s, default 300 (prevents spam when multiple thresholds cross in quick succession).Dedup: each threshold fires at most once per monthly cycle.
last_fired resets automatically when the UTC month rolls over.Payload shape:
event: "credit_threshold_crossed", threshold,
usage (credits used/quota/pct), action_hint, occurred_at.
Signed with X-BuyCrowds-Signature: sha256=<hmac-sha256(api_key.key, body)>.Architecture: subscribes to the internal
billing:events PubSub topic.
Every credit consume triggers a check; delivery happens under a supervised Task, so slow webhooks never block consumption.
No periodic polling — purely event-driven.Budget cap threshold webhooks (iter 63): the SAME config also fires webhooks when your dollar budget cap crosses thresholds — distinct signal from credit-quota crossings. Credit thresholds answer "am I running out of tier allowance?"; budget thresholds answer "am I running out of the dollar cap I set?". Both fire independently using the same threshold list (
thresholds: [0.5, 0.8, 0.95, 1.0]) but track dedup state in
parallel budget_last_fired so crossing the 80% credit alert doesn't consume the 80%
budget alert.Budget payload shape:
event: "budget_cap_threshold_crossed",
threshold: {level, label},
budget: {monthly_usd_cap, spent_usd, remaining_usd, ratio, pct_used},
action_hint (contextual: "approaching budget cap — consider freezing cost centers"
at 80%, "budget cap exhausted — all new spend blocked" at 100%),
occurred_at. Same HMAC signature header.Required config: budget threshold alerts only fire when BOTH a CostAlert config (this endpoint) AND a BudgetCap config (
PUT /v1/full-scrape/budget-cap) exist. No budget cap = no budget alerts.
Missing CostAlert = no webhook destination, so no alerts of either kind.Cycle reset: both
last_fired and
budget_last_fired reset together at the start of each UTC month.Cooldown: the
cooldown_seconds window applies to
both alert types globally — firing a credit alert cools down budget alerts too (and vice versa)
to prevent spam when multiple thresholds trip in quick succession.Daily spend digest (iter 93): opt-in via
daily_digest: true on the cost alert config. Fires a scheduled webhook once
per day (UTC) with yesterday's spend summary. Distinct signal from the event-driven
threshold alerts — digests always fire, regardless of whether any threshold was crossed.
Useful for agencies that want a predictable morning report regardless of spend patterns.Digest payload:
event: "daily_spend_digest", api_key_id,
date (yesterday ISO8601 date), period: {since, until} (UTC day
bounds), totals: {event_count, total_cost_usd},
by_type: {full_scrape_consume, full_scrape_overage, full_scrape_refund, scrape_scheduled_run}
each with {count, cost_usd}, and top_cost_centers[] (up to 5,
via iter 52 aggregator). Same HMAC signature header as the other alert types.Scheduling: the CostAlerts GenServer ticks every 6 hours checking which configs need a digest. A config gets its digest sent the FIRST tick on a new UTC day (e.g. if the last tick was 23:00 UTC and the next is 05:00 UTC, that's when the digest fires for that day). Dedup via
last_digest_sent_at field — a digest only fires when the stored date
differs from today, so crash-recovery never double-fires.Latency note: because of the 6h tick granularity, digests typically arrive between 00:00 and 06:00 UTC (up to 6 hours after midnight). Predictable enough for daily operations but not suitable for time-critical alerts — use threshold crossings (event-driven) for that.
Use case — morning cost report: agency's ops team starts the day checking Slack. CostAlert has
daily_digest: true + webhook
pointing to a Slack channel. Each morning before 6 AM UTC (2 AM EST, 7 AM CET), the
digest arrives with yesterday's totals + top 5 cost centers. Team opens Slack, sees the
headline "yesterday: 42 scrapes, $8.50, top: client-acme $5.00, client-beta $2.50", and
decides whether to dig deeper via /v1/cost/overview drill-down.
Pure compute, zero new state. Reads the billing event log and projects.
Linear extrapolation from month-to-date daily average — intentionally simple and explainable.
Use case: render a "you're on track for $X this month" widget, alert users before they blow their quota, detect cost anomalies from runaway schedules.
/forecast returns:
current (MTD cost, credits used, quota %),
projection (daily avg, projected EOM, days until exhausted, will-exhaust-before-EOM flag),
thresholds (which of 50/80/95/100% have been crossed this cycle),
anomaly (today vs rolling 7d average, flagged if today ≥ 2× average AND ≥ 3 events)./forecast/daily returns: array of day entries with
is_forecast flag. Historical days show real event counts and cost; forecast days extrapolate
from the historical average. Ideal for a stacked area chart — different color for historical vs forecast.Use case: render a "you're on track for $X this month" widget, alert users before they blow their quota, detect cost anomalies from runaway schedules.
Append-only audit log. Every credit consumption (full scrape single, batch, scheduled run)
writes an event with
Filters on
Summary shape:
Use case: monthly reconciliation, dispute resolution, showing users a "credits used this month" graph, integrating with their own accounting systems.
event_type, cost_usd, mode (included/overage), and metadata
(username, tier, schedule_id…). Events live 30 days in-memory (ETS).Filters on
/events: event_type (full_scrape_consume,
full_scrape_overage, scrape_scheduled_run), since (ISO 8601), limit (1–200).Summary shape:
total_events, total_cost_usd,
by_type, by_mode, by_day — everything a dashboard needs in one call.Use case: monthly reconciliation, dispute resolution, showing users a "credits used this month" graph, integrating with their own accounting systems.
Full Scrape On-Demand
53 endpoints
The nuclear option. One call hits every platform fresh (bypasses cache),
runs every deep fetcher in parallel, and returns a unified payload with everything we can extract.
Billed as credits against a monthly quota — overage charged per-request.
When to use it: one-shot enrichment for a new creator, real-time competitor checks,
press-kit snapshots that need to be timestamped. Every other cheap/cached endpoint should be
preferred when freshness isn't critical.
Credit quotas (per month):
Free: 3 · Starter: 50 · Pro: 300 · Business: 2000 · Enterprise: unlimited
Overage price: $0.50 per scrape · Our marginal cost: ~$0.42 per scrape (Apify + DB + bandwidth)
Free: 3 · Starter: 50 · Pro: 300 · Business: 2000 · Enterprise: unlimited
Overage price: $0.50 per scrape · Our marginal cost: ~$0.42 per scrape (Apify + DB + bandwidth)
Preview (public — no credit consumed)
GET/v1/public/full-scrape/preview/:username?tier=pro&used=42What would be scraped, quota status, overage flag PUBLIC
Quota check (authed, cheap — no scrape)
GET/v1/full-scrape/quotaCredit state only — used/quota/remaining + status (healthy/low/exhausted/overage_only) API KEY
Run single scrape (authed — consumes credit)
POST/v1/full-scrape/:usernameRun the full scrape — deducts one credit, returns unified payload + per-platform cost breakdown API KEY
Optional body params:
402 Payment Required if quota exhausted and tier has no overage allowance (Free tier is hard-blocked).
platforms (comma-list to restrict scope),
timeout_ms (5000–120000), byok (map of platform → credentials),
async (true → returns 202 + job id instead of blocking).402 Payment Required if quota exhausted and tier has no overage allowance (Free tier is hard-blocked).
Historical job archive (iter 71)
Receipts and replays now work for jobs older than 1 hour.
Until iter 71, completed/failed jobs lived in ETS with a 1-hour TTL — once swept, the
How the archive works. When a job transitions to a terminal state (
Transparent fallback.
Transition-triggered archive. Archiving happens ONCE per job, on the first patch that flips status from non-terminal to terminal. Subsequent patches (refund stamping, webhook delivery) don't trigger a re-archive — the
Retention.
What's persisted. Everything receipt/replay need:
Safety. All archive writes wrapped in try/rescue and spawned under
Use case: agency processes a 20-creator batch for client-acme on Monday. Tuesday morning, the accountant wants the receipt — pre-iter 71 this was gone. Now
/jobs/:id, /jobs/:id/receipt, and /jobs/:id/replay
endpoints all returned 404. An agency reviewing last week's batch had to reconstruct it from
billing events manually.How the archive works. When a job transitions to a terminal state (
:complete, :failed, :cancelled), the
Jobs GenServer spawns a fire-and-forget Task that mirrors the job record to a new
job_archives Postgres table. The ETS path stays unchanged — it's still the hot
store, still 1-hour TTL, still microsecond lookups. The archive is a warm fallback.Transparent fallback.
Jobs.get_for/2 now checks ETS first; on miss, falls through to
JobArchives.get_for/2. The controller calls the same function and never needs to
know whether the job came from hot or warm storage. Both return the same map shape, plus the
archived copy carries an extra archived: true flag so auditors can distinguish.Transition-triggered archive. Archiving happens ONCE per job, on the first patch that flips status from non-terminal to terminal. Subsequent patches (refund stamping, webhook delivery) don't trigger a re-archive — the
terminal_transition?/2 guard inside the GenServer cast detects the flip. This
avoids hammering the DB when a job has multiple mutations in quick succession.Retention.
RetentionSweeper (iter 44) gained a third sweep target: it now also deletes
job_archives rows whose finished_at is older than 30 days. Hourly tick. Zero new
infrastructure — rides on the same GenServer that already sweeps snapshots + billing_events.
Default retention can be tuned via JobArchives.archive_retention_days/0.What's persisted. Everything receipt/replay need:
id, api_key_id, status, batch, username, batch_usernames, result, error, cost_center,
replay_of, refunded_at, refund_amount_usd, refund_reason, webhook_url, plus the
lifecycle timestamps. In-flight fields like pid, progress, and
expires_at are dropped — they're meaningless for a finished job.Safety. All archive writes wrapped in try/rescue and spawned under
Task.Supervisor, so a DB hiccup never crashes the Jobs GenServer
and never blocks the caller. Lost archive writes are acceptable — ETS is still the source of
truth for the 1-hour window.Use case: agency processes a 20-creator batch for client-acme on Monday. Tuesday morning, the accountant wants the receipt — pre-iter 71 this was gone. Now
GET /v1/full-scrape/jobs/fs_abc123/receipt pulls from the archive seamlessly.
Similarly,
POST /v1/full-scrape/jobs/fs_abc123/replay can re-run last week's batch without
re-specifying usernames.
Async job polling (authed)
GET/v1/full-scrape/jobs/:idPoll an async scrape job — status, elapsed, result when complete, webhook delivery outcome API KEY
GET/v1/full-scrape/jobs/:id/receiptItemized post-hoc receipt — links job to billing events, cost breakdown, margin, formatted for accounting exports API KEY
POST/v1/full-scrape/jobs/:id/refundManual refund for a :failed single-scrape job — idempotent, issues credit + negative billing event API KEY
POST/v1/full-scrape/jobs/:id/scheduleCreate recurring Schedules entries from a historical job — one-shot → recurring conversion API KEY
POST/v1/full-scrape/jobs/:id/noteAppend a free-form operational note to an archived job API KEY
Scrape job notes (iter 94). Attach operational
annotations to archived jobs post-hoc. Use cases: linking to external ticket IDs, documenting
why a batch had issues, leaving context for future audits, noting transfers/refunds done
manually.
Body:
•
•
Storage shape: notes accumulate as a JSONB array on the
Visibility: notes appear in the
Scope limitation: only archived jobs can be annotated. Active ETS jobs (in the 1-hour hot store) return 404 — wait for the job to transition to terminal state (iter 71 archives on first terminal transition). This keeps the hot path free of additional mutation concerns.
Use case — manual correction audit trail: agency realizes last week's batch was tagged with the wrong
Body:
POST /v1/full-scrape/jobs/fs_abc123/note {"note": "Wrong cost_center, moved to client-beta via /transfer-center", "label": "fix-note"}•
note — required, 1-1000 chars, free-form text•
label — optional, max 40 chars, for UI categorization
(e.g. "fix-note", "jira-BC-1234", "client-request")Storage shape: notes accumulate as a JSONB array on the
job_archives row. Each entry:
{at: "2026-04-09T14:32:00Z", note: "...", label: "..." | null}. New notes are
appended to the end of the array — history is preserved, no deletion.Visibility: notes appear in the
notes
field of the GET /v1/full-scrape/jobs/:id response once the job is archived
(terminal state + within 30-day retention). They're also included in
/v1/full-scrape/jobs/history listings.Scope limitation: only archived jobs can be annotated. Active ETS jobs (in the 1-hour hot store) return 404 — wait for the job to transition to terminal state (iter 71 archives on first terminal transition). This keeps the hot path free of additional mutation concerns.
Use case — manual correction audit trail: agency realizes last week's batch was tagged with the wrong
cost_center. They fix it
via POST /v1/full-scrape/budget-cap/rename-center, then append a note to the
original job:
POST /v1/full-scrape/jobs/fs_xxx/note {"note": "Migrated cost_center from client-acme-old to client-acme on 2026-04-09 via rename-center. See billing events for the rewrite window.", "label": "attribution-fix"}.
When someone audits the archive a month later, the note explains the historical drift.
POST/v1/full-scrape/jobs/:id/replayRe-run a historical job with the same usernames + cost_center — fresh job_id, credits charged normally API KEY
POST/v1/full-scrape/jobs/replay-bulkReplay up to 20 jobs in a single call — per-item error isolation, composes with 30-day archive API KEY
POST/v1/full-scrape/jobs/:id/resumeRe-run ONLY the failed/unprocessed items from a batch — preserves successes, charges credits only for the remainder API KEY
Resume vs replay (iter 90). Replay (iter 60) re-runs
EVERYTHING from a historical job — full credit charge again. Resume runs only the items that
didn't succeed the first time. Use replay when you need fresh data across the whole batch;
use resume when a batch was partially successful and you just want to finish the job.
Classification logic: the controller loads the original batch job (via iter 71 archive fallback, so 30 days of history), then walks
• succeeded — present in results with
• failed — present in results with
• unprocessed — in
Response shape:
Nothing-to-resume case: if every username in the original batch succeeded (clean 100% run), resume returns
Single-scrape jobs: not supported. Single jobs are all-or-nothing — either the whole thing worked or the whole thing got refunded. Resume only makes sense for batches. Returns
Use case — mid-batch cancellation: ops team spotted an auth issue during a 20-creator batch and cancelled it at 12/20 completed. Five days later the auth is fixed. Instead of re-running all 20 (wasting credits on the 12 that succeeded), they call
Use case — flaky upstream recovery: a batch finished with 15/20 successful and 5 refunded (per iter 55 auto-refund). Resume re-runs just the 5 failed ones with a chance at a better outcome — the retry layer (iter 64/75) absorbs transient retries, the quarantine system (iter 69) flags persistent failures, and the agency saves credits compared to a full replay.
One-shot → recurring (iter 100):
Batch fan-out: schedules are per-username (one Schedules row per creator), so scheduling a batch job creates N schedules — one per
Response shape:
Archive-aware: works for jobs in the ETS hot store (last 1h) OR the DB archive (last 30 days) via the iter 71 transparent fallback. So an agency that ran a Monday batch can come back on Friday and schedule the same creators for weekly refreshes with one call.
Use case — baseline → recurring: agency runs an initial 20-creator batch for a new client to establish baseline metrics. Once validated, they convert it to weekly refresh via
Classification logic: the controller loads the original batch job (via iter 71 archive fallback, so 30 days of history), then walks
batch_usernames against result.results to classify each entry:• succeeded — present in results with
error == nil. Skipped
during resume, preserved from the original job.• failed — present in results with
error != nil. Included in
resume (these are the retries).• unprocessed — in
batch_usernames but NOT in
result.results. This happens when a batch was cancelled mid-loop, crashed
before completing, or the Task got killed. Included in resume.Response shape:
{resumed: true, original_job_id, new_job_id, batch_size, succeeded_count, failed_count,
unprocessed_count, resumed_count, succeeded_usernames, resumed_usernames}. The
succeeded_usernames list tells you which items were preserved — no credits
charged, data already in place from the original job. resumed_usernames is what
the new job is actually scraping.Nothing-to-resume case: if every username in the original batch succeeded (clean 100% run), resume returns
422 nothing_to_resume with a hint pointing to /replay for a full re-run.Single-scrape jobs: not supported. Single jobs are all-or-nothing — either the whole thing worked or the whole thing got refunded. Resume only makes sense for batches. Returns
422 resume_only_supports_batch.Use case — mid-batch cancellation: ops team spotted an auth issue during a 20-creator batch and cancelled it at 12/20 completed. Five days later the auth is fixed. Instead of re-running all 20 (wasting credits on the 12 that succeeded), they call
POST /v1/full-scrape/jobs/fs_abc123/resume. New job picks up the 8
remaining creators with fresh data; the 12 successful ones remain intact in the original
archive. Credits consumed: 8 instead of 20.Use case — flaky upstream recovery: a batch finished with 15/20 successful and 5 refunded (per iter 55 auto-refund). Resume re-runs just the 5 failed ones with a chance at a better outcome — the retry layer (iter 64/75) absorbs transient retries, the quarantine system (iter 69) flags persistent failures, and the agency saves credits compared to a full replay.
One-shot → recurring (iter 100):
POST /v1/full-scrape/jobs/:id/schedule converts a historical job into
recurring Schedules entries without retyping usernames or params. Body:
{"interval_seconds": 86400, "platforms": "instagram,tiktok", "webhook_url": "https://...", "template": "..."}.Batch fan-out: schedules are per-username (one Schedules row per creator), so scheduling a batch job creates N schedules — one per
batch_usernames entry. Per-username success/failure surfaces in the response
schedules[] array so the caller sees which creators got scheduled and which
hit quota/duplicate errors.Response shape:
{original_job_id, interval_seconds, usernames_count, created_count, failed_count,
schedules: [{username, status, schedule_id, next_run_at}, {username, status: "failed",
error}, ...]}. Successful schedules remain committed even if others fail —
individual validation errors don't roll back the batch.Archive-aware: works for jobs in the ETS hot store (last 1h) OR the DB archive (last 30 days) via the iter 71 transparent fallback. So an agency that ran a Monday batch can come back on Friday and schedule the same creators for weekly refreshes with one call.
Use case — baseline → recurring: agency runs an initial 20-creator batch for a new client to establish baseline metrics. Once validated, they convert it to weekly refresh via
POST /v1/full-scrape/jobs/fs_abc/schedule {"interval_seconds": 604800}.
Response returns 20 schedule_ids, one per creator, all running on 7-day intervals.
Zero retyping, consistent cost_center carry-over, direct link from ad-hoc batch → ongoing
refresh.
Bulk replay — refresh a group of jobs in one call.
Iter 60's
Body:
Max 20 job_ids per bulk call — same cap as
Per-item error isolation. One failed lookup or authorization doesn't block the others. Each entry in the
Archive-aware. Original job_ids up to 30 days old work automatically — the replay path loads each job via
Credit semantics. Bulk replay consumes credits normally — it's not a free retry. Each original's usernames get charged as if you were running the batch fresh. Budget cap + rate limit enforcement still applies per-replay. A bulk call that would collectively exceed your cap fails individual replays with 402-equivalent errors, but other replays continue.
Use case — weekly refresh of a client portfolio: agency tracks 20 creators for client-acme and 20 for client-beta via two separate weekly batches. On Monday morning, they list the previous week's terminal jobs via
Always async. Like single replay (iter 60), bulk replay doesn't have a sync mode — the caller gets job_ids immediately and polls each. Sync bulk would mean holding an HTTP connection open for potentially minutes per job × 20 jobs; not a reasonable HTTP shape.
/jobs/:id/replay works one job at a time; iter 73's
/jobs/replay-bulk batches up to 20 originals in a single request. Each replay runs
as an independent async job — the caller gets back the new job_ids and polls them individually.Body:
{"job_ids": ["fs_abc", "fs_def", ...], "cost_center": "optional_override",
"platforms": "optional_override", "webhook_url": "optional", "max_age_seconds": 86400}Max 20 job_ids per bulk call — same cap as
/batch. Overrides apply to ALL replays
in the bulk — if you need per-job overrides, use individual
/jobs/:id/replay calls.Per-item error isolation. One failed lookup or authorization doesn't block the others. Each entry in the
replays response array is
either {original_id, new_job_id, status: "enqueued", replay_of, archived_source} or
{original_id, status: "error", error: ""} . Top-level
enqueued + errors counts give a quick rollup.Archive-aware. Original job_ids up to 30 days old work automatically — the replay path loads each job via
Jobs.get_for/2 which falls back
to JobArchives (iter 71) on ETS miss. The response's archived_source: true
flag tells you when the source came from warm storage vs the hot 1h ETS path.Credit semantics. Bulk replay consumes credits normally — it's not a free retry. Each original's usernames get charged as if you were running the batch fresh. Budget cap + rate limit enforcement still applies per-replay. A bulk call that would collectively exceed your cap fails individual replays with 402-equivalent errors, but other replays continue.
Use case — weekly refresh of a client portfolio: agency tracks 20 creators for client-acme and 20 for client-beta via two separate weekly batches. On Monday morning, they list the previous week's terminal jobs via
GET /v1/full-scrape/jobs/history?status=complete&since=2026-03-31, collect the
job_ids, and POST them to /jobs/replay-bulk with
?max_age_seconds=604800 — any creator still fresh from the original run hits the
cached path, and only stale creators get re-scraped. Zero params re-entered, 7-day cache
preserved automatically.Always async. Like single replay (iter 60), bulk replay doesn't have a sync mode — the caller gets job_ids immediately and polls each. Sync bulk would mean holding an HTTP connection open for potentially minutes per job × 20 jobs; not a reasonable HTTP shape.
Replay model. One-shot "refresh this data" button for any job,
regardless of its terminal status (complete / failed / cancelled / refunded). The replay loads the
original job record, inherits the usernames (single or batch) and cost_center, and dispatches a
brand-new async job with a fresh id. Credits are charged normally for the replay — this is not a
free retry, just a convenience shortcut that saves the caller from re-specifying params.
Body (all optional — defaults pulled from the original):
•
•
•
•
Always async. Replay returns 202 with a new
Quote tokens are NOT inherited. The quoted price from the original pre-flight is long expired by replay time — forcing a new pre-flight just to retain the lock would defeat the one-click replay promise. If you need price certainty on the replay, run a fresh pre-flight with the same usernames/cost_center and include the token in a regular batch call instead.
Use case — scheduled data refresh: an agency runs a 20-creator batch every Monday for a client. Instead of storing the usernames list in their own DB, they record the Monday job_id and replay it the following Monday — zero client-side state, zero params to re-specify,
Not supported: sync batches (which don't create a Jobs record) cannot be replayed — they return no job_id to replay against. Use async batches (
Body (all optional — defaults pulled from the original):
•
cost_center — override the original's attribution tag•
platforms — override scrape_opts platforms (e.g. "instagram,tiktok")•
webhook_url — fire a new webhook on replay completion•
max_age_seconds — combine replay with conditional caching (iter 57) to get "refresh
this batch, but skip any username I already have fresh data on"Always async. Replay returns 202 with a new
job_id
regardless of whether the original was sync or async. Polling semantics match any normal async job.
The new job record carries replay_of: "<original_id>" so downstream consumers
(receipts, dashboards) can link back.Quote tokens are NOT inherited. The quoted price from the original pre-flight is long expired by replay time — forcing a new pre-flight just to retain the lock would defeat the one-click replay promise. If you need price certainty on the replay, run a fresh pre-flight with the same usernames/cost_center and include the token in a regular batch call instead.
Use case — scheduled data refresh: an agency runs a 20-creator batch every Monday for a client. Instead of storing the usernames list in their own DB, they record the Monday job_id and replay it the following Monday — zero client-side state, zero params to re-specify,
cost_center stays consistent. Combine with ?max_age_seconds=86400
to skip any creator that was already scraped in the last 24h by some other flow.Not supported: sync batches (which don't create a Jobs record) cannot be replayed — they return no job_id to replay against. Use async batches (
"async": true) if you want replay capability down the line.
GET/v1/full-scrape/jobs?status=pending&limit=20List active jobs (last 1h, ETS) with per-status summary API KEY
GET/v1/full-scrape/jobs/history?status=failed&cost_center=client-acme&since=2026-04-01T00:00:00Z&limit=50List historical (archived, last 30d) jobs with filters on status, cost_center, batch, time range API KEY
GET/v1/full-scrape/jobs/export?format=csv&cost_center=client-acme&since=2026-04-01T00:00:00ZStream archived jobs as CSV or JSON — accounting-ready export, memory-efficient API KEY
Job archive export (iter 109). Streams archived
jobs as CSV (one row per job) or JSON (preserves nested result map). Same memory-efficient
pattern as the iter 46 snapshot export — uses
CSV columns:
JSON format: alternative
Filters (all cumulative):
•
•
•
•
Use case — monthly reconciliation: finance needs an end-of-month CSV of every batch job for each client.
Pairs with iter 67 events export: jobs export is a higher-level view (one row per batch job), events export is a lower-level view (one row per billing event). For line-item drill-down use events; for job-level audit use this.
Ecto.Repo.stream/2 inside a
transaction, chunks 200 rows at a time, pipes into the Phoenix chunked response.CSV columns:
id, status,
batch, username, batch_size,
successful_count, failed_count, total_cost_usd,
cost_center, orchestration_id, replay_of,
started_at, finished_at, refunded_at,
refund_amount_usd, refund_reason. Ordered ascending by
finished_at.JSON format: alternative
?format=json emits a JSON array — each element is a full job object
including the nested result map (per-username scrape results, billing
breakdown, etc.). Use for programmatic consumption where CSV flattening would lose
detail.Filters (all cumulative):
•
?since= / ?until= — ISO8601 bounds on finished_at•
?cost_center= — exact match•
?status= — complete / failed / cancelled•
?batch=true|false — single vs batch jobsUse case — monthly reconciliation: finance needs an end-of-month CSV of every batch job for each client.
GET /v1/full-scrape/jobs/export?format=csv&cost_center=client-acme&since=2026-04-01T00:00:00Z&until=2026-05-01T00:00:00Z
returns the full month's jobs ready to drop into QuickBooks. Cross-references with
/v1/cost/invoice totals for double-entry sanity checking — per-job rows
here should sum to the invoice line items.Pairs with iter 67 events export: jobs export is a higher-level view (one row per batch job), events export is a lower-level view (one row per billing event). For line-item drill-down use events; for job-level audit use this.
Historical jobs — queryable warm store. Iter 71 persisted
terminal-state jobs to
Filters (all optional, all composable):
•
•
•
•
•
Response shape:
Indexes: the iter 71 migration added
Use case — end-of-week review:
job_archives; iter 72 makes that table queryable. Agencies
can now list "all failed jobs for client-acme last week" in one call instead of iterating through
billing events.Filters (all optional, all composable):
•
?status= — complete / failed / cancelled (terminal states only — pending/running live in ETS)•
?cost_center= — exact match on the stored attribution tag•
?since= / ?until= — ISO8601 bounds on finished_at•
?batch=true|false — single vs batch jobs•
?limit= — default 20, max 200Response shape:
jobs[] (same shape as live
/jobs list + archived: true flag), count, summary
(counts by status scoped to the same since/until window, via a single GROUP BY query), echoed
filters, and retention_days telling the caller how far back the archive
goes (default 30).Indexes: the iter 71 migration added
(api_key_id, finished_at), (api_key_id, status), and
(finished_at) indexes. Typical queries
(WHERE api_key_id = $1 AND finished_at >= $2 ORDER BY finished_at DESC LIMIT N)
are index scans — cheap even at months of accumulated history.Use case — end-of-week review:
GET /v1/full-scrape/jobs/history?status=failed&since=2026-04-01T00:00:00Z returns
every job that failed in the first week of April. Drill into each via
/jobs/:id/receipt and /jobs/:id/refund (both now archive-aware from iter
71). Feed the list into a Slack thread for the ops standup.
DELETE/v1/full-scrape/jobs/:idCancel a pending or running async scrape — kills the task, credit NOT refunded API KEY
Refund model — fairness for on-demand billing. Full scrape on-demand
charges up front at enqueue time. When a scrape fails (network error, platform down, upstream crash),
you shouldn't pay for it. Two refund paths:
1. Automatic refund on async failure. When an async single-scrape job is marked
2. Manual refund via POST /v1/full-scrape/jobs/:id/refund. Idempotent — calling twice returns the existing refund record with
How the math nets out:
Sync failures: for sync-mode scrapes (
1. Automatic refund on async failure. When an async single-scrape job is marked
:failed, the controller immediately calls Credits.refund/2
(incrementing an offsetting refund counter in the sliding 30-day bucket) AND writes a
:full_scrape_refund billing event with negative cost_usd, so
/v1/quota, /v1/cost/centers, and all MTD aggregates net out automatically.
The job record is stamped with refunded_at, refund_amount_usd, and
refund_reason: "auto_on_failure:<err>".2. Manual refund via POST /v1/full-scrape/jobs/:id/refund. Idempotent — calling twice returns the existing refund record with
already_refunded: true
instead of double-refunding. Returns 400 for batch jobs (partial-failure batches need per-username
resolution via the receipt endpoint), 422 for jobs not in :failed state. On success
returns the fresh credits_after_refund effective count so clients can update their UI
without re-polling /v1/quota.How the math nets out:
Credits.read/2 returns max(raw_consumed - refunded, 0) from two separate
Hammer buckets — the consume counter and the refund counter. BillingEvents.mtd_cost_usd/1
does sum(cost_usd) over the month, and negative refund events naturally subtract.
No DB migration required.Sync failures: for sync-mode scrapes (
async=false),
an upstream crash raises a 500 and the credit stays consumed — refund manually via the endpoint above
once you identify the failed job. This asymmetry is intentional: sync scrapes don't have a persistent
job record until the response lands, so auto-refund hooks can't attach cleanly.
Receipt endpoint — canonical accounting view of a scrape job.
Combines the in-memory job record (timeline, results, status) with billing events from the DB that
fall within the job's
Response shape:
Closes the loop:
started_at → finished_at window (±2s buffer). Matched by
api_key_id + event type (full_scrape_consume / full_scrape_overage).
For batch jobs, events are indexed by metadata.username to pin each line item to its
specific scrape.Response shape:
receipt_id (rcpt_<job_id>),
job_id, tier, status, batch, usernames,
timeline (created_at / started_at / finished_at / duration_ms),
items[] (per-username: mode, charged_usd, our_marginal_cost_usd, platforms, cost_breakdown),
totals (item_count, total_charged_usd, our_marginal_cost_usd, gross_margin_usd),
billing_events_linked (count + window), format (version, currency, generated_at).Closes the loop:
pre-flight → batch → receipt.
Pre-flight projects, batch commits, receipt reconciles. The gross_margin_usd line (total_charged −
our_marginal_cost_usd) gives operators visibility into per-job unit economics.
Jobs live 1h in-memory (ETS-backed, volatile across deploys). Credit is consumed at enqueue time — polling a
non-existent or expired job_id does not refund the credit. Jobs are scoped to the owning API key —
another key polling the same id gets 403. Recommended poll interval: 2s.
Idempotency keys (retry safety)
GET/v1/full-scrape/idempotency-keysList cached idempotency keys for the calling api_key — audit-only, bodies omitted API KEY
Pass
Replay indicator: replayed responses carry
Scope: keys are scoped per API key — different customers can reuse the same literal.
Constraints: printable ASCII, max 255 chars. Malformed keys return 400.
Not cached: 4xx/5xx responses (you can retry after fixing the request). Concurrent requests with the same key are not serialized — use for retry safety, not concurrency control.
Audit listing (iter 87):
Response shape: per-entry
Use case — retry debugging: client sees a timeout on a batch POST. Was the batch consumed? Call
Idempotency-Key: <unique> on credit-consuming POSTs
(/v1/full-scrape/:username, /batch, /schedules) and BuyCrowds will replay
the exact cached response on any retry within 24h — no duplicate credit consumption.Replay indicator: replayed responses carry
X-Idempotency-Replay: true.
Fresh responses carry X-Idempotency-Key: <echo> of what was stored.Scope: keys are scoped per API key — different customers can reuse the same literal.
Constraints: printable ASCII, max 255 chars. Malformed keys return 400.
Not cached: 4xx/5xx responses (you can retry after fixing the request). Concurrent requests with the same key are not serialized — use for retry safety, not concurrency control.
Audit listing (iter 87):
GET /v1/full-scrape/idempotency-keys returns the active cached keys for your
api_key — useful for debugging "did my request go through?". ETS scan via
:ets.foldl filtered by the first element of each key's tuple
({api_key_id, key}), so other tenants are invisible. Expired entries are
filtered out at query time even if the sweeper hasn't cleaned them yet.Response shape: per-entry
{key, status, created_at, expires_at, age_seconds}, sorted by created_at
descending. Response bodies are intentionally omitted — they can be multi-KB
each and this is audit, not replay. To replay a cached response, re-submit the original
POST with the same Idempotency-Key header.Use case — retry debugging: client sees a timeout on a batch POST. Was the batch consumed? Call
GET /v1/full-scrape/idempotency-keys, find the key, verify
status: 200 + age_seconds: 15. The batch DID go through — the
client can safely re-submit the same key to replay the cached response (same job_id) without
double-charging credits.
Snapshots (30-day metric history)
GET/v1/full-scrape/snapshots?username=neymarjr&limit=30List condensed metric snapshots — auto-captured on every scrape, 30-day retention API KEY
GET/v1/full-scrape/snapshots/:idSingle snapshot with full per-platform metric breakdown API KEY
POST/v1/full-scrape/snapshots/:id/extendExtend a snapshot's TTL beyond the default 30 days (max 365 days from capture) — pin reference data API KEY
POST/v1/full-scrape/snapshots/:id/reset-expiryRestore a snapshot's TTL back to the default 30 days from capture API KEY
Snapshot TTL extension (iter 85). By default every
snapshot has a 30-day TTL enforced by the RetentionSweeper (iter 44) — rows past
Body:
Ownership-scoped: 404 for snapshots owned by another api_key. 422
Reset:
Use case — quarterly client baselines: agency captures a snapshot of client-acme's creators on March 31 to establish the Q1 baseline. The default TTL would sweep it at end of April, but they need to reference it in the Q2 review on June 30. One call:
Use case — audit preservation: a snapshot captured during an anomaly investigation needs to be preserved for legal/compliance reasons. Extend to 365d and document the snapshot_id in the investigation ticket. The retention sweeper won't touch it until the investigation closes.
expires_at get deleted on the hourly sweep. For reference data you want to keep
longer (quarterly comparison baselines, client-approved historical states, audit records),
extend the TTL to prevent the sweep.Body:
POST /v1/full-scrape/snapshots/:id/extend {"additional_days": 60}. Adds
additional_days to the current expires_at. Clamped to
365 days from the original capture — you can't extend indefinitely. The
response echoes the new expires_at + max_possible_expires_at +
a clamped_to_max? flag if the extension hit the ceiling.Ownership-scoped: 404 for snapshots owned by another api_key. 422
exceeds_max_ttl when the snapshot is already past the
365-day ceiling (no-op request).Reset:
POST /v1/full-scrape/snapshots/:id/reset-expiry restores the TTL back to
default (captured_at + 30 days). If the default TTL has already passed by the
time you reset, the snapshot is marked for sweep in 1 hour as a grace period
(response carries grace_period_applied?: true).Use case — quarterly client baselines: agency captures a snapshot of client-acme's creators on March 31 to establish the Q1 baseline. The default TTL would sweep it at end of April, but they need to reference it in the Q2 review on June 30. One call:
POST /v1/full-scrape/snapshots/snap_xxx/extend {"additional_days": 120}. The
snapshot now survives until end of July, available via
GET /v1/full-scrape/snapshots/snap_xxx for the entire review window. After the
review, call reset-expiry and let the sweep reclaim the storage naturally.Use case — audit preservation: a snapshot captured during an anomaly investigation needs to be preserved for legal/compliance reasons. Extend to 365d and document the snapshot_id in the investigation ticket. The retention sweeper won't touch it until the investigation closes.
POST/v1/full-scrape/snapshots/diffCompare two snapshots — same diff engine as job diff, works across weeks API KEY
GET/v1/full-scrape/snapshots/timeseries?username=neymarjr&metric=followers&days=30Chart-ready time series of a single metric across snapshots API KEY
GET/v1/full-scrape/snapshots/trends?username=neymarjr&metric=followers&days=30Linear regression trend analysis — per-platform rate of change + confidence + classification API KEY
GET/v1/full-scrape/snapshots/anomalies?username=neymarjr&metric=followers&days=30&sigma=2.0Detect spike/drop anomalies — points beyond N sigma from the regression line API KEY
Catches what trend lines smooth over. A creator growing at 5k/week with
one 150k follower spike on April 5 looks "steady" to the trend endpoint. The anomalies endpoint flags
that exact snapshot as a
Algorithm: compute OLS regression line, measure residual at each sample (
Query params:
Response: list of anomalies sorted by severity with
Use case: power a "what happened" view on the dashboard. Creator went viral? Got hacked? Deleted a chunk of posts? The anomaly endpoint finds the exact date. Pairs with
:spike event with residual_sigma: 4.2, severity: :high.Algorithm: compute OLS regression line, measure residual at each sample (
actual - predicted), compute std dev of residuals, flag any point where
|residual| ≥ sigma × std.Query params:
username (required), metric
(default followers), days (3-90, default 30), platform (filter),
sigma (1.0-10.0, default 2.0 — higher = stricter, fewer anomalies).Response: list of anomalies sorted by severity with
direction (:spike/:drop),
severity (:low/:medium/:high based on σ magnitude),
actual, predicted, residual, captured_at, snapshot_id.Use case: power a "what happened" view on the dashboard. Creator went viral? Got hacked? Deleted a chunk of posts? The anomaly endpoint finds the exact date. Pairs with
/snapshots/:id to inspect the full per-platform state at the anomaly moment.
The "is this creator trending up or down?" answer.
Runs ordinary least squares regression over the last N days of snapshot history for a username.
Returns per-platform rate of change (per day and per week), direction, R² confidence, and status classification.
Status classification:
•
•
•
•
Query params:
Aggregate shape:
Use case: tell your users "Creator X gained 5% followers on Instagram this week with high confidence, but their Twitter presence is declining at -2%/week."
Status classification:
•
:trending — R² ≥ 0.3 AND rate ≥ 1% per week → real signal•
:flat — low rate or low confidence → no meaningful change•
:noisy — R² < 0.1 but non-trivial slope → data too erratic to trust•
:insufficient_data — fewer than 3 samples in windowQuery params:
username (required), metric
(default followers, one of followers/following/posts/engagement_rate/video_count/subscribers),
days (3-90, default 30), platform (filter to one), all_metrics=true
(compute trends for all canonical metrics in one shot).Aggregate shape:
fastest_growing, fastest_declining,
counts by status, total_slope_per_day. Ready for a "growth dashboard" card.Use case: tell your users "Creator X gained 5% followers on Instagram this week with high confidence, but their Twitter presence is declining at -2%/week."
The durable counterpart to jobs. Jobs live 1h (async polling window);
snapshots live 30 days (historical analysis window). Every scrape — ad-hoc, batch, or scheduled —
auto-captures a snapshot with condensed metrics (followers, following, posts, engagement_rate, video_count, subscribers).
The full result payload is NOT stored — only the numbers that matter for trend analysis.
Use cases:
• Daily/weekly monitoring via schedules: snapshot every run → diff last 7 days → "gained 5k followers"
• Chart rendering:
• Cross-week diffs:
Storage: ETS, 30-day sliding, periodic cleanup every 15 min. Volatile across deploy. ~22KB per user for a full 30-day history — fits comfortably.
Use cases:
• Daily/weekly monitoring via schedules: snapshot every run → diff last 7 days → "gained 5k followers"
• Chart rendering:
/snapshots/timeseries returns a flat array of (captured_at, platform, value)
sorted by date — feeds directly into a line/area chart• Cross-week diffs:
/snapshots/diff reuses the same diff engine as /full-scrape/diff
but works on snapshots that outlive the 1h job TTLStorage: ETS, 30-day sliding, periodic cleanup every 15 min. Volatile across deploy. ~22KB per user for a full 30-day history — fits comfortably.
Budget cap (hard enforce monthly ceiling)
GET/v1/full-scrape/budget-capRead your current budget cap config, or default state if unset API KEY
GET/v1/full-scrape/budget-cap/history?limit=100&since=2026-01-01T00:00:00ZCompliance audit trail — every PUT to /budget-cap is snapshotted; returns newest-first with change_source + change_note API KEY
GET/v1/full-scrape/budget-cap/history/diff?from=bcv_xxx&to=bcv_yyyField-by-field delta between two versions. Omit both for "what changed in the most recent edit?" API KEY
POST/v1/full-scrape/budget-cap/history/restoreRollback current config to a prior bcv_* version — logs a new version with change_source=rollback API KEY
GET/v1/full-scrape/budget-cap/burn-ratePer-center burn rate + days-until-empty + EOM projection. Worst-first triage, status flags: healthy/warning/critical/exceeded/frozen API KEY
GET/v1/full-scrape/budget-cap/recommendations?reserve_pct=10&min_move_usd=1Rebalance suggestions: donors (healthy + headroom) → recipients (critical/exceeded). Pipe directly into /transfer-allocation API KEY
POST/v1/full-scrape/budget-cap/recommendations/applyBundle-execute all rebalance suggestions atomically. Default dry_run=true — body {dry_run:false, max_moves:50, change_note} to commit API KEY
POST/v1/full-scrape/batch-size-advisorGiven price_per_scrape_usd + optional cost_center + count_requested, returns max batch under each cap + binding constraint. Pure read API KEY
POST/v1/full-scrape/batch-size-advisor/multiMulti-center version: body {centers: [{cost_center, count_requested}]}. Sequential fill deducts running cost from shared global/daily caps. Order = priority API KEY
GET/v1/full-scrape/budget-cap/center/:center/activity?since=2026-04-01T00:00:00Z&limit=100Unified timeline for one cost_center: billing_events + cost_alert_fires merged, sorted desc, tagged by type API KEY
GET/v1/full-scrape/budget-cap/center/:center/report-cardOne-call dashboard: config + burn + 7d trend + recent spend + 30d alert summary + cross-references API KEY
GET/v1/full-scrape/budget-cap/centers/report-cardsBulk compact report card for every sub-capped center. Sorted worst-first. Avoids N+1 on agency dashboards API KEY
POST/v1/full-scrape/budget-cap/what-ifReplays last N days of billing events against a proposed cap config. Reports blocked counts, first breach timestamps, daily burn curve API KEY
POST/v1/full-scrape/preflightUnified go/no-go gate. Body {usernames[], cost_center?, estimated_price_per_scrape_usd?}. Runs rate+credits+monthly+daily+sub-cap+frozen. Active reservations subtracted from headroom API KEY
POST/v1/full-scrape/budget-cap/reservationsCreate short-lived budget reservation. Body {amount_usd, cost_center?, ttl_seconds? (max 3600)}. Closes preflight race condition API KEY
GET/v1/full-scrape/budget-cap/reservationsList active reservations with TTL countdown + total reserved USD API KEY
GET/v1/full-scrape/budget-cap/reservations/statsPool stats: count/USD by center, TTL histogram, at-risk (<60s) count, oldest/newest, avg remaining TTL API KEY
POST/v1/full-scrape/budget-cap/reservations/:id/commitFinalize a reservation. Removes from active pool; actual spend tracked via billing_events API KEY
POST/v1/full-scrape/budget-cap/reservations/:id/releaseRelease reservation without committing — returns budget to available pool API KEY
POST/v1/full-scrape/budget-cap/reservations/:id/extendBump reservation TTL by {additional_seconds} (max 3600) — keeps same id and amount API KEY
POST/v1/full-scrape/budget-cap/reservations/release-allBulk release every active reservation for this api_key — emergency cleanup after aborted pipeline API KEY
GET/v1/full-scrape/budget-cap/presetsList built-in cap preset library: conservative / balanced / aggressive / dev / pause API KEY
POST/v1/full-scrape/budget-cap/presets/:name/applyApply a preset. Preserves per_center_caps + frozen list. Logs version with change_source=preset_applied:<name> API KEY
PUT/v1/full-scrape/budget-capSet or replace the monthly USD cap with optional hard enforcement API KEY
DELETE/v1/full-scrape/budget-capRemove the cap — no more enforcement, back to unlimited recurring spend API KEY
POST/v1/full-scrape/budget-cap/freezeFreeze a cost_center — pause all spending tagged with that attribution until unfrozen API KEY
POST/v1/full-scrape/budget-cap/unfreezeUnfreeze a cost_center — spending resumes immediately, sub-cap config preserved API KEY
POST/v1/full-scrape/budget-cap/freeze-bulkBulk freeze up to 50 cost_centers in one call — per-center error isolation API KEY
POST/v1/full-scrape/budget-cap/unfreeze-bulkBulk unfreeze — counterpart of freeze-bulk API KEY
POST/v1/full-scrape/budget-cap/transfer-allocationMove $ between two sub-caps without re-specifying the whole per_center_caps map API KEY
Targeted allocation rebalance (iter 103). The
existing
Semantics:
• from must have an existing sub-cap entry in
• to may already exist (value gets bumped by amount) or be new (creates with value = amount)
• from after transfer: if result is 0, the entry is dropped from the map entirely (signal that the center no longer has a dedicated sub-cap)
Response shape:
Historical billing untouched: transfer only reshapes FUTURE enforcement. Past billing events remain attributed to their original cost_center — use
Atomic: single DB UPDATE via the changeset — either both entries update or neither (no partial state).
Use case — mid-month rebalance: agency's monthly budget is $500 split 50/50 between client-acme and client-beta ($250 each). Mid-month, client-beta pitches a new campaign needing extra headroom, client-acme is comfortably under. Operator calls
PUT /v1/full-scrape/budget-cap requires a full
per_center_caps map — to move $20 from client-acme to client-beta, callers
had to read the current config, modify two entries, and PUT the whole map back. Iter 103
adds a targeted helper for the common "rebalance between centers" case:POST /v1/full-scrape/budget-cap/transfer-allocation {"from": "client-acme", "to": "client-beta", "amount_usd": 20}Semantics:
• from must have an existing sub-cap entry in
per_center_caps
AND value ≥ amount (422 errors otherwise)• to may already exist (value gets bumped by amount) or be new (creates with value = amount)
• from after transfer: if result is 0, the entry is dropped from the map entirely (signal that the center no longer has a dedicated sub-cap)
Response shape:
{from, to, amount_usd, from_before, from_after, to_before, to_after, source_dropped, updated_per_center_caps, note}.
The before/after pair makes it auditable — you can trace exactly what changed.Historical billing untouched: transfer only reshapes FUTURE enforcement. Past billing events remain attributed to their original cost_center — use
POST /v1/full-scrape/budget-cap/transfer-center (iter 76) if you want to
reattribute historical events too.Atomic: single DB UPDATE via the changeset — either both entries update or neither (no partial state).
Use case — mid-month rebalance: agency's monthly budget is $500 split 50/50 between client-acme and client-beta ($250 each). Mid-month, client-beta pitches a new campaign needing extra headroom, client-acme is comfortably under. Operator calls
POST /v1/full-scrape/budget-cap/transfer-allocation {"from": "acme", "to": "beta", "amount_usd": 50}.
Result: acme → $200, beta → $300. One call, no re-specifying the map, fully auditable.
POST/v1/full-scrape/budget-cap/rename-centerRename a cost_center across config + historical billing events — rewrites JSONB, preserves sub-cap + freeze state API KEY
POST/v1/full-scrape/budget-cap/transfer-centerMove billing events within a time window from one cost_center to another — granular, config-preserving API KEY
The guardrail for set-and-forget users.
You set a monthly ceiling in USD. When a new recurring operation would push projected monthly cost above
the cap, the API returns
Body:
Daily cap (iter 106 — velocity limiter): optional companion to
Enforcement order: global monthly cap → daily cap → frozen center → per-center sub-cap. Daily cap fails with 402 daily_spend_cap_exceeded containing
Window resets at UTC midnight. The cap is computed via
Scope: applies to recurring scheduled operations only (schedules + bulk watchlist scheduling). Ad-hoc scrapes remain governed by tier quota + overage — they consume credits but don't check the cap. The cap is specifically a guardrail for "I set 40 schedules and forgot".
Hard vs soft enforce:
•
•
Checked on: single
Visible in projection:
Example: User on Pro ($99 base, 1500 credits quota) sets
Per-cost-center sub-caps (iter 59): the PUT body now accepts an optional
Enforcement: if a request carries
Pre-flight integration: when a cost_center is set on a pre-flight request, the report includes a new
sub-cap math: per-center MTD cost is computed via
Hard enforce honors sub-caps: setting
Cost center freeze/unfreeze (iter 61): temporarily pause all spending for a specific cost_center without deleting the sub-cap or losing attribution history.
Freeze vs delete: freezing preserves the sub-cap config (
Use cases: client went delinquent → freeze until payment clears. Audit investigation → freeze to stop new activity. End-of-quarter budget cutoff → freeze all clients, unfreeze on the 1st. Contract dispute → freeze without losing the sub-cap you negotiated.
Bulk freeze/unfreeze (iter 101): for incident response and end-of-quarter cutoffs, use the bulk variants:
Idempotency: bulk freeze is idempotent — re-calling on already-frozen centers is a no-op reported as
Incident response use case: detection system spots anomalous spend across multiple clients. Ops team calls
Pre-flight surfaces it: when you pre-flight a batch tagged with a frozen center, the
Response headers on 402:
Cost center rename (iter 70):
Response:
Merge behavior: if the destination name already has a sub-cap configured, the OLD value is silently dropped (the pre-existing destination wins). This prevents a rename from overwriting a carefully tuned sub-cap with an old/stale one. The frozen list is de-duplicated naturally via
Historical reporting reconciliation: after a rename, every report endpoint (
Use case — client rebrand: agency had
Time-scoped transfer (iter 76):
Rename vs transfer:
•
•
Transfer response shape:
Use case — campaign re-categorization: agency ran 50 scrapes tagged
Dry-run simulator (iter 81):
Why this matters: before applying a tighter cap, agencies need to know which clients would be immediately 402'd. Blindly lowering the cap from $500 to $300 mid-month could block every in-flight workflow. The simulator surfaces this upfront with a concrete breakdown: "client-acme is already at $48 spent vs proposed $40 — applying this config would break client-acme's schedule immediately".
Recommendations block: human-readable strings ready for Slack/dashboard, context-aware: • "ALERT: your current MTD spend already exceeds the proposed global cap..."
• "At current burn, your projected EOM ($X) exceeds the proposed cap. Reduce daily to $Y to fit."
• "Cost center 'client-acme' is ALREADY over the proposed sub-cap..."
• "Cost center 'client-beta' projected to exceed its sub-cap on 2026-04-22 at current burn."
• "Global cap fits your current burn rate with headroom." (default when all is OK)
Use case — quarterly budget review: CFO wants to cut scraping spend 40% next quarter. Operator runs the simulator with the proposed new caps, gets a list of exactly which clients/centers would be immediately blocked and which would exhaust mid-month. The team then negotiates the cuts with the client or defers some schedules before committing the new config via
Auto-allocate (iter 83):
Allocation math:
1. Query
2. Filter to eligible centers (positive spend, excludes "unattributed")
3.
4. For each center:
5. Enforce
6. If floor bumps push total over allocatable, rescale proportionally to fit
Response shape:
Pure preview: no mutation. The
Full planning workflow (iters 81 + 83):
1.
2.
3. Optionally iterate (adjust reserve_pct, basis, min_per_center, or edit the per_center_caps map)
4.
5.
Reserve_pct rationale: 10% default means only 90% of the global cap gets distributed to per-center sub-caps. The other 10% sits as a global buffer that catches untagged overages, one-off emergency scrapes, or anything not bound to a sub-cap. Set to 0 for strict allocation (every dollar pinned to a center); set to 30 for loose allocation with headroom.
Use case — month-end rebalance: agency finishes April, reviews actuals. Some clients grew, some shrank. Instead of manually recomputing sub-caps, they call
402 Payment Required instead of creating it.Body:
monthly_usd_cap (required, 0-100000),
daily_usd_cap (optional, 0-10000, iter 106 velocity limiter),
hard_enforce (default true — false = warning only).Daily cap (iter 106 — velocity limiter): optional companion to
monthly_usd_cap. Caps spend for the current UTC day
independently. Prevents "burn the entire monthly budget in 2 days" scenarios when a new
campaign goes viral or a flaky upstream generates retry storms. Max $10,000/day. Nullable
— agencies without velocity concerns can use only the monthly cap.Enforcement order: global monthly cap → daily cap → frozen center → per-center sub-cap. Daily cap fails with 402 daily_spend_cap_exceeded containing
daily_cap_usd,
today_spent_usd, and a resolution block with three paths
(wait for UTC midnight reset, raise cap, remove cap). Response headers:
x-daily-cap-usd and x-daily-spent-usd.Window resets at UTC midnight. The cap is computed via
BillingEvents.today_cost_usd/1 — a scalar SQL sum of cost_usd for
events with occurred_at >= today_utc_start. Same indexing as the MTD
check, so it's equally cheap per-request.Scope: applies to recurring scheduled operations only (schedules + bulk watchlist scheduling). Ad-hoc scrapes remain governed by tier quota + overage — they consume credits but don't check the cap. The cap is specifically a guardrail for "I set 40 schedules and forgot".
Hard vs soft enforce:
•
hard_enforce=true — over-cap schedule creation is rejected with 402. Use this when you
can't tolerate a surprise bill.•
hard_enforce=false — over-cap creation proceeds but the over_cap? flag in
/recurring-cost surfaces the warning. Good for alerting without blocking.Checked on: single
POST /schedules (new schedule), bulk POST /watchlists/:id/schedules
(atomic — all or nothing). Cap is re-projected on every write; mutations above cap are rejected upfront.Visible in projection:
GET /recurring-cost now includes a
budget_cap section with cap_configured, cap_usd,
hard_enforce, over_cap?, headroom_usd, overage_usd.
UI can render "you've used $150 of your $200 cap" without extra roundtrips.Example: User on Pro ($99 base, 1500 credits quota) sets
monthly_usd_cap=150. They have 3 schedules
projected to add $45/month overage, total projection $144. Next schedule creation projects $165 total →
402. User must raise cap, delete a schedule, or extend intervals.Per-cost-center sub-caps (iter 59): the PUT body now accepts an optional
per_center_caps map that scopes sub-ceilings to specific cost_center tags
(iter 52 attribution). Example:PUT /v1/full-scrape/budget-cap {"monthly_usd_cap": 500, "per_center_caps": {"client-acme": 50, "client-beta": 100}}Enforcement: if a request carries
X-Cost-Center: client-acme and that
center's MTD cost has already hit $50, the budget enforcement plug returns
402 cost_center_cap_exceeded in addition to the global cap check. The response
includes cost_center, cap_usd, spent_usd, overage_usd,
and a resolution block pointing to three fixes (raise sub-cap, remove it, or drop the
header). Response headers: X-Budget-Center, X-Budget-Center-Cap-Usd,
X-Budget-Center-Spent-Usd.Pre-flight integration: when a cost_center is set on a pre-flight request, the report includes a new
cost_center_sub_cap section mirroring
the existing budget_cap section. Both are evaluated — a projection that fits under the
global cap but exceeds a sub-cap is blocked with the new reason
would_exceed_cost_center_sub_cap. Agencies get a single call that says "this batch
would fit globally BUT client-acme is already at $48 and your batch would push it to $52".sub-cap math: per-center MTD cost is computed via
BillingEvents.cost_for_center/2 — an indexed SQL aggregate over
metadata->>'cost_center' = '<tag>' for the current UTC month. One scalar
query; cheap enough to run on every request.Hard enforce honors sub-caps: setting
hard_enforce: false disables BOTH global and per-center enforcement (warning-only mode).
Sub-caps never bypass the global hard_enforce toggle.Cost center freeze/unfreeze (iter 61): temporarily pause all spending for a specific cost_center without deleting the sub-cap or losing attribution history.
POST /v1/full-scrape/budget-cap/freeze {"cost_center": "client-acme"} adds the tag to
the frozen_cost_centers array. From that moment, any request carrying
X-Cost-Center: client-acme is rejected with 402 cost_center_frozen —
the frozen check runs BEFORE the sub-cap check, so frozen state blocks independently of spend.
POST /v1/full-scrape/budget-cap/unfreeze {"cost_center": "client-acme"} restores
spending immediately. Both endpoints are idempotent.Freeze vs delete: freezing preserves the sub-cap config (
per_center_caps["client-acme"] stays put) — perfect for "pause this client for
Q2 then resume in Q3". Deleting the sub-cap loses the ceiling config. The freeze list can hold
up to 100 centers.Use cases: client went delinquent → freeze until payment clears. Audit investigation → freeze to stop new activity. End-of-quarter budget cutoff → freeze all clients, unfreeze on the 1st. Contract dispute → freeze without losing the sub-cap you negotiated.
Bulk freeze/unfreeze (iter 101): for incident response and end-of-quarter cutoffs, use the bulk variants:
POST /v1/full-scrape/budget-cap/freeze-bulk {"cost_centers": ["client-acme", "client-beta", "client-gamma"]}.
Up to 50 centers per call. Each center is validated independently (regex
[A-Za-z0-9_.-]+, max 64 chars); invalid tags are silently dropped before
freeze. Response includes per-center status so the caller can see what was frozen vs
what had errors (e.g. no budget cap configured).Idempotency: bulk freeze is idempotent — re-calling on already-frozen centers is a no-op reported as
status: "frozen". Same for
unfreeze-bulk. Safe to retry from scripts without checking current state.Incident response use case: detection system spots anomalous spend across multiple clients. Ops team calls
POST /v1/full-scrape/budget-cap/freeze-bulk with the full client list to
freeze everything at once. Investigation happens. When cleared, they call
/unfreeze-bulk with the same list. Two API calls total vs N individual
freeze calls + manual tracking of which got frozen.Pre-flight surfaces it: when you pre-flight a batch tagged with a frozen center, the
cost_center_sub_cap section returns
{frozen: true, blocks_request?: true} and blocking_reasons includes
"cost_center_frozen" — distinct from "would_exceed_cost_center_sub_cap" so
client code can surface the right message.Response headers on 402:
X-Budget-Center: <tag> and X-Budget-Center-Frozen: true.Cost center rename (iter 70):
POST /v1/full-scrape/budget-cap/rename-center {"from": "client-acme-old", "to": "client-acme"}
consolidates a renamed client across BOTH the budget cap config AND all historical billing events.
Uses PostgreSQL jsonb_set to rewrite
metadata->'cost_center' in-place for every matching event, preserving
labels, username, tier and any other metadata keys
untouched. The per_center_caps sub-cap entry gets its key renamed, and
frozen_cost_centers string array entries get substituted + deduped.Response:
{from, to, rewritten_events: 87, config_updated: true, cap: {...updated cap map...}}.
Idempotent — calling with from == to is a no-op returning 0.Merge behavior: if the destination name already has a sub-cap configured, the OLD value is silently dropped (the pre-existing destination wins). This prevents a rename from overwriting a carefully tuned sub-cap with an old/stale one. The frozen list is de-duplicated naturally via
Enum.uniq.Historical reporting reconciliation: after a rename, every report endpoint (
/v1/cost/invoice, /v1/cost/centers,
/v1/cost/compare-periods, /v1/cost/events/export) automatically reads
the rewritten events and shows the new name retroactively. Month-over-month comparisons stay
consistent — the old name disappears from all time periods in one shot.Use case — client rebrand: agency had
cost_center=acme-corp. Client went through a merger and is now
globex-industries. One call renames all past spend so the monthly invoice PDF
sent to the client shows the CURRENT name for BOTH historical and current events. No manual
reconciliation, no dangling labels, no duplicate tracking during transition.Time-scoped transfer (iter 76):
POST /v1/full-scrape/budget-cap/transfer-center {"from": "client-acme", "to": "client-acme-prelaunch", "since": "2026-04-01T00:00:00Z", "until": "2026-04-07T23:59:59Z"}
moves ONLY events within the window — distinct from rename (iter 70) which rewrites ALL history.
Leaves per_center_caps + frozen_cost_centers config untouched (those
are current state, not retro attribution).Rename vs transfer:
•
rename-center — wholesale, rewrites ALL matching history + config. Use when a
client is permanently renamed (merger, rebrand).•
transfer-center — time-scoped, rewrites ONLY events in the window, config
preserved. Use when campaigns get re-categorized retroactively (e.g. "the first week of April
was actually the prelaunch, not the main campaign").Transfer response shape:
{from, to, window: {since, until}, rewritten_events: N, idempotent_noop: false,
note}. Returns 0 rewritten when from == to (idempotent). 400 on invalid datetimes or
reversed window (since > until).Use case — campaign re-categorization: agency ran 50 scrapes tagged
client-acme throughout April. On April 15, finance team decides
the first week was actually the pre-launch budget, not the main campaign. One call:
transfer-center {from: "client-acme", to: "client-acme-prelaunch", since: "2026-04-01", until: "2026-04-07T23:59:59Z"}.
The monthly invoice now splits cleanly, and week 2+ events remain under the main tag untouched.Dry-run simulator (iter 81):
POST /v1/full-scrape/budget-cap/simulate with a proposed
{monthly_usd_cap, per_center_caps} body. Returns a projection of what WOULD
happen if those caps were applied NOW — zero mutation, pure preview. Each scope (global +
each proposed per-center entry) gets a forecast block:
{proposed_cap_usd, current_spent_usd, would_block_now?, projected_eom_spent_usd,
headroom_usd, days_until_exhausted, projected_exhaustion_date, recommended_daily_spend_usd,
projected_over_cap?, currently_over_cap?}.Why this matters: before applying a tighter cap, agencies need to know which clients would be immediately 402'd. Blindly lowering the cap from $500 to $300 mid-month could block every in-flight workflow. The simulator surfaces this upfront with a concrete breakdown: "client-acme is already at $48 spent vs proposed $40 — applying this config would break client-acme's schedule immediately".
Recommendations block: human-readable strings ready for Slack/dashboard, context-aware: • "ALERT: your current MTD spend already exceeds the proposed global cap..."
• "At current burn, your projected EOM ($X) exceeds the proposed cap. Reduce daily to $Y to fit."
• "Cost center 'client-acme' is ALREADY over the proposed sub-cap..."
• "Cost center 'client-beta' projected to exceed its sub-cap on 2026-04-22 at current burn."
• "Global cap fits your current burn rate with headroom." (default when all is OK)
Use case — quarterly budget review: CFO wants to cut scraping spend 40% next quarter. Operator runs the simulator with the proposed new caps, gets a list of exactly which clients/centers would be immediately blocked and which would exhaust mid-month. The team then negotiates the cuts with the client or defers some schedules before committing the new config via
PUT /v1/full-scrape/budget-cap.Auto-allocate (iter 83):
POST /v1/full-scrape/budget-cap/auto-allocate computes a proposed
per_center_caps map by distributing a new global cap proportionally to each
center's historical spend in a basis window. Body:
{"monthly_usd_cap": 500, "basis": "last_month" | "last_30_days",
"reserve_pct": 10, "min_per_center_usd": 5}.Allocation math:
1. Query
summary_by_cost_center for the basis window2. Filter to eligible centers (positive spend, excludes "unattributed")
3.
allocatable = monthly_usd_cap × (1 - reserve_pct / 100) — the reserve is
held back from allocation as a buffer (default 10% of the global cap)4. For each center:
proposed_cap = allocatable × (center_spend / total_spend)5. Enforce
min_per_center_usd floor by bumping small allocations up6. If floor bumps push total over allocatable, rescale proportionally to fit
Response shape:
{proposal: true, basis: {mode, since, until, total_historical_cost_usd,
eligible_center_count}, inputs: {monthly_usd_cap, reserve_pct, reserve_amount_usd,
allocatable_usd, min_per_center_usd}, allocations: [{cost_center, historical_cost_usd,
historical_event_count, historical_share_pct, proposed_cap_usd, bumped_to_minimum}, ...],
per_center_caps: {cost_center: cap_usd, ...}, overshoot_note, next_steps}.Pure preview: no mutation. The
per_center_caps field is structured to drop straight into
/simulate (preview impact) or PUT /budget-cap (commit).Full planning workflow (iters 81 + 83):
1.
POST /budget-cap/auto-allocate — get proposed allocation based on history2.
POST /budget-cap/simulate — preview impact of the proposal3. Optionally iterate (adjust reserve_pct, basis, min_per_center, or edit the per_center_caps map)
4.
PUT /budget-cap — commit5.
GET /cost/burn-down — monitor the new caps going forwardReserve_pct rationale: 10% default means only 90% of the global cap gets distributed to per-center sub-caps. The other 10% sits as a global buffer that catches untagged overages, one-off emergency scrapes, or anything not bound to a sub-cap. Set to 0 for strict allocation (every dollar pinned to a center); set to 30 for loose allocation with headroom.
Use case — month-end rebalance: agency finishes April, reviews actuals. Some clients grew, some shrank. Instead of manually recomputing sub-caps, they call
POST /budget-cap/auto-allocate {monthly_usd_cap: 500, basis: "last_month"}.
The result reflects April's actual weights. They simulate it, tweak one or two, and commit
in under a minute. Next month's budget is automatically rebalanced to match actual usage.
Recurring cost projection (budget sanity check)
GET/v1/full-scrape/recurring-costTotal projected monthly cost from all your schedules and digest schedules, with per-item breakdown API KEY
POST/v1/full-scrape/recurring-cost/simulateWhat-if projection — add/remove/change schedules hypothetically and see the delta API KEY
GET/v1/full-scrape/recurring-cost/compare-tiersProject current recurring state against every tier — cheapest-fit recommendation API KEY
GET/v1/full-scrape/recurring-cost/historical-tiers?days=30Retrospective tier analysis — replays past billing events across all tiers API KEY
"Was I on the right tier for the past 30 days?"
Reads your actual
Different from
Window:
Per-tier fields:
•
•
•
•
•
•
•
•
Top-level insights:
Use case: monthly billing review. "Should I have been on Business last month?" Returns concrete savings calculation. Combines with
BillingEvents from the window, counts total credits consumed, and replays that
volume against every tier's pro-rated quota + overage model to tell you which tier would have been cheapest
in hindsight.Different from
/compare-tiers: that's prospective
(current schedules → future projection). This is retrospective (past events → what-would-have-been cost).
Both answer "which tier?" but from opposite ends of the timeline.Window:
days query param (1-90, default 30). Reads from
the BillingEvents log, which retains 30 days of history sliding.Per-tier fields:
•
base_pro_rated_usd — tier base price scaled to window length•
overage_usd — what overage would have cost on this tier•
total_would_have_paid_usd = base + overage•
savings_vs_actual_usd — positive = would have saved, negative = would have cost more•
would_have_fit? — whether the tier accommodates overage•
window_quota — tier quota scaled to window days•
overage_credits — credits above the pro-rated quota•
reason — human-readable explanationTop-level insights:
total_actual_cost_usd (what you paid),
best_retrospective_tier, insight (human-readable summary like "You paid $103.74
on Pro. Pro was optimal — any other tier would have cost more").Use case: monthly billing review. "Should I have been on Business last month?" Returns concrete savings calculation. Combines with
/compare-tiers for "where I
was vs where I should go".
"Should I upgrade?" Answered precisely.
This endpoint projects your actual current recurring load (real schedules, real digest schedules, real credit usage)
against every BuyCrowds tier and returns a sorted table with savings vs current and a concrete recommendation.
Different from
Per-tier response fields:
•
•
•
•
•
•
•
Top-level:
Enterprise handling: tier has nil pricing →
Use case: Pro user sees recurring-cost showing $174/month ($99 base + $75 overage). Calls compare-tiers → sees Business would be $299 (no overage but more headroom). Recommendation: "Stay on Pro — Business would cost $125 more without meaningful quota gain at current load". Decision made in one call.
Pure compute over existing stores. Same pattern as the other
Different from
/cost/compare: that endpoint takes an
abstract "requests_per_day" parameter. This one uses your actual state — no guessing, no projections based on hypothetical load.Per-tier response fields:
•
tier, base_monthly_usd, projected_extra_usd, projected_total_usd•
savings_vs_current_usd — negative = would cost more, positive = would save•
fits? — boolean "does this tier accommodate my current load?"•
quota_fits? — credit quota check•
rate_limit_ok? — daily rate limit check•
reason — human-readable fit explanation•
full_scrapes_quota — tier's monthly credit allowanceTop-level:
current_tier, current_total_usd,
cheapest_fit (tier id), recommendation (human-readable string).Enterprise handling: tier has nil pricing →
fits?: :unknown,
projected_total_usd: nil, reason: "custom-priced, contact sales". Doesn't corrupt
the recommendation for the self-serve tiers.Use case: Pro user sees recurring-cost showing $174/month ($99 base + $75 overage). Calls compare-tiers → sees Business would be $299 (no overage but more headroom). Recommendation: "Stay on Pro — Business would cost $125 more without meaningful quota gain at current load". Decision made in one call.
Pure compute over existing stores. Same pattern as the other
/recurring-cost/* endpoints — no new state, no side effects.
Non-destructive preview. Propose mutations in the body, see the hypothetical
projection, decide whether to commit. Nothing is changed. Pure compute.
Body (all fields optional):
•
•
•
Response shape:
•
•
•
•
•
Use case: "I want to add 20 creators to my TopAthletes watchlist as daily schedules. Will I stay under my $200 budget cap?" → call simulate with 20 hypothetical add_schedules, check
Bulk scenario planning: combine removals + additions to see the net impact of a portfolio rotation: "remove 5 old creators, add 10 new ones at 6h interval instead of 24h".
Body (all fields optional):
•
add_schedules — array of hypothetical schedules:
[{"username": "alice", "interval_seconds": 86400, "template": "influencer_kit"}, ...]•
remove_schedule_ids — array of existing schedule ids to simulate removing:
["fss_abc", "fss_xyz"]•
change_intervals — map of schedule id → new interval: {"fss_abc": 3600}Response shape:
•
hypothetical — same structure as GET /recurring-cost but computed with the mutations applied•
current_baseline — actual current state for comparison•
delta — runs/credits/cost changes (positive = increase, negative = decrease)•
mutations_applied — counts of added/removed/changed•
would_exceed_cap? — boolean, true if hypothetical projection crosses your budget capUse case: "I want to add 20 creators to my TopAthletes watchlist as daily schedules. Will I stay under my $200 budget cap?" → call simulate with 20 hypothetical add_schedules, check
would_exceed_cap?, commit or adjust before hitting the real endpoints.Bulk scenario planning: combine removals + additions to see the net impact of a portfolio rotation: "remove 5 old creators, add 10 new ones at 6h interval instead of 24h".
"How much will my set-and-forget actually cost me this month?"
Single pass over all your schedules + digest schedules, extrapolates to monthly cost based on interval,
sums up the damage.
Response sections:
•
•
•
•
Month model: 30.44 days (365.25 / 12) — realistic average that handles 30- and 31-day months without user confusion.
Mode hints per schedule:
Use case: Pro user has 8 daily schedules + 2 weekly digests. Opens this endpoint before creating a 9th schedule. Sees "3500 runs/month projected, quota 1500, extra cost $400". Decides to upgrade to Business instead of blowing $400 in overage.
Pure compute, zero new state. Composes Schedules + DigestSchedules + Billing + Credits. Same pattern as
Response sections:
•
current — credits used this cycle, quota, headroom•
schedules — per-schedule and aggregate runs/month, credits/month, internal cost, overage cost•
digest_schedules — delivery counts (no credits, just webhook bandwidth)•
budget — base_monthly_usd + projected_extra_usd = projected_total_usd, plus headroom and
will_exhaust_quota? flag with days_until_exhaust countdownMonth model: 30.44 days (365.25 / 12) — realistic average that handles 30- and 31-day months without user confusion.
Mode hints per schedule:
"fits in quota",
"may trigger overage when combined with ad-hoc usage", or
"included (unlimited tier)". Lets the UI flag risky schedules without doing tier math client-side.Use case: Pro user has 8 daily schedules + 2 weekly digests. Opens this endpoint before creating a 9th schedule. Sees "3500 runs/month projected, quota 1500, extra cost $400". Decides to upgrade to Business instead of blowing $400 in overage.
Pure compute, zero new state. Composes Schedules + DigestSchedules + Billing + Credits. Same pattern as
/cost/forecast but focused on recurring-only spend.
Scrape templates (reusable presets)
POST/v1/full-scrape/templatesCreate a named preset — platforms, timeout, byok defaults API KEY
GET/v1/full-scrape/templates?include_system=trueList your templates (optionally including built-in system presets) API KEY
GET/v1/full-scrape/templates/systemList the 4 always-available built-in system presets API KEY
GET/v1/full-scrape/templates/:idSingle template (works for both user ids and _system:name ids) API KEY
PATCH/v1/full-scrape/templates/:idUpdate preset fields (rename, change platforms, etc.) API KEY
DELETE/v1/full-scrape/templates/:idRemove a template — does not affect existing schedules that referenced it API KEY
Define once, reuse everywhere. Create a template like
Resolution order: caller's explicit params ALWAYS win over template defaults. Template is a fallback. Pass
Lookup: reference by id (
Body params:
Limits: max 20 templates per api_key. BYOK tokens stored in template are echoed back by key name only (not value) — credentials never leave via GET.
Built-in system templates: 4 presets shipped with the platform, accessible by every api_key without any setup:
•
•
•
•
Reference any of these by name:
Works across: single
Example:
Delete semantics: deleting a template does NOT cascade to referencing schedules. Those schedules keep running with caller-only defaults (no platforms filter = scrape all). Graceful degradation — you don't break 50 schedules by removing a misnamed template.
"influencer_kit"
(platforms: instagram+tiktok+youtube) and reference it in any scrape call via ?template=influencer_kit
or {"template": "influencer_kit"} in the body. Works on single scrapes, batches, schedules,
and watchlist scrapes.Resolution order: caller's explicit params ALWAYS win over template defaults. Template is a fallback. Pass
platforms explicitly and the template's platforms are ignored.Lookup: reference by id (
tpl_abc) or by name (influencer_kit).
Name lookups are scoped per api_key — two different customers can both have a template called "default".Body params:
name (required, alphanumeric + underscore/dash, 1-64 chars, unique per key),
description (optional, max 300 chars),
platforms (list of platform atoms or "all"),
timeout_ms (5000-120000, optional),
byok_defaults (map of platform → credentials, optional).Limits: max 20 templates per api_key. BYOK tokens stored in template are echoed back by key name only (not value) — credentials never leave via GET.
Built-in system templates: 4 presets shipped with the platform, accessible by every api_key without any setup:
•
_system:default — all platforms, 45s timeout•
_system:influencer_kit — Instagram + TikTok + YouTube•
_system:tech_creator — GitHub + Reddit + Twitter•
_system:music_creator — Spotify + YouTube + TikTok + TwitterReference any of these by name:
{"template": "_system:influencer_kit"}. Use GET /templates/system
to list them, or GET /templates?include_system=true to see user + system in one list. Zero setup needed for a first-use user.Works across: single
POST /full-scrape/:username (sync/async),
batch POST /full-scrape/batch, scheduled scrapes POST /full-scrape/schedules, and
bulk watchlist scheduling POST /watchlists/:id/schedules. Scheduled scrapes persist the template
reference — the Scheduler tick re-resolves on every run, so updates to the template propagate to all future
fires automatically. Mutable templates, immutable schedules.Example:
POST /v1/full-scrape/neymarjr?async=true&template=influencer_kit runs an async scrape
with the template's platforms + timeout, no need to re-list them every time.Delete semantics: deleting a template does NOT cascade to referencing schedules. Those schedules keep running with caller-only defaults (no platforms filter = scrape all). Graceful degradation — you don't break 50 schedules by removing a misnamed template.
Watchlists (portfolio abstraction)
POST/v1/full-scrape/watchlistsCreate a named collection of usernames with optional tags and description API KEY
POST/v1/full-scrape/watchlists/from-top-creators?limit=20&since=2026-04-01T00:00:00ZAuto-populate a watchlist with the top-N creators by cost from a billing window API KEY
Auto-populate from spend history (iter 107).
Composition of iter 80 (
• Creating a "biggest cost drivers" list to monitor with tighter observation (anomaly alerts, schedules, manual review)
• Bootstrapping a client baseline — at the end of a pilot month, auto-populate a watchlist from everyone actually scraped and lock it in as the official list
• Discovering unexpected top spenders — "wait, why is @xyz in my top 20?" surfaces creators that leaked into the workflow without explicit tracking
Parameters:
Response shape:
Next steps — the response note lists three:
•
•
•
Use case — end-of-month baseline: agency onboards client-acme in March. By end of month, they've scraped 150 different creators. Some were intentional tracking, some were ad-hoc explorations. To lock in the official baseline for April, they call
cost_per_creator aggregator) + Watchlists.create.
Instead of hand-picking usernames, let the endpoint infer them from actual spend — "the
creators I already pay the most for". Useful for:• Creating a "biggest cost drivers" list to monitor with tighter observation (anomaly alerts, schedules, manual review)
• Bootstrapping a client baseline — at the end of a pilot month, auto-populate a watchlist from everyone actually scraped and lock it in as the official list
• Discovering unexpected top spenders — "wait, why is @xyz in my top 20?" surfaces creators that leaked into the workflow without explicit tracking
Parameters:
name (default "Top creators"),
limit (1-100, default 20), since/until (ISO8601,
default month-to-date), cost_center (optional filter — scope to one client's
top creators).Response shape:
{watchlist: {id, name, usernames, created_at, ...}, seeded_from: {creator_count,
since, until, cost_center_filter, creators: [...]}, note}. The
seeded_from block includes the full aggregator result (cost per creator,
event count, etc.) so the caller can see WHY each creator made the list.Next steps — the response note lists three:
•
POST /v1/full-scrape/watchlists/:id/scrape — scrape everyone in the new
list as a single batch•
POST /v1/full-scrape/watchlists/:id/schedules — create recurring schedules
for each member•
DELETE /v1/full-scrape/watchlists/:id — remove if the auto-populated list
isn't what you wantedUse case — end-of-month baseline: agency onboards client-acme in March. By end of month, they've scraped 150 different creators. Some were intentional tracking, some were ad-hoc explorations. To lock in the official baseline for April, they call
POST /v1/full-scrape/watchlists/from-top-creators?cost_center=client-acme&limit=50&since=2026-03-01T00:00:00Z.
The top 50 by spend (the ones actually being monitored regularly) become the official
watchlist. Outliers and one-off scrapes don't make the cut. Next month's recurring
schedules fire off this watchlist.
GET/v1/full-scrape/watchlistsList your watchlists with member counts and limits API KEY
GET/v1/full-scrape/watchlists/:idSingle watchlist with full member list and metadata API KEY
PATCH/v1/full-scrape/watchlists/:idRename, re-tag, or incrementally add/remove members via add_usernames/remove_usernames API KEY
DELETE/v1/full-scrape/watchlists/:idDelete a watchlist (members remain scrapeable individually) API KEY
POST/v1/full-scrape/watchlists/:id/scrapeBatch-scrape every member — supports async + webhook, reuses batch infra API KEY
GET/v1/full-scrape/watchlists/:id/trends?metric=followers&days=30Portfolio trend rollup — top growers, top decliners, total velocity API KEY
GET/v1/full-scrape/watchlists/:id/digest?days=7&metric=followersFull portfolio briefing — snapshots + trends + anomalies + billing + schedule health in one call API KEY
The daily briefing endpoint. Answers "what happened with my creators in the last N days?"
in a single composed response. Everything an agency account manager needs in one tab.
Response sections:
•
•
•
•
•
Query params:
Performance: ~4×N per-member reads where N = watchlist size. Typical latency < 50ms at max 50 members. Cacheable via
Use case: Monday morning — manager opens the dashboard, hits the digest endpoint for their 3 watchlists, sees: "TopAthletes gained 1.2M total followers this week with 2 high-severity spikes (neymarjr went viral), NewClients had 3 schedule errors, ContentBrands spent $18.60 in included credits."
Response sections:
•
snapshots — count per member, active vs silent•
trends — top 10 movers, top 10 laggards, status counts•
anomalies — total + by_severity/by_direction + 10 most extreme•
billing — total cost + per-member breakdown (filtered to watchlist members)•
schedules — active/inactive counts + members without coverage + last errorsQuery params:
days (1-30, default 7), metric
(default followers). Scoped per api_key.Performance: ~4×N per-member reads where N = watchlist size. Typical latency < 50ms at max 50 members. Cacheable via
Idempotency-Key header for expensive windows.Use case: Monday morning — manager opens the dashboard, hits the digest endpoint for their 3 watchlists, sees: "TopAthletes gained 1.2M total followers this week with 2 high-severity spikes (neymarjr went viral), NewClients had 3 schedule errors, ContentBrands spent $18.60 in included credits."
POST/v1/full-scrape/watchlists/:id/schedulesBulk-create schedules for every member in one call (Pro+ only) API KEY
The agency-level abstraction. Individual endpoints work on one username at a time.
Watchlists let you treat 10-50 creators as a portfolio — scrape them all with one call, see aggregate growth
trends, create schedules in bulk, track performance rollups.
Composition, not duplication. Under the hood, watchlist endpoints delegate to the existing primitives:
Limits: max 10 watchlists per API key, max 50 members per watchlist, max 100 chars in name, max 500 chars in description, max 20 tags.
Trend rollup shape:
Incremental membership updates: use
Use case: "I manage 25 creators, I want to scrape them all daily, see which ones are trending up this week, and get alerted when any of them has a viral event." → create watchlist, bulk-schedule all members daily, register anomaly alert with username="*". Three API calls and you're done.
Composition, not duplication. Under the hood, watchlist endpoints delegate to the existing primitives:
/scrape calls the batch endpoint with the member list, /schedules
loops Schedules.create, /trends calls Trends.analyze per member and rolls up
the results. No new scraping logic, no new billing logic.Limits: max 10 watchlists per API key, max 50 members per watchlist, max 100 chars in name, max 500 chars in description, max 20 tags.
Trend rollup shape:
total_slope_per_day, total_slope_per_week,
top_growers (5 best growers by slope), top_decliners (5 worst), counts per status
(trending/flat/noisy/insufficient). Ready for an "agency dashboard" card.Incremental membership updates: use
add_usernames or
remove_usernames in PATCH to avoid sending the full member list on every edit.Use case: "I manage 25 creators, I want to scrape them all daily, see which ones are trending up this week, and get alerted when any of them has a viral event." → create watchlist, bulk-schedule all members daily, register anomaly alert with username="*". Three API calls and you're done.
Webhook delivery health
GET/v1/full-scrape/webhook-healthList health records for all your webhook subjects (schedules, digest_schedules, anomaly_alerts, cost_alerts) API KEY
POST/v1/full-scrape/webhook-health/resetReset consecutive_failures after fixing a broken webhook URL API KEY
POST/v1/full-scrape/webhook-health/testFire a synthetic test payload sync — validates a URL without waiting for the next tick API KEY
GET/v1/full-scrape/webhook-deliveries?subject_type=cost_alert&limit=50Attempt-level inspection of the Oban retry queue — see individual delivery states, not just health aggregates API KEY
Attempt-level visibility (iter 86). The
Scoping: queries are filtered via
States returned:
Response shape: per-attempt
Debugging workflow:
1.
2.
3. Fix the root cause (DNS, TLS cert, server response code)
4.
5.
Use case — midnight Slack alert: your CostAlerts stop firing overnight. Morning coffee, you call
/webhook-health endpoint gives you the aggregate "is this subject healthy or
dead" signal. When you need to dig into WHY a subject is degrading — which retries are
still in flight, which attempt just failed, what error came back from the upstream —
/webhook-deliveries queries the Oban webhook_retry queue directly
and returns per-attempt state.Scoping: queries are filtered via
args->>'api_key_id' = <caller> in the oban_jobs table
(same JSONB pattern as iter 78 deferred batch listing). No cross-tenant leakage possible —
Oban job IDs from other api_keys are invisible.States returned:
scheduled (waiting for
next backoff), available (ready to run, waiting for worker), retryable
(failed, will retry), executing (running now), discarded (exhausted
all 5 attempts), cancelled (manually cancelled). Filter by
?subject_type=cost_alert / anomaly_alert / schedule /
digest_schedule / deferred_batch to narrow down.Response shape: per-attempt
{oban_job_id, state, subject_type, subject_id, webhook_url, attempt, max_attempts,
scheduled_at, inserted_at, attempted_at, last_error}. The
last_error is extracted from Oban's errors column — typically an HTTP status
code, timeout, or DNS failure. Plus a top-level by_state count breakdown for
quick dashboard rendering.Debugging workflow:
1.
/webhook-health → spot a subject with elevated consecutive_failures2.
/webhook-deliveries?subject_type=cost_alert → see the actual in-flight
retries + last errors3. Fix the root cause (DNS, TLS cert, server response code)
4.
/webhook-health/test → verify with a synthetic payload5.
/webhook-health/reset → re-enable the subject (clears consecutive_failures)Use case — midnight Slack alert: your CostAlerts stop firing overnight. Morning coffee, you call
GET /v1/full-scrape/webhook-deliveries?subject_type=cost_alert and see 3
deliveries in discarded state with last_error: "502 Bad Gateway"
pointing at your Slack webhook. Slack had an outage — you reset the subject, fire a test,
and the retry queue drains normally on the next alert trigger.
Centralized health tracking across 4 webhook systems.
Every webhook delivery (scrape, digest, anomaly alert, cost alert) is recorded against a
Status lifecycle:
•
•
•
Auto-disable behavior: when a subject crosses 5 consecutive failures, the corresponding config (schedule, digest_schedule, anomaly_alert, cost_alert) gets
Reset flow: fix your webhook URL →
Subject types:
Response fields:
Test endpoint:
• By subject: body
• By raw URL: body
Recovery flow: PATCH webhook_url → test it → reset health → PATCH enabled=true. Four calls total, no more waiting days for the next tick to validate your fix.
(subject_type, subject_id) tuple in a shared store. Track failure counts,
last success/failure timestamps, and auto-disable behavior for dead endpoints.Status lifecycle:
•
:healthy — last delivery succeeded, or no failures recorded•
:degraded — 1-4 consecutive failures, still retrying on next tick•
:dead — 5+ consecutive failures, source config auto-disabled to stop wasting resourcesAuto-disable behavior: when a subject crosses 5 consecutive failures, the corresponding config (schedule, digest_schedule, anomaly_alert, cost_alert) gets
enabled: false
automatically. Ticks still check, but skip dead subjects without firing. User must manually reset.Reset flow: fix your webhook URL →
POST /webhook-health/reset
with {"subject_type": "digest_schedule", "subject_id": "dsch_xxx"} → also PATCH
enabled=true on the source config to resume deliveries.Subject types:
schedule, digest_schedule,
anomaly_alert, cost_alert (for cost alerts, subject_id = api_key_id).Response fields:
consecutive_failures, total_failures,
total_successes, last_success_at, last_failure_at,
last_failure_reason, auto_disabled_at,
recent_attempts (ring buffer of last 20 attempts with at, outcome, reason).
Ordered by severity then recency.Test endpoint:
POST /webhook-health/test fires a synthetic
event: "webhook_test" payload synchronously and returns the delivery outcome with latency.
Two modes:• By subject: body
{"subject_type": "digest_schedule", "subject_id": "dsch_xxx"}
— looks up the existing config, fires through its webhook_url, records the attempt in health tracking• By raw URL: body
{"webhook_url": "https://my-new-endpoint.com/hook"}
— validates + fires a test payload to an arbitrary URL (pre-commit validation, not recorded in health)Recovery flow: PATCH webhook_url → test it → reset health → PATCH enabled=true. Four calls total, no more waiting days for the next tick to validate your fix.
Scheduled digest delivery (push-based briefings)
POST/v1/full-scrape/digest-schedulesRegister a recurring digest delivery — every N seconds, HMAC-signed webhook with the full watchlist digest API KEY
GET/v1/full-scrape/digest-schedulesList your digest schedules with limits and run counts API KEY
GET/v1/full-scrape/digest-schedules/:idSingle digest schedule with payload example and signature header docs API KEY
PATCH/v1/full-scrape/digest-schedules/:idPause/resume or change interval, days_window, webhook_url, metric API KEY
DELETE/v1/full-scrape/digest-schedules/:idStop future digest deliveries API KEY
Push-based version of
Zero credit consumption. Digest is pure compute over existing stores (snapshots, trends, anomalies, billing, schedules). Delivery costs are just webhook bandwidth.
Body:
Limits: Pro tier or higher required, max 5 digest schedules per API key, min interval 1 hour.
Architecture:
Payload wrapping:
Use case: "Every Monday 9am send my agency watchlist digest to this Slack webhook." Set it once, forget it, receive briefings forever. Set-and-forget monitoring at the portfolio level.
/watchlists/:id/digest.
Register once, get the full portfolio briefing delivered to your webhook on whatever schedule you want.
Most common use case: interval_seconds=604800 (weekly) pointed at a Slack channel.Zero credit consumption. Digest is pure compute over existing stores (snapshots, trends, anomalies, billing, schedules). Delivery costs are just webhook bandwidth.
Body:
watchlist_id (required, owned by you),
webhook_url (required, https only),
interval_seconds (required, min 3600 = 1 hour),
days_window (1-30, default 7 — how much history the digest covers),
metric (default followers — which metric trends use).Limits: Pro tier or higher required, max 5 digest schedules per API key, min interval 1 hour.
Architecture:
DigestScheduler GenServer ticks every minute,
finds due schedules via DigestSchedules.due_now/0, dispatches each in its own supervised Task
that composes the digest and delivers the webhook. A slow webhook cannot block other digests — isolation per config.Payload wrapping:
event: "watchlist_digest",
digest_schedule_id, watchlist_id, digest (the full briefing payload
identical to GET /watchlists/:id/digest), delivered_at.Use case: "Every Monday 9am send my agency watchlist digest to this Slack webhook." Set it once, forget it, receive briefings forever. Set-and-forget monitoring at the portfolio level.
Proactive anomaly webhooks (event-driven)
POST/v1/full-scrape/anomaly-alertsRegister a webhook that fires whenever a captured snapshot contains a new anomaly API KEY
GET/v1/full-scrape/anomaly-alertsList your registered anomaly alerts with fire counts API KEY
GET/v1/full-scrape/anomaly-alerts/:idSingle alert config with payload example and signature header docs API KEY
DELETE/v1/full-scrape/anomaly-alerts/:idRemove an anomaly alert config API KEY
Pull → push. The
Architecture: subscribes to the internal
Body:
Trigger types:
•
•
Cooldown behavior: after firing, the config is "cooling down" for N seconds. Any new anomalies detected during that window are silently skipped. Prevents your Slack channel from being flooded when a creator has 10 simultaneous platform anomalies during a viral event. Set
Payload (point_anomaly):
Payload (changepoint):
Use case: "Tell my Slack when @neymarjr has a viral event" → schedule daily scrape + anomaly alert with severity=medium + webhook to Slack → automated creator intel, zero polling.
/snapshots/anomalies endpoint requires polling.
This one registers a webhook and pushes the same data to you event-driven as snapshots land.Architecture: subscribes to the internal
snapshots:captured
PubSub topic. Every snapshot insertion triggers anomaly detection over the last 30 days; fired webhook per NEW
anomaly (dedup via fired_snapshot_ids set, capped at 500 entries).Body:
username (required, or "*" for any),
metric (default followers),
sigma_threshold (1.0-10.0, default 2.5),
severity_filter (low/medium/high, default low),
platforms (comma list or "all"),
cooldown_seconds (0-86400, default 300 — prevents webhook spam during bursts),
trigger_type (point_anomaly/changepoint, default point_anomaly — see below),
webhook_url (required, https only, signed with HMAC-SHA256).Trigger types:
•
point_anomaly (default): residual-based outlier detection. Fires when today's value
deviates ≥ sigma_threshold σ from the OLS-predicted value. Good for catching viral spikes, creator hacks,
bulk deletions — point-in-time shocks.•
changepoint: RSS-minimizing split detection (recursive binary segmentation). Fires when
a creator's growth trajectory shifts — acceleration, deceleration, or reversal — not just one
outlier day. Better for "this creator is losing steam" or "this creator just started going viral" style
signals. Dedupes via last_changepoint_at: only fires for changepoints strictly newer than
the last one seen. Payload event becomes growth_changepoint_detected and includes
pre/post slopes (per day + per week), slope_change, rss_improvement, and direction
(acceleration/deceleration/reversal).Cooldown behavior: after firing, the config is "cooling down" for N seconds. Any new anomalies detected during that window are silently skipped. Prevents your Slack channel from being flooded when a creator has 10 simultaneous platform anomalies during a viral event. Set
cooldown_seconds: 0 to fire every detection (legacy behavior).Payload (point_anomaly):
event: "snapshot_anomaly_detected", alert_id, username,
full anomaly (platform, direction, severity, residual_sigma, actual, predicted),
triggering_snapshot, occurred_at.Payload (changepoint):
event: "growth_changepoint_detected", alert_id, username,
full changepoint (platform, direction, changepoint_at, pre_slope_per_day, post_slope_per_day,
pre_slope_per_week, post_slope_per_week, slope_change, slope_change_pct, rss_improvement,
pre_samples, post_samples), triggering_snapshot, occurred_at.Use case: "Tell my Slack when @neymarjr has a viral event" → schedule daily scrape + anomaly alert with severity=medium + webhook to Slack → automated creator intel, zero polling.
Scrape diff (compare two jobs over time)
POST/v1/full-scrape/diffCompare two completed scrape jobs — per-platform deltas, aggregate follower change, direction counts API KEY
The killer feature for scheduled scrapes. Pass two
Body:
Per-platform changes detected:
Metrics extracted: followers, following, posts, engagement_rate, video_count, subscribers. Each delta has
Aggregate shape:
job_ids
of completed scrapes and get back a structured diff showing exactly what changed between them.Body:
{"job_id_a": "fs_xxx", "job_id_b": "fs_yyy"} —
convention is A = older, B = newer, but not enforced. Both jobs must belong to you and be :complete.Per-platform changes detected:
:changed (metrics diffed),
:added (only in newer),
:removed (only in older),
:recovered (failed → ok),
:failed_in_b (ok → failed),
:still_failing.Metrics extracted: followers, following, posts, engagement_rate, video_count, subscribers. Each delta has
{from, to, abs, pct, direction}. K/M suffixed strings
("12.5K") are parsed automatically.Aggregate shape:
total_follower_change,
total_follower_change_pct, counts of platforms improved/declined/flat/added/removed,
plus status transitions. Ready to render a "since last week" summary card.
Recurring schedules (Pro+ only)
POST/v1/full-scrape/schedulesCreate a recurring scrape — runs every N seconds, fires webhook on completion API KEY
GET/v1/full-scrape/schedulesList your schedules with tier eligibility info API KEY
PATCH/v1/full-scrape/schedules/:idPause/resume, change interval or webhook url API KEY
DELETE/v1/full-scrape/schedules/:idDelete a schedule — stops future runs API KEY
POST/v1/full-scrape/schedules/pause-allBulk-disable every schedule for the api_key — emergency stop for incident response API KEY
POST/v1/full-scrape/schedules/resume-allBulk re-enable counterpart of pause-all — recurrence timing resumes from each schedule's next_run_at API KEY
Bulk pause/resume (iter 102). Single UPDATE query
toggles
Pairs with /budget-cap/freeze-bulk (iter 101) for a complete emergency stop during incidents:
1.
2.
3. Investigation happens
4.
5.
Total: 4 API calls to pause and resume across N clients, regardless of how many schedules or cost centers are involved.
In-flight runs caveat: schedules that were ALREADY enqueued into the Oban cron queue before the pause may still fire. Pause prevents FUTURE enqueues but doesn't cancel runs mid-execution. To cancel an in-flight batch, use
Individual control still available:
enabled on every schedule for the calling api_key. Idempotent — safe
to retry. Returns affected_count so the caller knows how many rows changed.Pairs with /budget-cap/freeze-bulk (iter 101) for a complete emergency stop during incidents:
1.
POST /v1/full-scrape/budget-cap/freeze-bulk {"cost_centers": [...]} — all
enforcement flips to 402 for the frozen clients2.
POST /v1/full-scrape/schedules/pause-all — all recurring jobs stop firing
new runs3. Investigation happens
4.
POST /v1/full-scrape/schedules/resume-all — schedules pick up from where
they left off (recurrence timing isn't reset; each schedule's individual
next_run_at governs the first run after resume)5.
POST /v1/full-scrape/budget-cap/unfreeze-bulk {"cost_centers": [...]} —
spending resumesTotal: 4 API calls to pause and resume across N clients, regardless of how many schedules or cost centers are involved.
In-flight runs caveat: schedules that were ALREADY enqueued into the Oban cron queue before the pause may still fire. Pause prevents FUTURE enqueues but doesn't cancel runs mid-execution. To cancel an in-flight batch, use
DELETE /v1/full-scrape/jobs/:id (single) or
DELETE /v1/full-scrape/orchestrations/:id (bulk, iter 99).Individual control still available:
PATCH /v1/full-scrape/schedules/:id continues working for fine-grained
per-schedule toggles. Pause-all is an operator-level convenience, not a replacement.
Body params for POST /schedules:
Constraints: Pro tier or higher required, max 10 schedules per API key, min interval 5 min. Each run consumes one credit. If credit quota is exhausted, the schedule records an error but stays enabled and retries next cycle.
Volatility: Schedules are ETS-backed and lost on BEAM restart. DB persistence is future work. The scheduler ticks every 30 seconds, so drift from the exact interval can be up to 30s.
Use case: "Track @neymarjr every 6 hours and webhook my Slack" → daily monitoring without polling.
username (required) — the creator to scrapeinterval_seconds (required, min 300) — how often to re-runwebhook_url (optional) — https URL that receives each result (HMAC-signed)platforms (optional) — restrict scope, e.g. "instagram,tiktok"Constraints: Pro tier or higher required, max 10 schedules per API key, min interval 5 min. Each run consumes one credit. If credit quota is exhausted, the schedule records an error but stays enabled and retries next cycle.
Volatility: Schedules are ETS-backed and lost on BEAM restart. DB persistence is future work. The scheduler ticks every 30 seconds, so drift from the exact interval can be up to 30s.
Use case: "Track @neymarjr every 6 hours and webhook my Slack" → daily monitoring without polling.
Webhook callbacks (eliminate polling)
Pass
Payload signing: Every delivery carries an
Other headers:
Retries: 1 retry after 500ms on network error or non-2xx response. Your handler must be idempotent — we don't do deduplication.
Timeout: 10s per attempt. Slow handlers get marked as failed in the jobs store but the result is still readable via polling.
webhook_url (https only, no localhost/private IPs) when submitting an async scrape and
BuyCrowds will POST the result to that URL when the job finishes — success or failure.Payload signing: Every delivery carries an
X-BuyCrowds-Signature: sha256=<hex> header where the hex is
HMAC-SHA256(api_key.key, raw_body). Verify it before trusting the payload.Other headers:
X-BuyCrowds-Event (full_scrape.completed / .failed),
X-BuyCrowds-Job-Id, X-BuyCrowds-Delivery-Attempt (1 or 2).Retries: 1 retry after 500ms on network error or non-2xx response. Your handler must be idempotent — we don't do deduplication.
Timeout: 10s per attempt. Slow handlers get marked as failed in the jobs store but the result is still readable via polling.
Activity timeline (unified chronological feed)
GET/v1/full-scrape/activity?since=2026-04-09T00:00:00Z&cost_center=client-acme&limit=100Chronologically merged feed — billing events + job archives in one sorted stream API KEY
"What happened?" Merges two data sources into one
time-sorted feed:
• Billing events — per-scrape charges, overages, refunds, scheduled runs (iter 50+)
• Job archives — terminal-state batches from both immediate (iter 71) and deferred (iter 79) execution paths
Why merge them:
Each entry:
Defaults + filters: default window is the last 24 hours. Override with
counts_by_source rollup: response includes
Use case — compliance audit: legal asks "show me everything that happened for client-acme between April 1 and April 7". One call:
Use case — operator live dashboard: ops team's live view polls
• Billing events — per-scrape charges, overages, refunds, scheduled runs (iter 50+)
• Job archives — terminal-state batches from both immediate (iter 71) and deferred (iter 79) execution paths
Why merge them:
/cost/events shows
individual event charges. /jobs/history shows batch-level job state. Neither
alone tells the full story of "what happened in the last hour" — you see charges without
knowing which batch they belonged to, or batches without the intermediate charge timeline.
Activity feed interleaves both for operator dashboards and compliance review.Each entry:
{source, occurred_at, summary, ...}. The source tag is either
"billing_event" or "job_archive", letting UIs render distinct
icons. The summary string is a human-readable headline for the entry
("Scraped @neymarjr · client-acme", "Batch complete — 20 user(s)",
"Refunded @lewis.hamilton").Defaults + filters: default window is the last 24 hours. Override with
?since=...&until=... (ISO8601 UTC). Filter by
?cost_center=client-acme to scope to a single client. ?limit=
caps entries (default 100, max 500).counts_by_source rollup: response includes
{billing_event: N, job_archive: M} for quick metric rendering.Use case — compliance audit: legal asks "show me everything that happened for client-acme between April 1 and April 7". One call:
GET /v1/full-scrape/activity?since=2026-04-01T00:00:00Z&until=2026-04-07T23:59:59Z&cost_center=client-acme&limit=500.
Returns every scrape, every batch, every refund in chronological order — ready to paste into
a compliance report without stitching events + jobs manually.Use case — operator live dashboard: ops team's live view polls
/v1/full-scrape/activity?since=<5min ago>&limit=50 every 30 seconds.
The feed shows scrape activity + batch completions in real time. When something unusual
happens (spike of refunds, deferred batch fires), it shows up in the feed without requiring
a separate query per source.
Creator dashboard (unified operational view)
GET/v1/full-scrape/creators/:usernameOne-call dashboard — quarantine status + latest snapshot + 30d activity + recommendation API KEY
GET/v1/full-scrape/creators/compare?usernames=a,b,cSide-by-side snapshot comparison for 2-5 creators — zero credits, pure cache read API KEY
Creator comparison (iter 95). Pass 2-5 usernames and
get a side-by-side view of their latest snapshots. Uses the same
Parameters:
Response shape:
shared_platforms: computed as the intersection of platforms across all creators that DO have a snapshot. Lets the client render only the columns that exist for everyone — avoids the "tiktok metrics for 2 creators, instagram for 3" display problem. Empty list if nothing is shared.
Missing snapshots: creators without a snapshot in the last 30 days appear in
Use case — creator vetting: agency considering 3 potential creators for a campaign. Call
Pairs with /creators/:username: use compare for quick multi-creator views, and drill into a single creator via the dashboard endpoint when you need quarantine status, activity history, and scrape recommendation.
fresh_snapshot/3 helper as iter 57 (max_age) and iter 84 (creator dashboard)
so results reflect the same cache state — zero credits, zero billing events, zero Apify
calls.Parameters:
?usernames=neymarjr,lewis.hamilton,zendaya — comma-separated, max 5, duplicates
deduped. Minimum 2 (for a single creator use /creators/:username instead).Response shape:
{requested, with_snapshot, without_snapshot, shared_platforms, comparisons,
missing_usernames, note}. Each comparison entry is either
{username, snapshot: {id, captured_at, age_seconds, per_platform}} or
{username, snapshot: null, hint}.shared_platforms: computed as the intersection of platforms across all creators that DO have a snapshot. Lets the client render only the columns that exist for everyone — avoids the "tiktok metrics for 2 creators, instagram for 3" display problem. Empty list if nothing is shared.
Missing snapshots: creators without a snapshot in the last 30 days appear in
missing_usernames and their comparison entry has
snapshot: null + a hint pointing to
POST /v1/full-scrape/<username>. The caller sees which ones need a
fresh scrape before the comparison is complete.Use case — creator vetting: agency considering 3 potential creators for a campaign. Call
/v1/full-scrape/creators/compare?usernames=creator_a,creator_b,creator_c.
Response returns side-by-side follower counts, engagement rates, and post counts across
their shared platforms. Account manager compares in 10 seconds without running individual
scrapes (if they're already cached from recent scheduled jobs).Pairs with /creators/:username: use compare for quick multi-creator views, and drill into a single creator via the dashboard endpoint when you need quarantine status, activity history, and scrape recommendation.
Everything about a creator in one call. Composes
quarantine state (iter 69), latest cached snapshot (iter 57), 30-day event history (iter 82
username filter), reliability math (iter 68), and a scrape recommendation into a single
response. Agencies reviewing a specific creator get operational context without stitching
5 endpoints together.
Response shape:
•
•
•
•
•
•
Recommendation decision tree (priority order):
1. skip (severity high) — creator is quarantined. Reason explains refund count + how to fix.
2. use_cached (severity low) — fresh snapshot within 5 minutes exists. Hints at the
3. proceed_with_caution (severity medium) — success rate under 70% in the last 30 days. Scraping will likely trigger refunds — investigate upstream before committing.
4. proceed (severity low) — creator is healthy, no quarantine, no recent failures.
Use case — pre-flight check before ad-hoc scrape: account manager gets a request from client "can you grab fresh data on @neymarjr?". Instead of firing a scrape blind, they
Pairs with /v1/cost/creators: the creators aggregator (iter 80) gives you the top-N view of your entire portfolio. This endpoint drills into ONE specific creator. Click through an aggregator row → single-creator dashboard for full context.
Response shape:
•
username — normalized (lowercase, trimmed)•
quarantine — {flagged, details: {refund_count, total_refund_usd,
first_refund_at, last_refund_at}}•
latest_snapshot — most recent non-expired snapshot (up to 30 days old), or
null. Includes id, captured_at, age_seconds, and the
full per_platform blob•
activity_30d — {total_events, total_attempts, included, overage,
refunded, total_cost_usd, success_rate, success_rate_pct, first_event_at,
last_event_at}•
recommendation — {action, severity, reason, fresh_snapshot_available}•
drill_down — ready-to-use URLs for events log, CSV export, and cached probe
scoped to this creatorRecommendation decision tree (priority order):
1. skip (severity high) — creator is quarantined. Reason explains refund count + how to fix.
2. use_cached (severity low) — fresh snapshot within 5 minutes exists. Hints at the
?max_age_seconds=300 query to hit the zero-credit cache path.3. proceed_with_caution (severity medium) — success rate under 70% in the last 30 days. Scraping will likely trigger refunds — investigate upstream before committing.
4. proceed (severity low) — creator is healthy, no quarantine, no recent failures.
Use case — pre-flight check before ad-hoc scrape: account manager gets a request from client "can you grab fresh data on @neymarjr?". Instead of firing a scrape blind, they
GET /v1/full-scrape/creators/neymarjr first. Response says "use_cached — fresh
snapshot from 2 minutes ago". They return the cached data to the client at zero cost, zero
credits consumed. Or if the response says "skip — quarantined with 5 refunds", they warn the
client that the account needs attention before re-scraping.Pairs with /v1/cost/creators: the creators aggregator (iter 80) gives you the top-N view of your entire portfolio. This endpoint drills into ONE specific creator. Click through an aggregator row → single-creator dashboard for full context.
Creator quarantine (skip known-bad creators)
GET/v1/full-scrape/quarantine?window_days=7&min_refunds=3List creators with 3+ refunds in the last 7 days — derived live from billing events API KEY
Pay attention, not credits. When a creator has been failing
repeatedly (account went private, platform blocking, stale handle), you're burning credits +
triggering refunds on every batch that includes them. Quarantine surfaces those creators so you
can drop them from future batches before they cost you anything.
Derivation: pure query over
Response shape: per-creator
Pre-flight integration: every pre-flight response now includes a
Opt-in skip on batch: add
Clearing: no manual clear endpoint. Quarantine is derived, not stored. Fix the underlying issue (account restored, platform block lifted) and wait for the 7-day window to slide past the last refund event. The creator drops off the list naturally as the sliding window advances. If you need an immediate override, change the thresholds via query params on the
Use case — weekly refresh protection: an agency runs a weekly 20-creator batch for each client. Over time, 2-3 creators reliably fail (private accounts, stale handles). Adding
Derivation: pure query over
:full_scrape_refund events. A creator is flagged when they accumulated
min_refunds (default 3) refunds in the last window_days (default 7).
Both configurable via query params — max 30 days / 20 refunds. No stored state; flag auto-expires
as refund events slide out of the window.Response shape: per-creator
{username, refund_count, total_refund_usd, first_refund_at, last_refund_at} sorted
by refund_count descending. Plus a policy section showing the current thresholds and
a usage section explaining how to act on the list.Pre-flight integration: every pre-flight response now includes a
quarantine section at the top level — flagged_in_batch lists
which of the batch's usernames are currently flagged, and each per_username entry
gains quarantined: true/false + quarantine_info with refund history.
Agencies see before they commit what's risky.Opt-in skip on batch: add
?skip_quarantined=true to
POST /v1/full-scrape/batch and the controller drops flagged creators from the batch
BEFORE authorization check + BEFORE credit consumption. Skipped creators cost nothing (no credit,
no scrape, no refund). If every username in the batch is quarantined, the endpoint returns
422 all_usernames_quarantined with the list of dropped creators. If you want to
force-run despite the warning, omit the flag — quarantine is advisory by default, not enforcing.Clearing: no manual clear endpoint. Quarantine is derived, not stored. Fix the underlying issue (account restored, platform block lifted) and wait for the 7-day window to slide past the last refund event. The creator drops off the list naturally as the sliding window advances. If you need an immediate override, change the thresholds via query params on the
/quarantine endpoint for your specific call — the batch
skip-quarantined filter uses the default thresholds.Use case — weekly refresh protection: an agency runs a weekly 20-creator batch for each client. Over time, 2-3 creators reliably fail (private accounts, stale handles). Adding
?skip_quarantined=true to the batch call means the weekly
refresh automatically stops burning credits on those creators, and a Slack alert wired to
/v1/full-scrape/quarantine tells the account manager which clients need handle
updates.
Conditional on-demand (freshness-aware, credit-saving)
GET/v1/full-scrape/cached/:username?max_age_seconds=3600Zero-credit cache probe — returns the latest snapshot if fresh, 404 if not API KEY
Pay only for stale data. Every full-scrape endpoint now
accepts a freshness window via
Single scrape:
Batch scrape:
Async batch interaction: when the async flag is combined with
Quote token interaction: if you use a price-lock token with
GET /v1/full-scrape/cached/:username — pure cache probe. Default max_age is 7 days if not specified. Returns the snapshot or 404. Use this as the cheapest possible way to answer "do I already have this?" before even looking at quote tokens or pre-flight.
Use case — dashboard polling: a client-facing dashboard polls
max_age_seconds / max_age_minutes /
max_age_hours (1s min, 30d max). When a snapshot exists within that window, the request
is served from cache with zero credits consumed, zero billing events, zero Apify calls.Single scrape:
POST /v1/full-scrape/neymarjr?max_age_seconds=600 — returns the cached snapshot if
captured in the last 10 minutes, otherwise runs a fresh live scrape and bills normally. The cached
response includes cached: true, cached_snapshot.{id, captured_at, age_seconds, per_platform},
billing.credit_used: false, and a savings_usd_vs_live_scrape line (estimates
overage charge avoided if you were past quota).Batch scrape:
POST /v1/full-scrape/batch {"usernames": [...], "max_age_seconds": 3600}. The controller
pre-scans all usernames, splits them into cached vs stale, and only runs the scrape
loop for the stale ones. Cached results are stitched into the same results array with
mode: "cached". Top-level adds cached_count, scraped_count,
savings: {credits_saved, overage_scrapes_avoided, overage_charge_avoided_usd, our_marginal_cost_saved_usd}.
If all usernames hit cache: all_cached: true, zero auth check, immediate return.Async batch interaction: when the async flag is combined with
max_age_seconds, only the stale subset is enqueued as a scrape job. The 202 response
carries stale_to_scrape, cached_hits, and stale_usernames[]
so the client knows exactly what's running in background vs what already landed instantly. When the
job completes, the final result merges cached + scraped into one unified results list.Quote token interaction: if you use a price-lock token with
max_age_seconds, pre-flight must be called with the SAME max_age_seconds
so the quoted usernames hash matches the stale subset at commit time. Otherwise batch returns
422 quote_token_usernames_mismatch.GET /v1/full-scrape/cached/:username — pure cache probe. Default max_age is 7 days if not specified. Returns the snapshot or 404. Use this as the cheapest possible way to answer "do I already have this?" before even looking at quote tokens or pre-flight.
Use case — dashboard polling: a client-facing dashboard polls
POST /v1/full-scrape/neymarjr?max_age_seconds=300 every 60 seconds. First hit runs a
live scrape and caches the snapshot. The next 4 polls (within the 5-minute window) return the cached
snapshot at zero cost. 80% credit savings with no client logic changes.
Pre-flight (dry-run cost + feasibility projection)
POST/v1/full-scrape/pre-flightDry-run a batch scrape — returns full cost + feasibility report without consuming credits API KEY
Zero-side-effect cost projection. Body:
NO credits consumed. NO billing events logged. NO jobs created. Feasibility is evaluated against LIVE state at this exact moment — credit counter, MTD cost aggregate, rate-limit bucket, budget cap. Call this right before dispatching the real batch to know exactly what you're committing to. The canonical pattern for accounting-grade on-demand scraping:
Use case: you're about to fire a 20-creator monthly refresh for a client. Pre-flight tells you "15 included, 5 overage ($2.50), projected MTD $87.42 against $200 cap, can_proceed: true". You log that projection, fire the batch, and reconcile against the receipt endpoint afterward.
Price-lock quote tokens (atomic plan → commit): when
Quote token error cases:
•
•
•
•
•
The
{"usernames": ["neymarjr", "lewis.hamilton", ...]} (max 20). Returns per-username
feasibility (cached snapshot available? simulated mode — included/overage/quota_exhausted — with charge),
aggregate credit accounting (already_used_this_month, remaining_before,
remaining_after_if_proceed, included_units, overage_units,
quota_exhausted_units), total cost projection (charged vs our marginal),
budget_cap feasibility check (projected_spent_after_usd,
headroom_after_usd, would_exceed_cap?, blocks_request?),
rate-limit headroom (per_minute_remaining, headroom_after), and a
top-level can_proceed boolean with human-readable blocking_reasons.NO credits consumed. NO billing events logged. NO jobs created. Feasibility is evaluated against LIVE state at this exact moment — credit counter, MTD cost aggregate, rate-limit bucket, budget cap. Call this right before dispatching the real batch to know exactly what you're committing to. The canonical pattern for accounting-grade on-demand scraping:
pre-flight → human review → batch.Use case: you're about to fire a 20-creator monthly refresh for a client. Pre-flight tells you "15 included, 5 overage ($2.50), projected MTD $87.42 against $200 cap, can_proceed: true". You log that projection, fire the batch, and reconcile against the receipt endpoint afterward.
Price-lock quote tokens (atomic plan → commit): when
can_proceed: true, the response includes a quote section with a
Phoenix.Token-signed quote_token valid for 300 seconds. The token binds
(api_key_id, usernames_hash, cost_center, total_charged_usd) with HMAC-SHA256 over the
app's secret_key_base. Pass the token back to POST /v1/full-scrape/batch as
"quote_token": "..." and the batch commits at the quoted price — even if your credit
balance shifted between pre-flight and commit. The batch response echoes
quote.locked: true with the quoted vs actual cost.Quote token error cases:
•
410 quote_token_expired — token is past its 300s max_age; re-run pre-flight•
422 quote_token_invalid — signature failed (tampered / wrong environment)•
403 quote_token_api_key_mismatch — different api_key than the one that issued the quote•
422 quote_token_usernames_mismatch — batch usernames differ from the quoted list (hash mismatch)•
422 quote_token_cost_center_mismatch — cost_center at commit disagrees with quoteThe
quote_token is optional: omit it and the batch runs at live-state pricing (no
lock). Include it and the batch enforces strict redemption — no silent degradation if anything drifts.
Batch scrape (authed — atomic multi-username, sync or async)
POST/v1/full-scrape/batchScrape up to 20 usernames in one call — all-or-nothing credit check, optionally async API KEY
Body:
Max 20 usernames per batch. The upfront authorization check is atomic — if your tier can't afford the whole batch (Free hits quota), zero credits are consumed and 402 is returned. Once the batch is authorized, credits are consumed per-item as scrapes run.
Async mode (
Per-item failure isolation (iter 55): individual scrape failures inside a batch no longer crash the whole batch. Each failed username auto-refunds its credit (increments the refund bucket + writes a negative
Batch aggregate adds (iter 55):
Smart retry before refund (iter 64): before giving up and triggering the per-item auto-refund, each scrape call is retried up to 2 times with exponential backoff (1s, 2s). Max added latency per failing item: 3 seconds. First attempt is always fast — only transient failures pay the retry cost. Per-item result now carries an
Retry telemetry on the batch aggregate:
Sync batch latency note: sync batches now incur up to 3s extra per failed item before the retry layer gives up. A 20-item batch with one flaky username can take 3s longer to respond. If that's unacceptable, use async mode — the retry layer runs in background and the caller polls.
Retry config override (iter 75): pass optional
Backoff schedule per max_retries:
•
•
•
•
•
•
Use cases:
• Latency-sensitive dashboards (
• Default batch workloads (
• High-stakes monthly refresh (
Refund semantics preserved: items that fail all their retries still go through the iter 55 auto-refund path. The difference is just how many attempts happen before giving up. The per-item
Deferred one-shot scrapes (iter 77): pass an optional
Request shape:
Response 202:
Auth at RUN time, not SUBMIT time. Critical design choice: when a deferred batch fires hours or days later, the caller's state (quota, budget cap, sub-caps, frozen centers) may have shifted. The worker re-evaluates authorization at fire time and fires a
On successful fire: worker iterates the usernames serially, consumes credits per-item (matching the normal batch semantics), handles per-item failures via the same refund path (iter 55), logs billing events with
Oban config: dedicated queue
Use case — off-peak weekly refresh: agency runs a 30-creator weekly batch. Upstream platforms rate-limit harder during business hours, so they want the batch to run at 3am UTC. One call:
Management endpoints (iter 78):
•
•
Why no post-fire cancel: once a deferred batch fires, it runs through the normal per-item process loop — per-item failures auto-refund (iter 55), the whole-batch webhook fires when done. If you need to "cancel" a fired batch's effects, use the refund path on individual jobs after-the-fact. Deferred cancel is pre-fire only.
Unified history via archive (iter 79): fired deferred batches are now mirrored to the
•
•
•
• The
Billing events still carry the flag: per-scrape billing events logged by the deferred worker stamp
{"usernames": ["user1", "user2", ...], "platforms": "instagram,tiktok", "async": true, "webhook_url": "https://...", "cost_center": "client-acme", "quote_token": "SFMyNTY..."}Max 20 usernames per batch. The upfront authorization check is atomic — if your tier can't afford the whole batch (Free hits quota), zero credits are consumed and 402 is returned. Once the batch is authorized, credits are consumed per-item as scrapes run.
Async mode (
async: true): returns 202 with a job_id immediately.
Task processes usernames serially in background; progress field on the job updates as each
username completes (GET /v1/full-scrape/jobs/:id shows {"completed": 12, "total": 20, "percent": 60.0}).
Single webhook fires on batch completion with full aggregate result — event type is full_scrape.batch_completed.Per-item failure isolation (iter 55): individual scrape failures inside a batch no longer crash the whole batch. Each failed username auto-refunds its credit (increments the refund bucket + writes a negative
full_scrape_refund billing event with
metadata refund_reason: "auto_batch_item_failure"), and the batch continues. The
per-item result carries
{error: "<msg>", refunded: true, refund_amount_usd: -0.50, scrape: null, cost_usd: 0.0}
while successful items keep their normal shape.Batch aggregate adds (iter 55):
successful_count, failed_count, refunded_count,
refund_total_usd,
billing.refunded_overage_count,
billing.net_overage_count, and
billing.net_overage_charge_usd — the customer's real bill after refunds net through.
Included-tier scrapes that fail also refund (against the sliding credit bucket) even though they
have zero dollar charge, so your quota is fully protected against flaky upstreams.Smart retry before refund (iter 64): before giving up and triggering the per-item auto-refund, each scrape call is retried up to 2 times with exponential backoff (1s, 2s). Max added latency per failing item: 3 seconds. First attempt is always fast — only transient failures pay the retry cost. Per-item result now carries an
attempts field (1 = first try succeeded, 2 = recovered on first retry, 3 = recovered on
second retry or final failure).Retry telemetry on the batch aggregate:
retries.items_that_retried,
retries.total_retry_attempts,
retries.recovered_by_retry (successes that would have been refunds without the retry),
and
retries.saved_refunds_usd (the dollar value the retry layer prevented from being
refunded). Operators can track this field to see how much the retry layer is earning the customer —
a healthy upstream should see near-zero saved_refunds_usd; a flaky one might recover
$5-10/day automatically.Sync batch latency note: sync batches now incur up to 3s extra per failed item before the retry layer gives up. A 20-item batch with one flaky username can take 3s longer to respond. If that's unacceptable, use async mode — the retry layer runs in background and the caller polls.
Retry config override (iter 75): pass optional
max_retries=N (integer 0-5) on scrape/batch to tune the retry budget per request.
Clamped to [0, 5] at the plug boundary — out-of-range values are silently corrected.Backoff schedule per max_retries:
•
0 — [] — no retries, fail immediately on first error (fastest, lowest tolerance)•
1 — [1s] — max +1s added latency•
2 — [1s, 2s] — default, +3s max•
3 — [1s, 2s, 4s] — +7s max•
4 — [1s, 2s, 4s, 15s] — +22s max•
5 — [1s, 2s, 4s, 15s, 60s] — +82s max, extreme-reliabilityUse cases:
• Latency-sensitive dashboards (
max_retries=0): fail-fast, surface errors
immediately, client can retry with its own backoff strategy.• Default batch workloads (
max_retries=2): balanced, absorbs most transient
blips without blowing HTTP budgets.• High-stakes monthly refresh (
max_retries=5): an agency generating the
monthly client report will tolerate +82s to maximize success rate over the whole batch.Refund semantics preserved: items that fail all their retries still go through the iter 55 auto-refund path. The difference is just how many attempts happen before giving up. The per-item
attempts field in the response reflects the
actual count, and retries.saved_refunds_usd still tells operators how much the
retry layer recovered.Deferred one-shot scrapes (iter 77): pass an optional
run_at (ISO8601 UTC) on the batch body and the controller hands the batch off to
an Oban-scheduled worker instead of running it inline. Must-have fields:
run_at (must be in the future) + webhook_url (required — deferred
batches can't be polled because the scheduled fire time is typically beyond the 1-hour ETS TTL
and the job is tracked in Oban, not Jobs).Request shape:
{"usernames": [...], "run_at": "2026-04-10T03:00:00Z", "webhook_url": "https://...",
"cost_center": "...", "labels": [...], "platforms": "..."}. Past timestamps are silently
rejected (run_at parses to nil and the batch runs inline instead).Response 202:
{deferred: true, oban_job_id, scheduled_at, batch_size, cost_center, webhook_url,
webhook_event_on_completion: "deferred_batch_completed", webhook_event_on_failure:
"deferred_batch_failed"}. The oban_job_id is the handle for cancellation
(future work) — the Jobs ETS system is NOT involved until the worker actually fires and
creates per-item billing events.Auth at RUN time, not SUBMIT time. Critical design choice: when a deferred batch fires hours or days later, the caller's state (quota, budget cap, sub-caps, frozen centers) may have shifted. The worker re-evaluates authorization at fire time and fires a
deferred_batch_failed webhook with the specific reason
(quota_exhausted_at_run_time, cost_center_frozen: client-acme,
budget_cap_exceeded_at_run_time) instead of silently consuming credits from a
stale authorization. Agencies setting up overnight runs get honest feedback on whether their
state drifted.On successful fire: worker iterates the usernames serially, consumes credits per-item (matching the normal batch semantics), handles per-item failures via the same refund path (iter 55), logs billing events with
metadata.deferred: true so reports can distinguish deferred spend from live spend,
then delivers a
deferred_batch_completed webhook with
{event, oban_job_id, scheduled_at, completed_at, elapsed_ms, batch_size,
successful_count, failed_count, results[]}.Oban config: dedicated queue
:deferred_scrape with concurrency 5 (lower than webhook_retry's 20 because each
job runs a full batch of up to 20 scrapes serially — total concurrent scrapes ≈ 100).Use case — off-peak weekly refresh: agency runs a 30-creator weekly batch. Upstream platforms rate-limit harder during business hours, so they want the batch to run at 3am UTC. One call:
POST /v1/full-scrape/batch {"usernames": [...], "run_at": "2026-04-10T03:00:00Z",
"webhook_url": "https://ops.agency.com/batch-done"}. The batch sleeps in Oban until
3am, fires, and the ops webhook receives the aggregated result before the team starts their
day.Management endpoints (iter 78):
•
GET /v1/full-scrape/deferred — lists your pending deferred batches. Queries the
oban_jobs table scoped to worker = DeferredBatch +
state in (scheduled, available, retryable) +
args->>'api_key_id' = <caller> (JSONB containment). Returns
{count, pending: [{oban_job_id, state, scheduled_at, batch_size, cost_center,
webhook_url, replay_of, max_retries, max_age_seconds, inserted_at}, ...]} ordered by
scheduled_at ASC.•
DELETE /v1/full-scrape/deferred/:oban_job_id — cancel a pending deferred batch
before it fires. Ownership is verified by checking args.api_key_id matches the
caller. Errors: 404 deferred_job_not_found, 403 deferred_job_owned_by_another_api_key,
422 deferred_job_already_fired (job already transitioned past a cancellable
state).Why no post-fire cancel: once a deferred batch fires, it runs through the normal per-item process loop — per-item failures auto-refund (iter 55), the whole-batch webhook fires when done. If you need to "cancel" a fired batch's effects, use the refund path on individual jobs after-the-fact. Deferred cancel is pre-fire only.
Unified history via archive (iter 79): fired deferred batches are now mirrored to the
job_archives table (iter 71) with an id of the
form def_<oban_id>. This closes the audit gap — both successful runs
(status: complete) and auth-failures at run time (status: failed) get
archived. Downstream effects:•
GET /v1/full-scrape/jobs/history now surfaces deferred runs alongside ETS-sourced
batches. Filter by ?cost_center= or ?status=failed to find them.•
GET /v1/full-scrape/jobs/def_42/receipt works because the receipt endpoint
falls back to archive on ETS miss (iter 71).•
POST /v1/full-scrape/jobs/def_42/replay and
POST /v1/full-scrape/jobs/replay-bulk work too — bulk weekly refreshes can mix
ETS job ids and deferred job ids freely.• The
def_ prefix is visually distinct from the ETS-sourced fs_
prefix, so operators can tell execution path at a glance in logs and dashboards.Billing events still carry the flag: per-scrape billing events logged by the deferred worker stamp
metadata.deferred: true (distinct from
the job_archives record), so /v1/cost/events/export filtered CSV dumps can
separate deferred spend from immediate spend at the event granularity — useful when an agency
wants "what did we spend on overnight batches this month" broken out from the main usage.
credits_used_this_month_after_batch now pulls directly from
Credits.read/2 so it reflects the net effective count (consumed − refunded) the moment
the batch finishes, not a cached pre-refund counter.Auto-split for >20 usernames (iter 91): pass
auto_split: true along with up to 100 usernames and the
controller will chunk them into 5 sub-batches of 20, enqueue each as an
independent async batch, and return the array of new job_ids. Always async — sync execution
of 100 scrapes sequentially would blow every HTTP budget. Each chunk runs its own
cached-split + quarantine + retry + refund pipeline independently.Incompatible with quote_token: quote tokens are issued for a fixed username list; chunking breaks that contract. Passing both returns 400 auto_split_incompatible_with_quote_token. If you need price lock per chunk, pre-flight each chunk separately and submit 5 individual quoted batches.
Response shape:
{auto_split: true, total_usernames, chunk_count, chunk_size, enqueued, errors,
chunks: [{chunk_index, job_id, usernames, batch_size, status}, ...], cost_center, note}.
Top-level enqueued/errors counts give a quick rollup; per-chunk
status is either "enqueued" or "error" with
error reason if the enqueue failed (e.g. Jobs store unreachable).Webhook behavior: if you pass
webhook_url,
it fires once per chunk — plan for 5 delivery attempts if you're running a
full 100-username auto-split. The webhooks are independent; there is no aggregated
"all-chunks-done" webhook. To correlate, track the chunk job_ids client-side and wait for
all N to deliver.cost_center propagation: the caller-provided
cost_center (or X-Cost-Center header) is attached to EVERY chunk —
all 100 scrapes attribute to the same client. Iter 59 per-center sub-caps still enforce
globally, so if any chunk would push the cumulative spend over the sub-cap, that chunk's
credits get blocked at the normal enforcement layer. Chunks that already ran stay committed.Use case — bulk initial onboarding: new enterprise client with 75 creators to track. Agency calls
POST /v1/full-scrape/batch with
{"usernames": [...75...], "auto_split": true, "cost_center": "client-bigcorp"}.
Response returns 4 chunk job_ids (20 + 20 + 20 + 15). Ops dashboard polls all 4 until
complete, then the initial baseline is established with a single API call from the caller's
perspective.Orchestration tracking (iter 97): auto-split now generates a shared
orchestration_id (format orch_<base32>)
attached to all sub-jobs. The 202 response includes orchestration_id at the
top level AND an orchestration_poll_url pointing to the aggregated lookup.Aggregated view:
GET /v1/full-scrape/orchestrations/:id returns all sub-jobs sharing the id
plus an aggregate rollup:
{orchestration_id, sub_job_count, statuses: {complete, failed, cancelled},
totals: {usernames, successful, failed}, sub_jobs: [...], cost_center, note}.
Each sub_jobs entry has {id, status, batch_size, successful_count,
failed_count, finished_at, started_at, cost_center}. Drill into individual
sub-jobs via GET /v1/full-scrape/jobs/:id for full per-chunk detail
(results array, notes, receipt).Merged view (iter 98): orchestration lookup now merges live ETS jobs (pending/running) with archived jobs (terminal state). Real-time progress is visible the moment chunks start executing — no more waiting for terminal state. Each sub-job entry carries a
source tag ("ets" for live,
"archive" for historical). ETS wins on id collision so callers always see
the most current state. Response adds active_count (ETS entries in
pending/running) and archived_count (terminal in DB) for quick status
rollup, plus progress field per ETS entry showing
{completed, total} live counters.Bulk cancel (iter 99):
DELETE /v1/full-scrape/orchestrations/:id cancels all non-terminal sub-jobs
sharing the orchestration_id in a single call. Iterates the ETS sub-jobs via
Jobs.list_by_orchestration, calls the existing per-job cancel logic
(Task.Supervisor.terminate_child + Jobs.mark_cancelled) on each, and returns an aggregate
{sub_job_count, cancelled_count, already_terminal_count, sub_jobs}. Terminal
jobs are left untouched and appear as action: "already_terminal".Credit refund semantics: bulk cancel does NOT refund credits. Same policy as single-job cancel — compute + bandwidth were already paid up to the cancellation point. For refunds on partial-failure items, use
POST /v1/full-scrape/jobs/:id/refund per sub-job after cancellation. If
you need the "refund everything" semantic, cancel first + loop over the aggregated
response firing manual refunds per eligible sub-job.404 semantics: returns 404
no_active_sub_jobs when no ETS entries match — could mean the orchestration
already finished (check GET), the id is wrong, or the chunks already expired past the
1-hour ETS TTL (auto-archived). Delete is only meaningful on running orchestrations.Use case — abort runaway batch: agency submits a 75-creator auto-split and realizes mid-execution that the wrong creators were selected. Instead of cancelling 4 sub-jobs individually via
DELETE /jobs/:id, they call
DELETE /v1/full-scrape/orchestrations/orch_abc123. All 4 running chunks are
cancelled in one call. Any chunk that had already finished naturally is reported as
already_terminal and left alone.Orchestration replay (iter 104):
POST /v1/full-scrape/orchestrations/:id/replay re-runs an entire historical
orchestration as a fresh auto-split batch. The controller loads all sub-jobs (merged
ETS + archive), combines their batch_usernames into a single deduped list,
and delegates to the normal batch path with auto_split=true. A BRAND NEW
orchestration_id is generated for the replay run.What gets inherited:
cost_center and
webhook_url are pulled from the first sub-job of the original orchestration.
If you need to override them, call with
POST /v1/full-scrape/batch directly with a custom
auto_split body — this endpoint is a convenience shortcut for the common
"run it again" workflow.Replay linkage: the replay batch carries
replay_of: "<original orchestration_id>" in the generated sub-jobs so
archives can trace the chain. The response format matches
/v1/full-scrape/batch with auto_split=true, including the new
orchestration_id at the top level.Credit cost: replay is NOT a free re-run — credits are charged normally for each scrape in the new orchestration, same as any regular batch. If the goal is to refresh cached data, combine with
POST /v1/full-scrape/batch directly and pass max_age_seconds
for conditional caching per chunk.Use case — weekly refresh: agency runs a 75-creator auto-split on Monday for client-bigcorp. Following Monday, they call
POST /v1/full-scrape/orchestrations/orch_abc123/replay. A new orchestration
runs the same 75 creators with fresh data, keyed to a new orchestration_id. Zero retyping
of usernames, cost_center, or webhook — all inherited from the original archive.Indexed lookup: the iter 97 migration added a
(api_key_id, orchestration_id) index, so even large histories return the
aggregate quickly. Scoped to the caller's api_key — ownership enforced at the SQL layer.Cost envelope (iter 74): pass an optional
max_spend_usd param on the batch body and the controller projects the total charge
for the stale subset BEFORE any authorization check or credit consume. If the projection exceeds
the envelope, the request short-circuits with
402 batch_exceeds_max_spend and zero side effects — no credits consumed,
no billing events, no snapshots. Cached hits don't count against the envelope (they cost nothing),
so max_spend only applies to the portion that would actually be scraped live.Envelope vs pre-flight+quote_token. Pre-flight + quote_token (iter 51/53) gives you a LOCKED price via a two-call handshake. Envelope (iter 74) gives you a HARD CEILING in a single call — no separate pre-flight round-trip. Use envelope when you know your max tolerance; use pre-flight+quote when you need to surface the projection to the end user before committing. Both paths can be combined: envelope as a safety belt, quote_token as the price lock.
402 response shape:
{error, max_spend_usd, projected_spend_usd, overage_usd, breakdown:
{included_units, overage_units, quota_exhausted_units, overage_price_usd, quota, already_used},
hint}. The breakdown tells the caller exactly why — maybe they're partway
through their quota so more of the batch would be overage than they expected.Use case — safety belt on weekly refresh: agency runs
POST /v1/full-scrape/batch with 20 creators + max_spend_usd: 5.00.
Under normal conditions they're inside quota and the batch costs $0 (envelope untouched). If
they've been consuming heavier than usual and 15 of the 20 would be overage, the projection
($7.50) exceeds the envelope ($5.00) → 402, no charge. Agency sees the breakdown, raises the
envelope to $10 or removes 5 creators, retries. Zero accidental spend.
Tier-aware performance:
Free: 6 concurrent / 30s timeout · Starter: 8 / 30s · Pro: 12 / 25s · Business: 16 / 20s · Enterprise: 20 / 15s
Pro+ tiers get higher concurrency and tighter timeouts — the scrape finishes faster under load.
Free: 6 concurrent / 30s timeout · Starter: 8 / 30s · Pro: 12 / 25s · Business: 16 / 20s · Enterprise: 20 / 15s
Pro+ tiers get higher concurrency and tighter timeouts — the scrape finishes faster under load.