Cheap AI Tokens Are a Scam Where Your Prompts Are the Product

2026-06-26 11 minutes

A fishhook baited with a discount price tag reels glowing user prompts into a server draining them into a canister.

Contents

Cheap AI API resellers undercut official prices by 70 to 97 percent because the discount is not the product: your prompts are. They log every request to resell as training data, route you to weaker models, and run on stolen-card accounts. A CISPA Helmholtz audit caught silent model swapping, but the harvested logs are the real margin.

Key Takeaways

A 90 percent discount on frontier AI is funded by reselling your prompts.
Proxies can send an “Opus” request to a cheaper model and relabel it.
Many reseller accounts come from stolen cards and faked identity checks.
Pointing a coding agent at an unknown API host hands a stranger your machine.
Official APIs and zero-retention gateways are cheap enough to skip the scam.

Why is a Claude or GPT API 90% cheaper from a reseller?

A frontier model has a hard cost floor. GPU time per token is a real expense, and the official provider already prices it close to the bone. So a reseller charging one tenth of that loses money on every call, unless something else pays the bill. The discount cannot come from being smarter about compute.

The math is easy to check. Anthropic lists Claude Opus 4.8 at $5 per million input tokens and $25 per million output, with Sonnet at $3/$15. A reseller selling the same access at 5 to 10 percent of retail charges less than the raw compute costs the provider. No one runs a frontier model at a 90 percent loss out of kindness.

The Chinese “transfer station” (中转站) market is the clearest example. These shops price Claude at roughly 1 RMB per $1 of tokens. That is 70 to 90 percent below official rates, per ChinaTalk’s reporting by Zilan Qian. The markup on access is just customer acquisition. The numbers only add up once you count the data you hand over.

This is not a China quirk, either. Demand is global, and buyers already suspect the catch. A poster in r/AI_India asked where to buy the same dirt-cheap access, then answered their own question:

the obvious downside is privacy. Your prompts/chats are probably getting logged somewhere on those servers. So if you’re using it for anything sensitive, that sounds risky.

r/AI_India OP (111 points)

The rule is simple: someone always pays the token bill. If it is not you in cash, it is you in data.

The three ways a cheap proxy actually makes money

ChinaTalk frames the transfer-station model as “one fish, three meals” (一鱼三吃). Each meal is a separate revenue stream, and they stack. The third one is why prices can hit 5 to 10 percent of retail.

Diagram stacking the three reseller revenue streams: markup from farmed and stolen-card accounts, model swapping with broken prompt caching, and harvested prompt logs sold as training data.

Meal one is the markup. Operators bulk-register accounts to farm free credits, resell unused quota, and game corporate and education discounts. One trick is “APImaxxing,” where a single $200 Max plan gets split among many users. The darker input is accounts bought with stolen credit cards. Those enter the pool at near-zero cost.

Meal two is model swapping. Because every request flows through the proxy, you cannot check which model answered. The operator can route an “Opus” request to Sonnet, Haiku, or a cheaper open model like GLM or Qwen, then relabel the output. The only tell is that hard tasks feel “dumbed-down” (降智). Rotating accounts also break prompt caching. So you burn full-price tokens on context that should be nearly free.

Developers work out the same playbook on their own. A commenter in the r/AI_India thread described the trick well. The proxy uses heavy caching plus a router that sends coding requests to Claude and pushes everything else to cheaper models like Qwen or DeepSeek. Yet it bills all of it as Claude. That is the swap, spotted by someone watching from the outside.

The most-upvoted write-up on the subject sums up the audit bluntly. A week-long investigation posted to r/LocalLLM drew over a thousand votes:

a CISPA Helmholtz audit of 17 of these relays found up to 47.21% performance drops vs. the official API - relays silently route “Opus” requests to Haiku, GLM, or Qwen and relabel the response. 45.83% of audited endpoints failed model-fingerprint verification.

r/LocalLLM OP (1,093 points)

Still, be careful with that headline number. The much-shared “up to 47.21%” figure is really just one data point. It came from one relay, the Gemini route, scoring 37 percent versus 83.82 percent on the official API. A skeptic in the same thread noted that the other nine results sat within margin of error. Some shadow routes even beat official. So treat the 47 percent as proof the swap happens, not as the typical case.

The solid backbone is academic, not viral. The UC Berkeley paper Are You Getting What You Pay For? by Cai, Shi, Zhao, and Song shows that model swapping in opaque APIs is real. It is hard to spot from text alone, and you cannot prove it without trusted hardware. The authors are blunt:

Users pay for specific models but have no guarantee that providers deliver them faithfully.

Cai, Shi, Zhao, Song (UC Berkeley) (arXiv 2504.04715)

Statistical power curves from the UC Berkeley paper showing that detection of model substitution collapses as the swap rate drops, so partial swapping evades text-based checks. — Detection power falls off sharply when a proxy swaps models on only some requests.

Image: Are You Getting What You Pay For? (arXiv 2504.04715)

That turns the China story into a basic fact about any proxy you do not control. Meal three, the logs, is the whole thesis, so it gets its own section next.

You are the training set, so what happens to your prompts?

Every request through a proxy sits on the operator’s server. For a coding agent, that payload is worth a lot. The operator grabs the full prompt, the full reply, the tool calls, and the whole session history. That means real engineering decisions, your repo context, secrets pasted mid-session, and outputs a human already checked as correct.

That bundle is close to an ideal dataset for fine-tuning a model. The people running these shops say so plainly. Per ChinaTalk, developers admit the markup is just customer acquisition, and the log harvest is the real margin. That makes users “simultaneously paying customers and unpaid data producers.”

Reddit puts it in plainer English. A commenter in the r/LocalLLM thread explained that everything you type gets stored in a database, for the operator to train on or sell as they please. The buyer who half-knows this is still handing over the data anyway.

The harvested data reaches the open market, too. Datasets of Claude Opus reasoning traces with no stated source already float around HuggingFace, like this Opus-4.6 reasoning dump . ChinaTalk flags the same pattern. Once your reasoning chains leave your hands, you cannot pull them back.

The harm goes past training. Leaked prompt logs enable targeted fraud and even blackmail, because the proxy now holds your code, your business logic, and whatever personal context slipped into a chat. A stranger with your full session history can do far more than train a model on it.

My own rule is simple. Anything proprietary, anything with a credential, and anything I would not post publicly never goes through an endpoint whose operator I cannot name and hold accountable. That one rule blocks every anonymous reseller by default.

This is not only a China story

It is tempting to file this under “foreign problem.” That is a mistake. The same tricks run on Western marketplaces and Discord, and the failures get worse than a weak model.

The clearest case ships malware, not just a downgraded model. A PSA in r/ClaudeAI flagged a reseller, awstore.cloud, that sold cheap “Claude access” on the mainstream Plati Market. It then turned Claude Code itself into the attack. You set ANTHROPIC_BASE_URL to their domain and send any prompt. The server returns a fake “configuration message.” Claude Code reads it as a tool-use response and runs a PowerShell dropper without asking. The post’s rule of thumb is worth repeating:

only use official API providers. The real Claude API is api.anthropic.com. If a “reseller” needs you to change the base URL to a domain you’ve never heard of, they control what your AI agent executes on your machine. Full stop.

r/ClaudeAI awstore.cloud PSA (security PSA)

The supply chain underneath carries a real human cost. Faked identity checks get cleared by recruiting people in poorer countries or by buying biometric data outright. The Worldcoin black market sold harvested iris scans for under $30 each. When you buy access farmed this way, you fund that chain.

You also take on legal risk. Paying for access farmed on stolen cards makes you a link in card fraud. Anthropic’s own distillation-attacks report found a single proxy network running more than 20,000 fake accounts. Build anything real on that base and you risk chargebacks and a permanent ban.

The common reaction is a cynical shrug: everyone grabs data anyway. One r/technology commenter caught the mood with heavy sarcasm about how only Chinese firms supposedly misuse data. Still, a known provider you can sue is not the same risk as an anonymous middleman. That gap is the whole point of the fix below.

How to get cheap AI access without becoming the product

There are genuinely affordable, honest paths. Here they are ranked from most private to least, with the tradeoffs stated openly.

Option	Price vs official	Who sees your prompts	Model guarantee
Local model (Ollama , llama.cpp)	Hardware only	Nobody	You run the weights
Official API (api.anthropic.com)	Full retail	The named provider	Contractual
OpenRouter with ZDR	Near retail	Opt-out of retention	Disclosed routing
Official DeepSeek / GLM / Qwen	Genuinely cheap	Named provider, China-hosted	Real, named model
Self-hosted relay (your own key)	Your retail cost	Only you	Your own endpoint
Anonymous reseller / transfer station	70-97% off	The operator, forever	None

Official first-party APIs come first on trust. Endpoints like api.anthropic.com and api.openai.com are the only ones with a model and data promise you can actually hold someone to.

A clear gateway is the next step. OpenRouter offers many models behind one key, with a zero-data-retention option and an account-wide training opt-out. A transfer station cannot give you that disclosure. Read the policy before you send a real prompt, since free models on any gateway often trade access for training rights.

The honest budget path is buying Chinese models direct. Official DeepSeek , Zhipu GLM, and Alibaba Qwen APIs are truly cheap and come from named providers. Your data is processed in China, so own that tradeoff up front. The price is real, though, and no one relabels the model.

The most private path keeps the prompt on your hardware. Run Qwen, GLM, or Llama locally through Ollama or llama.cpp, and the request never leaves your machine. For convenience without the trust problem, self-host your own relay against your own official key, using a project like claude-relay-service . You get pooling and routing without handing logs to a stranger.

How To Vet an AI API Provider Before You Point Your Tools at It

Vet an AI API provider before you point your tools at it

Check the base URL

Confirm the endpoint is an official host (api.anthropic.com, api.openai.com) or a named, reputable gateway. Refuse any domain you cannot identify.

Read the data policy

Require zero-retention or an explicit, written training opt-out before sending a single real prompt. No policy means you should assume everything is logged.

Sanity-check the price

Treat any discount steeper than roughly 50 percent off frontier token rates as a red flag, not a deal. The gap gets paid in data or fraud.

Probe for model swapping

Send a known-hard prompt through the proxy and through the official model, then compare. A clear quality gap means you are not getting what you paid for.

Keep agents off untrusted proxies

Control of the base URL means control of what your coding agent runs locally. Keep agents on official endpoints only.

Verify how you pay

Anonymous-only payment, like WeChat or Alipay to an individual or crypto with no invoice, is a tell that no accountable business stands behind the service.

Frequently asked questions

Is buying tokens from a transfer station or reseller illegal?

It usually violates the model provider’s terms and can break local AI-service rules. If the underlying accounts use stolen cards, you are also touching card fraud. At minimum, you risk a permanent ban.

Will my Anthropic or OpenAI account get banned?

Providers actively suspend proxy-linked accounts and pools. Many resellers also vanish within months and take prepaid balances with them. Anything you prepaid is at risk.

Are the official DeepSeek, GLM, and Qwen APIs a safe cheap option?

They are genuinely cheap and come from named providers, which beats an anonymous proxy. The tradeoff is that your data is processed in China under different privacy rules. Decide that consciously and keep sensitive work off them.

Is a gateway like OpenRouter the same thing as a transfer station?

No. A legitimate gateway discloses its retention policy, lets you opt out of training, and bills you transparently. A transfer station is built for evasion and monetizes your logs.