How the data is built
Data provenance, VWAP methodology, and the AI enrichment pipeline.
How the data is built
portfolIQ data goes through four stages: raw ingestion, Data Vault 2.0 modelling, mart computation, and API exposure. At no point is a single raw source redistributed as-is — the output is always a derived, enriched layer.
1. Price data — VWAP consensus
Price data is a Volume-Weighted Average Price (VWAP) computed across 5 major exchanges:
- Binance
- Kraken
- Coinbase Exchange
- Bybit
- OKX
How the formula works:
- Raw OHLCV is collected from each exchange via public endpoints (no authenticated API keys).
- Outliers are excluded if the price deviates more than 3% (Tier 1), 5% (Tier 2), or 7% (Tier 3) from the median.
- Each exchange's weight is capped at 50% of the total volume — no single source dominates.
- If fewer than 3 exchanges provide data for an asset, no VWAP is published (the asset appears without a price).
The methodology is published and git-tagged: see methodology/vwap-consensus-v1.0.md (tag methodology-vwap-v1.1).
See Asset tiers for coverage and frequency per tier.
2. Data pipeline
CoinGecko (metadata, market snapshots)
↓
Exchanges: Binance, Kraken, Coinbase, Bybit, OKX (OHLCV)
↓
raw.* (internal PostgreSQL, TimescaleDB)
↓
dbt — Data Vault 2.0
hub_asset, hub_exchange
link_asset_exchange
sat_asset_market, sat_asset_vwap_consensus
↓
marts.* (aggregated, queryable)
↓
API /v1/*
CoinGecko data is ingested internally only and is never redistributed as-is. Attribution: "Powered by CoinGecko". The redistributable price is the computed VWAP consensus.
3. AI enrichment
Each asset in the top 1000 is enriched with AI-generated analyses. Three types per asset:
| Analysis type | Description |
|---|---|
token_classification | Protocol type (L1, DEX, lending, stablecoin, etc.) |
fundamental_summary | Prose summary of on-chain fundamentals and market position |
sentiment_score | Aggregated sentiment from the last 48h of news coverage |
Model routing: Claude Haiku 4.5 handles 85% of analyses (speed + cost). Claude Sonnet 4.6 handles 15% (complex analyses, first-run on a new asset).
All analyses are subject to AMF guardrails: 13 banned prescriptive patterns (including "buy", "target price", "invest now"). Any analysis that triggers a pattern is rejected and regenerated. See The AI layer.
Every analysis is stored with prompt_version, model, cost_usd_micros, and generated_at — fully auditable.
4. Data freshness
| Tier | Coverage | Update frequency |
|---|---|---|
| Tier 1 | Ranks 1–50 | Hourly |
| Tier 2 | Ranks 51–200 | Every 6 hours |
| Tier 3 | Ranks 201–1000 | Twice daily |
The last_updated field in every API response indicates the freshness of the data for that asset.
Not financial advice. Not a fatwa. Methodology disclosed. Data provided for informational purposes only.