How the data is built

Data provenance, VWAP methodology, and the AI enrichment pipeline.

How the data is built

portfolIQ data goes through four stages: raw ingestion, Data Vault 2.0 modelling, mart computation, and API exposure. At no point is a single raw source redistributed as-is — the output is always a derived, enriched layer.

1. Price data — VWAP consensus

Price data is a Volume-Weighted Average Price (VWAP) computed across 4 exchanges:

Binance
Kraken
Coinbase Exchange
Bybit

How the formula works:

Raw OHLCV is collected from each exchange via public endpoints (no authenticated API keys).
Outliers are excluded if the price deviates more than 3% (Tier 1), 5% (Tier 2), or 7% (Tier 3) from the median.
Each exchange's weight is capped at 50% of the total volume — no single source dominates.
If fewer than 3 exchanges provide data for an asset, no VWAP is published (the asset appears without a price).

The methodology is published and git-tagged: see methodology/vwap-consensus-v1.0.md (tag methodology-vwap-v1.1).

See Asset tiers for coverage and frequency per tier.

2. Data pipeline

CoinGecko (metadata, market snapshots)
    ↓
Exchanges: Binance, Kraken, Coinbase, Bybit (OHLCV)
    ↓
raw.*  (internal PostgreSQL, TimescaleDB)
    ↓
dbt — Data Vault 2.0
    hub_asset, hub_exchange
    link_asset_exchange
    sat_asset_market, sat_asset_vwap_consensus
    ↓
marts.* (aggregated, queryable)
    ↓
API /v1/*

CoinGecko data is ingested internally only and is never redistributed as-is. Attribution: "Powered by CoinGecko". The redistributable price is the computed VWAP consensus.

3. AI enrichment

Each asset in the top 1000 is enriched with AI-generated analyses. Three types per asset:

Analysis type	Description
`token_classification`	Protocol type (L1, DEX, lending, stablecoin, etc.)
`fundamental_summary`	Prose summary of on-chain fundamentals and market position
`sentiment_score`	Aggregated sentiment from the last 48h of news coverage

Model routing: Claude Haiku 4.5 handles 85% of analyses (speed + cost). Claude Sonnet 4.6 handles 15% (complex analyses, first-run on a new asset).

All analyses are subject to AMF guardrails: 13 banned prescriptive patterns (including "buy", "target price", "invest now"). Any analysis that triggers a pattern is rejected and regenerated. See The AI layer.

Every analysis is stored with prompt_version, model, cost_usd_micros, and generated_at — fully auditable.

4. Data freshness

Tier	Coverage	Update frequency
Tier 1	Ranks 1–50	Hourly
Tier 2	Ranks 51–200	Every 6 hours
Tier 3	Ranks 201–1000	Twice daily

The last_updated field in every API response indicates the freshness of the data for that asset.

Not financial advice. Methodology disclosed. Data provided for informational purposes only.