portfolIQ
Documentation

How the data is built

Data provenance, VWAP methodology, and the AI enrichment pipeline.

How the data is built

portfolIQ data goes through four stages: raw ingestion, Data Vault 2.0 modelling, mart computation, and API exposure. At no point is a single raw source redistributed as-is — the output is always a derived, enriched layer.

1. Price data — VWAP consensus

Price data is a Volume-Weighted Average Price (VWAP) computed across 5 major exchanges:

  • Binance
  • Kraken
  • Coinbase Exchange
  • Bybit
  • OKX

How the formula works:

  1. Raw OHLCV is collected from each exchange via public endpoints (no authenticated API keys).
  2. Outliers are excluded if the price deviates more than 3% (Tier 1), 5% (Tier 2), or 7% (Tier 3) from the median.
  3. Each exchange's weight is capped at 50% of the total volume — no single source dominates.
  4. If fewer than 3 exchanges provide data for an asset, no VWAP is published (the asset appears without a price).

The methodology is published and git-tagged: see methodology/vwap-consensus-v1.0.md (tag methodology-vwap-v1.1).

See Asset tiers for coverage and frequency per tier.

2. Data pipeline

CoinGecko (metadata, market snapshots)

Exchanges: Binance, Kraken, Coinbase, Bybit, OKX (OHLCV)

raw.*  (internal PostgreSQL, TimescaleDB)

dbt — Data Vault 2.0
    hub_asset, hub_exchange
    link_asset_exchange
    sat_asset_market, sat_asset_vwap_consensus

marts.* (aggregated, queryable)

API /v1/*

CoinGecko data is ingested internally only and is never redistributed as-is. Attribution: "Powered by CoinGecko". The redistributable price is the computed VWAP consensus.

3. AI enrichment

Each asset in the top 1000 is enriched with AI-generated analyses. Three types per asset:

Analysis typeDescription
token_classificationProtocol type (L1, DEX, lending, stablecoin, etc.)
fundamental_summaryProse summary of on-chain fundamentals and market position
sentiment_scoreAggregated sentiment from the last 48h of news coverage

Model routing: Claude Haiku 4.5 handles 85% of analyses (speed + cost). Claude Sonnet 4.6 handles 15% (complex analyses, first-run on a new asset).

All analyses are subject to AMF guardrails: 13 banned prescriptive patterns (including "buy", "target price", "invest now"). Any analysis that triggers a pattern is rejected and regenerated. See The AI layer.

Every analysis is stored with prompt_version, model, cost_usd_micros, and generated_at — fully auditable.

4. Data freshness

TierCoverageUpdate frequency
Tier 1Ranks 1–50Hourly
Tier 2Ranks 51–200Every 6 hours
Tier 3Ranks 201–1000Twice daily

The last_updated field in every API response indicates the freshness of the data for that asset.


Not financial advice. Not a fatwa. Methodology disclosed. Data provided for informational purposes only.