V.Trivedy_
← Architectural Teardowns

CryptoniumX — Architecture Teardown

A custodial cryptocurrency exchange built on Ruby on Rails around a matching engine written from scratch. The configurable trading screen was never the hard part. The hard part was building an engine fast enough for ~4,000 trades a second and disciplined enough that it could never sell the same coin twice.


Snapshot

What it isCentralized spot cryptocurrency exchange (custodial), later extended with peer-to-peer trading
TimelineBeta shipped Jul 2017; Rails production build followed; P2P added Aug 2018
Core stackRuby on Rails · MySQL · Redis · RabbitMQ · Sidekiq · ActionCable / WebSockets · Node.js (fullnode middleware) · SCSS/Bootstrap · CoffeeScript/jQuery
Throughput4,000 trades/sec (final matching engine)
SecurityAudited by a third-party security firm
Author roleTech lead and architect: owned the stack, schema, matching engine, auth, KYC, file/code structure, and DevOps; led ~6 developers and 2 designers
StatusDelivered in two phases (Node.js/Changelly beta → Rails production)

The problem

An exchange is two things at once: a custodian and a market. It holds people's money, and it decides who trades with whom. Each job fails in a different direction, and neither tolerates "close enough."

The market side has to be fair and deterministic. Fair, in an order book, has a precise meaning: price first, then time. The best price gets filled first; among orders at the same price, whoever arrived first wins. That rule only holds if every order for a market is processed in one definite sequence. Process two orders "at the same moment" and the rule means nothing, and traders stop trusting the venue.

The custody side has to be exact. Balances cannot drift. A deposit cannot be credited twice. A withdrawal cannot leave before the funds behind it are reserved. There is no eventually-correct for someone's money.

And both have to be fast: thousands of orders a second, a book that redraws live on every screen, routine withdrawals that clear with no human in the loop.

The obvious build dies on the first two requirements. Treat order matching like any other Rails background task, a pool of workers each pulling an order, reading the book from the database, matching, writing it back, and it demos perfectly and corrupts in production. Two workers read the same resting sell order, both match a buyer against it, and you have now sold the same Bitcoin twice. Concurrency that is harmless when you send email is a double-spend when you run a book. Adding locks does not save you: lock the book per order and you have serialized yourself anyway, with deadlocks thrown in.


The architecture

Cut to first principles: an exchange is two systems with opposite needs bolted together, and the whole design is the discipline of keeping them apart.

Matching coreEverything else
Needs to beDeterministic, sequential, low-latencyHigh-volume, I/O-bound; eventual consistency is fine
So it mustRun one consumer per market, in memory, no locks, no I/O in the loopUse horizontal workers, durable queues, run async
Built asMatching daemons fed by RabbitMQSidekiq jobs, SQL, caches, notifications

The decision everything hangs from: the matching loop touches no database and makes no network call. It consumes an ordered stream of validated orders and emits an ordered stream of fills. Persistence, balance settlement, and broadcasting all happen after it and around it. This one rule is why a Ruby stack can hold 4,000 matches a second. Nothing in the hot path waits on disk or network.

That rule is also why two message systems run side by side, which otherwise looks like duplication.

LayerToolCarriesWhy this tool
Intake → engineRabbitMQ (AMQP)Validated orders, routed per market, in strict order, durablyGuaranteed delivery plus per-queue ordering. One queue per market means one consumer, which is the serialization fairness depends on.
Engine → worldSidekiq (Redis)Persist fills, move ledger entries, send notifications, refresh cachesRails-native, fast, at-least-once. None of this needs global ordering, so here concurrency is an asset, not a hazard.

Self-trade prevention lives in the engine too: a user's own buy and sell must not fill each other, or anyone can paint fake volume against themselves. The matched trade settles at the resting order's price (the maker's), not the incoming order's (the taker's), so the trader who posted liquidity gets the price they committed to.

Real time. The order book and trade feed are pushed to browsers over WebSockets via ActionCable, fanned out through Redis pub/sub (any application node can publish an update to every connected client). Polling a book over HTTP would have every client hammering the server for data that changes many times a second; a socket pushes the delta once and the screen reacts. The engine never blocks on a broadcast: it drops the fill onto a channel and moves on, so one slow client cannot back up the book.

The fullnode boundary. The Rails app never speaks to a blockchain node directly. Each coin runs its own fullnode (a bitcoind, a geth node, and so on, each a program that holds a full copy of that chain and exposes an RPC command interface). In front of every node sits a small Node.js service that authenticates and brokers every call. A fullnode's RPC is powerful and trusting: anyone who can reach it and holds the wallet passphrase can move coins. Putting an authenticating middleware between the app and the node means a compromised application server cannot issue a raw "send coins" command. It can only make the scoped, signed requests the middleware permits. It is a guard posted at the most dangerous door in the building.

How an order travels from a browser to a matched trade and back to every screen, and how wallet operations cross the fullnode trust boundary. Notice that the matching daemon writes to a log and a cache, never to the database.

One order, start to finish

The sequence below is where the custody guarantee actually lives. Funds are locked before the order reaches the engine; the database is written after the engine, never inside its loop.

A single buy order, from click to settled trade.


Data model

The schema is where an exchange is won or lost, and two decisions carry most of the weight.

Money is a ledger, not a column. A balance is never a single number you increment and decrement in place. Every movement is a row in an append-only, double-entry ledger: each entry has a matching counter-entry, and the whole book nets to zero. The live balance is a derived sum, cached in Redis for speed. A mutable balance column has no history, races under load, and cannot be rebuilt after a bug. A ledger reconstructs to the cent, which is exactly what a security auditor and a regulator ask to see.

Available is not the same as total. Each balance splits into available and locked. Placing an order moves funds from available to locked before the order is allowed near the queue; it does not yet debit the user. Settle the trade and the lock converts to a real transfer; cancel the order and the lock returns to available. Check-then-debit without this reservation lets one balance back ten orders at once and overdraw the account. The lock is the gate the order has to pass through to exist.

Idempotency is built into the schema, not left to hopeful code: external references (deposit txid, withdrawal client id) carry unique constraints, so a retried message or a re-seen blockchain confirmation can never credit twice.

Core entities. Orders and trades are immutable once written; balances are computed from LEDGER_ENTRY, never edited directly.

The choices that are easy to get wrong, and what each one buys:

ChoiceInstead ofWhat it prevents
Append-only double-entry ledger; balance = derived sumA mutable balance columnLost updates under concurrency; missing audit trail; books that cannot be rebuilt
available + locked split per balanceA single balance numberOverdrawing across many open orders; messy partial cancels
Unique idempotency_key / txid on ledger and transactionsTrusting code never to retryDouble-credited deposits and double-sent withdrawals
Immutable orders / trades with state transitionsEditing rows in placeA trade history you cannot trust; broken replay and reconciliation
kyc_tier on user, country on the KYC recordOne global "verified" flagPer-country rules and tiered limits would otherwise need schema churn

Infrastructure and operations

Redis does four jobs, and treating it as one cache misses the point:

Redis roleServesWhy it matters
Book + market-data cacheLive depth, ticker, recent trades, written by the engineReads outnumber writes by orders of magnitude. If every order-book poll hit the SQL database it would melt. The engine is the single writer of truth; Redis is the read copy of the book.
Session storeAuth sessionsApp servers stay stateless, so they scale sideways
Rate limitingPer-IP and per-key countersBlunts credential stuffing and API abuse
Pub/sub backplaneActionCable fan-outAny node can broadcast to any connected client

Where load and failure bite, and how the design absorbs them:

Pressure pointRiskHow it is absorbed
Matching daemon per marketOne consumer is a single point of failure for that marketAppend-only event log: restart and replay from the last checkpoint to rebuild the in-memory book.
ActionCable connectionsThread-based, leans on Redis pub/sub; that fan-out bottlenecks in the low thousands of sockets on Rails 5 (2017)Run cable on its own nodes, separate from the API. Because the book sits in Redis, a dropped socket just re-reads a snapshot.
Fullnode sync and livenessA lagging node shows stale balances or stalls withdrawalsThe Node.js middleware isolates and health-checks each node; deposits are credited only after a confirmation count.
Withdrawal automationAuto-approving a withdrawal that should not go outThe approval daemon scores every request; anything over threshold, off-whitelist, or high-velocity parks in the admin panel for a person.

Security posture. Custody is tiered: a hot wallet sized to cover routine withdrawals stays online and automated, while the bulk of funds sit offline in a vault / cold tier, and a hardware-wallet service is offered to users for self-custody. 80/20 (configurable) hot/cold split and multi-signature scheme. The withdrawal approval daemon is the line between automatic and manual: it judges each request on amount, destination whitelist, account KYC tier, and velocity, clears the safe ones, and hands the rest to a reviewer. The fullnode auth middleware is defense in depth at the RPC boundary. The admin panel is role-scoped: roles bound what CRUD each operator can do, and reporting reads the data, it never reaches into the engine. For an exchange, a "highly secure authentication system" means second-factor codes (TOTP), withdrawal-address whitelisting with email confirmation, session and device management, login rate limiting, and anti-enumeration on sign-in and reset. The platform was reviewed by an outside security firm rasing funds.

Two phases, and the buy-versus-build line

The beta (Jul 2017) ran on Node.js and integrated Changelly. Changelly is a non-custodial swap aggregator: it never holds the trade, it routes a swap to partner liquidity and forwards the coins to the user's own address. So the beta held no order book and took custody of nothing. It brokered. That let an investor-facing demo ship in weeks without building anything hard. The production exchange then built exactly the parts you cannot outsource: the matching engine, custody, and compliance.

The principle that generalizes: buy what proves the idea, build what is the moat. The common inversion, hand-rolling a CRUD admin panel while outsourcing the one capability that defines the product, is how teams pour months into the wrong half.

The configurable trading screen

Users could do more than switch between preset layouts. Every component on the trading page (chart, order book, depth, order forms, action buttons) could be resized and dropped anywhere on a grid, and the arrangement saved per user. Technically that is a serialized layout, a stored map of component to position and size, rendered client-side and keyed to the account. What kept it usable rather than chaotic was a hard rule: anything a user needs is reachable in three steps from anywhere on the platform. That rule outlived the project. It has been my UX baseline on everything since.


Outcome

ResultFrom the brief
Matching throughput4,000 trades/sec on the final engine
SecurityPassed a third-party security audit
Scope deliveredSpot exchange, wallet and portfolio management, dynamic KYC/AML and per-country compliance, order book and trade history, admin panel with role-based CRUD and reporting, vault and hardware-wallet services
DeliveryTwo phases: Node.js + Changelly beta (Jul 2017) → Rails production; P2P added Aug 2018
Team~6 developers and 2 designers under one tech lead

The 4,000 figure is aggregate across markets, and it comes from one place: keeping all I/O out of the matching path. Peer-to-peer trading (Aug 2018) is a different trust model from the central book. Instead of an engine matching anonymous orders against shared liquidity, peers deal directly while the platform escrows the asset and arbitrates disputes.


What I would watch

A few things this kind of system teaches, usually the hard way.

The matching engine is not what ends exchanges. Custody is. The scariest code is the withdrawal path and the key handling, not the order book. Put your paranoia there.

An in-memory engine is only as trustworthy as its recovery. If you cannot replay the log and land on the exact same book, you do not have a matching engine, you have a fast guess. Build the event log before you optimize the match.

ActionCable bought real-time cheaply in 2017, and it has a ceiling. Watch the concurrent socket count and the Redis fan-out, and have the move ready (dedicated cable nodes, or a Go-based socket layer) before the book freezes during a volatile hour, not after.

The split that saves you is boring: an ordered, durable queue in, everything async out. The urge to do "just one quick database write" inside the match loop is exactly how latency and races get back in. Keep the loop clean.