Clixs.ai: live photos, searchable in seconds
A guest uploads a selfie and gets back every photo they're in. Not next week. While the party is still on. The face matching is the easy part. AWS Rekognition does that. The hard part is the camera. Pull frames off a Canon, Nikon, or Sony body, across the venue's WiFi, into the cloud, and make them searchable in seconds. Do that without leaking one event's photos into another's, on a backend that charges for every call. That's the system.
Snapshot
| Product | Clixs.ai. Live photo sharing for events: weddings, conferences, corporate, sports, graduations. |
| Positioning | Marketed as India's first live Cam2Cloud platform. Photos delivered as they're clicked (the company's words). |
| The edge | Photos go straight from the camera to the cloud, so guests find them during the event, not days later. |
| Stage | In production. Mid-migration from an album model to an event/function/camera model. |
| Backend | FastAPI (Python, async) with in-process background workers. |
| Frontend | Next.js 14 App Router, TypeScript, Tailwind, Framer Motion. |
| Datastores | MongoDB for metadata and billing. SQLite for the edge upload queue and credentials. |
| Cloud | AWS S3 for storage, Rekognition for face vectors, SQS for event fan-in. |
| Edge ingestion | A custom FTP/FTPS service, 3 instances, handling Canon, Nikon, and Sony cameras. |
| Tenancy | Event-scoped. One Rekognition collection per event. |
| Payments | Razorpay (INR). Per-event, per-photo pricing. Coupons, webhooks, fraud checks. |
| Surfaces | Marketing clixs.photo. App app.clixs.ai. Find page app.clixs.ai/find. |
| Production scale | 10,000+ customers, 2.5M+ photos matched (company figures). Photo discovery averages 25s, 2 min at the tail. Ingest runs at 10,000 photos a minute and scales out. |
The problem
The promise is speed. A guest on the dance floor scans a QR code, uploads a selfie, and finds the shot the photographer took two minutes ago. "Delivered as they're clicked" is not a slogan. It is a deadline. The clock starts at the shutter and runs across the venue's WiFi, into the cloud, through face indexing, and back to a phone, in seconds. Four things make that deadline hard to hit.
| Constraint | Why it bites |
|---|---|
| Live edge ingestion | The frames start on Canon, Nikon, and Sony bodies. These are sealed appliances. They speak their own dialect of FTP over congested venue WiFi that drops mid-transfer. You don't control the client and you can't fix the network. |
| Per-event isolation | A guest's selfie must match only that event's photos. A cross-event match is a privacy breach. Your face turning up in a stranger's wedding gallery. |
| Instant search, heavy indexing | The guest is waiting. You can't index a continuous stream of full-res frames inside that wait. But the stream has to become searchable almost at once. |
| Metered AI | Rekognition charges per IndexFaces and per SearchFacesByImage. Every call also burns a paid quota. Index one frame twice and you pay twice, and you bill the customer for work you didn't do. |
The obvious build fails on every count. One shared face index leaks faces between events. A synchronous cloud upload stalls the camera and backs up the next frame. With no idempotency, every venue-WiFi retry turns into money.
Two rules run through the design. On a pay-per-call AI, the unit of correctness is not the row. It's the line on the customer's invoice. So duplicate-safety comes before speed. And "live" is a deadline, not a feature. The system is a latency pipeline. Its slowest hop is the product.
The architecture
Two hard problems, one on each side of the pipeline.
Start with search. The search primitive is a face-vector lookup, and Rekognition's unit of search is the collection. So each event's 8-character event_code is its Rekognition collection ID. Isolation stops being app logic you can get wrong. It becomes a property of which collection you query. You cannot match a guest against another event, because you are searching one collection and only one.
Now ingest. The rule is simple: never let the camera wait on the cloud. Indexing is asynchronous and queue-backed. Search is synchronous, at request time. At the edge, the FTP handler does one thing. It writes the frame to local disk, fast. A durable queue and a worker pool carry it to S3 behind that. The camera sees a quick local transfer and fires the next shot. The upload, the SQS event, and the indexing all happen in the seconds after, off the camera's clock.
| Decision | Chose | Over | Why | What it costs |
|---|---|---|---|---|
| Tenant boundary | One Rekognition collection per event (event_code = collection ID) | One shared collection filtered by event_id | Isolation is structural. Per-event delete is one DeleteCollection. No documented cap on collections per account, so it scales across many tenants. | event_code is now load-bearing identity tied to an external resource. Renaming or merging events becomes a migration. Rekognition TPS is account-global, shared by every collection. |
| Edge ingestion | A custom FTP/FTPS service: land-to-staging, durable SQLite queue, retrying workers | FTP inside the FastAPI app, or a synchronous upload inside the FTP session | A stalled camera never starves the API. The camera never waits on a slow cloud leg. Frames survive a restart. | A second service, a second datastore, a credential-sync contract, and a whole class of edge failures to own. |
| Ingest vs search | Async indexing (S3 → SQS → consumer). Synchronous search at request time. | Sync on both, or async on both | A continuous stream indexes in the background while guests still get sub-second results. | Two code paths. An eventual-consistency window. Both fight for the same Rekognition TPS. |
| Upload fan-in | S3 ObjectCreated → SQS → one consumer for every source | Each uploader indexing its own faces | One indexing path, whatever the source: Cam2Cloud, web, or legacy. S3 is the only trigger. | SQS standard delivery is at-least-once. The consumer has to be idempotent or it double-indexes and double-bills. |
| Auth surface | JWT and RBAC enforced at the API. Client-side route guards for UX only. | Next.js middleware as the security gate | The real boundary is the API. Client guards are for show. | Next.js auth middleware is off. Every protected resource has to be checked server-side, or it's exposed. |
The full system. Cameras flow through the custom Cam2Cloud service. Web and guest traffic go through the API. Both land objects in S3. S3 fires one SQS stream into one consumer, which indexes faces into the right per-event Rekognition collection. Guest search skips the queue and hits Rekognition directly.
Cam2Cloud: the hardest part
This is the part the team found hardest, and it's worth saying why. The difficulty isn't the happy path. It's everything that goes wrong between the shutter and the guest's screen, at a live event where there's no second take.
Here's the inversion. In normal client/server work, you control the client. Here the clients are sealed cameras. Canon, Nikon, and Sony bodies and their wireless FTP transmitters, each speaking FTP its own way, each with fixed retry and timeout behavior you can't change, and no way to push a firmware fix. The server has to bend to the device, because the device will never bend to the server. Now add the network. Venue WiFi is built for guests, not uploads. It's congested, NAT'd, sometimes behind a captive portal, and it drops connections mid-transfer. Every assumption a clean FTP server makes is wrong here.
| What's true at the edge | Why it breaks the naive build | What the service does |
|---|---|---|
| Cameras are sealed appliances with fixed, quirky FTP/FTPS behavior you can't patch | A server that expects well-behaved clients rejects them or mishandles them | The handler and authorizer are built around the real devices, not the spec. Plain FTP and FTPS are both supported, because some firmware can't do TLS reliably. |
| Venue WiFi drops connections mid-transfer | A synchronous upload stalls, and frames are lost | Land-to-staging. The FTP leg writes locally and fast. A durable queue and retrying workers move the frame to S3 after. The camera never waits on the cloud. |
| A dropped transfer leaves a truncated file | Indexing half a JPEG wastes a paid call and corrupts results | A frame joins the queue only after the transfer is confirmed complete and the file is whole. A FileJanitor sweeps up the debris from aborted transfers. |
| The edge box restarts. The backend goes briefly unreachable. | In-memory work vanishes. A remote queue needs connectivity the venue can't promise. | A persistent SQLite queue on local disk. It's ACID, embedded, needs no network to enqueue, and survives a reboot. |
| Checking every login against the backend ties the edge to backend uptime | One backend hiccup locks every camera out, mid-event | Credentials are pushed once to the service's local SQLite, stored as a SHA-256 hash. The SQLiteAuthorizer validates logins offline. To revoke, push a delete. |
| One event's camera must not see another's frames | Shared staging leaks frames between tenants | Each upload key is chroot-jailed to its own directory. The worker maps that staging path to the right tenant's S3 prefix, and so to the right Rekognition collection. |
| At a live event, a silent stall is invisible until guests complain | Missing photos that nobody sees coming | Prometheus metrics: connections, queue depth, upload success and retry, latency. An operator can watch frames flow during the event. |
Shutter to searchable. Splitting the FTP leg (fast, local, on the camera's clock) from the S3 upload (slow, retryable) is the move that makes "delivered as they're clicked" real. The camera's clock never depends on the cloud's. A frame is still searchable seconds after it lands.
The backend's only job in this loop is paperwork. When a host makes an upload key in the dashboard, UploadKeyService hashes the secret into MongoDB and FTPNotificationService posts the credential to the FTP service's notification API, which is protected by a bearer token. After that, the edge service runs on its own. That independence is the whole reason to draw the microservice boundary here.
And it holds up at volume. The service takes in around 10,000 photos a minute, and you add throughput by adding instances.
The other clock: synchronous guest search
Ingestion is slow and forgiving. Search is fast and unforgiving. The guest is waiting, so the request goes straight to Rekognition and back. The same path enforces the quota.
The synchronous clock. The quota check is the same gate that returns 402 Payment Required when an event has used up its plan. Billing enforcement, on the hot path.
Data model
Schema is the one layer you don't get to quietly refactor on a slow afternoon. The moment real data lands on it, it's load-bearing. Here it does something unusual. The primary key of an event is also the identity of an AWS resource.
The current model. The legacy albums/photos pair at the bottom still runs alongside the media/functions/cameras tree and shares face_index. A migration in flight, not dead code.
| Schema decision | What it does | What it buys, and costs |
|---|---|---|
| event_code = Rekognition collection ID | One identity for the DB key and the AI resource | Isolation is enforced in the data layer, not just app code. But event_code can never change casually. It's wired into an external system. |
| host_sharing_key separate from event_code | A 16-char unlisted gallery URL, distinct from the face-search code | A host can share the full gallery without exposing the face-search endpoint or the internal ID. Revoke one link and the other still works. |
| Hierarchy mirrors the S3 prefix | Event → Function → Camera → Media maps 1:1 to the storage key | One structure is the DB tree, the storage path, and the access scope at once. Where a frame lives and who can see it never drift apart. |
| face_ids on media, detail in face_index | A small array on the hot document. Rich match data in a side collection. | Hot-path reads stay cheap. The expensive per-face detail is paged only when you resolve matches. |
| SHA-256 hash of the upload-key secret, in backend and edge | The camera secret is never stored in plaintext anywhere | Leak either store and you get no working credential. The edge validates logins without calling the backend. |
| status enum on the event | One field for publish, share, and billing state | Public sharing, the guest waitlist, and the 402 gate all read one field. |
| Two media models, side by side | A strangler-fig migration. The consumer branches on the S3 key format. | New features land on the new model while old data keeps working. But every read path carries a branch, and that branch is where bugs hide. |
One small tell of the same shift: the S3 prefix is still mifotos/. A fossil of the product's old name. Harmless, but it shows how a rebrand reaches the marketing site long before it reaches the storage keys. Internal names are the hardest to change, for the same reason event_code is. They hold the system up.
Infrastructure and operations
Runtime. The async FastAPI API and the Next.js frontend ship to Ubuntu through shell scripts (deploy.sh). The Cam2Cloud service ships on its own (deploy_ftp.sh, with Docker Compose available). The SQS consumer and the hourly ZIP-cleanup worker start inside the API process (main.py), not as separate daemons. Simple to run, with one scaling catch noted below.
Tenant isolation is layered
Isolation isn't one wall. It's five.
| Layer | Mechanism | What it isolates |
|---|---|---|
| Identity and compute | JWT (python-jose, 24h) plus RBAC. User roles (event_host, photographer, event_manager, wedding_planner, admin) and per-event member roles (creator, admin, member, viewer) with permissions. | Who can touch which event |
| Face data | One Rekognition collection per event | Cross-event matching. Impossible by construction. Not a filter. |
| Object storage | S3 key prefix per user, event, function, camera. Presigned URLs are per-object and time-boxed. | Media access and download links |
| Metadata | MongoDB documents scoped by user_id and event membership | Query-level tenant separation |
| Camera credentials | SHA-256 hash of key and secret in SQLite, plus a chroot-jailed FTP home per key | One camera can't reuse another's credentials or read its frames |
The rest of the security model, from the codebase. Razorpay webhooks are checked with HMAC-SHA256 over the raw request body against X-Razorpay-Signature, and duplicate webhooks are de-duped on the event-id header. The Cam2Cloud notification API needs a bearer token. Fraud scoring runs on event creation and on payments. Uploads are checked for type and size; Rekognition itself caps input images at 5 MB and takes PNG or JPEG. HTTPS is forced by middleware in production. Host galleries are shared only when status is published.
Billing and usage gating
The flow is order-first. POST /payments/create-order makes a Razorpay order. A signature-checked webhook confirms it and provisions an event_subscriptions record. From there, UsageTrackingService checks every upload and every guest search against the plan and returns 402 on overage.
Two things to watch. The webhook handler has to verify against the raw body and dedupe on the event-id, because Razorpay delivers at-least-once. And the live pricing has moved past the code. The product now sells per-event, per-photo: ₹0.89 a photo on Starter, ₹0.69 on Pro, down to ₹0.29 in bulk, first event free, 500 GB up to unlimited storage. The codebase overview still describes monthly tiers. The enforcement is the same either way. The pricing gap is one more sign of a system that outgrew its own description.
Where load and failure bite
| Pressure point | What happens under load | What absorbs it, and what to watch |
|---|---|---|
| The "live" deadline | Shutter to searchable runs through staging, the SQLite queue, S3, SQS, the consumer, Rekognition, then Mongo. It lands at about 25s on average, 2 min at the tail. Any one stall pushes the tail out. | The async design makes "live" possible: the camera never waits on the cloud. It also makes it fragile: every hop can stall. Instrument the full chain end-to-end and alert when the tail drifts past 2 min. The edge Prometheus metrics cover only the first leg. [watch] |
| Rekognition account TPS (global) | Every event shares one TPS budget. Two big events indexing at once can throttle indexing and live search for a third, unrelated event. | SQS buffers indexing off the request path, and RekognitionService bounds concurrency with a thread pool. But synchronous search competes for the same budget. Per-event collections isolate data, not throughput. Levers: raise the TPS quota, give search a priority lane over batch indexing, or rate-limit ingest per account. [watch] |
| SQS at-least-once delivery | A duplicate ObjectCreated would become a duplicate IndexFaces. Corrupted matches, and inflated usage that over-bills or burns quota. | The consumer dedupes on the object key, so a redelivered event is a no-op. Keep that, run a visibility timeout around 6x the processing time, and send poison messages to a DLQ. Idempotency here is a billing control. |
| Usage counters in MongoDB | Per-upload and per-search increments are a write hotspot. Read-modify-write loses updates under load, and the quota drifts. | Use atomic $inc. The 402 gate has to read a consistent counter. [watch] |
| In-process workers vs API scale-out | N API replicas spawn N SQS consumers (fine, they compete for messages) but also N hourly cleanup crons (racing S3 list and delete). | Consumers scale cleanly. The cleanup job should be leader-elected, or moved to a single scheduler, the moment the API runs more than one replica. [watch] |
| Bulk ZIP download | Zipping hundreds of full-res photos in-request would blow memory and block. | download_links plus presigned URLs keep big downloads off the hot path. The cleanup worker expires stale ZIPs after 6h and old links after 90 days. |
Testing and deploy, as described. Unit and API tests in backend/app/tests/ cover photos, albums, guest flows, and security. The edge service has lab and integration scripts. Deployment is script-based on Ubuntu.
Outcome
The codebase shows what was built. The live site shows positioning and traction. Kept apart, both are honest.
| Dimension | Source | Status |
|---|---|---|
| Live Cam2Cloud as the edge | Marketed as India's first live Cam2Cloud platform. Canon, Nikon, Sony, and AWS listed as technology partners (clixs.photo). | Company's claim |
| Per-event face capacity | Up to 20 million face vectors per Rekognition collection. For a single event, effectively no ceiling. | Designed limit |
| Ingestion paths shipped | Cam2Cloud (the custom edge service, 3 instances), web multipart, and legacy album. All funnel into one S3 → SQS → index pipeline. | Built |
| How it evolved | From a conference-only album gallery to a full event/function/camera SaaS with live Cam2Cloud, payments, branding, and team roles. The album-to-media migration is still in flight. | Done, and ongoing |
| Commercial traction | 10,000+ customers, 2.5 million+ photos matched, 99.8% match accuracy (company figures). | Reported |
| Engineering scale | Photo discovery averages 25s, 2 min at the tail. Ingest runs at 10,000 photos a minute, scaled by adding FTP instances. | Measured |
The system proved out a clean tenant boundary (one collection per event), a search path that stays fast, and the hard one: an edge service that turns sealed cameras on bad venue WiFi into a reliable live feed to the cloud. The proof is in the clock. Photos surface in about 25 seconds, 2 minutes at the worst, while the service takes in 10,000 a minute. It did all of this while a live data-model migration ran underneath it.
What I'd watch
Five things I'd keep an eye on.
"Live" is a deadline, and the slowest hop owns it. Today it runs about 25 seconds, 2 minutes at the tail. That headroom is thin, and the chain is long: staging, the SQLite queue, S3, SQS, Rekognition, Mongo. The edge metrics only see the first leg. Measure the whole chain end-to-end and alert when the tail drifts past 2 minutes. Skip that, and "delivered as they're clicked" quietly becomes "delivered later," and you'll hear it from guests before you see it on a graph.
Per-event collections isolate data, not throughput. Rekognition TPS is account-global. The day two big events ingest at once is the day a third event's guests wait. Pick the priority rule before a flagship wedding, not during it. Let synchronous search jump the queue ahead of batch indexing, or throttle ingest per account. Raising the quota only moves the ceiling.
Idempotency is a billing control, not housekeeping. The S3 to SQS path is at-least-once, and venue WiFi guarantees retries. The consumer already dedupes on the object key, so a redelivered event does nothing. Keep it that way. The day someone "optimizes" that check out, one retry double-indexes a face and double-charges a paying customer's quota.