Canva's Data Powerhouse: Processing 25 Billion Events Daily

Canva, the popular graphic design platform, has developed an impressive product analytics pipeline that processes a staggering 25 billion events per day, equivalent to 800 billion events per month. This sophisticated system plays a crucial role in driving data-informed decisions and powering various user-facing features. Let’s delve into the intricacies of Canva’s analytics infrastructure and explore how they manage this massive data stream with 99.999% uptime.

Jump to

The Foundation: Structured Event Schema

At the core of Canva’s analytics success lies a fundamental decision: every collected event must adhere to a machine-readable, well-documented schema. This approach ensures data consistency and facilitates easier processing and analysis.

Protobuf-based Schema Definition

Canva utilizes the Protobuf language to define analytics event schemas, leveraging the team’s familiarity with this format for microservice contracts. However, they’ve implemented an additional rule: event schemas must maintain full transitive compatibility. This means that any schema version can deserialize any version of an event, ensuring both forward and backward compatibility.

Datumgen: Custom Code Generator

To enforce schema compatibility rules and streamline development, Canva created Datumgen, a custom code generator built on top of protoc. This tool verifies schema compatibility and generates code in multiple languages, including:

TypeScript definitions for frontend type checking
Java definitions for backend event handling
SQL definitions for Snowflake table schemas
An Event Catalog for easy exploration of collected events

The Collection Process

Canva’s event collection pipeline is designed for efficiency and reliability:

Client-side Collection: Events are initially captured by Canva’s application, using a shared core library written in TypeScript for consistency across platforms.
Server-side Validation: Events are sent to a server endpoint, which validates each event against predefined schemas before forwarding them to an ingest-worker via Kinesis Data Stream (KDS).
Event Enrichment: The ingest-worker processes events, adding details like geolocation data and device information.
Data Streaming: Processed events are sent to another KDS for routing to various consumers.

Optimizing for Performance and Cost

Canva has implemented several optimizations to enhance their analytics pipeline:

Kinesis Data Stream (KDS) Adoption

After evaluating options, Canva migrated from AWS SQS and SNS to KDS, resulting in an 85% cost reduction while maintaining acceptable performance.

Data Compression

By implementing zstd compression on batches of events, Canva achieved a 10x compression ratio, saving an estimated $600K per year in AWS costs.

SQS Fallback Mechanism

To handle KDS throttling and high tail latency, Canva implemented an SQS fallback system. This ensures consistent response times and provides a failover option during KDS outages.

Distribution and Consumption

Canva’s decoupled router allows for flexible event distribution to various consumers:

Snowflake: All event types are delivered to Snowflake using Snowpipe Streaming for analytics and dashboarding.
Real-time Consumers: Backend services can subscribe to events via KDS or SQS, depending on their needs.

Deduplication and Guarantees

Canva’s analytics service provides an at-least-once delivery guarantee, with the responsibility for deduplication falling on individual consumers when necessary.

By implementing this robust and scalable analytics pipeline, Canva has created a powerful foundation for data-driven decision-making and feature development, processing an impressive 25 billion events daily while maintaining high reliability and cost-effectiveness.

Read more such articles from our Newsletter here.

Canva’s Robust Product Analytics Pipeline: Collecting 25 Billion Events Daily

The Foundation: Structured Event Schema

The Collection Process