The Talent500 Blog
micro-frontend

Canva’s Robust Product Analytics Pipeline: Collecting 25 Billion Events Daily

Canva, the popular graphic design platform, has developed an impressive product analytics pipeline that processes a staggering 25 billion events per day, equivalent to 800 billion events per month. This sophisticated system plays a crucial role in driving data-informed decisions and powering various user-facing features. Let’s delve into the intricacies of Canva’s analytics infrastructure and explore how they manage this massive data stream with 99.999% uptime.

The Foundation: Structured Event Schema

At the core of Canva’s analytics success lies a fundamental decision: every collected event must adhere to a machine-readable, well-documented schema. This approach ensures data consistency and facilitates easier processing and analysis.

Protobuf-based Schema Definition

Canva utilizes the Protobuf language to define analytics event schemas, leveraging the team’s familiarity with this format for microservice contracts. However, they’ve implemented an additional rule: event schemas must maintain full transitive compatibility. This means that any schema version can deserialize any version of an event, ensuring both forward and backward compatibility.

Datumgen: Custom Code Generator

To enforce schema compatibility rules and streamline development, Canva created Datumgen, a custom code generator built on top of protoc. This tool verifies schema compatibility and generates code in multiple languages, including:

  • TypeScript definitions for frontend type checking
  • Java definitions for backend event handling
  • SQL definitions for Snowflake table schemas
  • An Event Catalog for easy exploration of collected events

The Collection Process

Canva’s event collection pipeline is designed for efficiency and reliability:

  1. Client-side Collection: Events are initially captured by Canva’s application, using a shared core library written in TypeScript for consistency across platforms.
  2. Server-side Validation: Events are sent to a server endpoint, which validates each event against predefined schemas before forwarding them to an ingest-worker via Kinesis Data Stream (KDS).
  3. Event Enrichment: The ingest-worker processes events, adding details like geolocation data and device information.
  4. Data Streaming: Processed events are sent to another KDS for routing to various consumers.

Optimizing for Performance and Cost

Canva has implemented several optimizations to enhance their analytics pipeline:

Kinesis Data Stream (KDS) Adoption

After evaluating options, Canva migrated from AWS SQS and SNS to KDS, resulting in an 85% cost reduction while maintaining acceptable performance.

Data Compression

By implementing zstd compression on batches of events, Canva achieved a 10x compression ratio, saving an estimated $600K per year in AWS costs.

SQS Fallback Mechanism

To handle KDS throttling and high tail latency, Canva implemented an SQS fallback system. This ensures consistent response times and provides a failover option during KDS outages.

Distribution and Consumption

Canva’s decoupled router allows for flexible event distribution to various consumers:

  • Snowflake: All event types are delivered to Snowflake using Snowpipe Streaming for analytics and dashboarding.
  • Real-time Consumers: Backend services can subscribe to events via KDS or SQS, depending on their needs.

Deduplication and Guarantees

Canva’s analytics service provides an at-least-once delivery guarantee, with the responsibility for deduplication falling on individual consumers when necessary.

By implementing this robust and scalable analytics pipeline, Canva has created a powerful foundation for data-driven decision-making and feature development, processing an impressive 25 billion events daily while maintaining high reliability and cost-effectiveness.

Read more such articles from our Newsletter here.

1+
Avatar

prachi kothiyal

Add comment