What is Web3?
●A quick recap: Decentralized, user-owned, built on blockchains (e.g.,
Ethereum).
● Core components: Smart Contracts, On-Chain Events, Transactions.
The Problem with On-Chain Data
● Blockchains are a slow "database." Direct querying for complex dApp front-
ends is inefficient and often impossible.
● dApps need fast, reliable, and indexed access to both real-time and historical
on-chain data.
Why Google Cloud for Decentralized Apps?
● We're not putting the blockchain on GCP. We're building a highly performant
off-chain infrastructure layer to support our on-chain logic.
● Gain scalability, reliability, advanced data processing, and security that are
difficult to achieve otherwise.
The Web3 Challenge &
The Cloud Opportunity
3.
1. The CoreArchitecture: An overview of our data pipeline.
2. Listening to the Chain: Connecting to the blockchain with Infura.
3. Ingesting Events: Decoupling services with Pub/Sub.
4. Processing & Caching: The role of workers and Memorystore.
5. Choosing Your Database: Storing data in Cloud SQL, Firestore, &
BigQuery.
6. Serving Your dApp: Building a scalable API on GKE.
7. Putting It All Together: A complete, scalable Kubernetes architecture.
Workshop Agenda
What is Infura?
●A Blockchain Node Provider. It gives us a reliable API (HTTP & WebSocket) endpoint to
communicate with a blockchain network without running our own node.
Listening for Events
● We use a WebSocket connection to an Infura gateway.
● This allows us to subscribe to specific smart contract events in real-time.
● Example: Subscribing to a Transfer event on an ERC-20 token contract.
The "Listener" Service
● A simple, lightweight microservice whose only job is to maintain this WebSocket
connection and receive events.
● Once an event is received, it immediately passes it on for processing.
Step 1 - Listening to the Chain with Infura
6.
The Problem:
What ifyour listener service crashes? What if event processing is slow? You'll lose events and create a
bottleneck.
Solution: Google Cloud Pub/Sub
● A fully-managed, real-time messaging service.
● Our Listener becomes a Publisher: It receives an event from Infura and immediately publishes it to a
Pub/Sub "topic". Its job is done.
● This decouples event ingestion from event processing, ensuring no data is lost.
● It acts as a buffer, smoothing out traffic spikes.
Step 2 - Decoupling with Event Handlers (Pub/Sub)
The "Worker" Service
●A separate microservice that subscribes to the
Pub/Sub topic.
● Its job is to pull events from the queue and
perform the heavy lifting:
○ Decode event data.
○ Enrich data by calling other APIs.
○ Format data for storage in a database.
Step 3 - Processing Events with Workers & Caching
Caching with Memorystore for Redis
● Before hitting a database, we can use an in-
memory cache for ultra-fast operations.
● Why Cache?
○ Deduplication: Block reorgs can cause
duplicate events. Use Redis to check if an
event has already been processed.
○ Session Data: Store temporary user data.
○ Rate Limiting: Control access to
resources.
10.
There's no singlebest database; use the right tool for the job.
Cloud SQL (Managed PostgreSQL/MySQL)
● Use for: Structured, relational data.
● Web3 Example: Storing user profiles, transaction histories with clear relationships,
financial ledgers.
Firestore (NoSQL Document DB)
● Use for: Flexible, semi-structured data; real-time front-end updates.
● Web3 Example: Storing NFT metadata (attributes can vary), user-specific settings, activity
feeds.
BigQuery (Serverless Data Warehouse)
● Use for: Large-scale analytics on all your blockchain data.
● Web3 Example: Analyzing token velocity, finding top NFT holders, tracking DeFi protocol
health. Workers can stream all processed events directly into BigQuery.
Step 4 - Choosing the Right Google
Cloud Database
11.
Why Kubernetes?
● Wehave multiple microservices (listeners, workers, API). Kubernetes is perfect for
orchestrating them.
● Google Kubernetes Engine (GKE) is Google's managed Kubernetes service.
Key Benefits for Web3:
● Auto-scaling: Automatically add or remove service replicas (pods) based on load. If your
Pub/Sub queue gets long, GKE can automatically spin up more worker pods.
● Resilience: If a pod crashes, GKE automatically restarts it.
● Resource Management: Efficiently pack your services onto Compute Engine VMs (your
GKE nodes). You can choose different VM types (e.g., e2-standard for APIs, n2-high-cpu
for compute-heavy workers).
A Scalable Kubernetes-Based
Architecture (GKE)
12.
This is howwe organize our services inside a GKE cluster for maximum scalability and
security.
1. The Event Listener Deployment
● A set of pods running the listener code.
● Only needs outbound internet access to connect to Infura.
2. The Worker Deployment
● A set of pods running the processing code.
● Doesn't need any public internet access. It just talks to Pub/Sub and your internal
databases. This is more secure.
3. The API Layer Deployment
● A set of pods running your API server (e.g., Express.js, FastAPI).
● Exposed to the internet via a Google Cloud Load Balancer to serve requests from your
dApp's front-end.
Splitting Your Backend on GKE
13.
● Cloud Run:A serverless alternative for your API or simple workers. Pay-per-use
and scales to zero.
● API Gateway / Apigee: Secure your public API with authentication, rate limiting,
and monitoring.
● Cloud Armor: Protect your API layer from DDoS and other web-based attacks.
● Cloud Monitoring & Logging: Get full observability into your entire system. Create
dashboards and alerts to monitor the health of your pipeline.
● Identity Platform: Manage user identity with both traditional (email/social) and
Web3 (Connect Wallet) sign-in methods.
Other Powerful GCP Products
14.
Recap: We combinethe decentralized world of blockchain events with the scalable,
reliable infrastructure of Google Cloud.
Key Architecture Pattern:
1. Ingest events reliably using an external gateway (Infura).
2. Decouple services with a message queue (Pub/Sub).
3. Process data with scalable workers (GKE / Cloud Run).
4. Store data in the right database for the job (Cloud SQL, Firestore, BigQuery).
5. Serve data to your users via a managed API layer (GKE / Cloud Run).
Questions?
Summary & Q&A