Module 13: Production Patterns & Security¶
This module covers essential patterns for running Solana DApps in production, including security best practices, CI/CD pipelines, deployment strategies, and operational excellence.
Learning Objectives¶
By the end of this module, you will be able to:
- Implement security best practices for smart contracts
- Set up comprehensive CI/CD pipelines with GitHub Actions
- Deploy using blue-green and canary strategies
- Monitor and respond to production incidents
- Optimize compute units and reduce costs
Prerequisites¶
- Completed all previous modules
- Understanding of Kubernetes basics
- Familiarity with GitHub Actions
- Basic security awareness
Part A: Smart Contract Security¶
Security Checklist¶
Before deploying any Solana program to mainnet, verify these security requirements:
| Category | Check | Priority |
|---|---|---|
| Authorization | All instructions verify required signers | Critical |
| Ownership | Account ownership verified before mutation | Critical |
| Math | All arithmetic uses checked_* operations | Critical |
| Seeds | PDA seeds include unique identifiers | High |
| Reentrancy | State updated before external CPIs | High |
| Validation | All inputs validated and bounded | High |
| Accounts | Account discriminators checked | Medium |
| Events | Critical actions emit events | Medium |
Signer Verification¶
Every instruction that modifies state must verify the appropriate signer:
// INCORRECT - No signer verification
pub fn unsafe_transfer(ctx: Context<Transfer>, amount: u64) -> Result<()> {
// Anyone can call this!
ctx.accounts.vault.amount -= amount;
Ok(())
}
// CORRECT - Proper signer verification
pub fn safe_transfer(ctx: Context<SafeTransfer>, amount: u64) -> Result<()> {
// Authority must sign
require!(
ctx.accounts.authority.key() == ctx.accounts.vault.authority,
ErrorCode::Unauthorized
);
ctx.accounts.vault.amount = ctx.accounts.vault.amount
.checked_sub(amount)
.ok_or(ErrorCode::InsufficientFunds)?;
Ok(())
}
#[derive(Accounts)]
pub struct SafeTransfer<'info> {
#[account(mut)]
pub vault: Account<'info, Vault>,
#[account(signer)] // Constraint ensures signature
pub authority: Signer<'info>,
}
Account Ownership Verification¶
Always verify that accounts are owned by expected programs:
#[derive(Accounts)]
pub struct ProcessPayment<'info> {
// Verify token account is owned by Token Program
#[account(
mut,
constraint = token_account.owner == token::ID @ ErrorCode::InvalidOwner
)]
pub token_account: Account<'info, TokenAccount>,
// Anchor automatically verifies Account<'info, T> ownership
// But be explicit for interface accounts
#[account(
owner = crate::ID @ ErrorCode::InvalidProgramOwner
)]
pub escrow: Account<'info, Escrow>,
}
Checked Arithmetic¶
Always use checked arithmetic to prevent overflow/underflow:
// INCORRECT - Can overflow
pub fn unsafe_math(amount1: u64, amount2: u64) -> u64 {
amount1 + amount2 // Panics or wraps on overflow
}
// CORRECT - Checked operations
pub fn safe_math(amount1: u64, amount2: u64) -> Result<u64> {
amount1
.checked_add(amount2)
.ok_or(ErrorCode::MathOverflow.into())
}
// For complex calculations, use a dedicated module
pub mod math {
use super::*;
pub fn calculate_swap_output(
input_amount: u64,
input_reserve: u64,
output_reserve: u64,
fee_rate: u16,
) -> Result<u64> {
// Calculate fee
let fee = input_amount
.checked_mul(fee_rate as u64)
.ok_or(ErrorCode::MathOverflow)?
.checked_div(10000)
.ok_or(ErrorCode::MathOverflow)?;
// Input after fee
let input_after_fee = input_amount
.checked_sub(fee)
.ok_or(ErrorCode::MathOverflow)?;
// Constant product formula
let numerator = output_reserve
.checked_mul(input_after_fee)
.ok_or(ErrorCode::MathOverflow)?;
let denominator = input_reserve
.checked_add(input_after_fee)
.ok_or(ErrorCode::MathOverflow)?;
numerator
.checked_div(denominator)
.ok_or(ErrorCode::MathOverflow.into())
}
}
PDA Seed Security¶
Use unique, collision-resistant seeds for PDAs:
// INCORRECT - Seeds not unique enough
#[account(
seeds = [b"escrow"], // Only one escrow possible!
bump
)]
pub escrow: Account<'info, Escrow>,
// CORRECT - Unique seeds with user and identifier
#[account(
seeds = [
b"escrow",
maker.key().as_ref(),
&escrow_id.to_le_bytes()
],
bump = escrow.bump
)]
pub escrow: Account<'info, Escrow>,
Reentrancy Protection¶
Update state before making cross-program invocations:
pub fn withdraw(ctx: Context<Withdraw>, amount: u64) -> Result<()> {
let escrow = &mut ctx.accounts.escrow;
// FIRST: Check conditions
require!(escrow.balance >= amount, ErrorCode::InsufficientFunds);
require!(escrow.authority == ctx.accounts.authority.key(), ErrorCode::Unauthorized);
// SECOND: Update state BEFORE external call
escrow.balance = escrow.balance.checked_sub(amount).unwrap();
// THIRD: Make external call (CPI to Token Program)
let cpi_accounts = Transfer {
from: ctx.accounts.vault.to_account_info(),
to: ctx.accounts.recipient.to_account_info(),
authority: ctx.accounts.escrow.to_account_info(),
};
let cpi_program = ctx.accounts.token_program.to_account_info();
let seeds = &[b"escrow", &[ctx.accounts.escrow.bump]];
let signer = &[&seeds[..]];
let cpi_ctx = CpiContext::new_with_signer(cpi_program, cpi_accounts, signer);
token::transfer(cpi_ctx, amount)?;
Ok(())
}
Part B: CI/CD with GitHub Actions¶
Comprehensive Test Workflow¶
# .github/workflows/test.yml
name: Test
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
env:
SOLANA_VERSION: 1.18.0
ANCHOR_VERSION: 0.30.0
NODE_VERSION: 20
RUST_VERSION: 1.75.0
jobs:
# ==========================================
# Anchor Program Tests
# ==========================================
anchor-test:
name: Anchor Tests
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-action@stable
with:
toolchain: ${{ env.RUST_VERSION }}
components: clippy, rustfmt
- name: Cache Cargo
uses: actions/cache@v4
with:
path: |
~/.cargo/bin/
~/.cargo/registry/index/
~/.cargo/registry/cache/
~/.cargo/git/db/
target/
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
- name: Install Solana
run: |
sh -c "$(curl -sSfL https://release.solana.com/v${{ env.SOLANA_VERSION }}/install)"
echo "$HOME/.local/share/solana/install/active_release/bin" >> $GITHUB_PATH
- name: Install Anchor
run: |
cargo install --git https://github.com/coral-xyz/anchor --tag v${{ env.ANCHOR_VERSION }} anchor-cli --locked
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
- name: Install pnpm
run: npm install -g pnpm
- name: Install Dependencies
run: pnpm install
- name: Build Programs
run: anchor build
- name: Run Tests
run: anchor test
- name: Upload Coverage
uses: codecov/codecov-action@v4
with:
files: ./coverage/lcov.info
fail_ci_if_error: false
# ==========================================
# Rust Linting and Security
# ==========================================
rust-quality:
name: Rust Quality
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-action@stable
with:
toolchain: ${{ env.RUST_VERSION }}
components: clippy, rustfmt
- name: Check Formatting
run: cargo fmt --all -- --check
- name: Clippy Lints
run: cargo clippy --all-targets --all-features -- -D warnings
- name: Security Audit
run: |
cargo install cargo-audit
cargo audit
# ==========================================
# Frontend Tests
# ==========================================
frontend-test:
name: Frontend Tests
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./app
steps:
- uses: actions/checkout@v4
- name: Setup Node
uses: actions/setup-node@v4
with:
node-version: ${{ env.NODE_VERSION }}
cache: 'pnpm'
cache-dependency-path: app/pnpm-lock.yaml
- name: Install pnpm
run: npm install -g pnpm
- name: Install Dependencies
run: pnpm install
- name: Type Check
run: pnpm type-check
- name: Lint
run: pnpm lint
- name: Test
run: pnpm test:ci
- name: Build
run: pnpm build
# ==========================================
# API Tests
# ==========================================
api-test:
name: API Tests
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./api
services:
postgres:
image: postgres:15
env:
POSTGRES_PASSWORD: test
POSTGRES_DB: test_db
ports:
- 5432:5432
options: >-
--health-cmd pg_isready
--health-interval 10s
--health-timeout 5s
--health-retries 5
steps:
- uses: actions/checkout@v4
- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install Poetry
run: pip install poetry
- name: Install Dependencies
run: poetry install
- name: Run Tests
env:
DATABASE_URL: postgresql://postgres:test@localhost:5432/test_db
run: poetry run pytest --cov=app --cov-report=xml
- name: Upload Coverage
uses: codecov/codecov-action@v4
with:
files: ./coverage.xml
# ==========================================
# Rust Services Tests
# ==========================================
services-test:
name: Services Tests
runs-on: ubuntu-latest
defaults:
run:
working-directory: ./services
steps:
- uses: actions/checkout@v4
- name: Install Rust
uses: dtolnay/rust-action@stable
with:
toolchain: ${{ env.RUST_VERSION }}
- name: Build Services
run: cargo build --workspace
- name: Test Services
run: cargo test --workspace
Deploy to Devnet Workflow¶
# .github/workflows/deploy-devnet.yml
name: Deploy to Devnet
on:
push:
branches: [develop]
workflow_dispatch:
env:
SOLANA_VERSION: 1.18.0
ANCHOR_VERSION: 0.30.0
jobs:
deploy-programs:
name: Deploy Programs
runs-on: ubuntu-latest
environment: devnet
steps:
- uses: actions/checkout@v4
- name: Install Solana
run: |
sh -c "$(curl -sSfL https://release.solana.com/v${{ env.SOLANA_VERSION }}/install)"
echo "$HOME/.local/share/solana/install/active_release/bin" >> $GITHUB_PATH
- name: Install Anchor
run: cargo install --git https://github.com/coral-xyz/anchor --tag v${{ env.ANCHOR_VERSION }} anchor-cli --locked
- name: Setup Keypair
run: |
echo "${{ secrets.DEVNET_DEPLOYER_KEYPAIR }}" > ~/.config/solana/id.json
solana config set --url devnet
- name: Build Programs
run: anchor build
- name: Deploy Programs
run: |
anchor deploy --provider.cluster devnet
- name: Verify Deployment
run: |
solana program show $(solana address -k target/deploy/token_escrow-keypair.json)
solana program show $(solana address -k target/deploy/nft_marketplace-keypair.json)
solana program show $(solana address -k target/deploy/defi_amm-keypair.json)
solana program show $(solana address -k target/deploy/dao_governance-keypair.json)
deploy-services:
name: Deploy Services
runs-on: ubuntu-latest
needs: deploy-programs
environment: devnet
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to ECR
uses: aws-actions/amazon-ecr-login@v2
- name: Build and Push Images
run: |
# Build API
docker build -t ${{ secrets.ECR_REGISTRY }}/solana-dapps-api:${{ github.sha }} ./api
docker push ${{ secrets.ECR_REGISTRY }}/solana-dapps-api:${{ github.sha }}
# Build App
docker build -t ${{ secrets.ECR_REGISTRY }}/solana-dapps-app:${{ github.sha }} ./app
docker push ${{ secrets.ECR_REGISTRY }}/solana-dapps-app:${{ github.sha }}
# Build Indexer
docker build -t ${{ secrets.ECR_REGISTRY }}/solana-dapps-indexer:${{ github.sha }} ./services/indexer
docker push ${{ secrets.ECR_REGISTRY }}/solana-dapps-indexer:${{ github.sha }}
# Build Relay
docker build -t ${{ secrets.ECR_REGISTRY }}/solana-dapps-relay:${{ github.sha }} ./services/relay
docker push ${{ secrets.ECR_REGISTRY }}/solana-dapps-relay:${{ github.sha }}
- name: Deploy to Kubernetes
run: |
aws eks update-kubeconfig --name solana-dapps-devnet
helm upgrade --install solana-dapps ./k8s/helm/solana-dapps \
--namespace solana-dapps-devnet \
--values ./k8s/helm/solana-dapps/values-devnet.yaml \
--set api.image.tag=${{ github.sha }} \
--set app.image.tag=${{ github.sha }} \
--set indexer.image.tag=${{ github.sha }} \
--set relay.image.tag=${{ github.sha }} \
--wait --timeout 10m
Mainnet Release Workflow¶
# .github/workflows/release.yml
name: Release to Mainnet
on:
release:
types: [published]
jobs:
verify:
name: Verify Release
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Verify Tests Passed
run: |
# Check that all tests passed on this commit
gh run list --commit ${{ github.sha }} --status success --json name | jq -e '.[] | select(.name == "Test")'
env:
GH_TOKEN: ${{ github.token }}
- name: Verify Audit
run: |
# Ensure security audit was performed
cargo audit
- name: Verify Build
run: anchor build --verifiable
deploy-mainnet:
name: Deploy to Mainnet
runs-on: ubuntu-latest
needs: verify
environment: mainnet
steps:
- uses: actions/checkout@v4
- name: Install Solana
run: |
sh -c "$(curl -sSfL https://release.solana.com/v${{ env.SOLANA_VERSION }}/install)"
echo "$HOME/.local/share/solana/install/active_release/bin" >> $GITHUB_PATH
- name: Setup Keypair
run: |
echo "${{ secrets.MAINNET_DEPLOYER_KEYPAIR }}" > ~/.config/solana/id.json
solana config set --url mainnet-beta
- name: Build Verifiable
run: anchor build --verifiable
- name: Deploy with Confirmation
run: |
# Deploy each program with confirmation
anchor deploy --provider.cluster mainnet-beta
- name: Verify On-Chain
run: |
# Verify deployed bytecode matches
anchor verify $(solana address -k target/deploy/token_escrow-keypair.json)
notify:
name: Notify
runs-on: ubuntu-latest
needs: deploy-mainnet
steps:
- name: Notify Slack
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "Mainnet deployment complete for ${{ github.event.release.tag_name }}",
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": "*Solana DApps Mainnet Release*\n*Version:* ${{ github.event.release.tag_name }}\n*Commit:* ${{ github.sha }}"
}
}
]
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK }}
Part C: Deployment Strategies¶
Blue-Green Deployment¶
Blue-green deployments maintain two identical production environments:
# k8s/helm/solana-dapps/templates/deployment-blue-green.yaml
{{- if .Values.blueGreen.enabled }}
# Blue Deployment (Current Production)
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "api.fullname" . }}-blue
labels:
{{- include "api.labels" . | nindent 4 }}
version: blue
spec:
replicas: {{ .Values.blueGreen.blue.replicas }}
selector:
matchLabels:
{{- include "api.selectorLabels" . | nindent 6 }}
version: blue
template:
metadata:
labels:
{{- include "api.selectorLabels" . | nindent 8 }}
version: blue
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.blueGreen.blue.tag }}"
# ... rest of container spec
---
# Green Deployment (New Version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "api.fullname" . }}-green
labels:
{{- include "api.labels" . | nindent 4 }}
version: green
spec:
replicas: {{ .Values.blueGreen.green.replicas }}
selector:
matchLabels:
{{- include "api.selectorLabels" . | nindent 6 }}
version: green
template:
metadata:
labels:
{{- include "api.selectorLabels" . | nindent 8 }}
version: green
spec:
containers:
- name: {{ .Chart.Name }}
image: "{{ .Values.image.repository }}:{{ .Values.blueGreen.green.tag }}"
---
# Service points to active color
apiVersion: v1
kind: Service
metadata:
name: {{ include "api.fullname" . }}
spec:
selector:
{{- include "api.selectorLabels" . | nindent 4 }}
version: {{ .Values.blueGreen.active }} # "blue" or "green"
ports:
- port: 8000
targetPort: http
{{- end }}
Blue-green switch script:
#!/bin/bash
# scripts/switch-blue-green.sh
NAMESPACE="solana-dapps"
CURRENT=$(kubectl get svc solana-dapps-api -n $NAMESPACE -o jsonpath='{.spec.selector.version}')
if [ "$CURRENT" == "blue" ]; then
NEW="green"
else
NEW="blue"
fi
echo "Switching from $CURRENT to $NEW..."
# Verify new deployment is healthy
kubectl rollout status deployment/solana-dapps-api-$NEW -n $NAMESPACE
# Switch traffic
kubectl patch svc solana-dapps-api -n $NAMESPACE -p "{\"spec\":{\"selector\":{\"version\":\"$NEW\"}}}"
echo "Traffic now pointing to $NEW"
Canary Deployment¶
Gradually roll out to a subset of users:
# k8s/helm/solana-dapps/templates/canary.yaml
{{- if .Values.canary.enabled }}
apiVersion: flagger.app/v1beta1
kind: Canary
metadata:
name: {{ include "api.fullname" . }}
namespace: {{ .Release.Namespace }}
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ include "api.fullname" . }}
progressDeadlineSeconds: 60
service:
port: 8000
targetPort: http
gateways:
- public-gateway
hosts:
- {{ .Values.global.ingress.host }}
analysis:
interval: 1m
threshold: 5
maxWeight: 50
stepWeight: 10
metrics:
- name: request-success-rate
thresholdRange:
min: 99
interval: 1m
- name: request-duration
thresholdRange:
max: 500
interval: 1m
webhooks:
- name: load-test
url: http://flagger-loadtester.test/
timeout: 5s
metadata:
cmd: "hey -z 1m -q 10 -c 2 http://{{ include "api.fullname" . }}-canary:8000/"
{{- end }}
Part D: Monitoring and Alerting¶
Prometheus Alerting Rules¶
# k8s/monitoring/alerting-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: solana-dapps-alerts
namespace: monitoring
spec:
groups:
- name: solana-dapps.rules
rules:
# High Error Rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5..", service="api"}[5m])) /
sum(rate(http_requests_total{service="api"}[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate detected"
description: "Error rate is above 5% for the last 5 minutes"
# High Latency
- alert: HighLatency
expr: |
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{service="api"}[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High latency detected"
description: "P95 latency is above 1 second"
# Pod Restarts
- alert: PodCrashLooping
expr: |
rate(kube_pod_container_status_restarts_total{namespace="solana-dapps"}[15m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Pod is crash looping"
description: "Pod {{ $labels.pod }} is restarting frequently"
# Transaction Failures
- alert: HighTransactionFailureRate
expr: |
sum(rate(solana_tx_failed_total[5m])) /
sum(rate(solana_tx_total[5m])) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High Solana transaction failure rate"
description: "More than 10% of transactions are failing"
# RPC Endpoint Issues
- alert: RPCEndpointDown
expr: solana_rpc_up == 0
for: 2m
labels:
severity: critical
annotations:
summary: "Solana RPC endpoint is unreachable"
description: "Cannot connect to Solana RPC endpoint"
# Database Connection Issues
- alert: DatabaseConnectionPoolExhausted
expr: db_pool_available_connections < 2
for: 5m
labels:
severity: warning
annotations:
summary: "Database connection pool nearly exhausted"
description: "Only {{ $value }} connections available"
PagerDuty Integration¶
# k8s/monitoring/alertmanager-config.yaml
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-config
namespace: monitoring
type: Opaque
stringData:
alertmanager.yaml: |
global:
resolve_timeout: 5m
pagerduty_url: 'https://events.pagerduty.com/v2/enqueue'
route:
group_by: ['alertname', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: 'default'
routes:
- match:
severity: critical
receiver: 'pagerduty-critical'
- match:
severity: warning
receiver: 'slack-warnings'
receivers:
- name: 'default'
slack_configs:
- channel: '#solana-dapps-alerts'
send_resolved: true
- name: 'pagerduty-critical'
pagerduty_configs:
- service_key_file: '/etc/alertmanager/secrets/pagerduty-key'
severity: critical
- name: 'slack-warnings'
slack_configs:
- channel: '#solana-dapps-warnings'
send_resolved: true
Part E: Cost Optimization¶
Compute Unit Optimization¶
Solana programs have a compute budget. Optimize to reduce costs:
// Optimize account data size
#[account]
pub struct OptimizedEscrow {
pub maker: Pubkey, // 32 bytes
pub taker: Pubkey, // 32 bytes
pub mint_a: Pubkey, // 32 bytes
pub mint_b: Pubkey, // 32 bytes
pub amount_a: u64, // 8 bytes
pub amount_b: u64, // 8 bytes
pub vault_bump: u8, // 1 byte
pub state: u8, // 1 byte (instead of enum with padding)
// Total: 146 bytes (vs potentially larger with unnecessary fields)
}
// Use smaller types where possible
pub fn calculate_fee(amount: u64, fee_bps: u16) -> u64 {
// u16 is sufficient for basis points (0-10000)
(amount as u128 * fee_bps as u128 / 10000) as u64
}
// Avoid redundant deserialization
pub fn process_batch(ctx: Context<ProcessBatch>) -> Result<()> {
// Process accounts in batch instead of individual transactions
for account_info in ctx.remaining_accounts.iter() {
// Process each account
}
Ok(())
}
Request compute budget only when needed:
// Frontend: Set compute budget for complex transactions
import { ComputeBudgetProgram } from '@solana/web3.js';
const modifyComputeUnits = ComputeBudgetProgram.setComputeUnitLimit({
units: 300000 // Increase for complex operations
});
const addPriorityFee = ComputeBudgetProgram.setComputeUnitPrice({
microLamports: 1000 // Priority fee for faster inclusion
});
const transaction = new Transaction()
.add(modifyComputeUnits)
.add(addPriorityFee)
.add(yourInstruction);
RPC Cost Management¶
# api/app/services/rpc_manager.py
from typing import List
import httpx
import asyncio
from dataclasses import dataclass
from datetime import datetime, timedelta
@dataclass
class RPCEndpoint:
url: str
priority: int
rate_limit: int # requests per second
current_count: int = 0
last_reset: datetime = datetime.now()
class RPCManager:
def __init__(self, endpoints: List[RPCEndpoint]):
self.endpoints = sorted(endpoints, key=lambda e: e.priority)
self.current_index = 0
async def get_endpoint(self) -> str:
"""Get next available endpoint with rate limiting."""
for endpoint in self.endpoints:
# Reset counter if needed
if datetime.now() - endpoint.last_reset > timedelta(seconds=1):
endpoint.current_count = 0
endpoint.last_reset = datetime.now()
# Check rate limit
if endpoint.current_count < endpoint.rate_limit:
endpoint.current_count += 1
return endpoint.url
# All endpoints rate limited, wait and retry
await asyncio.sleep(0.1)
return await self.get_endpoint()
async def make_request(self, method: str, params: list) -> dict:
"""Make RPC request with automatic failover."""
endpoint = await self.get_endpoint()
async with httpx.AsyncClient() as client:
try:
response = await client.post(
endpoint,
json={"jsonrpc": "2.0", "id": 1, "method": method, "params": params},
timeout=30.0
)
return response.json()
except Exception as e:
# Mark endpoint as failed and try next
self.endpoints = [e for e in self.endpoints if e.url != endpoint]
if self.endpoints:
return await self.make_request(method, params)
raise e
Part F: Incident Response¶
Runbook Template¶
# Incident Runbook: High Error Rate
## Detection
- Alert: HighErrorRate
- Threshold: Error rate > 5% for 5 minutes
- Dashboard: [Grafana Link]
## Initial Assessment (5 minutes)
1. Check error rate trend in Grafana
2. Identify affected endpoints via logs
3. Check recent deployments
4. Verify RPC endpoint health
## Common Causes & Remediation
### RPC Endpoint Issues
**Symptoms:** All transactions failing, RPC timeout errors
**Check:** `solana cluster-version --url <rpc-url>`
**Fix:**
1. Switch to backup RPC: `kubectl set env deployment/api SOLANA_RPC_URL=<backup>`
2. Notify RPC provider
### Database Connection Issues
**Symptoms:** API timeouts, connection pool exhausted
**Check:** `kubectl logs -l app=api | grep "connection"`
**Fix:**
1. Restart API pods: `kubectl rollout restart deployment/api`
2. Scale up if persistent: `kubectl scale deployment/api --replicas=5`
### Program Bug
**Symptoms:** Specific instruction failing, error logs show program error
**Check:** Review recent program deployments
**Fix:**
1. Rollback program if critical: `solana program deploy <previous-binary>`
2. Disable affected feature via feature flag
## Escalation
- L1 (On-Call): Initial triage, known issues
- L2 (Senior Engineer): Complex debugging, code changes
- L3 (Architect): System-wide issues, critical decisions
## Post-Incident
1. Document timeline in incident report
2. Identify root cause
3. Create follow-up tickets for prevention
4. Update runbook if needed
Feature Flags¶
# api/app/services/feature_flags.py
from enum import Enum
from typing import Optional
import redis.asyncio as redis
class FeatureFlag(Enum):
ENABLE_ESCROW = "enable_escrow"
ENABLE_MARKETPLACE = "enable_marketplace"
ENABLE_AMM = "enable_amm"
ENABLE_GOVERNANCE = "enable_governance"
MAINTENANCE_MODE = "maintenance_mode"
class FeatureFlagService:
def __init__(self, redis_url: str):
self.redis = redis.from_url(redis_url)
self.cache: dict[str, bool] = {}
self.cache_ttl = 60 # seconds
async def is_enabled(self, flag: FeatureFlag) -> bool:
"""Check if feature flag is enabled."""
# Check cache first
if flag.value in self.cache:
return self.cache[flag.value]
# Fetch from Redis
value = await self.redis.get(f"feature:{flag.value}")
enabled = value == b"true" if value else True # Default enabled
self.cache[flag.value] = enabled
return enabled
async def set_flag(self, flag: FeatureFlag, enabled: bool) -> None:
"""Set feature flag value."""
await self.redis.set(f"feature:{flag.value}", str(enabled).lower())
self.cache[flag.value] = enabled
async def emergency_disable(self, flag: FeatureFlag) -> None:
"""Emergency disable a feature."""
await self.set_flag(flag, False)
# Clear all caches
self.cache.clear()
# Usage in FastAPI
from fastapi import HTTPException, Depends
async def check_escrow_enabled(
flags: FeatureFlagService = Depends(get_feature_flags)
):
if not await flags.is_enabled(FeatureFlag.ENABLE_ESCROW):
raise HTTPException(503, "Escrow feature is currently disabled")
return True
@router.post("/escrows", dependencies=[Depends(check_escrow_enabled)])
async def create_escrow(...):
...
Part G: Program Upgrades¶
Upgrade Authority Management¶
// Always use multisig for mainnet upgrade authority
use squads_multisig::state::Ms;
pub fn upgrade_program_with_multisig(
ctx: Context<UpgradeWithMultisig>,
buffer: Pubkey,
) -> Result<()> {
// Verify multisig has signed
require!(
ctx.accounts.multisig.transaction_index > ctx.accounts.multisig.ms_change_index,
ErrorCode::MultisigNotApproved
);
// Proceed with upgrade via BPF Upgradeable Loader
let upgrade_ix = bpf_loader_upgradeable::upgrade(
&ctx.accounts.program.key(),
&buffer,
&ctx.accounts.multisig.key(),
&ctx.accounts.spill.key(),
);
invoke_signed(
&upgrade_ix,
&[
ctx.accounts.program.to_account_info(),
ctx.accounts.program_data.to_account_info(),
ctx.accounts.buffer.to_account_info(),
ctx.accounts.multisig.to_account_info(),
ctx.accounts.spill.to_account_info(),
ctx.accounts.rent.to_account_info(),
ctx.accounts.clock.to_account_info(),
ctx.accounts.bpf_loader.to_account_info(),
],
&[/* multisig seeds */],
)?;
Ok(())
}
Version Migration¶
// Support multiple account versions for migration
#[account]
pub struct EscrowV1 {
pub version: u8, // 1
pub maker: Pubkey,
pub amount: u64,
}
#[account]
pub struct EscrowV2 {
pub version: u8, // 2
pub maker: Pubkey,
pub taker: Option<Pubkey>, // New field
pub amount: u64,
pub created_at: i64, // New field
}
pub fn migrate_escrow_v1_to_v2(ctx: Context<MigrateEscrow>) -> Result<()> {
let old_escrow = &ctx.accounts.old_escrow;
let new_escrow = &mut ctx.accounts.new_escrow;
// Copy existing fields
new_escrow.version = 2;
new_escrow.maker = old_escrow.maker;
new_escrow.amount = old_escrow.amount;
// Initialize new fields
new_escrow.taker = None;
new_escrow.created_at = Clock::get()?.unix_timestamp;
// Close old account, return rent to payer
ctx.accounts.old_escrow.close(ctx.accounts.payer.to_account_info())?;
emit!(EscrowMigrated {
old_pubkey: ctx.accounts.old_escrow.key(),
new_pubkey: ctx.accounts.new_escrow.key(),
version: 2,
});
Ok(())
}
Summary¶
In this module, you learned:
- Security: Implementing signer verification, checked math, and reentrancy protection
- CI/CD: Setting up comprehensive GitHub Actions workflows for testing and deployment
- Deployment Strategies: Blue-green and canary deployments for safe releases
- Monitoring: Prometheus alerting rules and PagerDuty integration
- Cost Optimization: Compute unit optimization and RPC cost management
- Incident Response: Runbooks, feature flags, and emergency procedures
- Upgrades: Safe program upgrade patterns with multisig and version migration
Key Takeaways¶
- Never deploy without comprehensive testing and security review
- Implement feature flags for quick incident response
- Monitor everything and alert on meaningful thresholds
- Have runbooks ready for common incident types
- Use multisig for all mainnet upgrade authorities
- Plan for version migration before deploying
Course Completion¶
Congratulations! You have completed the Solana DApps course. You now have the knowledge to:
- Build production-ready Solana programs with Anchor
- Create modern frontends with React and wallet integration
- Deploy scalable backends with FastAPI and Poem
- Operate applications in Kubernetes with observability
- Implement security best practices and incident response
Next Steps¶
- Build: Create your own DApp using these patterns
- Test: Deploy to devnet and iterate
- Audit: Get a security review before mainnet
- Launch: Deploy with monitoring and alerting
- Iterate: Continuously improve based on user feedback