Discord Bot Operations & Scaling
Use this playbook to keep production bots healthy after the initial deploy. It covers monitoring, scheduling, cost controls, sharding, data management, compliance, and incident response.
Monitoring & Alerting
Section titled “Monitoring & Alerting”- Panel Metrics: Every container exposes real-time CPU, RAM, and network graphs. Watch for RAM > 75% or CPU spikes near 90%—scale up before watchdog restarts kick in.
- Log Streaming: Download console logs or stream them live for aggregation into Logtail, Datadog, or Elastic. Include
requestId,guildId, orshardIdin logs to pinpoint issues quickly. - Status Webhooks: Subscribe to https://status.mambahost.com or request a dedicated webhook so incident notifications land in your staff Discord.
- Heartbeat Commands: Run scheduled
pingjobs that report uptime and latency to a monitoring channel.
Scheduling & Automation
Section titled “Scheduling & Automation”- Restarts: Schedule daily or weekly restarts during low-traffic windows to clear memory leaks.
- Dependency Updates: Add a scheduler task that runs your update script (e.g.,
npm run updateorpip install --upgrade -r requirements.txt) followed by a restart. - Cron Jobs: Use the built-in scheduler for tasks such as data exports, leaderboard resets, or timed announcements.
- Backups: Automate
mysqldumpor SQLite copies to/backupswith retention policies (default 7 days).
Scaling Patterns
Section titled “Scaling Patterns”Vertical Scaling
Section titled “Vertical Scaling”- Move from Starter → Pro → Premium when RAM or CPU utilization consistently exceeds 75%.
- Premium containers support PM2/multiprocess setups, multiple bots, and heavier AI inference workloads.
Horizontal Scaling & Sharding
Section titled “Horizontal Scaling & Sharding”- Divide workloads by shards or features. Example: one bot handles moderation (sharded), another handles music streaming.
- Use Redis, PostgreSQL, or MySQL to store state shared across shards/containers.
- Keep slash-command registration scripts aware of shard counts—Discord enforces interaction timeouts per shard.
Multi-Environment Strategy
Section titled “Multi-Environment Strategy”- Dev/Staging: Mirror production env vars except for tokens/guild IDs; register slash commands in a staging guild to avoid polluting production.
- Production: Keep env vars minimal and rotate tokens quarterly.
- Automate promotions by syncing assets from staging to production once smoke tests pass.
Data Management
Section titled “Data Management”- MySQL: Ideal for inventories, audit logs, or economy data. Request credentials via support; add them as env vars.
- SQLite: Suitable for small bots—back up the
.dbfile nightly. - Object Storage: Host large assets (images, audio, templates) on S3-compatible storage to keep the container lean.
- Caching: External Redis/Upstash caches accelerate frequently accessed data and reduce database load.
Cost Controls
Section titled “Cost Controls”- Monitor resource graphs and downgrade tiers if utilization stays below 30% for extended periods.
- Consolidate lightweight bots into one Premium container using PM2 to reduce per-bot costs.
- Offload CPU-heavy tasks (e.g., video transcoding) to serverless workers or queued jobs that run only when needed.
Security & Compliance
Section titled “Security & Compliance”- Rotate tokens when staff changes occur. Remove old tokens immediately in the Discord Dev Portal.
- Use least-privilege OAuth scopes and enable privileged intents only when your bot actually needs them.
- Enable MFA on all panel accounts; create sub-users with scoped permissions rather than sharing root credentials.
- Log sensitive actions (bans, payouts, command escalations) and store them in immutable storage for auditability.
Incident Response
Section titled “Incident Response”- Detect: Watchdog restarts, 5xx errors, or alerting webhooks indicate an incident.
- Stabilize: Scale up, disable problematic features via feature flags, or roll back to the previous deployment.
- Communicate: Post in your community status channel and update status.mambahost.com if needed.
- Postmortem: Capture timeline, root cause, follow-up tasks, and automation improvements.
Integrations with Creator Labs & Game Servers
Section titled “Integrations with Creator Labs & Game Servers”- Sync downtime alerts across Creator Labs websites, Discord bots, and managed game servers.
- Pipe telemetry from game servers into Discord via webhooks hosted on the same infrastructure for consistent latency.
- Use n8n automations (see
docs/N8N_MARKETING_AUTOMATION.md) to route leads or support tickets into bots.
Checklist
Section titled “Checklist”- Monitor metrics weekly and adjust plan tiers accordingly.
- Review env vars quarterly; remove unused secrets.
- Test backups/restore monthly.
- Keep CI pipelines green before every deploy.
- Maintain a staging container for smoke tests.
- Document incident procedures and share with staff.