When production breaks, your job is to slow things down, gather facts, and bring in the right help. This simple plan keeps you calm, limits damage, and speeds up recovery. Share it with your team.
Step 1: Pause and set status
Tell your team you see the issue. If users are affected, post a short status note. Keep it simple:
We are investigating a production error. Some users may see failures. Next update in 30 minutes.
Promise a time for the next update and stick to it.
Step 2: Capture what users see
Ask support to collect four details:
- The exact time the problem started
- The page or action that fails
- Any error message or code shown
- The user email or account if it helps
These details let engineers match reports to logs.
Step 3: Check simple health signals
Open your host dashboard or uptime tool. Confirm:
- Are there 500 errors or timeouts?
- Are servers or containers restarting?
- Is the database reachable?
Do not change anything yet. You are just confirming scope.
Step 4: Review recent changes
Ask two questions:
- What code shipped in the last 24 hours?
- What infra changes happened in the last week?
If a change lines up with the start of the outage, consider rolling back to the last good release or image.
Step 5: Check Rails and app logs
You do not need to read Ruby code to help. Search for repeating errors near the start time. Common signals:
Exception,Timeout,NoMethodErrorActiveRecord,PG::(Postgres),Redis,Sidekiq
Save a sample error with timestamp. This is the “receipt” an expert will need.
Step 6: Verify key services
Most Rails apps depend on a few services. Check each one is healthy:
- Database: Postgres or MySQL is up and connections are not maxed
- Cache/Queue: Redis and Sidekiq are running and queues are not stuck
- Storage: S3 or similar is reachable
- Vendors: Payments, email, and other APIs show green on their status pages
If a vendor is down, note their incident link in your internal thread.
Step 7: Contact your host
Open a support ticket with a short summary:
- When the issue started
- The error sample from logs
- Any recent deploy or infra change
Ask them to confirm there is no platform incident, disk full issue, network block, or SSL problem. Keep the ticket updated.
Step 8: Reduce blast radius
If one feature is causing crashes, turn it off with a feature flag if you have one. Consider read-only mode to protect data while you investigate. If needed, put up a short maintenance page while you roll back.
Step 9: Call for rails emergency help
If you do not have in-house Rails help, bring in a specialist. Rails Fever handles production emergencies. We work with your host, read your logs, stabilize first, then find root cause. Have your error sample, deploy history, and vendor list ready. This saves time and cost.
Step 10: Do a fast postmortem
Keep it to one page:
- What happened in plain words
- Start and end time
- User and revenue impact
- Trigger and root cause
- What will change so it does not happen again
Share it with the team and close the loop.
Prevention: build your safety net
Crashes will happen. Your goal is to spot them early and make them small.
1) Monitoring and alerts
Use an uptime tool and a metrics dashboard that watches CPU, memory, queue size, error rate, and DB connections. Set alerts that wake a human in minutes.
2) Error tracking
Install an error tracker like Sentry, Honeybadger, or Rollbar. These tools group errors, tag releases, and show which users are hit. They turn “it is broken” into “this line, after this deploy.”
3) Centralized logs
Send Rails, Sidekiq, Nginx, and host logs to one place. Fast search during a fire saves hours.
4) Backups and restore drills
Backups matter only if you can restore them. Test a DB restore on staging every month.
5) Safer deploys
Ship small changes often. Add health checks and automatic rollback. Small changes are easier to unwind than big ones.
6) Capacity checks
Watch database size, disk usage, and connection limits. Many “crashes” are really resource exhaustion.
7) SSL and domain hygiene
Track certificate and domain renewals. Set calendar reminders 30 and 7 days before expiration.
8) Security and patching
Keep Rails and gems current. Do a monthly patch window. Outdated libraries cause errors and risk.
Need help with Rails maintenance? We offer comprehensive Rails Care Plans for ongoing support, technical audits to assess your current state, and Rails upgrades to keep you current. View our pricing plans to find the right fit for your needs.
Schedule a consultation or email hello@railsfever.com to discuss your Rails needs.