Incident #004

Generated by MRX Admin on 29 May 2025 14:02. All timestamps are local to Europe/Budapest

Key Information

Timestamps

  • Incident Type: Default

  • Severity: Minor

  • No custom fields have been set for this incident.

  • Declared at: 29 May 2025 13:34

  • Resolved at: 29 May 2025 14:02

  • Identified at: 29 May 2025 13:57

  • Incident duration: 28 minutes

Team

  • Incident Lead: MRX Admin

  • Reporter: MRX Admin

  • Active participants: MRX Admin

  • No related incidents have been set for this incident.

Summary

Our website is gonna be unavailable the next 15 minutes. The NAS server get relocated soon.

Incident Timeline

Time
Event

2025-05-29

13:34:13

Incident reported by MRX Admin

MRX Admin reported the incidentSeverity: MajorStatus: Investigating

13:57:38

Status changed to Monitoring and severity downgraded to Minor

MRX Admin shared an updateSeverity: Major → MinorStatus: Investigating → MonitoringThe system is booting up, relocaation complete.

14:02:39

Incident resolved and entered the post-incident flow

MRX Admin shared an updateStatus: Monitoring → DocumentingEverything went back normal. System rebooted.

Contributors

Outline any factors that played a role in this incident happening, or it being as bad as it was.

This could be technical (e.g., “the server’s disk filled up”), human (e.g., “Sara missed her first page”), or external (e.g., “this coincided with a marketing email being sent to our customers”).

Cover as many of the factors as you can without over-focusing on one “root” cause.

e.g. A recent deployment lead to the authentication service failing to start. This lead to users being unable to log in.

Mitigators

Outline any factors that reduced the incident's impact or prevented it from being worse than it was.

This might include external factors (e.g., “it was lucky it happened during work hours”), effective technical controls (e.g., ”our alerting caught this quickly” or ”our auto-rollback worked as expected”), or having the right person on call.

Highlighting these elements helps identify what's working well and what's worth reinforcing or scaling.

e.g. We recently deployed some alerting changes that meant we were paged within 30 seconds of failed deployment landing in production.

Learnings and risks

Capture anything we learned as result of responding to or investigating this incident, and any risks that were revealed or highlighted.

Think about how to improve response next time, and consider any patterns pointing to broader issues, like “key person risk.”

e.g. We don’t have a reliable way to find and surface the right runbooks for incidents like this. There’s a risk the wrong actions are taken or we miss things that would otherwise help to resolve them more quickly.

Follow-ups

View follow-ups in dashboard

Last updated