On May 15, 2023, between 02:36 and 04:08 UTC, Atlassian customers using Bitbucket, Confluence, Jira Align, Jira Service Management, Jira Software, Jira Work Management, Jira Product Discovery, and Atlas products with services hosted in the us-west-1 region were impacted by an incident related to the storing and retrieval of data assets, including media, attachments and build artifacts.
The event was triggered by a network migration of an internal service as part of an initiative to increase security by hardening partitions between network segments. The incident was detected within three minutes by automated monitoring and mitigated by a rollback of the change which put Atlassian systems into a known good state. The total time to resolution was about one hour and 32 minutes.
The impact across products was:
The service disruption lasted for one hour and 32 minutes between May 15, 2023, 02:36 and May 15, 2023, 04:08 UTC and caused service disruption to customers with services hosted in the us-west-1 region.
The issue was caused by an attempted migration of a service to a new network segment. As part of this migration, a DNS record pointing to the old network segment was not updated, which resulted in failure when the old network stack was removed. While we have a number of testing and preventative processes in place, this specific issue wasn’t identified as moving services across network segments is not a regular activity and is difficult to accurately replicate in a test environment. To mitigate against these types of issues, we made this change using blue/green deployment practices but failed to run adequate verification steps before decommissioning the old stack.
We are prioritizing the following improvement actions to avoid repeating this type of incident:
We apologize to customers whose services were impacted during this incident; we are taking immediate steps to improve the platform’s performance and availability.
Thanks,
Atlassian Customer Support