Microsoft has confirmed that the recent outage that struck a number of its cloud-based services came as a result of a DNS DDoS attack. The outage, which lasted for roughly two hours, was triggered by an “anomalous surge” in DNS queries that came from all over the world and were targeting a set of Azure-hosted domains.
Late last week, Microsoft’s users were unable to access a whole slew of cloud-based services, such as Xbox Live, Microsoft Office, SharePoint Online, Microsoft Intune, Dynamics 365, Microsoft Teams, Skype, Exchange Online, OneDrive, Yammer, Power BI, Power Apps, OneNote, Microsoft Managed Desktop, and Microsoft Streams.
The company isn’t pointing any fingers. The media, however, are saying that the outage uncovered major flaws in Microsoft’s modus operandi. As per MSPoweruser, “even a concerted DDoS attack” shouldn’t be able to take Azure down, but the company erred when implementing DNS Edge caches.
“Azure DNS servers experienced an anomalous surge in DNS queries from across the globe targeting a set of domains hosted on Azure. Normally, Azure’s layers of caches and traffic shaping would mitigate this surge. In this incident, one specific sequence of events exposed a code defect in our DNS service that reduced the efficiency of our DNS Edge caches,” Microsoft explained.
With an overload on DNS services, clients started retrying requests frequently, only exacerbating the problem, the company said. These tries, however, are legitimate and were not dropped by the volumetric spike mitigation system. “This increase in traffic led to decreased availability of our DNS service.”
Fixing the issues
After the mandatory apology for the inconvenience caused, the company said it repaired the problem, adding that DNS caches shouldn’t have problems handling traffic spikes anymore.
It also said it will improve how it monitors and mitigates anomalies in traffic, without detailing what it plans on doing at this time.