Cloudflare: the incident that caused DNS resolution issues
Cloudflare's implications and solutions for the DNS resolution incident
DNS service provider Cloudflare experienced a DNS resolution error that caused internet access issues for many users. The error was caused by an internal software error at Cloudflare and not by an external attack. Cloudflare is working to prevent future errors and apologizes for the incident.
Contribute to spreading the culture of prevention!
Support our cause with a small donation by helping us raise awareness among users and companies about cyber threats and defense solutions.
During the morning of October 4, 2023, DNS service provider Cloudflare experienced a DNS resolution error that affected internet access for many users. The error, which lasted from 07:00 to 11:00 UTC, caused error responses (SERVFAIL) for some valid DNS queries sent via the 18.104.22.168 IP address or through products such as WARP, Zero Trust, or DNS resolvers third parties using 22.214.171.124. This incident was caused by an internal Cloudflare software error and not an external attack.
Background on the DNS system
To understand the extent of the incident, it is important to understand how the Domain Name System (DNS) works. Each domain exists within a DNS zone, which is a collection of jointly controlled domain and host names. For example, Cloudflare is responsible for the domain cloudflare.com, which is considered part of the "cloudflare.com" zone. Above the various domain zones, there is the root zone, which contains information on how to reach the individual domain zones. The root zone is critical for resolving all other domain names. To ensure the integrity and authenticity of the information contained in the root zone, it is signed with DNSSEC, a digital signature system for DNS.
The cause of the Cloudflare error
The cause of the Cloudflare incident lies in a planned change in root zone management that introduced a new record type called ZONEMD. The issue occurred due to an error in the ZONEMD record parser by Cloudflare DNS resolvers. This caused Cloudflare resolvers to not use new root zone versions. When DNSSEC signatures from the September 21 root zone release reached their expiration date on October 4, Cloudflare resolvers failed to validate DNSSEC signatures and began returning error (SERVFAIL) responses to users. The impact of the error was not evenly distributed but was concentrated in some of Cloudflare's largest data centers.
Prevention and improvement measures
Cloudflare has taken the incident very seriously and is already working to prevent future problems. Some of the actions taken include improved visibility into the state of the root zone, more secure internal redeployment of the root zone, improved testing, and a more resilient architecture that uses obsolete copies of the root zone only for a limited period of time. Cloudflare's goal is to ensure maximum availability of its services and to ensure that users are no longer affected by similar errors. Finally, the company apologizes for the incident and takes the trust of its customers and end users very seriously.Follow us on Facebook for more pills like this