Cloudflare Bug caused massive memory leak-gets named ‘CloudBleed’
The internet has been taken to storm following the discovery of CloudBleed bug this February. It was found out that a large number of data was leaked from HTTP requests for websites using CloudFlare’s Content Delivery Network. These data included passwords & form information input by the user on Cloudflare enabled websites.
The CloudBleed bug reminds us of the HeartBleed bug, which allowed the stealing of passwords & information transferred using SSL due to a serious bug in the OpenSSL library.This compromises the secret keys used to identify the service providers and to encrypt the traffic, the names and passwords of the users and the actual content.
This was first noticed by Tavis Ormandy from Google’s Project Zero program which investigates bugs in programs, drivers & internet to find ZeroDay exploits. The matter was brought into notice of the CF team. He stated that the HTTP requests returned corrupt websites lacking some data.
It was discovered that a bug in the Cloudflare edge server configuration allowed them to skip the buffer, which usually signals the end of some code & run past it. Hence it was returning with private information such as POST data, cookies & HTTP headers which contained information which are sensitive.
The issue got worse when they got to know that these resources were cached by search engines like Google & Bing.Let’s go through the issue now.
How CloudFlare works:
Cloudflare is basically a CDN(Content Delivery Network which caches a website & send it to different servers(called edge servers) located at different parts of the world. This helps to speed up the loading of dynamic content because the visitor can load the website from their nearest edge server-thus reducing the number of HTTP requests & time.
To cache a webpage, the elements on the page should be parsed, the content of HTML is modified by Cloudflare as it inserts extra HTTP headers that can be understood by the edge servers.This headers include addon HTTPS headers for HTTP sites,Google Analytics tags,codes that function similar to robots.txt configuration.
It happens that the HTML parser used by Cloudflare written using Ragel was in use till last year. Since the developers at CF found it to be too complicated, they were replacing it with a new parser named CF_HTML. CF_HTML is reported to work better when rendering HTML5 webpages & also faster than old one(obviously).
This was first tried & tested for Automatic HTTPS Rewriting function which solves the mixed content errors on HTTPS sites(We suppose the CloudBleed bug was the one behind some random mixed content errors on our website, which at that time -prompted us to disable HTTPS).
All the other features of Cloudflare began slowly migrating to the new parser from the Ragel parser.But the activation of CF_HTML parser along with Ragel opened up the vulnerability.But the surprising thing was that the issue was present even before implementing CF_HTML,but no memory was leaked back then-possibly due to internal NGINX server configuration.
According to Cloudflare’s blog post, the Ragel code which is converted to equivalent C code had the following logic errors:
/* generated code */ if ( ++p == pe ) goto _test_eof;
“The root cause of the bug was that reaching the end of a buffer was checked using the equality operator and a pointer was able to step past the end of the buffer.
This is known as a buffer overrun. Had the check been done using >= instead of == jumping over the buffer end would have been caught.”
What happened following this incident:
The private key used for internal communication between Cloudflare edge servers was also leaked somehow & this key was used to maintain encryption between server communications. Three CF features – Email obfuscation, Server side excludes & HTTPS rewriting was disabled globally to prevent any chances of further memory leakage.
The information leaked includes HTTP Headers, POST data used for writing information into SQL databases(which possibly contains passwords & form information),JSON API calls, URI parameters, cookies & OAuth token used for Social Login services.
The worse thing was that this memory cache was caught by search engines while crawling webpages & with the help of Google & Bing, the affected website’s caches were purged.It was found that almost 770 URI parameters containing sensitive data was cached in search results & were promptly removed.
After investigating the flow of traffic through edge servers after the fix was implemented, the three disabled features were re-enabled.