Issue Summary


Root Cause



  • adding additional safeguards to disable features not yet in service.
  • Increase hardening of the GFE testing stack to reduce the risk of having a latent bug in production binaries that may cause a task to restart.
  • Pursuing additional isolation between different shards of GFE pools in order to reduce the scope of failures.
  • Create a consolidated dashboard of all configuration changes for GFE pools, allowing engineers to more easily and quickly observe, correlate, and identify problematic changes to the system.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store