tech:

taffy

Google Explains Outage, Outlines Steps To Prevent Similar Fails

gmail_logo

Google services including Gmail, Google+, Calendar, and Documents were down for about 25 minutes on Friday. For some users, the outage lasted for over half an hour.

Google says the company is now correcting the bug that caused the outage, as well as putting more checks and monitors in place to ensure that this kind of problem doesn’t happen again.

Ben Treynor, vice president of Engineering with Google, explains what happened in a blog post.

At 10:55 a.m., an internal system that generates configurations—essentially, information that tells other systems how to behave—encountered a software bug and generated an incorrect configuration. The incorrect configuration was sent to live services over the next 15 minutes, caused users’ requests for their data to be ignored, and those services, in turn, generated errors.

Users began seeing these errors on affected services at 11:02 a.m., and at that time our internal monitoring alerted Google’s Site Reliability Team. Engineers were still debugging 12 minutes later when the same system, having automatically cleared the original error, generated a new correct configuration at 11:14 a.m. and began sending it; errors subsided rapidly starting at this time. By 11:30 a.m. the correct configuration was live everywhere and almost all users’ service was restored.

Steps Google is taking to prevent similar outages:

1. Correcting the bug in the configuration generator to prevent recurrence, and auditing all other critical configuration generation systems to ensure they do not contain a similar bug.

2. Adding additional input validation checks for configurations, so that a bad configuration generated in the future will not result in service disruption.

3. Adding additional targeted monitoring to more quickly detect and diagnose the cause of service failure.

[Image courtesy: Google]

Just in

Apple sued in a landmark iPhone monopoly lawsuit — CNN

The US Justice Department and more than a dozen states filed a blockbuster antitrust lawsuit against Apple on Thursday, accusing the giant company of illegally monopolizing the smartphone market, writes Brian Fung, Hannah Rabinowitz and Evan Perez.

Google is bringing satellite messaging to Android 15 — The Verge

Google’s second developer preview for Android 15 has arrived, bringing long-awaited support for satellite connectivity alongside several improvements to contactless payments, multi-language recognition, volume consistency, and interaction with PDFs via apps, writes Jess Weatherbed. 

Reddit CEO Steve Huffman is paid more than the heads of Meta, Pinterest, and Snap — combined — QZ

Reddit co-founder and CEO Steve Huffman has been blasted by Redditors and in media reports over his recently-revealed, super-sized pay package of $193 million in 2023, writes Laura Bratton. 

British AI pioneer Mustafa Suleyman joins Microsoft — BBC

Microsoft has announced British Artificial Intelligence pioneer Mustafa Suleyman will lead its newly-formed division, Microsoft AI, according to the BBC report. 

UnitedHealth Group has paid more than $2 billion to providers following cyberattack — CNBC

UnitedHealth Group said Monday that it’s paid out more than $2 billion to help health-care providers who have been affected by the cyberattack on subsidiary Change Healthcare, writes Ashley Capoot.