Over the last few days, Microsoft has suffered major outages to O365 cloud services. This has sent partners and email admins scrambling for information. How can an organization tell when its most critical services are down? Does it have to wait for the help desk calls to come in?
Communication services are critical to the functioning of an organization. As we like to say around here, an organization is a set of people and their conversations. O365 and Lync services are part of what brings an organization 21st century efficiency. Not having O365 service is like like sending the organization back to the stone age. Many white collar organizations, such as government organizations, just simply stop functioning completely when email is down. And when these outages are long there is a significant impact.
In tough times like we witnessed this week, what we really care about is precision rapidity of information. Microsoft is not up to par on that front. It took hours to get public information on Twitter. By that point, IT departments and partners were already struggling to figure things out and help desk phone lights were blinking. The challenge is that Microsoft requires email admins to log in to a portal to get information about outages. This is contrary to public expectations - Google, Amazon - both publish that information on their public pages. When I had a power outage a few weeks ago, I looked on their website with my mobile phone, and it was clearly written that there where 49 houses that were out of power. A transformer had blown up and I should expect power back within 2 hours. No login required into my account. If my power company can do this on hardware, truck rolls and all the complications of hardware repairs, IT companies should be able to do this if they expect to deliver dial tone level service.
Having the information up front allows for better communication and planning. Notifying the help desk, providing potential ETAs, reduces help desk calls. Typically, this type of information travels very fast through verbal channels, when the whole organization has nothing better to do than stand next to the coffee machine. Then it lets the IT people be proactive at finding remediating processes where necessary. They look like heroes trying to fix the inevitable instead of being caught with their pants down.
Netmail Monitor provides this information, so that email admins can be proactive. With mail flow monitor, you get direct information on sending and receiving of emails. With the direct connection into Exchange and O365, you get uptime monitoring. It is possible to get immediate information about Office365 up state. It is also possible to get information about how long the systems were down once the problem has been resolved. This way you can keep Microsoft straight on the up time commitments. The combination of a complete solution like Netmail that covers not only monitoring but also compliance, security and long-term storage makes O365 and Exchange 2013 shine.
This recent outage certainly raises a question about the reliability of very large systems and catastrophic event propagation. But more importantly, it raises questions about our processes to detect and react when downtime arrives. Having monitoring systems that provide the information ahead of time allows us to better react to such events.