The other day I met with a customer moving to O365. They really want to get rid of the email problem. I mean really get rid of the problem. Once and for all. No emails on premise anymore. Put everything in the cloud. Really everything and when I say everything, I mean 10TB of everything. Yep, 10TB. They need tokeep everything for legal retention. Their current email system (Exchange?), holds 5TB of email from the last 20 years and they have another 5TB of personal archives that reside on computers across the network.
They want to move all of the data to the cloud. None of the Microsoft services plans actually let them do that but they figured out a trick to do it... I’ll tell you about the trick at the end of the post, but before that, I wanted to give you my thoughts about their idea. There are a few problems with it.
First, 10TB Single Instanced data is a lot of data, even with serious bandwidth like 100Mbit/s to the internet. If it was possible to use that bandwidth at 100% efficiency, it would take 27 days of continuous data transfers. But what about the other applications that need to run over these pipes and should be given priority? I also wonder if there are limitations onthe path between the customer and O365? What can the routers and the firewalls sustain in terms of bandwidth?What is the overhead of the WebAccess APIs?
But my biggest concern is the Microsoft O365 throttling. Since Wave 15, Microsoft has really tightened the screw. Rightfully so, O365 is a shared infrastructure and they don’t want one customer’s application dragging down the whole system. Here are some examples that may impact migrations:
You are limited to a certain number of concurrent connections per user (EWSMaxConcurrency). This limits the amounts of threads you can run pushing data.
- Timeouts will happen on search, when the search is too long and the system will only return a maximum number of results. (EWSFastSearchTimeoutInSeconds & EWSFindCountLimit) We care about this when we validate injections. We do not want missing or duplicate items.
- There is a throttling limit on how much data you can push in a mailbox per admin as you approach a 3Gig limit.
In the past, you could find out through PowerShell using Get-ThrottlingPolicy what your limitations are but this is no longer possible.
We probably will be able to get Microsoft to relax some of these policies for this migration. And we have our own tricks in the software, so you could scale the number of servers. Our experience shows that when we get about 1GB per hour, per server we are very happy. That brings us to 400 days/server of work. We will find out in the next few weeks what we can really get in a high performance environment. I’ll let you know. In the meantime, the question is, what else could we do if we really want to get rid of email and put it in the cloud? Can we not keep a low cost system behind where we would store some of that data?
We could setup a sort of museum for old email. A simple email archiving system would do the work. We would want it to store the data in a neutral format that does not require maintenance (i.e. not a database), XML would do the trick. And we would want simple access over a web application (i.e. no desktop software to deploy). Finally, we would want low cost storage that is self-healing and low maintenance like an object store (i.e. not a RAID file system you need to backup). Then we would pick an arbitrary date in the past that is far enough, maybe 1 or 2 years. We would move these two years of email to O365 and keep the rest on-site in the archiving solution. Voila !! The risk of the migration is greatly reduced, the man-hour costs also reduced and data is kept at a low cost. The key here is low cost. The local archiving system has to be simple enough to not be a burden.
There is another potential solution to this problem. The 10TB is mostly email attachements in a proportion of at least 80% if not 95%. Maybe we could keep the attachments local and only push the email in O365. This would take the 10TB to roughly a manageable 1TB. The attachments would be accessible through an http link and stored in object storage as described above. This would be easy to use for the end users and allow us (?) to make a low cost, high velocity migration--what we call a week-end cut-over--whichare the least disruptive to the organization. (That will be worth a separate blog entry.) The advantage of removing attachments is clearly simplicity. At the end of the process, the only tool left behind is the object store. That is as simple as it gets.
The general idea is that there is little value in working very hard to move all that data to the cloud. Users barely access any emails that are older than 1 year. People still need access to these emails… but only once in a blue moon. By providing local storage for these emails at very low cost, and allowing end users to access them, we can significantly reduce the cost, duration and risk of the migration. The customer mentioned in this example has a hard deadline for their migration. I have a feeling that somewhere along the way, we will get to use one of the above techniques to meet the timelines. There is no question if the migration will be successful, it will. But time will tell what choice is made… longer migration or keeping some data onsite. Now that you have read to the end, here is how you can keep all that data in O365 and get around the mailbox limitations - use legal hold on E3 licenses. Microsoft does not set limits on the size of mailboxes that are put on legal hold… for now.