I considered for a moment whether to ascribe gender roles to the enterprise tech involved in the spat. I decided it might make this short story more entertaining. And let’s face it, people just love anthropomorphizing their technology. Forgive me for the clichéd gender stereotypes.
Meet the couple
The Barracuda stands guard and provides security so we’ll make him the dude (Arthur). The Exchange server is far more complex and does the bulk of the work. Definitely the chick (Anna). Anna and Arthur have been together for a little over a year. The relationship has been going well. But they work together, so that can be trouble.
The fight
It’s hard to know how this particular fight started. Anna and Arthur would tell completely different stories. Like most tiffs between young lovers it probably started over something silly. Anna forgot to lock the door in the morning or Arthur drunk-posted something stupid to facebook. I was made aware of the trouble by my boss. He informed me that people were reporting to him that they were not receiving emails they’d been expecting. As the tech relationship counselor of the office I sprung into action.
My preconceptions were incorrect
I immediately expected Arthur was to blame. He’d been acting erratically as of late. Translation: We have a 2-node Barracuda cluster and the first node has been flaky. It’s getting old and needs to be replaced. I have the replacement sitting in my office, but I need to plan a trip to the datacenter to install it. Given this knowledge I accessed the management interface of Barracuda 1 (B1) and right away saw something disturbing on the Status page. The status of the energize updates and instant replacement subscriptions had a red error code with a message to contact Barracuda support. Uh oh! It also had ~500 messages queued up for delivery to our Exchange server. I rebooted B1. While it was rebooting I accessed B2. It had no errors but it DID have ~500 messages queued up just like B1. So the errors on B1 are a red herring! Yes B1 is screwed up in general. But Arthur may not be to blame after all.
Back pressure
My boss had mentioned to me something about “insufficient resources” messages in the Barracuda logs. Indeed messages were being rejected by our Exchange CAS array with an SMTP code indicating this. I checked our two Exchange CAS / Hub Transport servers. I looked through Event Viewer on CAS1. All clean. CAS2 was a different story. I found two events which indicated Exchange was refusing to accept incoming SMTP messages. This was triggered by a feature called Back Pressure. Exchange 2010 tracks a bunch of system metrics to determine whether or not it is at risk of serious impairment. When it detects a dangerous state it backs off on its processing load. In this case CAS2 decided it was getting too low on disk space on the drive containing its log files. Never mind there were a few GB free. Exchange feels that’s not enough. So it stopped accepting incoming SMTP messages. The services keep running. It just writes 2 events to the Application log and sits there silently. Anna decided to give Arthur the cold shoulder. So mail was still being delivered by CAS1. But any mail which happened to hit CAS2 would be rejected. And the Barracudas would queue it up for a later redelivery attempt. Since these are virtual machines I simply expanded the disk and restarted the Exchange Transport service. CAS2 resumed service.
Not ready to make up
I checked the queues on the Barracudas. They were starting to go down. Then I witnessed them bump back up. Wth? Back to Event Viewer on the CAS boxes. Lo and behold they are both dropping connection attempts from both Barracudas. The reason is that the Barracudas are trying to establish more simultaneous connections than the Receive Connectors will allow. Argh! I didn’t find any obvious way to limit the SMTP sessions on the Barracudas. So I increased the maximum number of sessions allowed on the CAS servers. The default is 20. I changed it to 50, which seemed like a reasonable number to me. This got the couple communicating again.
Lessons learned
Keep plenty of free space on Exchange drives containing DBs or log files. OR tweak the back pressure disk space thresholds as described at the bottom of the page here. It involves some simple edits to the EdgeTransport.exe.config file. Microsoft doesn’t recommend it. But I don’t trust them anyway (see my previous post involving Network Load Balancing).
I’ll very soon be replacing Barracuda 1, which means Anna will be getting a new boyfriend. I really hope he’s not a jerk. But at least in this relationship there’s always Instant Replacement!