KEN'S KORNER - Data Center Environmental Monitoring
Selecting a Monitoring System
Monitoring can range from electrical power equipment supporting critical load such as UPS
systems, diesel generators, PDU/RPP’s down to individual branch circuits within power panels.
When evaluating a monitoring system consider what type of graphics are desired and available. Pictorial diagrams displaying individual monitored components are helpful, especially ones that change color or light up in the event of an alarm or system failure.
Will your monitoring systems need to alert staff only within the building or will it require the capability to send messages to remote workstations or smart phones?
Be sure you purchase a monitoring system that is able to accept all the various alarms from all your critical systems. A scalable monitoring system is especially helpful as a data center grows and new equipment is brought online. Your choice of monitoring system will impact the necessity of replacement parts. Are they easily accessible, is it necessary to keep spare parts onsite for quick replacement in the event of component failure?
A monitoring system that time stamps readings; temperatures, power to cabinets and humidity can help troubleshoot problems. Time stamping readings helps to eliminate a lot of finger pointing and helps to fix problems a lot faster.
Select a reputable company that has a long track record of providing successful monitoring solutions. It is important that they be around for future service, parts and support. Select a vendor that can provide quick response in the event of a system problem, and is willing to provide onsite training on the system.
Water leakage from chillers and cooling systems can cause damage to electronic equipment. The data center should have drains under the floor in the event of water leaks. These drains should have liquid detector sensors that report to a central monitoring station.
Water sensors should be placed inside and under cooling equipment (condensation trays), near potential leaks that can come from nearby pipes or areas that could cause problems. These sensors should be placed at the lowest points under the floor near any drains. Air conditioning condensation trays should also be equipped with sensors to detect overflow. The monitoring system should tell you the exact area of the leak based on which water detectors under the A/C units cooling the raised floor, triggered the alarm.
Temperature & Humidity
Ambient monitoring of temperature and humidity in the data center along with inlet temperatures to the servers should be monitored, as well as individual A/C units, pumps and cooling towers. A range should be set and if either temperature or humidity is measured outside this window an alarm should trigger at a monitoring station.
As the temperature increases, computer equipment begins to experience performance problems and unaddressed will eventually lead to shut down, failure and equipment damage.
Know your average or typical temperatures throughout the data center. Know where the hot spots are and monitor them, if not addressed these will be the first areas to experience equipment shut down. There are monitoring sensors you can install, even handheld thermometers can be placed in problem areas and monitored during regular walk-through.
Too high a humidity and you run the risk of condensation and water collection leading to corrosion on electrical components creating possible shorts in sensitive electrical equipment resulting in equipment damage and failure. Too low of humidity and there is an increased risk of electro-static discharge leading that can affect the equipment operation.
If you are experiencing a problem with an individual server cabinet, a remote wireless sensor can be purchased to monitor and track this specific area. You can bring the monitoring device back to a work station and download and graph the temperature over time showing the changes.
Video surveillance along with doors accessing the server room should have some form of card reading access to record who and when someone is entering different areas of the data center. If you experienced a problem in an area, you can check records to see if any activities preceded the problems.
Identifying smoke allows for fire suppression before it can spread and cause direct damage to equipment and a facility. The byproducts of some smoke from electronic equipment are gasses that cause corrosion of IT equipment.
Using early warning smoke detection systems allow a data center to investigate and understand possible fire treats, respond to nuisance alarms prior to notifying authorities and transfer data and processes to redundant systems.
Install smoke detectors both above and below the raised floor integrated with a fire suppression system. The smoke detectors should send an alarm to an area which is occupied 24x7 like a guard station or master control room.
Airflow and Cooling Systems
Monitoring of airflow and cooling systems should be viewed as an early warning system for high temperatures in a data center. Higher temps or decreased airflow indicate cooling problems if not addressed can lead to equipment failure.
Early temp monitoring will also help prevent server failures due to lack of adequate cooling and extend the life of expensive equipment.
Consider monitoring the incoming utility power to the entire site. This is especially helpful when calculating power usage effectiveness (PUE). All critical components such as UPS systems and diesel generators and diesel fuel levels in storage tanks should be monitored.
Monitoring utility power feeds into the system will tell you is you are experiencing any power anomalies like sags and spikes which can foretell a future power event. Monitoring and being prepared and reacting accordingly can be the difference in maintaining uptime or experiencing an outage.
Monitor power entering the data center, and schedule cutover to backup power sources if anomalies exceed predetermined thresholds. Be a weather watcher. If storms are heading your way, move to backup generators and get off the grid. That way you’re already sure the generators will start and run and not get caught flat footed if power outages are experienced from the utility provider.
UPS Battery Rooms
Monitor the hydrogen levels and temperature in the battery backup rooms. Also the eyewash and deluge showers should have flow monitors to alert the monitoring station that a discharge has occurred. The battery room’s exhaust fan should also have an airflow monitor to alert data center staff when the fan isn’t running, leading to the buildup of hydrogen gas.
The battery room’s hydrogen gas monitor should interphase with the UPS battery breakers so if high levels of hydrogen gas is sensed they will open up the breakers. You don’t want the UPS system to charge batteries in the event of a hydrogen gas buildup.
UPS batteries should have a battery monitoring system that not only monitors each string of batteries as a whole but also each individual battery.
Utilizing a Monitoring System
Create a designated monitoring station or control room with a monitoring system that has the ability to have all monitoring devices report to it. For safety and redundancy, duplicate the monitoring system to a second area, possibly a guard station that is staffed 24x7 as a backup plan.
Each team member should be thoroughly trained on all critical equipment and understand what each alarm means and how to react accordingly.
A team member should be assigned to walk through all critical areas each shift, with a list of systems to be visually checked and verified. Before and after each walk through verify the readings of the monitoring systems in the monitoring station or control room.
Prior to each shift hold a meeting to inform team members of any changes that have been made in the data center; such as added loads, equipment failure, monitor alerts or maintenance vendors having visited or to visit the site.
When making changes, like raising temperatures, make sure the facility team members monitor the situation closely and are prepared to react if necessary. Also it is always a good practice to inform IT when making changes.