White Paper 1

“Increase Customer Satisfaction Save Money with a Smart Maintenance System for IT Infrastructure.”

You can read a full version of this document (in PDF format) by registering below.

Now loading…

Background
The Causes of “Frozen” Devices
Why Does Your Organization Need a Smart Maintenance System for Your Critical IT Infrastructure?
What is the Real Cost of Downtime?
How Does a Smart Maintenance System Work?
Is a Smart Maintenance System worth the Investment?
The Solution
About Meikyo Electric

1. Background

It is a common practice, whenever an IP enabled device fails or is “frozen”, the user has to manually power cycle or reboot the device to resume normal operations.
If you could proactively monitor the health of your IT network infrastructure, or do “Life & Death Monitoring”, you could minimize all of the issues caused by downtime, which we will discuss later. IT infrastructure would include infrastructure that supports IT including routers and switches, WiFi access points, surveillance cameras, and digital signage. With the expansion of IT infrastructure, more effective and efficient life and death monitoring will be required.

Life & Death monitoring can be either active monitoring or passive monitoring. Active monitoring is done by the monitoring side, actively. PING monitoring is the most famous type of active monitoring, which is done by periodically sending ICMP packets to the monitored deice to check the response. If there is no response for a period of time, it is determined that an abnormality has occurred. Passive monitoring can be done by using a WATCHDOG function. Basically, monitor packets are sent regularly from the monitored device, and if the packets do not arrive in a period of time, the user will be notified.

Now back to power cycling of “frozen” devices. Manual power cycling or rebooting “frozen” devices can be difficult, especially when devices are in remote locations. Remote can be in “hard-to-reach” locations within a facility or in other offices across the country or around the world that an IT technician can’t easily access. Organizations should consider devices which can do BOTH, Life & Death Monitoring and remote power management, ideally automatically.

Automatic power cycling could also be described as a Smart Maintenance System. The importance of such a system is often underestimated. If your IT infrastructure is small, the demand is less. However, as that IT infrastructure grows, your need for an automatic solution increases exponentially.

There are significant impacts to your business that arise when network devices go down. These could include lost business reputation, lost revenue from downtime, increased IT maintenance costs and a decrease in productivity due to an already overstretched IT support resources team.
Users who implement a strategy to deploy a Smart Maintenance System for IT Infrastructure can realize significant benefits. These benefits can include:

Increased customer satisfaction
Increased profits from a reduction of downtime
Decreased expenses from a reduction of IT maintenance costs
More efficient utilization of limited IT support resources

2. The Causes of “Frozen” Devices

When a device freezes, it is usually due to an issue with firmware, software or hardware. These issues can be caused by the following:

Lightning induced surges: Lightning strikes generate strong noise levels, causing devices to freeze. Surges may also occur from indirect lightning strikes and thunder.
Network overload: New products and services are introduced everyday that heavily tax network devices.
Unforeseen Network issues

Regardless of what causes device freezes, it is critical to get devices back online quickly, preferably automatically. It is vital to have a plan to prepare for these events in advance. Investment in a Smart Maintenance System for IT infrastructure will provide significant benefits to any organization.

Business Reputation

If your remote IT devices are unmanned and freeze, you risk impacting your reputation in the eyes of your customers and/or “users”. With social media and the internet, complaints are easy to find and rarely ever go away. Discover and resolve issues before your “users” do by investing in solutions with automatic device recovery capabilities.

Lost Revenue from Downtime

If your remote IT devices are unmanned and freeze, you risk losing revenue during that downtime. Every minute your IT infrastructure (i.e. digital signage or kiosk) isn’t operational, could significantly impact revenue from advertisers and / or “users”. Automatic device recovery will not only minimize downtime, but essentially eliminate it. While there still may be cases of downtime, the frequency and duration will be significantly reduced.

High IT Maintenance Costs

If your remote IT devices are unmanned and freeze, you will most likely have to “roll-a-truck” to send a maintenance crew to troubleshoot and reset any frozen devices. The time to get a crew on-site and to troubleshoot can be expensive. If you design a reliable system with smart devices, you can eliminate these unexpected and unplanned costs.

Limited IT Support Resources

In recent years, most companies and organizations have insufficient IT staff to meet their growing internal and external demands. In many cases, these essential resources to your business or organization are already overworked. It may require significant additional time to deploy IT staff to any remote sites. With the right strategy and investment in smart monitoring and management equipment, you can avoid having to send overworked IT staff to sites after-hours, or on holidays.

3. Why Does Your Organization Need a Smart Maintenance System for Your Critical IT Infrastructure?

Imagine the following scenario: A managed service provider (MSP) technician named Will, is playing catch with his son in the park on a Saturday afternoon. He gets a call from a client, Dave, whose digital signage system has gone offline. The system is critical to Dave’s business on Saturday evening. Unfortunately, both Will’s colleague and boss are on vacation. It is a two-hour drive to Dave’s site. As Will’s wife is busy, he needs to take his son to a friend, as he can’t bring his son to the customer’s site.

Will is in a difficult situation. Now imagine this occurrence happens every month for Will’s clients, and his boss is always having to send someone to client sites to troubleshoot and restore issues. If these emergency situations are covered by the service level agreement (SLA), then these expenses have to be absorbed by the MSP. If they are not part of the SLA, and the MSP is issuing invoices for these costs to clients like Dave, you are going to have an upset client or clients.

What is the true cost of downtime for an organization? Many organizations over simplify the cost of downtime, however it should include the following:

Employee productivity cost
Lost Revenue (or SLA penalties)
Recovery costs
Long-term impact

We will expand on this more later, however it should be known that the larger the organization the more dramatic the increase of these downtime costs.

Wouldn’t it be great if Will’s company had deployed a smart device which did both life & death monitoring, and had the ability to automatically recover the device, with no customer knowledge an event even happened? The Watchboot Rebooter is such a device.

It is a known and accepted fact in the industry that IT devices freeze. The factors causing frozen devices will always exist and cannot be eliminated.

Thus, one of the key metrics for an organization is to reduce their MTTR. Depending on the organization, there are different definitions of MTTR. Mean-Time-To-Repair is the time required to fix a piece of machinery or device. Mean-Time-To-Restore is a digital equivalent and is the time required to get an application back into production after a performance issue or downtime. Finally, there is Mean-Time-To-Resolution, which focuses on the broader issue. It addresses not only the time to fix a problem, but also the additional proactive steps which are designed to keep the problem from recurring.

Only once the underlying root causes are addressed and proactive steps are put into place, can the issue consider to be resolved. The ultimate goal is to minimize these metrics to as low of value as possible.

4. What is the Real Cost of Downtime?

If we assume we are measuring Mean-Time-To-Restore, let us begin to expand the key elements in the downtime cost calculation and use the previous scenario as an example.

…… You can read a full version of this document (in PDF format) by registering above. When you submit the form, an email with a link, ID and password will be sent to you so you can access the document.

Download White Paper