How to maximize server uptime?

  
                  

Keep the server up and running, or at least get ready to run as soon as you need it. This goal may be one of the goals that all data center managers are most eager to achieve.

However, few data center managers can honestly say that everything they do is absolutely to maximize system uptime. Experts say that in fact many managers waste a lot of time and money on techniques and practices that have little or no positive impact on uptime.

Walter Beddoe, vice president of IT operations and logistics support at Six Telekurs, a US financial data service provider, believes that maximizing uptime is both a science and a management art. “You need to put together a lot of different things, including those who can do the job, use fault-tolerant hardware, adopt dynamic security, good maintenance and change management practices, etc. Most importantly, you must commit to do everything to the best of your ability. Well done."

Alan Howard, director of IT at Princeton Radiology, a diagnostic medical imaging company in Princeton, urges his subordinates not to waste time and resources on behaviors and tools that don't directly contribute to improving uptime. For example, the effort to carry out clustering is "quite wasted", and it is not as complete as redundant configuration with tools.

Clusters that cannot be automated—the synchronization needs to be done manually—may cause more problems, Howard said. "A master node can be catastrophic if it fails. It is better to let the standby node fail than to let the master node fail."

For example, his team made a The Windows Server cluster, used as a failover, caused the application to crash because a change in the application's configuration file could not be copied to the standby server in time. "The effort to fix an application crash is often more than the effort to fix a cluster node failure." After

, his team no longer has a traditional cluster server. Instead, they configured a “single standby server cluster”—and mapped the cluster to a dual-controller Compellent storage center SAN, “so we were able to migrate virtual machines on demand almost seamlessly.

Carefully Planning

Most data center managers agree that carefully planning all server-related work—from procurement to management to replacement—is a critical step in ensuring system reliability. .

Raoul Gabiam, IT operations and engineering manager at the University of Washington, says lifecycle management is an integral part of server uptime planning. “Knowing when, how to replace hardware, and upgrading software is very important because it affects system performance, continuity, and overall uptime.”

For example, if you must do it once Software upgrades, then understanding the need for hardware and the state of existing hardware is critical. You may have to buy hardware to meet the software upgrade needs to avoid more downtime, Gabiam explained.

Gabiam also strongly advocates standardization and coordination as a way to ensure reliable server operations. “Before anyone installs anything or makes a change, you must take a change management process.”

Change management is about understanding “how everything is configured and changing before implementing changes” Make an assessment," Gabiam said. "In this way, you can always understand what is not allowed and what may affect each other."

He said that obeying the discipline of change management, it is possible to foresee some kind of What happens when you configure the server or place it in a new environment.

Online Resources is a company that provides trading services to financial institutions. Its CTO, Paul Franko, believes that working attitudes can also have a huge impact. He said he has made an extra effort to ensure that regular but critical server-related work can be taken seriously and dealt with in a timely manner.

"We have proposed a system check and balance mechanism to ensure that our various rules can be followed," he said. According to Franko, managers must routinely check the management of their subordinates, and with the double check of other means, they can minimize human error. "It's a person who makes mistakes. If you don't set up multiple checkpoints, things will slip to the wrong side."

Promoting preventive maintenance

Regular preventive maintenance measures may be The easiest and least painful way to support a reliable server. "The uptime of the system can only be as long as the uptime of the weakest component in the entire system," Beddoe said. In the long run, the basic tasks of upgrading system software, providing conditional power, and ensuring proper cooling environment will enable the data center servers to operate without failure without breaking the budget. The mission team will deploy staff to fix the fault.

Paul Franko, CTO of Online Resources, believes that work attitude can play a huge role in server management. He said he did an extra effort to ensure that regular but critical server-related work can be taken seriously and dealt with in a timely manner.

Franko said that in order to ensure that all the work that must be done is carried out when needed, it should be determined which tasks are server maintenance work and organized into a clear schedule. "Some things must be performed immediately, such as security upgrades, while other tasks are performed in batches or at regular intervals." This second type of task includes upgrades to software non-critical functional improvements. .

Franko added that server maintenance should be handled this way: maintenance should not take up server uptime. "We can't let the system run slower for some maintenance work. We have to do this anyway."

If you have to remove a server for maintenance, Franko's team will This maintenance will be scheduled for midnight or weekends, when the user's demand is low. The only reason to remove a feature server during normal business hours is that you must install or perform a critical software upgrade, such as a zero-day security patch.

Automating Basic Server Management Tasks

Over the past few years, server management has become more complex, mostly due to the emergence of virtualization and related technologies; In order to increase the efficiency and usage of the server, various best practices must be designed.

Virtualization itself helps protect data centers from server downtime. Virtualization allows multiple virtual machines to run on different hosts by consolidating servers and interconnecting them in a shared environment. Failure of any one host will cause the workload to be redistributed among the remaining hosts. "A server may fail, but that doesn't mean it will affect the delivery of the entire service," Gabiam said.

In order to more effectively manage the expanding virtualized environment, companies such as Xenos Software, Uptime Software, Nimsoft, and Nagios Enterprise have launched issues designed to help data center staff focus on server performance and location. A tool that takes full advantage of performance improvement opportunities.

Beddoe feels that such a tool is essential. "You have to have something reassuring to make sure that all of your servers can do what they are supposed to do at any time."

Copyright © Windows knowledge All Rights Reserved