High availability configurations
This topic provides information on high availability implementations, including disaster recovery scenarios.
The preferred approach for high availability (HA) in enterprise environments is to connect agents to a single server endpoint that is part of cluster with a load balancer.
Note: In certain circumstances, for example, when high availability is required but enterprise-level support is not, you may take the alternative approach. For details, see Configure agents for high availability (alternate).
To configure agents to connect to a single server endpoint:
Install the Deployment Automation server application on n servers for the active-active feature. See Install for a high availability implementation.
These n servers are used as a cluster with a load balancer, so make sure you have set the first server's External Agent URL and External User URL system settings to point to the IP address or the DNS name and port of the load balancer cluster.
To make it possible for agents to communicate with each of the server nodes within the cluster, during the agent installation, the agent must point to the load balancer server. In this scenario, the load balancer will choose which of the server nodes communicates with the agent.
To ensure each agent communicates properly through the load balancer server, do the following when installing the agents:
- In the agent installer Server Details in Hostname or address, use the IP address or the DNS name of the load balancer server.
- In the agent installer Server Details in Agent Communication Port, use the port number of the load balancer server.
Another approach for maintaining high availability is to connect agents to a series of endpoints.
However, enterprise-level scalability is not supported with this approach, and it is difficult to maintain with a full list of all servers. Use this approach with caution. We recommend that you take the preferred approach for high availability in enterprise environments. For details, see Configure agents for high availability (recommended).
The following steps describe how to connect agents to a series of server endpoints with which, in the case of a server failure, the agents will attempt to communicate.
For the purpose of this example, the following procedure assumes a
two-node cluster. Two example cluster nodes have the following IP addresses:
ip_node_2 (or you can use DNS names).
To configure agents to connect to a series of endpoints:
Assuming that during installation, an agent was set to communicate
with the server at
ip_node_1, to enable the agent to also communicate
with the server on the cluster node
ip_node_2, you should configure the two servers with a
network relay as follows:
- Install the Deployment Automation server application on two servers for the active-active feature. See Install the first server. These two servers are not used as a cluster with a load balancer.
In the Deployment Automation user interface, navigate to Management > Resources .
In the selection box, select Relays.
Click the Relay Actions button and then select Create Network Relay.
In the Create Network Relay dialog, enter the following details:
Field Description Name Enter a name for the second server. Host Enter
ip_node2. This is the remote hostname or IP address the server will use to connect to the network relay.
JMS Port Enter the JMS Port number that the server will use to communicate with the network relay over JMS protocol. Active Select this option to tell the server to start the connection to the network relay.
Repeat the process for each cluster node.
Configure agent relay failover
To configure agent relay failover, specify two or more target servers for the agent relay to connect to. If a server fails the agent relay switches to another server from the list.
When the agent relay starts, it connects to the source server defined
agentrelay.jms_proxy.server_host parameter. If that
server fails, the agent relay selects a failover server from the list. The
agent relay continues to use the server until it fails, even if the previous or
the source server becomes available again.
Example: Two failover servers:
Separate each server definition with a comma.
Note: If a large number of agents are configured to use failover, reconnecting can take a while.
To configure agent relay failover:
Open the agent relay properties file: \conf\agentrelay.properties
agentrelay.jms_proxy.failover_hosts_with_portsparameter, enter a list of failover server locations in this format:
<IP address or hostname>:JMS_port
Disaster recovery with hot standby
For disaster recovery where high availability is required, you can use a hot standby strategy to ensure server availability in case of system failure.
In this strategy, you maintain two servers and backup the file store and database as part of your regularly scheduled system backup procedures. This enables you to immediately switch to the hot standby system should the primary system fail.
A hot standby system configuration is displayed in the following figure.
Disaster recovery with a hot standby system
Disaster recovery with cold standby
For disaster recovery where high availability is not required, you can use a cold standby strategy to ensure server availability in case of system failure.
When the primary system fails, the cold standby is brought online and promoted to the primary server. Once online, the standby reestablishes connections with all agents, performs recovery, and proceeds with any queued processes.
Because the most intense work is handed off to agents, a high performance configuration should not have an agent installed on the same hardware as the main server.
When using the cold standby data center configuration, you typically configure the data tier with network storage and a clustered database. The service tier performs best when it's on a dedicated, stable, multi-core machine with a fast connection to the data tier. Maintain a standby machine and keep it ready in case the primary server goes down.
A typical cold standby data center configuration is displayed in the following figure.
Disaster Recovery with a Cold Standby System