blob: 6f1a350441c7130afee7d48d21d042c00744e8bd [file] [log] [blame]
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -08001package net.floodlightcontroller.core.internal;
2
3import java.io.IOException;
4import java.util.ArrayList;
5import java.util.Collection;
6import java.util.Iterator;
7import java.util.concurrent.DelayQueue;
8import java.util.concurrent.Delayed;
9import java.util.concurrent.TimeUnit;
10
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080011import net.floodlightcontroller.core.IFloodlightProviderService.Role;
12import net.floodlightcontroller.core.IOFSwitch;
13import net.floodlightcontroller.core.annotations.LogMessageDoc;
14
Jonathan Harta99ec672014-04-03 11:30:34 -070015import org.slf4j.Logger;
16import org.slf4j.LoggerFactory;
17
Ray Milkey269ffb92014-04-03 14:43:30 -070018/**
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080019 * This class handles sending of RoleRequest messages to all connected switches.
Ray Milkey269ffb92014-04-03 14:43:30 -070020 * <p/>
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080021 * Handling Role Requests is tricky. Roles are hard state on the switch and
22 * we can't query it so we need to make sure that we have consistent states
Ray Milkey269ffb92014-04-03 14:43:30 -070023 * on the switches. Whenever we send a role request to the set of connected
24 * switches we need to make sure that we've sent the request to all of them
25 * before we process the next change request. If a new switch connects, we
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080026 * need to send it the current role and need to make sure that the current
27 * role doesn't change while we are doing it. We achieve this by synchronizing
28 * all these actions on Controller.roleChanger
Ray Milkey269ffb92014-04-03 14:43:30 -070029 * On the receive side: we need to make sure that we receive a reply for each
30 * request we send and that the reply is consistent with the request we sent.
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080031 * We'd also like to send the role request to the switch asynchronously in a
32 * separate thread so we don't block the REST API or other callers.
Ray Milkey269ffb92014-04-03 14:43:30 -070033 * <p/>
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080034 * There are potential ways to relax these synchronization requirements:
35 * - "Generation ID" for each role request. However, this would be most useful
Ray Milkey269ffb92014-04-03 14:43:30 -070036 * if it were global for the whole cluster
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080037 * - Regularly resend the controller's current role. Don't know whether this
Ray Milkey269ffb92014-04-03 14:43:30 -070038 * might have adverse effects on the switch.
39 * <p/>
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080040 * Caveats:
Ray Milkey269ffb92014-04-03 14:43:30 -070041 * - No way to know if another controller (not in our controller cluster)
42 * sends MASTER requests to connected switches. Then we would drop to
43 * slave role without knowing it. Could regularly resend the current role.
44 * Ideally the switch would notify us if it demoted us. What happens if
45 * the other controller also regularly resends the same role request?
46 * Or if the health check determines that
47 * a controller is dead but the controller is still talking to switches (maybe
48 * just its health check failed) and resending the master role request....
49 * We could try to detect if a switch demoted us to slave even if we think
50 * we are master (error messages on packet outs, e.g., when sending LLDPs)
51 * <p/>
52 * <p/>
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080053 * The general model of Role Request handling is as follows:
Ray Milkey269ffb92014-04-03 14:43:30 -070054 * <p/>
55 * - All role request messages are handled by this class. Class Controller
56 * submits a role change request and the request gets queued. submitRequest
57 * takes a Collection of switches to which to send the request. We make a copy
58 * of this list.
59 * - A thread takes these change requests from the queue and sends them to
60 * all the switches (using our copy of the switch list).
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080061 * - The OFSwitchImpl sends the request over the wire and puts the request
Ray Milkey269ffb92014-04-03 14:43:30 -070062 * into a queue of pending request (storing xid and role). We start a timeout
63 * to make sure we eventually receive a reply from the switch. We use a single
64 * timeout for each request submitted using submitRequest()
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080065 * - After the timeout triggers we go over the list of switches again and
Ray Milkey269ffb92014-04-03 14:43:30 -070066 * check that a response has been received (by checking the head of the
67 * OFSwitchImpl's queue of pending requests)
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080068 * - We handle requests and timeouts in the same thread. We use a priority queue
Ray Milkey269ffb92014-04-03 14:43:30 -070069 * to schedule them so we are guaranteed that they are processed in
70 * the same order as they are submitted. If a request times out we drop
71 * the connection to this switch.
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080072 * - Since we decouple submission of role change requests and actually sending
Ray Milkey269ffb92014-04-03 14:43:30 -070073 * them we cannot check a received role reply against the controller's current
74 * role because the controller's current role could have changed again.
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080075 * - Receiving Role Reply messages is handled by OFChannelHandler and
Ray Milkey269ffb92014-04-03 14:43:30 -070076 * OFSwitchImpl directly. The OFSwitchImpl checks if the received request
77 * is as expected (xid and role match the head of the pending queue in
78 * OFSwitchImpl). If so
79 * the switch updates its role. Otherwise the connection is dropped. If this
80 * is the first reply, the SWITCH_SUPPORTS_NX_ROLE attribute is set.
81 * Next, we call addSwitch(), removeSwitch() to update the list of active
82 * switches if appropriate.
83 * - If we receive an Error indicating that roles are not supported by the
84 * switch, we set the SWITCH_SUPPORTS_NX_ROLE to false. We keep the
85 * switch connection alive while in MASTER and EQUAL role.
86 * (TODO: is this the right behavior for EQUAL??). If the role changes to
87 * SLAVE the switch connection is dropped (remember: only if the switch
88 * doesn't support role requests)
89 * The expected behavior is that the switch will probably try to reconnect
90 * repeatedly (with some sort of exponential backoff), but after a while
91 * will give-up and move on to the next controller-IP configured on the
92 * switch. This is the serial failover mechanism from OpenFlow spec v1.0.
93 * <p/>
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -080094 * New switch connection:
95 * - Switch handshake is done without sending any role request messages.
96 * - After handshake completes, switch is added to the list of connected switches
Ray Milkey269ffb92014-04-03 14:43:30 -070097 * and we send the first role request message if role
98 * requests are enabled. If roles are disabled automatically promote switch to
99 * active switch list and clear FlowTable.
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800100 * - When we receive the first reply we proceed as above. In addition, if
Ray Milkey269ffb92014-04-03 14:43:30 -0700101 * the role request is for MASTER we wipe the flow table. We do not wipe
102 * the flow table if the switch connected while role supported was disabled
103 * on the controller.
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800104 */
105public class RoleChanger {
106 // FIXME: Upon closer inspection DelayQueue seems to be somewhat broken.
107 // We are required to implement a compareTo based on getDelay() and
108 // getDelay() must return the remaining delay, thus it needs to use the
109 // current time. So x1.compareTo(x1) can never return 0 as some time
110 // will have passed between evaluating both getDelays(). This is even worse
111 // if the thread happens to be preempted between calling the getDelay()
112 // For the time being we enforce a small delay between subsequent
113 // role request messages and hope that's long enough to not screw up
114 // ordering. In the long run we might want to use two threads and two queues
115 // (one for requests, one for timeouts)
116 // Sigh.
117 protected DelayQueue<RoleChangeTask> pendingTasks;
118 protected long lastSubmitTime;
119 protected Thread workerThread;
120 protected long timeout;
Ray Milkey269ffb92014-04-03 14:43:30 -0700121 protected static long DEFAULT_TIMEOUT = 15L * 1000 * 1000 * 1000L; // 15s
Yuta HIGUCHI6ac8d182013-10-22 15:24:56 -0700122 protected final static Logger log = LoggerFactory.getLogger(RoleChanger.class);
Ray Milkey269ffb92014-04-03 14:43:30 -0700123
124 /**
125 * A queued task to be handled by the Role changer thread.
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800126 */
127 protected static class RoleChangeTask implements Delayed {
Ray Milkey269ffb92014-04-03 14:43:30 -0700128 protected enum Type {
129 /**
130 * This is a request. Dispatch the role update to switches
131 */
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800132 REQUEST,
Ray Milkey269ffb92014-04-03 14:43:30 -0700133 /**
134 * This is a timeout task. Check if all switches have
135 * correctly replied to the previously dispatched role request
136 */
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800137 TIMEOUT
138 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700139
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800140 // The set of switches to work on
141 public Collection<OFSwitchImpl> switches;
142 public Role role;
143 public Type type;
144 // the time when the task should run as nanoTime()
145 public long deadline;
Ray Milkey269ffb92014-04-03 14:43:30 -0700146
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800147 public RoleChangeTask(Collection<OFSwitchImpl> switches, Role role, long deadline) {
148 this.switches = switches;
149 this.role = role;
150 this.type = Type.REQUEST;
151 this.deadline = deadline;
152 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700153
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800154 @Override
155 public int compareTo(Delayed o) {
156 Long timeRemaining = getDelay(TimeUnit.NANOSECONDS);
157 return timeRemaining.compareTo(o.getDelay(TimeUnit.NANOSECONDS));
158 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700159
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800160 @Override
161 public long getDelay(TimeUnit tu) {
162 long timeRemaining = deadline - System.nanoTime();
163 return tu.convert(timeRemaining, TimeUnit.NANOSECONDS);
164 }
165 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700166
167 @LogMessageDoc(level = "ERROR",
168 message = "RoleRequestWorker task had an uncaught exception.",
169 explanation = "An unknown occured while processing an HA " +
170 "role change event.",
171 recommendation = LogMessageDoc.GENERIC_ACTION)
172 protected class RoleRequestWorker extends Thread {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800173 @Override
174 public void run() {
175 RoleChangeTask t;
176 boolean interrupted = false;
177 log.trace("RoleRequestWorker thread started");
178 try {
179 while (true) {
180 try {
181 t = pendingTasks.take();
182 } catch (InterruptedException e) {
183 // see http://www.ibm.com/developerworks/java/library/j-jtp05236/index.html
184 interrupted = true;
185 continue;
186 }
187 if (t.type == RoleChangeTask.Type.REQUEST) {
188 sendRoleRequest(t.switches, t.role, t.deadline);
189 // Queue the timeout
190 t.type = RoleChangeTask.Type.TIMEOUT;
191 t.deadline += timeout;
192 pendingTasks.put(t);
Ray Milkey269ffb92014-04-03 14:43:30 -0700193 } else {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800194 verifyRoleReplyReceived(t.switches, t.deadline);
195 }
196 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700197 } catch (Exception e) {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800198 // Should never get here
Ray Milkey269ffb92014-04-03 14:43:30 -0700199 log.error("RoleRequestWorker task had an uncaught exception. ",
200 e);
201 } finally {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800202 // Be nice in case we earlier caught InterruptedExecution
203 if (interrupted)
204 Thread.currentThread().interrupt();
205 }
206 } // end loop
207 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700208
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800209 public RoleChanger() {
210 this.pendingTasks = new DelayQueue<RoleChangeTask>();
211 this.workerThread = new Thread(new RoleRequestWorker());
212 this.timeout = DEFAULT_TIMEOUT;
213 this.workerThread.start();
214 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700215
216
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800217 public synchronized void submitRequest(Collection<OFSwitchImpl> switches, Role role) {
218 long deadline = System.nanoTime();
219 // Grrr. stupid DelayQueue. Make sre we have at least 10ms between
220 // role request messages.
Ray Milkey269ffb92014-04-03 14:43:30 -0700221 if (deadline - lastSubmitTime < 10 * 1000 * 1000)
222 deadline = lastSubmitTime + 10 * 1000 * 1000;
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800223 // make a copy of the list
224 ArrayList<OFSwitchImpl> switches_copy = new ArrayList<OFSwitchImpl>(switches);
225 RoleChangeTask req = new RoleChangeTask(switches_copy, role, deadline);
226 pendingTasks.put(req);
227 lastSubmitTime = deadline;
228 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700229
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800230 /**
Ray Milkey269ffb92014-04-03 14:43:30 -0700231 * Send a role request message to switches. This checks the capabilities
232 * of the switch for understanding role request messaging. Currently we only
233 * support the OVS-style role request message, but once the controller
234 * supports OF 1.2, this function will also handle sending out the
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800235 * OF 1.2-style role request message.
Ray Milkey269ffb92014-04-03 14:43:30 -0700236 *
237 * @param switches the collection of switches to send the request too
238 * @param role the role to request
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800239 */
Ray Milkey269ffb92014-04-03 14:43:30 -0700240 @LogMessageDoc(level = "WARN",
241 message = "Failed to send role request message " +
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800242 "to switch {switch}: {message}. Disconnecting",
Ray Milkey269ffb92014-04-03 14:43:30 -0700243 explanation = "An I/O error occurred while attempting to change " +
244 "the switch HA role.",
245 recommendation = LogMessageDoc.CHECK_SWITCH)
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800246 protected void sendRoleRequest(Collection<OFSwitchImpl> switches,
247 Role role, long cookie) {
248 // There are three cases to consider:
249 //
250 // 1) If the controller role at the point the switch connected was
251 // null/disabled, then we never sent the role request probe to the
252 // switch and therefore never set the SWITCH_SUPPORTS_NX_ROLE
253 // attribute for the switch, so supportsNxRole is null. In that
254 // case since we're now enabling role support for the controller
255 // we should send out the role request probe/update to the switch.
256 //
257 // 2) If supportsNxRole == Boolean.TRUE then that means we've already
258 // sent the role request probe to the switch and it replied with
259 // a role reply message, so we know it supports role request
260 // messages. Now we're changing the role and we want to send
261 // it another role request message to inform it of the new role
262 // for the controller.
263 //
264 // 3) If supportsNxRole == Boolean.FALSE, then that means we sent the
265 // role request probe to the switch but it responded with an error
266 // indicating that it didn't understand the role request message.
267 // In that case we don't want to send it another role request that
268 // it (still) doesn't understand. But if the new role of the
269 // controller is SLAVE, then we don't want the switch to remain
270 // connected to this controller. It might support the older serial
271 // failover model for HA support, so we want to terminate the
272 // connection and get it to initiate a connection with another
273 // controller in its list of controllers. Eventually (hopefully, if
274 // things are configured correctly) it will walk down its list of
275 // controllers and connect to the current master controller.
276 Iterator<OFSwitchImpl> iter = switches.iterator();
Ray Milkey269ffb92014-04-03 14:43:30 -0700277 while (iter.hasNext()) {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800278 OFSwitchImpl sw = iter.next();
279 try {
280 Boolean supportsNxRole = (Boolean)
281 sw.getAttribute(IOFSwitch.SWITCH_SUPPORTS_NX_ROLE);
282 if ((supportsNxRole == null) || supportsNxRole) {
283 // Handle cases #1 and #2
Ray Milkey269ffb92014-04-03 14:43:30 -0700284 log.debug("Sending NxRoleRequest to {}", sw);
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800285 sw.sendNxRoleRequest(role, cookie);
HIGUCHI Yutab1d3e512013-06-17 10:47:11 -0700286 } else {
Ray Milkey269ffb92014-04-03 14:43:30 -0700287 if (role == Role.MASTER) {
288 // ONOS extension:
289 log.debug("Switch {} doesn't support NxRoleRequests, but sending " +
290 "{} request anyway", sw, role);
291 //Send the role request anyway, even though we know the switch
292 //doesn't support it. The switch will give an error and in our
293 //error handling code we will add the switch.
294 //NOTE we *could* just add the switch right away rather than
295 //going through the overhead of sending a role request - however
296 //we then have to deal with concurrency issues resulting from
297 //calling addSwitch outside of a netty handler.
298 sw.sendNxRoleRequest(role, cookie);
299 }
300 // Handle case #3
301 else if (role == Role.SLAVE) {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800302 log.debug("Disconnecting switch {} that doesn't support " +
Ray Milkey269ffb92014-04-03 14:43:30 -0700303 "role request messages from a controller that went to SLAVE mode");
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800304 // Closing the channel should result in a call to
305 // channelDisconnect which updates all state
306 sw.getChannel().close();
307 iter.remove();
308 }
309 }
310 } catch (IOException e) {
Ray Milkey269ffb92014-04-03 14:43:30 -0700311 log.warn("Failed to send role request message " +
312 "to switch {}: {}. Disconnecting",
313 sw, e);
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800314 sw.getChannel().close();
315 iter.remove();
316 }
317 }
318 }
Ray Milkey269ffb92014-04-03 14:43:30 -0700319
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800320 /**
321 * Verify that switches have received a role reply message we sent earlier
Ray Milkey269ffb92014-04-03 14:43:30 -0700322 *
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800323 * @param switches the collection of switches to send the request too
Ray Milkey269ffb92014-04-03 14:43:30 -0700324 * @param cookie the cookie of the request
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800325 */
Ray Milkey269ffb92014-04-03 14:43:30 -0700326 @LogMessageDoc(level = "WARN",
327 message = "Timeout while waiting for role reply from switch {switch}."
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800328 + " Disconnecting",
Ray Milkey269ffb92014-04-03 14:43:30 -0700329 explanation = "Timed out waiting for the switch to respond to " +
330 "a request to change the HA role.",
331 recommendation = LogMessageDoc.CHECK_SWITCH)
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800332 protected void verifyRoleReplyReceived(Collection<OFSwitchImpl> switches,
Ray Milkey269ffb92014-04-03 14:43:30 -0700333 long cookie) {
334 for (OFSwitchImpl sw : switches) {
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800335 if (sw.checkFirstPendingRoleRequestCookie(cookie)) {
336 sw.getChannel().close();
337 log.warn("Timeout while waiting for role reply from switch {}."
Ray Milkey269ffb92014-04-03 14:43:30 -0700338 + " Disconnecting", sw);
Umesh Krishnaswamy345ee992012-12-13 20:29:48 -0800339 }
340 }
341 }
342}