[SDFAB-1015] Use gNOI to probe reachability and increase short timeouts
p4runtime probe reachability is based on GetPipelineConfig gRPC that
can timeout if we are setting in parallel the pipeline: the two requests
concur for the same lock. For our purposes it is enough to check if the
device is still there; for this reason stratum handshaker now relies
on gNOI reachability which is based on getTime RPC.
Additionally increase short timeouts: we have consistenly measured a
time of 14s to push the pipeline on the new QS devices.
Change-Id: I8837540241d8a68f648e47ae165ea53a2d0a865c
diff --git a/drivers/stratum/src/main/java/org/onosproject/drivers/stratum/StratumHandshaker.java b/drivers/stratum/src/main/java/org/onosproject/drivers/stratum/StratumHandshaker.java
index ac9a534..7b41f85 100644
--- a/drivers/stratum/src/main/java/org/onosproject/drivers/stratum/StratumHandshaker.java
+++ b/drivers/stratum/src/main/java/org/onosproject/drivers/stratum/StratumHandshaker.java
@@ -64,7 +64,12 @@
@Override
public CompletableFuture<Boolean> probeReachability() {
- return p4runtime.probeReachability();
+ // p4runtime probe reachability is based on GetPipelineConfig gRPC that
+ // can timeout if we are setting in parallel the pipeline: the two requests
+ // can concur for the same lock. For our purposes it is enough to check if
+ // the device is still there; for this reason stratum handshaker now relies
+ // on gNOI reachability which is based on getTime RPC.
+ return gnoi.probeReachability();
}
@Override
diff --git a/protocols/p4runtime/ctl/src/main/java/org/onosproject/p4runtime/ctl/client/P4RuntimeClientImpl.java b/protocols/p4runtime/ctl/src/main/java/org/onosproject/p4runtime/ctl/client/P4RuntimeClientImpl.java
index fdf5a94..5e29d62 100644
--- a/protocols/p4runtime/ctl/src/main/java/org/onosproject/p4runtime/ctl/client/P4RuntimeClientImpl.java
+++ b/protocols/p4runtime/ctl/src/main/java/org/onosproject/p4runtime/ctl/client/P4RuntimeClientImpl.java
@@ -55,10 +55,12 @@
private static final long DEFAULT_P4_DEVICE_ID = 1;
// TODO: consider making timeouts configurable per-device via netcfg
+ // We have measured that some devices can take up to 15s to push a pipeline
+ // which can block potentially other READ done against the target.
/**
* Timeout in seconds for short/fast RPCs.
*/
- static final int SHORT_TIMEOUT_SECONDS = 10;
+ static final int SHORT_TIMEOUT_SECONDS = 15;
/**
* Timeout in seconds for RPCs that involve transfer of potentially large
* amount of data. This shoulld be long enough to allow for network delay