Quality of Service¶
Ocelot currently supports a single Quality of Service (QoS) capability. It allows you to configure, on a per-route basis, the application of a circuit breaker when making requests to downstream services. This feature leverages a well-regarded .NET library known as Polly. For more details, visit the Polly library’s official repository.
Note
Polly v7 syntax is no longer supported as of version 23.2, when the Ocelot team upgraded Polly from v7 to v8.
Installation¶
To utilize the Quality of Service via Polly library, begin by importing the appropriate Ocelot.Provider.Polly extension package:
Install-Package Ocelot.Provider.Polly
Next, in your Program, incorporate Polly services by invoking the AddPolly() extension on the OcelotBuilder, as shown below [1]:
using Ocelot.Provider.Polly;
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly();
QoSOptions Schema¶
Class: FileQoSOptions
Here is the complete Quality of Service configuration, also known as the “QoS options schema”. Depending on your needs and choosen strategies definition of all properties are not required. If you skip a property then a default value will be substituted as per Ocelot/Polly specification.
"QoSOptions": {
// Circuit Breaker strategy
"BreakDuration": 0, // integer
"MinimumThroughput": 0, // integer
"FailureRatio": 0.0, // floating number
"SamplingDuration": 0, // integer
// Timeout strategy
"Timeout": 0, // integer
// Deprecated options
"DurationOfBreak": 0, // deprecated! -> use BreakDuration
"ExceptionsAllowedBeforeBreaking": 0, // deprecated! -> use MinimumThroughput
"TimeoutValue": 0, // deprecated! -> use Timeout
}
Ocelot Option and Polly equivalent |
Description |
|---|---|
|
This is duration of break the circuit will stay open before resetting. The unit is milliseconds. |
|
This number of actions or more must pass through the circuit within the time slice for the statistics to be considered significant and for the circuit breaker to engage |
|
This is the failure-to-success ratio at which the circuit will break |
|
This is the duration of the sampling over which failure ratios are assessed. The unit is milliseconds. |
|
This is the default timeout. The unit is milliseconds. |
Warning
The following options are deprecated in version 24.1: DurationOfBreak, ExceptionsAllowedBeforeBreaking, and TimeoutValue!
Use the appropriate new options as shown in the table above.
These deprecated options will be removed in version 25.0.
For backward compatibility in version 24.1, a deprecated option takes precedence over its replacement.
Note [2]: Ocelot checks that the values of options are valid during execution. If not, it logs errors or warnings (refer to the Value constraints section in Notes). For a complete explanation about strategies and mechanisms, consult Polly’s Resilience strategies documentation.
Global Configuration [3]¶
According to the Global Configuration Schema, global Quality of Service options for static routes were introduced in version 24.1.
These global options can also be overridden in the Routes configuration section, a capability that has been supported for a long time.
{
"Routes": [
{
"Key": "R0", // optional
"QoSOptions": {
"Timeout": 15000 // 15s
},
// ...
},
{
"Key": "R1", // this route is part of a group
"QoSOptions": {}, // optional due to grouping
// ...
}
],
"GlobalConfiguration": {
"BaseUrl": "https://ocelot.net",
"QoSOptions": {
"RouteKeys": ["R1",], // if undefined or empty array, opts will apply to all routes
"BreakDuration": 1000, // 1s
"MinimumThroughput": 3
},
// ...
}
}
Dynamic routes were not supported in versions prior to 24.1.
However, global Quality of Service options have been available in Dynamic Routing mode for a long time.
Starting with version 24.1, global QoS options can also be overridden in the DynamicRoutes configuration section, as defined by the Dynamic Route Schema.
{
"DynamicRoutes": [
{
"Key": "", // optional
"ServiceName": "my-service",
"QoSOptions": {
"Timeout": 15000 // 15s
},
}
],
"GlobalConfiguration": {
"BaseUrl": "https://ocelot.net",
"DownstreamScheme": "http",
"ServiceDiscoveryProvider": {
// required section for dynamic routing
},
"QoSOptions": {
"RouteKeys": [], // or null, no grouping, thus opts apply to all dynamic routes
"BreakDuration": 1000, // 1s
"MinimumThroughput": 3,
"FailureRatio": 0.1, // 10%
"SamplingDuration": 30000 // 30s
}
}
}
In this dynamic routing configuration, the Timeout strategy is applied to the my-service service in addition to the Circuit Breaker strategy, resulting in Polly timing out after 15 seconds.
However, for all implicit dynamic routes, the Timeout strategy is not globally configured, in favor of the standard Timeout option managed by the Ocelot Core requester middleware.
Lastly, the Circuit Breaker strategy has been globally configured for all routes due to the absence of route grouping, with the following options:
allow 3 errors before breaking the circuit for 1 second, and allow up to 10% errors during the default 30-second sampling period.
Note
1. Please note that route-level options take precedence over global options.
2. If the RouteKeys option is not defined or the array is empty in the global QoSOptions, the global options will apply to all routes.
If the array contains route keys, it defines a single group of routes to which the global options apply.
Routes excluded from this group must specify their own route-level QoSOptions.
3. Since Ocelot’s Polly provider utilizes the Resilience pipeline registry, each route has a dedicated pipeline cached in Polly’s registry using the route’s load-balancing key. For a static route, the load-balancing key uniquely identifies the route by its upstream options, whereas for dynamic routes the load-balancing key is typically the service name from the discovery provider. Thus, Polly’s registry maintains dedicated pipelines for each discovered service, and those pipelines behave independently. Finally, it is important to understand that global QoS options do not create a single shared resilience pipeline in the registry.
4. Dynamic routes were not supported in versions prior to 24.1.
Beginning with version 24.1, global QoS options for Dynamic Routing may be overridden in the DynamicRoutes configuration section, as defined by the Dynamic Route Schema.
Additionally, global configuration for static routes (also known as Routes) has been supported since version 24.1.
Circuit Breaker strategy¶
Documentation: Circuit breaker resilience strategyPrimary option:MinimumThroughput, formerlyExceptionsAllowedBeforeBreaking
The options MinimumThroughput and BreakDuration can be configured independently from Timeout:
"QoSOptions": {
"MinimumThroughput": 3,
"BreakDuration": 1000 // ms
}
Alternatively, you can omit BreakDuration, which will default to the implicit 5-second setting as specified in Polly’s BreakDuration documentation:
"QoSOptions": {
"MinimumThroughput": 3
}
This setup activates only the Circuit breaker resilience strategy.
Additionally, there is a failure handling strategy based on FailureRatio, which serves as a counterpart to, or supplement for, the number of failures, also known as MinimumThroughput.
"QoSOptions": {
"MinimumThroughput": 10,
"FailureRatio": 0.5, // 50%
"SamplingDuration": 10000, // ms, 10 seconds
}
Thus, a failure ratio of 0.5 indicates that the circuit will break if 50% or more of actions result in handled failures, after reaching the minimum threshold of 10 failures, also known as the MinimumThroughput option.
Additionally, the 10-second sampling duration defines the time window over which the 50% failure ratio is evaluated.
Note: The
MinimumThroughputoption (also known as Polly’s MinimumThroughput) is the primary option that enables the Circuit Breaker strategy. Its value must be valid (set to 2 or greater, refer to the Value constraints section in Notes) and may be supplemented with additional Circuit Breaker options.
Timeout strategy¶
Documentation: Timeout resilience strategyPrimary option:Timeout, formerlyTimeoutValue
The Timeout can be configured independently from the options of the Circuit Breaker strategy:
"QoSOptions": {
"Timeout": 5000 // ms
}
This setup activates only the Timeout resilience strategy.
To configure a global QoS timeout using the Timeout strategy for all routes (both static and dynamic) set the Timeout option as defined in the Global Configuration Schema:
"GlobalConfiguration": {
// other global props
"QoSOptions": {
"Timeout": 10000 // ms, 10 seconds
}
}
Please note that the route-level timeout takes precedence over the global timeout. For example, a route timeout may be shorter, while the global timeout can be longer and apply to all routes.
Note: There are Value constraints for
Timeout: it must be a positive number starting from 1 millisecond to enable the Timeout strategy. IfTimeoutis undefined, zero or a negative number, the Timeout strategy will not be added to the resilience pipeline. Also, keep in mind Polly’s Timeout constraint, thus Ocelot validates theTimeout. If the value violates Polly’s requirements, it will be rolled back to the default of 30 seconds.
Notes¶
Absolute timeout [4]¶
If a QoS section is not included, QoS will not be applied, and Ocelot will enforce an absolute timeout of 90 seconds (defined by the DownstreamRoute DefTimeout constant) for all downstream requests.
This absolute timeout is configurable via the DownstreamRoute DefaultTimeoutSeconds static C# property.
For more information, refer to the Default timeout section of the Configuration chapter.
Value constraints¶
Starting with Polly v8, the Resilience strategies documentation outlines the following constraints on values:
The
BreakDurationvalue must exceed 500 milliseconds and be less than 24 hours (1 day =86 400 000milliseconds). If unspecified or invalid, it defaults to 5000 milliseconds (5 seconds); refer to the BreakDuration documentation.The
MinimumThroughputvalue must be 2 or greater. If unspecified or invalid, it defaults to 100 failures; refer to the MinimumThroughput documentation.The
FailureRatiomust be greater than 0.0 and no more than 1.0. If unspecified or invalid, it defaults to 0.1 (10%); refer to the FailureRatio documentation.The
SamplingDurationvalue must exceed 500 milliseconds and be less than 24 hours (1 day =86 400 000milliseconds). If unspecified or invalid, it defaults to 30000 milliseconds (30 seconds); refer to the SamplingDuration documentation.The
Timeoutmust be greater than 10 milliseconds and less than 24 hours (1 day =86 400 000milliseconds). If unspecified or invalid, it defaults to 30000 milliseconds (30 seconds); refer to the Timeout documentation. And please note, when both route-level and global QoS timeouts have positive values but are invalid, a default value will be automatically substituted from theTimeoutStrategyclass DefaultTimeout static C# property, which can also be configured in your Program.
Ocelot logs warnings containing failed validation messages for all options, but it does not block Ocelot startup, even when QoS options are invalid. Inspect your logs for these messages and adjust your configuration if necessary.
QoS and route (global) timeouts¶
The Timeout option in QoS always takes precedence over the route Timeout property, so Timeout will be ignored in favor of QoS Timeout.
In Ocelot Core, Timeout and configuration Timeout are not intended to be used together.
Moreover, there is an Ocelot Core design constraint: if the route or global Timeout duration is shorter than the QoS Timeout, you may encounter warning messages in the logs that begin with the following sentence:
Route '/xxx' has Quality of Service settings (QoSOptions) enabled, but either the route Timeout or the QoS Timeout is misconfigured: ...
This warning means that the route or global timeout will occur before the QoS Timeout strategy has a chance to handle its own timeout event, which is configured with a longer duration.
Technically, this situation results in the functional disabling of the Polly’s Timeout resilience strategy.
Ocelot handles this misconfiguration by logging a warning and automatically applying a longer timeout to the TimeoutDelegatingHandler in order to effectively unblock the QoS Timeout strategy.
To avoid this warning, ensure that your QoS timeouts are shorter than the route or global timeouts, or remove the Timeout property from routes where QoS is enabled with the Timeout option.
Global and default QoS timeouts¶
If a route-level QoS timeout is undefined, the global Timeout takes precedence over the default timeout (30 seconds, see the Timeout docs).
This means the global QoS timeout can override Polly’s default of 30 seconds via the Global Configuration Schema.
Extensibility [5]¶
To use your ResiliencePipeline<T> provider, you can apply the following syntax:
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly<MyProvider>();
// MyProvider should implement IPollyQoSResiliencePipelineProvider<HttpResponseMessage>
// Note: you can use standard provider PollyQoSResiliencePipelineProvider
Additionally, if you want to utilize your own DelegatingHandler, the following syntax can be applied:
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly<MyProvider>(MyQosDelegatingHandlerDelegate);
// MyQosDelegatingHandlerDelegate is a delegate use to get a DelegatingHandler. Refer to Ocelot's PollyResiliencePipelineDelegatingHandler
Finally, to define your own set of exceptions for mapping, you can apply the following syntax:
static Error CreateError(Exception e) => new RequestTimedOutError(e);
Dictionary<Type, Func<Exception, Error>> MyErrorMapping = new()
{
{typeof(TaskCanceledException), CreateError},
{typeof(TimeoutRejectedException), CreateError},
{typeof(BrokenCircuitException), CreateError},
{typeof(BrokenCircuitException<HttpResponseMessage>), CreateError},
};
builder.Services
.AddOcelot(builder.Configuration)
.AddPolly<MyProvider>(MyErrorMapping);
// Note: Default error mapping is defined in the DefaultErrorMapping field of the Ocelot.Provider.Polly.OcelotBuilderExtensions class