SocketTimeoutException connect timed out when accessing S3 buckets using S3 blobstores

Problem

Nexus Repository 3 using AWS S3 blobstores may report the following stack trace as an error in the nexus.log:

Caused by: java.net.SocketTimeoutException: connect timed out
 at java.net.PlainSocketImpl.socketConnect(Native Method)
 at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
 at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
 at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
 at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
 at java.net.Socket.connect(Socket.java:589)
 at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368)
 at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:142)
 at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
 ... 167 common frames omitted


These socket connect timeout errors lead to instability accessing blobs stored in the S3 blobstore and potentially build failures when accessing the content in the repositories using these blob stores.

Explanation

Nexus Repository 3 presently uses AWS SDK v1 to communicate with any configured S3 bucket.

If a connection is cached inside the S3 connection pool for re-use, then the IP address mapped to an S3 bucket name is also cached. Since last use of the connection, it is possible the IP address mapped to the S3 bucket name may have changed ( expected in an AWS environment). When the connection is attempted to be re-used by Nexus Repository 3, a socket connect timeout error may occur trying to establish a connection to the now defunct IP address associated with the re-used connection.

AWS recommends that the the Java system property networkaddress.cache.ttl be set on the JVM of clients to possibly a lower than default value to help deal with DNS name resolution of S3 bucket names changing IP address.

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html

The Java virtual machine (JVM) caches DNS name lookups. When the JVM resolves a hostname to an IP address, it caches the IP address for a specified period of time, known as the time-to-live (TTL).

Because AWS resources use DNS name entries that occasionally change, we recommend that you configure your JVM with a TTL value of no more than 60 seconds. This ensures that when a resource’s IP address changes, your application will be able to receive and use the resource’s new IP address by requerying the DNS.

Nexus Repository 3 has shipped with this property set to 3600 seconds inside <app-dir>/etc/karaf/system.properties

However, the HTTP connections made to S3 are configurable via Java APIs of the SDK. A connection pool is used to re-use connections. One of the connection options is called "Connection Time to Live (TTL)" described here:

https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/section-client-configuration.html#http-transport-configuration

Connection Time to Live (TTL)

By default, the SDK will attempt to reuse HTTP connections as long as possible. In failure situations where a connection is established to a server that has been brought out of service, having a finite TTL can help with application recovery. For example, setting a 15 minute TTL will ensure that even if you have a connection established to a server that is experiencing issues, you’ll reestablish a connection to a new server within 15 minutes.

To set the HTTP connection TTL, use the ClientConfiguration.setConnectionTTL method.

The JavaDoc states:

Returns the expiration time (in milliseconds) for a connection in the connection pool. When
 * retrieving a connection from the pool to make a request, the total time that the connection
 * has been open is compared against this value. Connections which have been open for longer are
 * discarded, and if needed a new connection is created.
 * <p>
 * Tuning this setting down (together with an appropriately-low setting for Java's DNS cache
 * TTL) ensures that your application will quickly rotate over to new IP addresses when the
 * service begins announcing them through DNS, at the cost of having to re-establish new
 * connections more frequently.

When a connection is
 * retrieved from the connection pool, this parameter is checked to see if the connection can be
 * reused.

Solution

This symptom and complete diagnosis is documented in this issue:
 
https://issues.sonatype.org/browse/NEXUS-28266
 
To mitigate this what we suggest is adding these lines to <app-dir>/bin/nexus.vmoptions file. The number is always in seconds.

# vvv added to address https://issues.sonatype.org/browse/NEXUS-28266
-Dnetworkaddress.cache.ttl=60
# ^^^ added to address https://issues.sonatype.org/browse/NEXUS-28266

Then edit <data-dir>/etc/nexus.properties and add these lines for the equivalent of 60 seconds. The "s" extension is required to indicate seconds:

# vvv added to address https://issues.sonatype.org/browse/NEXUS-28266
nexus.s3.connection.ttl=60s
# ^^^ added to address https://issues.sonatype.org/browse/NEXUS-28266

A restart of Nexus Repository 3 will be required to have an effect of these changes.

Have more questions? Submit a request

0 Comments

Article is closed for comments.