Problem
Nexus Repository 3 using AWS S3 blob-stores may report the following stack trace as an error in the nexus.log:
Caused by: java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:589) at org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:368) at com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:142) at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142) ... 167 common frames omitted
These socket connect timeout errors lead to instability in accessing blobs stored in the S3 blobstore and potentially build failures when accessing the content in the repositories using these blob stores.
Explanation
Nexus Repository 3 presently uses AWS SDK v1 to communicate with any configured S3 bucket.
If a connection is cached inside the S3 connection pool for re-use, then the IP address mapped to an S3 bucket name is also cached. Since the last use of the connection, it is possible the IP address mapped to the S3 bucket name may have changed ( expected in an AWS environment). When the connection is attempted to be re-used by Nexus Repository 3, a socket connect timeout error may occur trying to establish a connection to the now-defunct IP address associated with the re-used connection.
AWS recommends that the Java system property networkaddress.cache.ttl be set on the JVM of clients to possibly a lower than default value to help deal with DNS name resolution of S3 bucket names changing IP address.
https://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/java-dg-jvm-ttl.html
The Java virtual machine (JVM) caches DNS name lookups. When the JVM resolves a hostname to an IP address, it caches the IP address for a specified period of time, known as the time-to-live (TTL).
Because AWS resources use DNS name entries that occasionally change, we recommend that you configure your JVM with a TTL value of no more than 60 seconds. This ensures that when a resource’s IP address changes, your application will be able to receive and use the resource’s new IP address by requerying the DNS.
Nexus Repository 3 has shipped with this property set to 3600 seconds inside <app-dir>/etc/karaf/system.properties
However, the HTTP connections made to S3 are configurable via Java APIs of the SDK. A connection pool is used to re-use connections. One of the connection options is called "Connection Time to Live (TTL)" described here:
Connection Time to Live (TTL)
By default, the SDK will attempt to reuse HTTP connections as long as possible. In failure situations where a connection is established to a server that has been brought out of service, having a finite TTL can help with application recovery. For example, setting a 15 minute TTL will ensure that even if you have a connection established to a server that is experiencing issues, you’ll reestablish a connection to a new server within 15 minutes.
To set the HTTP connection TTL, use the ClientConfiguration.setConnectionTTL method.
The JavaDoc states:
Returns the expiration time (in milliseconds) for a connection in the connection pool. When * retrieving a connection from the pool to make a request, the total time that the connection * has been open is compared against this value. Connections which have been open for longer are * discarded, and if needed a new connection is created. * <p> * Tuning this setting down (together with an appropriately-low setting for Java's DNS cache * TTL) ensures that your application will quickly rotate over to new IP addresses when the * service begins announcing them through DNS, at the cost of having to re-establish new * connections more frequently. When a connection is * retrieved from the connection pool, this parameter is checked to see if the connection can be * reused.
Solution
To mitigate this what we suggest is adding these lines to <app-dir>/bin/nexus.vmoptions
the file. The number is always in seconds.
# vvv added to address NEXUS-28266 -Dnetworkaddress.cache.ttl=60 # ^^^ added to address NEXUS-28266
Then edit <data-dir>/etc/nexus.properties
and add these lines for the equivalent of 60 seconds. The "s" extension is required to indicate seconds:
# vvv added to address NEXUS-28266 nexus.s3.connection.ttl=60s # ^^^ added to address NEXUS-28266
A restart of Nexus Repository 3 will be required to have an effect on these changes.