Symptom
Repository 3 will not start. The nexus.log contains messages from Elasticsearch complaining there are "too man open files" while processing "translog" files, despite the host and process user being allocated the Sonatype recommended open file limits. Example messages from log files:
2021-11-18 20:33:17,471+0000 WARN [elasticsearch[9ABF7C4C-519835A7-11A56AA3-61A039CE-A8A3617F][generic][T#2]] *SYSTEM org.elasticsearch.cluster.action.shard - [9ABF7C4C-519835A7-11A56AA3-61A039CE-A8A3617F] [5ee1789646d644f752c824e160d3d84882ee3c6b][0] received shard failed for target shard [[5ee1789646d644f752c824e160d3d84882ee3c6b][0], node[Jhd9TnXsRUu-m7BDnl7Gmw], [P], v[3251], s[INITIALIZING], a[id=hLc90y0jQXOZunsOzFkNLQ], unassigned_info[[reason=ALLOCATION_FAILED], at[2021-11-18T20:33:17.174Z], details[failed recovery, failure IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/nexus-data/elasticsearch/nexus/nodes/0/indices/5ee1789646d644f752c824e160d3d84882ee3c6b/0/translog/translog-1348.ckp: Too many open files]; ]]], indexUUID [i3-vCF1KSfqoB1jAxNiPxQ], message [failed recovery], failure [IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: FileSystemException[/nexus-data/elasticsearch/nexus/nodes/0/indices/5ee1789646d644f752c824e160d3d84882ee3c6b/0/translog/translog-104.ckp: Too many open files]; ]
org.elasticsearch.index.shard.IndexShardRecoveryException: failed to recovery from gateway
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:250)
at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.index.engine.EngineCreationFailureException: failed to create engine
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:152)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1513)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1497)
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:970)
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:942)
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)
... 5 common frames omitted
Caused by: java.nio.file.FileSystemException: /nexus-data/elasticsearch/nexus/nodes/0/indices/5ee1789646d644f752c824e160d3d84882ee3c6b/0/translog/translog-104.ckp: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at org.elasticsearch.index.translog.Checkpoint.read(Checkpoint.java:82)
at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:330)
at org.elasticsearch.index.translog.Translog.<init>(Translog.java:179)
at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:205)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:148)
... 11 common frames omitted
2021-11-18 20:33:13,467+0000 WARN [elasticsearch[9ABF7C4C-519835A7-11A56AA3-61A039CE-A8A3617F][generic][T#3]] *SYSTEM org.elasticsearch.index.translog - [9ABF7C4C-519835A7-11A56AA3-61A039CE-A8A3617F] [150b810c0368acf6af2eb4523c84cf1cf740cd8d][0] failed to delete unreferenced translog files
java.nio.file.FileSystemException: /nexus-data/elasticsearch/nexus/nodes/0/indices/150b810c0368acf6af2eb4523c84cf1cf740cd8d/0/translog: Too many open files
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:91)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newDirectoryStream(UnixFileSystemProvider.java:427)
at java.nio.file.Files.newDirectoryStream(Files.java:457)
at org.elasticsearch.index.translog.Translog$OnCloseRunnable.handle(Translog.java:726)
at org.elasticsearch.index.translog.Translog$OnCloseRunnable.handle(Translog.java:714)
at org.elasticsearch.index.translog.ChannelReference.closeInternal(ChannelReference.java:67)
at org.elasticsearch.common.util.concurrent.AbstractRefCounted.decRef(AbstractRefCounted.java:64)
at org.elasticsearch.index.translog.TranslogReader.close(TranslogReader.java:143)
at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:129)
at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:354)
at org.elasticsearch.index.translog.Translog.<init>(Translog.java:179)
at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:205)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:148)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1513)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1497)
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:970)
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:942)
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)
at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Diagnosis
Elasticsearch translog files may be leaking due to previous incomplete shutdowns, or known Elasticsearch component bugs. At this point elasticsearch indexes should be considered corrupt and not recoverable.
To minimize the chance of such corruption, ensure your repository shutdown process allows repository to perform a graceful shutdown with a SIGTERM signal. Do not abruptly kill the process using an aggressive liveness check, system service monitor, container stop or manually using kill -9 if it can be avoided.
Solution
Elasticsearch indexes are used to support the UI and REST component search features. As such they can be recreated from the database contents without losing critical information.
While Repository is stopped, rename or delete the data directory sub-directory named "elasticsearch" ( /nexus-data/elasticsearch in the example messages in this article), then try starting repository. On startup, elasticsearch indexes will be automatically rebuilt and should allow repository to fully start.