Starting NxRM as a service may result in a Karaf NullPointerException on start-up

Problem

Rapidly stopping and starting Nexus 3 when configured as a service[1] (for example during  DR testing) may result in Nexus not starting up at all and with the following exception in the logging:

2021-05-07 15:18:13,640+0000 INFO [FelixStartLevel] *SYSTEM org.sonatype.nexus.pax.logging.NexusLogActivator - start
2021-05-07 15:18:14,285+0000 INFO [FelixStartLevel] *SYSTEM org.sonatype.nexus.features.internal.FeaturesWrapper - Fast FeaturesService starting
2021-05-07 15:18:14,364+0000 ERROR [FelixStartLevel] *SYSTEM org.apache.karaf.deployer.features.FeatureDeploymentListener - Unable to update deployed features for bundle: org.apache.felix.framework - 5.6.12
java.lang.NullPointerException: null
at org.apache.karaf.deployer.features.FeatureDeploymentListener.bundleChanged(FeatureDeploymentListener.java:247)
at org.apache.karaf.deployer.features.FeatureDeploymentListener.init(FeatureDeploymentListener.java:95)
at org.apache.karaf.deployer.features.osgi.Activator$DeploymentFinishedListener.deploymentEvent(Activator.java:86)
at org.apache.karaf.features.internal.service.FeaturesServiceImpl.registerListener(FeaturesServiceImpl.java:295)
at org.apache.karaf.deployer.features.osgi.Activator.doStart(Activator.java:53)
at org.apache.karaf.util.tracker.BaseActivator.start(BaseActivator.java:92)
at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
at org.apache.felix.framework.Felix.activateBundle(Felix.java:2240)
at org.apache.felix.framework.Felix.startBundle(Felix.java:2146)
at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1373)
at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:308)
at java.lang.Thread.run(Thread.java:748)

Observations

  • This issue has been observed in-frequently after rapid system restarts (where NxRM starts automatically via systemd).
  • The error occurs right at the beginning of the boot sequence
  • Any logging or thread dumps captured at the time of the issue do not give any hints as to where the problem is.

Possible cause and fix

This could be a race condition where mounted disks (relevant to Nexus Repository Manager) are not mounted quick enough and/or the network is not fully initialised and up' before 'systemd' kicks in.

If you do have a reliance on network mounts then ensuring these are fully available to Nexus before the Nexus service starts should hopefully overcome this issue.

  • establish which mounts are relied upon (either for blobstores or Nexus data)
  • run
    systemctl list-units | grep '.mount'
    - to get the names of these mount 'units'
  • add both the network and the mounts to the service [Unit] section options for 'Requires' and 'After' [2]
[Unit]
...
Requires=network.target network-online.target <required mount 1>.mount <required mount2>.mount multi-user.target
After=network.target network-online.target <required mount 1>.mount <required mount2>.mount multi-user.target
...

 

Refs

 

 

 

 

 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.