Starting Nexus Repository 3 as a service may result in a Karaf NullPointerException on start-up

Problem

Rapidly stopping and starting Nexus Repository 3 when configured as a service[1] (for example during  DR testing) may result in Nexus Repository not starting up at all and with the following exception in the logging:

2021-05-07 15:18:13,640+0000 INFO [FelixStartLevel] *SYSTEM org.sonatype.nexus.pax.logging.NexusLogActivator - start
2021-05-07 15:18:14,285+0000 INFO [FelixStartLevel] *SYSTEM org.sonatype.nexus.features.internal.FeaturesWrapper - Fast FeaturesService starting
2021-05-07 15:18:14,364+0000 ERROR [FelixStartLevel] *SYSTEM org.apache.karaf.deployer.features.FeatureDeploymentListener - Unable to update deployed features for bundle: org.apache.felix.framework - 5.6.12
java.lang.NullPointerException: null
at org.apache.karaf.deployer.features.FeatureDeploymentListener.bundleChanged(FeatureDeploymentListener.java:247)
at org.apache.karaf.deployer.features.FeatureDeploymentListener.init(FeatureDeploymentListener.java:95)
at org.apache.karaf.deployer.features.osgi.Activator$DeploymentFinishedListener.deploymentEvent(Activator.java:86)
at org.apache.karaf.features.internal.service.FeaturesServiceImpl.registerListener(FeaturesServiceImpl.java:295)
at org.apache.karaf.deployer.features.osgi.Activator.doStart(Activator.java:53)
at org.apache.karaf.util.tracker.BaseActivator.start(BaseActivator.java:92)
at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
at org.apache.felix.framework.Felix.activateBundle(Felix.java:2240)
at org.apache.felix.framework.Felix.startBundle(Felix.java:2146)
at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1373)
at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:308)
at java.lang.Thread.run(Thread.java:748)

Observations

  • This issue has been observed in-frequently after rapid system restarts (where Nexus Repository starts automatically via systemd).
  • The error occurs right at the beginning of the boot sequence
  • Any logging or thread dumps captured at the time of the issue do not give any hints as to where the problem is.

Related Jiras

https://issues.apache.org/jira/browse/KARAF-6074

Check and workaround

Example commands to find the corrupted files:

# Change below two variables for your environment. _INSTALL_DIR should end with the version string
_VER="3.40.1-01";
_INSTALL_DIR="/opt/sonatype/nexus-${_VER}";

cd $(dirname ${_INSTALL_DIR%/});
curl -o "/tmp/nexus-unix.tar.gz" -L "https://download.sonatype.com/nexus/3/nexus-${_VER}-unix.tar.gz";
tar --diff -f "/tmp/nexus-unix.tar.gz" nexus-${_VER}/etc/karaf | grep -vE '(Uid|Gid|Mod time) differs'

Example commands for restoring all files under "karaf" directory:

cd $(dirname ${_INSTALL_DIR%/});
tar -xv -f "/tmp/nexus-unix.tar.gz" nexus-${_VER}/etc/karaf

 

Possible cause

This could be a race condition where mounted disks (relevant to Nexus Repository) are not mounted quickly enough and/or the network is not fully initialized and up' before 'systemd' kicks in.

If you do have a reliance on network mounts then ensuring these are fully available to Nexus Repository before the Nexus Repository service starts should hopefully overcome this issue.

  • establish which mounts are relied upon (either for blobstores or Nexus Repository data)
  • run
    systemctl list-units | grep '.mount'
    - to get the names of these mount 'units'
  • add both the network and the mounts to the service [Unit] section options for 'Requires' and 'After' [2]
[Unit]
...
Requires=network.target network-online.target <required mount 1>.mount <required mount2>.mount multi-user.target
After=network.target network-online.target <required mount 1>.mount <required mount2>.mount multi-user.target
...

 

Refs

 

 

 

 

 

Have more questions? Submit a request

0 Comments

Article is closed for comments.