Problem
Rapidly stopping and starting Nexus Repository 3 when configured as a service[1] (for example during DR testing) may result in Nexus Repository not starting up at all and with the following exception in the logging:
2021-05-07 15:18:13,640+0000 INFO [FelixStartLevel] *SYSTEM org.sonatype.nexus.pax.logging.NexusLogActivator - start
2021-05-07 15:18:14,285+0000 INFO [FelixStartLevel] *SYSTEM org.sonatype.nexus.features.internal.FeaturesWrapper - Fast FeaturesService starting
2021-05-07 15:18:14,364+0000 ERROR [FelixStartLevel] *SYSTEM org.apache.karaf.deployer.features.FeatureDeploymentListener - Unable to update deployed features for bundle: org.apache.felix.framework - 5.6.12
java.lang.NullPointerException: null
at org.apache.karaf.deployer.features.FeatureDeploymentListener.bundleChanged(FeatureDeploymentListener.java:247)
at org.apache.karaf.deployer.features.FeatureDeploymentListener.init(FeatureDeploymentListener.java:95)
at org.apache.karaf.deployer.features.osgi.Activator$DeploymentFinishedListener.deploymentEvent(Activator.java:86)
at org.apache.karaf.features.internal.service.FeaturesServiceImpl.registerListener(FeaturesServiceImpl.java:295)
at org.apache.karaf.deployer.features.osgi.Activator.doStart(Activator.java:53)
at org.apache.karaf.util.tracker.BaseActivator.start(BaseActivator.java:92)
at org.apache.felix.framework.util.SecureAction.startActivator(SecureAction.java:697)
at org.apache.felix.framework.Felix.activateBundle(Felix.java:2240)
at org.apache.felix.framework.Felix.startBundle(Felix.java:2146)
at org.apache.felix.framework.Felix.setActiveStartLevel(Felix.java:1373)
at org.apache.felix.framework.FrameworkStartLevelImpl.run(FrameworkStartLevelImpl.java:308)
at java.lang.Thread.run(Thread.java:748)
Observations
- This issue has been observed in-frequently after rapid system restarts (where Nexus Repository starts automatically via systemd).
- The error occurs right at the beginning of the boot sequence
- Any logging or thread dumps captured at the time of the issue do not give any hints as to where the problem is.
Related Jiras
https://issues.apache.org/jira/browse/KARAF-6074
Check and workaround
Example commands to find the corrupted files:
# Change below two variables for your environment. _INSTALL_DIR should end with the version string
_VER="3.40.1-01";
_INSTALL_DIR="/opt/sonatype/nexus-${_VER}";
cd $(dirname ${_INSTALL_DIR%/});
curl -o "/tmp/nexus-unix.tar.gz" -L "https://download.sonatype.com/nexus/3/nexus-${_VER}-unix.tar.gz";
tar --diff -f "/tmp/nexus-unix.tar.gz" nexus-${_VER}/etc/karaf | grep -vE '(Uid|Gid|Mod time) differs'
Example commands for restoring all files under "karaf" directory:
cd $(dirname ${_INSTALL_DIR%/});
tar -xv -f "/tmp/nexus-unix.tar.gz" nexus-${_VER}/etc/karaf
Possible cause
This could be a race condition where mounted disks (relevant to Nexus Repository) are not mounted quickly enough and/or the network is not fully initialized and up' before 'systemd' kicks in.
If you do have a reliance on network mounts then ensuring these are fully available to Nexus Repository before the Nexus Repository service starts should hopefully overcome this issue.
- establish which mounts are relied upon (either for blobstores or Nexus Repository data)
- run
systemctl list-units | grep '.mount'
- to get the names of these mount 'units' - add both the network and the mounts to the service [Unit] section options for 'Requires' and 'After' [2]
[Unit]
...
Requires=network.target network-online.target <required mount 1>.mount <required mount2>.mount multi-user.target
After=network.target network-online.target <required mount 1>.mount <required mount2>.mount multi-user.target
...
Refs
- https://www.freedesktop.org/software/systemd/man/systemd.unit.html#%5BUnit%5D%20Section%20Options
- https://help.sonatype.com/en/run-as-a-service.html