.
1. Option 3 - HA Deployment in AWS
1.1 ProvisioningFailed
kubectl describe pvc/nexus-data-nxrm-ha-aws-61-0-0-nxrm-statefulset-0 -n nexusrepo
Warning ProvisioningFailed 4m13s ebs.csi.aws.com_ebs-csi-controller-86cc6f578b-69b26_7ea9f9e2-1780-4bf0-ab64-8648c43ffa68 failed to provision volume with StorageClass "nxrm-ha-aws-61.0.0-nexus3-61-ebs-storage": rpc error: code = Internal desc = Could not create volume "pvc-9b8b6d15-e680-4340-92a8-63005bfde58c": could not create volume in EC2: WebIdentityErr: failed to retrieve credentials |
How to know to describe the PVC?
1.1.1 Nexus Repository 3 resources status
1.1.1.1 Pods
The Nexus Repository 3 HA AWS helm chart creates 3 pods by default. But there is only 1 pod, which is in Pending status.
There are no events after describing the pod if the AGE of the pod is younger than 10m
But if the AGE is older than 10m:
running PreBind plugin "VolumeBinding": binding volumes: timed out waiting for the condition |
1.1.1.2 StatefulSet
1.1.1.3 PVC
1.1.2 Understand the error
The error means there is a permission issue of ebs.csi.aws.com when creating the required PV. Recall
- If you are using EKS version 1.23+, you must first install the AWS EBS CSI driver before running the current AWS Helm chart. We recommend using the EKS add-on option as described in AWS’s installation instructions.
1.1.3 How to solve the issue
AWS Management Console tab in https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html
1.) Login to AWS Console -> EKS -> Check the EKS cluster has the required OICD
2.) Go to the Add-ons tab->Search EBS
2.1) Check if the role's name is correct; if not, don't modify it. Instead, delete the Add-on and readd the EBS CSI Driver Add-on with the correct role.
2.2) Click the role and check the role's permission
3.) Check the role's permission policy is AmazonEBSCSIDriverPolicy and is AWS Managed. This role shouldn't be modified unless there is a good reason and you know what you are doing.
4.) Check the role's Trust relationships are correct. Any typo can cause an issue
An example:
1.2 FailedMount
1.2.1 Error 1
Error from describing the pod:
Warning FailedMount 4s (x7 over 35s) kubelet MountVolume.SetUp failed for volume "nxrm-secrets" : kubernetes.io/csi: mounter.SetUpAt failed to get CSI client: driver name secrets-store.csi.k8s.io not found in the list of registered CSI drivers |
or
Warning FailedMount 29s kubelet Unable to attach or mount volumes: unmounted volumes=[nxrm-secrets], unattached volumes=[aws-iam-token nxrm-secrets logback-tasklogfile-override nexus-data kube-api-access-hvd6v]: timed out waiting for the condition Warning FailedMount 26s (x8 over 90s) kubelet MountVolume.NewMounter initialization failed for volume "nxrm-secrets" : volume mode "Ephemeral" not supported by driver secrets-store.csi.k8s.io (no CSIDriver object) |
This error is clear. The Secret Store CSI Driver wasn't installed.
Follow the AWS documentation for Secrets Store CSI Drivers to mount the license secret, which is stored in AWS Secrets Manager, as a volume in the pod running Nexus Repository.
Error from describing the pod:
Warning FailedMount 50s (x2 over 2m52s) kubelet MountVolume.SetUp failed for volume "nxrm-secrets" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod nexusrepo/nxrm-ha-aws-61-0-0-nxrm-statefulset-0, err: error connecting to provider "aws": provider not found: provider "aws" Warning FailedMount 27s kubelet Unable to attach or mount volumes: unmounted volumes=[nxrm-secrets], unattached volumes=[nxrm-secrets logback-tasklogfile-override nexus-data kube-api-access-hvd6v aws-iam-token]: timed out waiting for the condition |
Besides installing the Secret Store CSI Driver, the Secrets Manager and Config Provider are also required:
To install the Secrets Manager and Config Provider use the YAML file in the deployment directory:
kubectl apply -f https://raw.githubusercontent.com/aws/secrets-store-csi-driver-provider-aws/main/deployment/aws-provider-installer.yaml
1.2.2 Error 2
Warning FailedMount 7s kubelet MountVolume.SetUp failed for volume "nxrm-secrets" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod nexusrepo/nxrm-ha-aws-61-0-0-nxrm-statefulset-0, err: rpc error: code = Unknown desc = ca-central-1: Failed fetching secret arn:aws:secretsmanager:ca-central-1:499792945187:secret:nxrm-license-p9wVVT: AccessDeniedException: User: arn:aws:sts::499792945187:assumed-role/secrets-store-csi-service-account-role/secrets-store-csi-driver-provider-aws is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:ca-central-1:499792945187:secret:nxrm-license-p9wVVT because no identity-based policy allows the secretsmanager:GetSecretValue action status code: 400, request id: 886eb4d3-6e9d-42e7-bc9e-39c9af9f0035 |
how to solve it:
From the error message, check the role secrets-store-csi-service-account-role has the required policy to get the secrets values.
1.2.3 Error 3
Error from describing the pod:
Warning FailedMount 14s kubelet (combined from similar events): MountVolume.SetUp failed for volume "nxrm-secrets" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod nexusrepo/nxrm-ha-aws-61-0-0-nxrm-statefulset-0, err: rpc error: code = Unknown desc = ca-central-1: Failed fetching secret arn:aws:secretsmanager:ca-central-1:499792945187:secret:nxrm-license-p9wVVT: WebIdentityErr: failed to retrieve credentials caused by: AccessDenied: Not authorized to perform sts:AssumeRoleWithWebIdentity status code: 403, request id: e6074a18-774e-429d-b54e-3c55cb47f39d |
how to solve it:
The issue is similar to 1.1.
Please check the role defined in the below and its Trust relationships
https://github.com/sonatype/nxrm3-ha-repository/blob/main/nxrm-ha-helm/values.yaml#L25
If the below is defined in Trust relationships:
"oidc.eks.ca-central-1.amazonaws.com/id/2F203DF611EB4531E51B812FDAEC12FD:sub": "system:serviceaccount:nexusrepo:nexus-repository-deployment-sa"
Then the serviceaccount is the one defined in:
https://github.com/sonatype/nxrm3-ha-repository/blob/main/nxrm-ha-helm/values.yaml#L24
the value is in the format: system:serviceaccount:<serviceaccount-namespace>:<service-account>
1.2.4 Error 4
Error from describing the pod:
Warning FailedMount 67s (x10 over 5m19s) kubelet MountVolume.SetUp failed for volume "nxrm-secrets" : rpc error: code = Unknown desc = failed to mount secrets store objects for pod nexusrepo/nxrm-ha-aws-61-0-0-nxrm-statefulset-0, err: rpc error: code = Unknown desc = Failed to fetch secret from all regions: arn:aws:secretsmanager:ca-central-1:499792945187:secret:nxrm-admin-init-cred-MkELHqWarning FailedMount 58s kubelet Unable to attach or mount volumes: unmounted volumes=[nxrm-secrets], unattached volumes=[logback-tasklogfile-override nexus-data kube-api-access-zkrtn aws-iam-token nxrm-secrets]: timed out waiting for the condition |
check the log of the secrete store provider
kubectl logs -l app=csi-secrets-store-provider-aws --prefix -n kube-system
If there is JMES Path in the log:
[pod/csi-secrets-store-provider-aws-fq7z6/provider-aws-installer] W1031 23:44:23.318892 1 secrets_manager_provider.go:84] JMES Path - admin_nxrm_password for object alias - nxrm-admin-password does not point to a valid object.[pod/csi-secrets-store-provider-aws-fq7z6/provider-aws-installer] E1031 23:44:23.318924 1 server.go:151] Failure getting secret values from provider type secretsmanager: Failed to fetch secret from all regions: arn:aws:secretsmanager:ca-central-1:499792945187:secret:nxrm-admin-init-cred-MkELHq |
Then check if the secret key is the same as the JmesPath. Use branch 3.61.0-02 as an example:
For the secret:
It should have a key-value pair with the key of the same as
Which by default is admin_nxrm_password
The same for the secret:https://github.com/sonatype/nxrm3-ha-repository/blob/61.0.2/nxrm-aws-ha-helm/templates/secret.yaml#L27
the secret should have 3 key-value pairs whose keys are: username, password, and host
1.3 Can't find secret
Error from describing the pods:
Warning Failed 17m kubelet Error: secret "nxrm-db-secret" not found |
The secret should be synced by Secrets Store CSI Drivers. To see more information about the error, we need to check the logs of Secrets Store CSI Drivers. DaemonSet manages the CSI Driver, creating 1 pod(csi-secrets-store-secrets-store-csi-driver-xxxxx) in each node. If there are only 3 nodes, then checking the logs of each pod is endurable. But if there are many, then it's pretty troublesome to go through each pod of the CSI Driver. Another way is to check all the CSI driver pods by label(the limitation is it only shows 10 latest log entries for each pod)
kubectl logs -l app=secrets-store-csi-driver --prefix -n kube-system
[pod/csi-secrets-store-secrets-store-csi-driver-g85j5/secrets-store] I1031 19:12:32.321315 1 secretproviderclasspodstatus_controller.go:340] "The secret operation failed with forbidden error. If you installed the CSI driver using helm, ensure syncSecret.enabled=true is set.\n"[pod/csi-secrets-store-secrets-store-csi-driver-g85j5/secrets-store] E1031 19:12:32.328470 1 secretproviderclasspodstatus_controller.go:336] "failed to create Kubernetes secret" err="secrets is forbidden: User \"system:serviceaccount:kube-system:secrets-store-csi-driver\" cannot create resource \"secrets\" in API group \"\" in the namespace \"nexusrepo\"" spc="nexusrepo/nxrm-ha-aws-61.0.0.nexus3-61-secret" pod="nexusrepo/nxrm-ha-aws-61-0-0-nxrm-s |
1.3.1 How to solve the issue
Check the value of the syncSecret.enable.
helm get values csi-secrets-store -n kube-system
Output:
USER-SUPPLIED VALUES:
syncSecret:
enabled: false
helm upgrade csi-secrets-store secrets-store-csi-driver/secrets-store-csi-driver --set syncSecret.enabled=true -n kube-system