It’s bug time again ….
For anyone setting up NTA on their shiny new NSX-T 3.2 deployment you may have hit a situation where everything goes “pete tong”, as us Brits like to say, when you enable the NTA Detectors. Well it’s down to two specific detectors using the wrong docker registries:
- Horizontal Port Scanner
- Non Standard Port Usage
There is a fix, but it requires you to edit the deployment configuration for the NTA-Server. My advice then is to update the deployment before you enable these detectors, however if you have already done so then you simply do the change below and then delete your pods and they should get redeployed.
Firstly SSH to your Kubernetes Control Plane Node then complete the following:
Edit the NTA Server Deployment
kubectl -n nsxi-platform edit deployment nta-server
Look for env variable called NTAFLOW_IMAGE
---
containers:
- env:
- name: SPRING_CONFIG_LOCATION
value: /opt/vmware/pace/config/application.yaml
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: SPARK_HOME
value: /opt/spark
- name: SERVICE_ACCOUNT_NAME
value: nta-server-sa
- name: NTAFLOW_IMAGE
value: harbor-repo.vmware.com/nsx_intelligence/clustering/nta-flow:19238496
---
Replace harbor-repo.vmware.com with your docker registry info. For example, if my registry was livefire-labs.dev, this line would look like:
- name: NTAFLOW_IMAGE
value: livefire-labs.dev/nsx_intelligence/clustering/nta-flow:19238496
Finally Save and exit.
At this point you should be able to enable the detectors and everything should work. Like I said, if you’ve already enabled everything then just delete the existing pods deployed and they should redeploy automatically.
This WILL be fixed in an up coming release, but I know a few contacts are eager to play with the NTA + NDR features of NSX-T 3.2, so hopefully this will reduce some head scratching in the mean time.
FINAL HINT:
If you are running an environment with NSX Intelligence in evaluation mode you might need to increase the vCPU of the work-node(s) in your cluster from 16 vCPU to 24 vCPU. This was required in our environment as we were running multiple NAPP services, and the anomaly detection pod was failing to run due to a CPU constraint.
Hope that helps you all.