Use ML where it earns its keep — and defend the models themselves.
Module overview
ML helps with problems of scale and subtlety: anomaly detection, clustering, and triage. But security is an adversarial, low-base-rate domain where naive ML drowns analysts in false positives. This module builds practical anomaly detection with honest evaluation, then covers adversarial ML and LLM security (prompt injection, poisoning) so you can defend ML systems too.
Lesson 1
Practical Anomaly Detection (and Its Pitfalls)
Deep explanation
Frame the problem first. Supervised learning needs labels you rarely have for novel attacks; unsupervised anomaly detection (Isolation Forest, clustering) finds outliers without labels but needs careful features and thresholds. The base-rate problem dominates: at millions of events and few attacks, even 99% accuracy yields overwhelming false positives. Evaluate with precision/recall and alert volume, never accuracy. Good features come from domain knowledge: bytes-out per session, rare process-parent pairs, login velocity, DNS entropy.
Examples
Isolation Forest over per-host network features flags a beaconing host as an outlier.
A model with 99% accuracy still produces thousands of false alerts at enterprise scale — precision/recall tell the real story.
logs -> features (bytes-out, parent/child rarity, login velocity, DNS entropy)
-> IsolationForest -> anomaly scores -> THRESHOLD -> triage queue
evaluate with PRECISION/RECALL + alert volume (accuracy is misleading)
Hands-on lab
Engineer features from a provided network/auth log dataset (e.g., bytes-out, rare parent-child, login velocity).
Train an Isolation Forest and score a held-out set.
Report precision and recall (not accuracy) and tune the threshold/contamination to a sane alert volume.
Write a note: which features drove detections and where the model would create analyst fatigue.
Expected output: An anomaly detector that flags the known outliers, reported with precision/recall and a defensible threshold — plus an honest note on false-positive cost.
What to observe: In security, the base rate makes accuracy meaningless; precision/recall and alert volume are the real metrics.
How attackers exploit · how defenders respond
Exploit: Adversaries operate "low and slow" to stay inside normal variance and below anomaly thresholds.
Detect & respond: Combine ML anomaly scores with rule-based context and analyst feedback; never ship a raw model as an alert source without tuning.
Red teamBlend with baseline behavior; poison or drift the model over time.
Blue teamDomain-driven features, honest evaluation, human-in-the-loop triage, and drift monitoring.
Real-world scenario
Teams that judged ML detectors by accuracy shipped fatigue machines; switching to precision/recall and tuning thresholds made the same models useful.
Lesson 2
Adversarial ML & LLM Security
Deep explanation
If you deploy ML/LLMs, they become targets. Threats include evasion (inputs crafted to be misclassified), data poisoning (corrupting training data to implant blind spots), model/prompt-injection against LLM apps (untrusted content overriding instructions), and sensitive-data leakage. The NeoShield platform's own Claude integration models the right posture: treat model output as untrusted data, validate against a strict schema, and dispatch only to vetted handlers — never execute model-authored code.
Defenses: input/output validation, isolation of tool execution, provenance and integrity for training data, rate/abuse limits, and monitoring of model decisions.
Examples
A prompt-injection payload hidden in a fetched web page tries to make an LLM agent exfiltrate data — blocked by treating fetched content as untrusted and constraining tools.
Poisoned training samples implant a backdoor trigger that flips a classifier — caught by data provenance and validation.
Commands & tools
# Defensive pattern (pseudocode) — never trust model output as code
resp = call_model(prompt)
data = validate_against_schema(resp) # reject if invalid
if data.action not in ALLOWED_HANDLERS: reject()
dispatch(ALLOWED_HANDLERS[data.action], data.args) # vetted handler only
# LLM app hardening checklist
# - separate system/instruction from untrusted content
# - constrain & sandbox tools the model can call
# - rate-limit + budget-gate + log every call
Diagram
untrusted input/content -> [ LLM ] -> output
|
validate schema + allow-list action
|
vetted handler ONLY (no eval of model code)
threats: evasion | poisoning | prompt-injection | data leakage
Hands-on lab
Review the NeoShield Claude pipeline (policy-check -> model -> schema-validate -> vetted handler) and map each step to a threat it mitigates.
Construct a benign prompt-injection test against a lab LLM endpoint and confirm the schema/allow-list rejects the unsafe instruction.
Add one defense: stricter output schema, tool sandboxing, or a rate/budget gate.
Document residual risk and a monitoring plan for model decisions.
Expected output: A threat-to-control mapping for an LLM pipeline, a blocked injection test, and one concrete hardening added with a monitoring note.
What to observe: The durable defense is architectural: treat model output as untrusted data, validate it, and constrain what it can trigger.
How attackers exploit · how defenders respond
Exploit: Evasion, poisoning, and prompt injection target the model and its surrounding app rather than classic code bugs.
Detect & respond: Validate inputs/outputs, monitor decision drift and refusal/abuse rates, maintain training-data provenance, and log all tool calls.
Red teamCraft evasive inputs, poison data, inject instructions via untrusted content.
Blue teamSchema + allow-list + sandboxing + provenance + monitoring; never execute model-authored code.
Real-world scenario
Real LLM-agent incidents stem from trusting fetched/user content as instructions; the validate-then-dispatch-to-vetted-handler pattern is the structural fix.
End-of-module assessment
Tap an answer to check it.
1. Why is accuracy a poor metric for security ML?
With rare positives, accuracy hides poor precision/recall and alert fatigue.
2. The structural defense for LLM apps is to:
Treat model output as untrusted data; validate and constrain what it can trigger.
3. Data poisoning attacks target:
Poisoning corrupts training data to manipulate the resulting model.
Key takeaways
Use ML for scale/subtlety, but evaluate with precision/recall and alert volume, never accuracy.
Good features come from domain knowledge; keep humans in the triage loop.
Defend ML/LLM systems: validate output, allow-list actions, sandbox tools, and never execute model-authored code.