NeoShield Security logo NeoShield Security Quantum X

Academy / Machine Learning for Cybersecurity

MODULE 08 · Advanced

Machine Learning for Cybersecurity

Use ML where it earns its keep — and defend the models themselves.

Module overview

ML helps with problems of scale and subtlety: anomaly detection, clustering, and triage. But security is an adversarial, low-base-rate domain where naive ML drowns analysts in false positives. This module builds practical anomaly detection with honest evaluation, then covers adversarial ML and LLM security (prompt injection, poisoning) so you can defend ML systems too.

Lesson 1

Practical Anomaly Detection (and Its Pitfalls)

Deep explanation

Frame the problem first. Supervised learning needs labels you rarely have for novel attacks; unsupervised anomaly detection (Isolation Forest, clustering) finds outliers without labels but needs careful features and thresholds. The base-rate problem dominates: at millions of events and few attacks, even 99% accuracy yields overwhelming false positives. Evaluate with precision/recall and alert volume, never accuracy. Good features come from domain knowledge: bytes-out per session, rare process-parent pairs, login velocity, DNS entropy.

Examples

  • Isolation Forest over per-host network features flags a beaconing host as an outlier.
  • A model with 99% accuracy still produces thousands of false alerts at enterprise scale — precision/recall tell the real story.

Commands & tools

# Isolation Forest for outlier detection (scikit-learn, lab dataset)
from sklearn.ensemble import IsolationForest
model = IsolationForest(contamination=0.01, random_state=42).fit(X_train)
scores = model.decision_function(X_test)
preds  = model.predict(X_test)        # -1 = anomaly

# Evaluate honestly (NOT accuracy)
from sklearn.metrics import precision_score, recall_score
print(precision_score(y, preds==-1), recall_score(y, preds==-1))

Diagram

  logs -> features (bytes-out, parent/child rarity, login velocity, DNS entropy)
            -> IsolationForest -> anomaly scores -> THRESHOLD -> triage queue
  evaluate with PRECISION/RECALL + alert volume   (accuracy is misleading)

Hands-on lab

  1. Engineer features from a provided network/auth log dataset (e.g., bytes-out, rare parent-child, login velocity).
  2. Train an Isolation Forest and score a held-out set.
  3. Report precision and recall (not accuracy) and tune the threshold/contamination to a sane alert volume.
  4. Write a note: which features drove detections and where the model would create analyst fatigue.
Expected output: An anomaly detector that flags the known outliers, reported with precision/recall and a defensible threshold — plus an honest note on false-positive cost.
What to observe: In security, the base rate makes accuracy meaningless; precision/recall and alert volume are the real metrics.

How attackers exploit · how defenders respond

Exploit: Adversaries operate "low and slow" to stay inside normal variance and below anomaly thresholds.

Detect & respond: Combine ML anomaly scores with rule-based context and analyst feedback; never ship a raw model as an alert source without tuning.

Red teamBlend with baseline behavior; poison or drift the model over time.
Blue teamDomain-driven features, honest evaluation, human-in-the-loop triage, and drift monitoring.

Real-world scenario

Teams that judged ML detectors by accuracy shipped fatigue machines; switching to precision/recall and tuning thresholds made the same models useful.
Lesson 2

Adversarial ML & LLM Security

Deep explanation

If you deploy ML/LLMs, they become targets. Threats include evasion (inputs crafted to be misclassified), data poisoning (corrupting training data to implant blind spots), model/prompt-injection against LLM apps (untrusted content overriding instructions), and sensitive-data leakage. The NeoShield platform's own Claude integration models the right posture: treat model output as untrusted data, validate against a strict schema, and dispatch only to vetted handlers — never execute model-authored code.

Defenses: input/output validation, isolation of tool execution, provenance and integrity for training data, rate/abuse limits, and monitoring of model decisions.

Examples

  • A prompt-injection payload hidden in a fetched web page tries to make an LLM agent exfiltrate data — blocked by treating fetched content as untrusted and constraining tools.
  • Poisoned training samples implant a backdoor trigger that flips a classifier — caught by data provenance and validation.

Commands & tools

# Defensive pattern (pseudocode) — never trust model output as code
resp = call_model(prompt)
data = validate_against_schema(resp)      # reject if invalid
if data.action not in ALLOWED_HANDLERS: reject()
dispatch(ALLOWED_HANDLERS[data.action], data.args)   # vetted handler only

# LLM app hardening checklist
# - separate system/instruction from untrusted content
# - constrain & sandbox tools the model can call
# - rate-limit + budget-gate + log every call

Diagram

  untrusted input/content -> [ LLM ] -> output
                                       |
                          validate schema + allow-list action
                                       |
                          vetted handler ONLY (no eval of model code)
  threats: evasion | poisoning | prompt-injection | data leakage

Hands-on lab

  1. Review the NeoShield Claude pipeline (policy-check -> model -> schema-validate -> vetted handler) and map each step to a threat it mitigates.
  2. Construct a benign prompt-injection test against a lab LLM endpoint and confirm the schema/allow-list rejects the unsafe instruction.
  3. Add one defense: stricter output schema, tool sandboxing, or a rate/budget gate.
  4. Document residual risk and a monitoring plan for model decisions.
Expected output: A threat-to-control mapping for an LLM pipeline, a blocked injection test, and one concrete hardening added with a monitoring note.
What to observe: The durable defense is architectural: treat model output as untrusted data, validate it, and constrain what it can trigger.

How attackers exploit · how defenders respond

Exploit: Evasion, poisoning, and prompt injection target the model and its surrounding app rather than classic code bugs.

Detect & respond: Validate inputs/outputs, monitor decision drift and refusal/abuse rates, maintain training-data provenance, and log all tool calls.

Red teamCraft evasive inputs, poison data, inject instructions via untrusted content.
Blue teamSchema + allow-list + sandboxing + provenance + monitoring; never execute model-authored code.

Real-world scenario

Real LLM-agent incidents stem from trusting fetched/user content as instructions; the validate-then-dispatch-to-vetted-handler pattern is the structural fix.

End-of-module assessment

Tap an answer to check it.

1. Why is accuracy a poor metric for security ML?

With rare positives, accuracy hides poor precision/recall and alert fatigue.

2. The structural defense for LLM apps is to:

Treat model output as untrusted data; validate and constrain what it can trigger.

3. Data poisoning attacks target:

Poisoning corrupts training data to manipulate the resulting model.

Key takeaways

  • Use ML for scale/subtlety, but evaluate with precision/recall and alert volume, never accuracy.
  • Good features come from domain knowledge; keep humans in the triage loop.
  • Defend ML/LLM systems: validate output, allow-list actions, sandbox tools, and never execute model-authored code.

Sign in to save your progress across devices, track lessons, and ask the AI instructor.

Ask the AI instructor

Stuck on this module? Ask a question and get a practical, defensive explanation.

Daily limit: visitors 1, members 3, donors 20.

Related reading: NeoShield security blog · Practice safely in an isolated lab only.