Interview questionsSecurity & DevSecOps

Security & DevSecOps interview questions & answers

100 Security & DevSecOps interview questions, each answered three ways: a concise spoken answer, a technical explanation, and a hands-on example.

Tip: paste the job description + your resume into our free resume checker to see which of these skills the role actually requires.

All questions

  1. What is DevSecOps, and how does it differ from traditional security gating at the end? [Basic]
  2. What does shift-left security mean, and why does it matter? [Basic]
  3. What is the difference between SAST, DAST, IAST, and SCA? [Basic]
  4. When in the pipeline does each of SAST, DAST, and SCA run? [Basic]
  5. What is the difference between SAST and DAST, and what does each catch and miss? [Basic]
  6. What is software composition analysis (SCA), and why does it matter for dependencies? [Basic]
  7. What is SonarQube, and what does it analyse? [Basic]
  8. Is SonarQube primarily SAST, code quality, or both? [Basic]
  9. What is a SonarQube quality gate, and how do you use it to fail a build? [Basic]
  10. What is the difference between a quality gate and a quality profile in SonarQube? [Basic]
  11. What does 'clean as you code' mean in SonarQube, and why focus on new code? [Basic]
  12. What metrics does SonarQube track (coverage, duplication, code smells, bugs, vulnerabilities)? [Basic]
  13. What is the difference between a bug, a vulnerability, and a code smell in SonarQube? [Basic]
  14. What is technical debt in SonarQube, and how is it estimated? [Basic]
  15. How does SonarQube measure code coverage, and does it run your tests? [Basic]
  16. How do you integrate SonarQube into a Jenkins or GitHub pipeline? [Basic]
  17. What is a SonarQube hotspot, and how is it different from a vulnerability? [Basic]
  18. How do you handle false positives in SonarQube? [Basic]
  19. How do you enforce that pull requests pass the quality gate before merge? [Basic]
  20. What is the difference between SonarQube and SonarCloud? [Basic]
  21. How would you roll out SonarQube to many teams without blocking them on day one? [Basic]
  22. What is Wiz, and what category of tool is it (CSPM/CNAPP)? [Basic]
  23. What is CSPM, and what does it protect against? [Basic]
  24. What is CNAPP, and what capabilities does it combine? [Basic]
  25. What is the difference between agent-based and agentless cloud security scanning? [Basic]
  26. How does Wiz scan a cloud environment without agents? [Basic]
  27. What is the Wiz Security Graph, and why is context important for prioritisation? [Basic]
  28. What is a toxic combination (attack path) in Wiz, and why prioritise it? [Basic]
  29. How does Wiz help prioritise which vulnerabilities to fix first? [Basic]
  30. Why is context (internet exposure, sensitive data, privileges) key to vulnerability prioritisation? [Basic]
  31. How would you use Wiz findings to drive remediation across teams? [Basic]
  32. How does your AI-assisted remediation tool relate to scanners like Wiz? [Basic]
  33. What is the difference between a CVE, a CVSS score, and exploitability (EPSS/KEV)? [Basic]
  34. What is CVSS, and what are its limitations for prioritisation? [Intermediate]
  35. What is EPSS, and how does it improve on raw CVSS? [Intermediate]
  36. What is the CISA KEV catalog, and how would you use it? [Intermediate]
  37. How do you reduce the noise of thousands of vulnerability findings? [Intermediate]
  38. How do you decide whether to patch, mitigate, or accept the risk of a vulnerability? [Intermediate]
  39. What is vulnerability management as a lifecycle (discover, prioritise, remediate, verify)? [Intermediate]
  40. How would you remediate a critical CVE in a third-party Java dependency (as you did at Intuit)? [Intermediate]
  41. What is a transitive dependency, and why does it complicate patching? [Intermediate]
  42. What is the difference between patching the base image and patching the application dependency? [Intermediate]
  43. What is container image scanning, and what tools do it (Trivy, Grype, Wiz)? [Intermediate]
  44. How do you prevent vulnerable images from being deployed (admission control, registry policy)? [Intermediate]
  45. What is an SBOM, and how do you generate and use one? [Intermediate]
  46. What is the difference between SPDX and CycloneDX SBOM formats? [Intermediate]
  47. What is supply-chain security, and what is SLSA? [Intermediate]
  48. What is artifact signing, and what is Sigstore/cosign? [Intermediate]
  49. How do you verify the provenance of a build artifact? [Intermediate]
  50. What is secrets scanning, and how do you stop secrets being committed to Git? [Intermediate]
  51. What do you do if a secret has already been committed and pushed? [Intermediate]
  52. What is the principle of least privilege, and how do you apply it in CI/CD? [Intermediate]
  53. How do you secure pipeline credentials and avoid long-lived secrets? [Intermediate]
  54. What is OIDC-based authentication from CI to a cloud provider, and why is it safer than keys? [Intermediate]
  55. What is workload identity, and how does it remove static credentials? [Intermediate]
  56. What is policy-as-code, and what tools implement it (OPA, Sentinel, Kyverno)? [Intermediate]
  57. What is OPA, and what is the Rego language used for? [Intermediate]
  58. How would you enforce a policy like 'no public S3 buckets' as code? [Intermediate]
  59. What is admission control in Kubernetes, and how do OPA Gatekeeper or Kyverno use it? [Intermediate]
  60. What is the difference between a validating and a mutating admission webhook? [Intermediate]
  61. What is 'secure by default', and give an example of a secure-by-default pattern? [Intermediate]
  62. What is Zero Trust, and how does it differ from perimeter-based security? [Intermediate]
  63. How do you apply least privilege to service-to-service communication (e.g., with Istio mTLS)? [Intermediate]
  64. What is secrets management, and how does HashiCorp Vault work at a high level? [Intermediate]
  65. What is dynamic secrets generation in Vault, and why is it powerful? [Intermediate]
  66. What is the difference between static and dynamic secrets? [Intermediate]
  67. How does secret rotation work, and why does it matter? [Advanced]
  68. What is encryption at rest versus in transit, and how do you ensure both? [Advanced]
  69. What is the difference between symmetric and asymmetric encryption? [Advanced]
  70. What is TLS, and what happens during a TLS handshake at a high level? [Advanced]
  71. What is mutual TLS, and where would you use it? [Advanced]
  72. What is certificate management, and how do tools like cert-manager help? [Advanced]
  73. What is the difference between authentication and authorization? [Advanced]
  74. What is RBAC versus ABAC? [Advanced]
  75. How do you secure container runtime (seccomp, AppArmor, capabilities, read-only root)? [Advanced]
  76. What are Linux capabilities, and why drop them in containers? [Advanced]
  77. Why should containers not run as root, and what does running rootless achieve? [Advanced]
  78. What are Kubernetes Pod Security Standards (privileged, baseline, restricted)? [Advanced]
  79. How do you prevent privilege escalation in a Kubernetes cluster? [Advanced]
  80. What is network segmentation, and how do NetworkPolicies enforce it? [Advanced]
  81. What is the difference between a vulnerability, a threat, and a risk? [Advanced]
  82. What is threat modeling, and when would you do it? [Advanced]
  83. What is the OWASP Top 10, and name a few categories? [Advanced]
  84. What is an injection attack, and how do you prevent SQL injection? [Advanced]
  85. What is the difference between encoding, encryption, and hashing? [Advanced]
  86. Why do you hash and salt passwords rather than encrypt them? [Advanced]
  87. What is a security incident response process, and what are its phases? [Advanced]
  88. How would you respond to a suspected credential compromise in production? [Advanced]
  89. How do you audit who did what in your cloud and clusters (CloudTrail, audit logs)? [Advanced]
  90. What is compliance-as-code, and how do you continuously prove compliance? [Advanced]
  91. How do you balance security controls with developer velocity? [Advanced]
  92. How do you get developers to adopt secure practices without friction? [Advanced]
  93. How would you embed security scanning into a pipeline without making it slow? [Advanced]
  94. What is a break-glass procedure, and why have one? [Advanced]
  95. How do you manage and rotate SSH and API keys at scale? [Advanced]
  96. What is the difference between a WAF, an IDS, and an IPS? [Advanced]
  97. How do you measure the effectiveness of your security program (MTTR for vulns, coverage)? [Advanced]
  98. What recent security tool or practice have you adopted, and what risk did it reduce? [Advanced]
  99. How would you design a secure-by-default CI/CD pipeline from scratch? [Advanced]
  100. How do you prove to an auditor that security controls are enforced continuously, not just documented? [Advanced]

What is DevSecOps, and how does it differ from traditional security gating at the end? [Basic]

Answer

DevSecOps means security is engineered into the full software delivery lifecycle instead of being treated as a final approval gate. The difference is ownership and timing: developers, platform, security, and operations all share controls that run continuously in code, CI/CD, cloud, and production.

Technical explanation

Traditional security gating often creates late rework because vulnerabilities are found after implementation or right before release.

DevSecOps moves guardrails into developer workflows: secure templates, SAST, SCA, image scanning, IaC scanning, secrets checks, admission policy, and runtime detection.

The goal is not to bypass security; it is to make secure behavior the default while keeping high-risk exceptions visible and approved.

Hands-on example

Hands-on: in a Jenkins or GitHub Actions pipeline, run unit tests, SAST, dependency scanning, container image scanning, IaC policy checks, and artifact signing before deployment. Low-risk findings create tickets, critical exploitable findings fail the build, and policy-as-code blocks noncompliant Kubernetes manifests at admission time.

What does shift-left security mean, and why does it matter? [Basic]

Answer

Shift-left security means finding and fixing security issues as early as possible, ideally while code is being written or reviewed. It matters because early fixes are cheaper, faster, and less disruptive than production remediations.

Technical explanation

Shift-left includes IDE feedback, pre-commit secret scanning, pull-request SAST, SCA during build, and IaC policy checks before infrastructure is applied.

It reduces late-stage release blockers and turns security into normal engineering feedback.

It should be paired with shift-right controls such as runtime detection, audit logs, and incident response because not all risks are visible before deployment.

Hands-on example

Example workflow: a developer opens a pull request. The PR runs SonarQube analysis, dependency review, Trivy image scanning, and Checkov/Terraform policy checks. The developer gets line-level comments and fixes the issue before merge instead of waiting for a quarterly penetration test finding.

What is the difference between SAST, DAST, IAST, and SCA? [Basic]

Answer

SAST analyzes source or bytecode without running the application, DAST tests a running application from the outside, IAST observes the application while tests execute, and SCA identifies vulnerable third-party dependencies and licenses.

Technical explanation

SAST is strong for coding issues such as injection patterns, insecure crypto calls, and unsafe deserialization paths.

DAST is strong for runtime and configuration issues such as exposed endpoints, missing headers, authentication bypass patterns, and reflected injection behavior.

IAST combines runtime context with code-level insight, while SCA focuses on open-source packages, transitive dependencies, CVEs, licenses, and upgrade paths.

Hands-on example

Hands-on: run SonarQube during PR for SAST/code quality, run OWASP ZAP against a deployed test URL for DAST, use an IAST agent in integration tests if available, and run Snyk/Dependabot/Trivy/Grype for SCA against package manifests and container images.

When in the pipeline does each of SAST, DAST, and SCA run? [Basic]

Answer

SCA and SAST usually run early in pull request and build stages. DAST runs later against a deployed test or staging environment because it needs a live application. Image and IaC scanning run before deployment, and runtime controls continue after release.

Technical explanation

SAST is useful before merge because developers can fix findings in the same code review cycle.

SCA should run on every dependency change and image build because new vulnerabilities can affect old packages.

DAST should run after deployment to an ephemeral or staging environment with representative routes, authentication, and test data.

Hands-on example

Pipeline example: PR stage = lint, unit tests, SAST, SCA, secret scan. Build stage = image scan and SBOM generation. Test deploy stage = DAST/ZAP scan. Release stage = signature verification and admission policies. Production = runtime monitoring and vulnerability drift detection.

What is the difference between SAST and DAST, and what does each catch and miss? [Basic]

Answer

SAST looks inside the code without executing it, while DAST attacks or probes the running application from the outside. SAST catches code-level flaws early; DAST catches runtime behavior and deployment/configuration issues, but neither is complete alone.

Technical explanation

SAST can find insecure API usage and tainted data flows but may produce false positives and may miss runtime-only misconfiguration.

DAST can validate externally observable issues but usually has limited visibility into exact source lines and may miss code paths not reached during scanning.

A strong program uses both, plus SCA, secrets scanning, threat modeling, and runtime controls.

Hands-on example

Example: SAST flags a concatenated SQL query in a DAO class. DAST confirms a SQL injection on /search when the staging app is running. The fix is to use parameterized queries, add regression tests, and rerun both scanners to verify the issue is gone.

What is software composition analysis (SCA), and why does it matter for dependencies? [Basic]

Answer

Software composition analysis identifies the third-party libraries, frameworks, transitive dependencies, and licenses in an application. It matters because most modern applications depend heavily on open-source packages, and a vulnerability can enter through a library the team did not write.

Technical explanation

SCA reads package manifests, lock files, build outputs, SBOMs, and sometimes container images.

It maps components to CVEs, license policies, end-of-life versions, and known malicious packages.

Good SCA prioritizes reachable, exploitable, internet-exposed, and business-critical findings rather than treating every CVE equally.

Hands-on example

Hands-on: for a Java service, scan pom.xml and the built image. If log4j-core is found as a transitive dependency, identify the parent dependency with mvn dependency:tree, upgrade or override the version, rebuild, regenerate the SBOM, and rescan before deployment.

What is SonarQube, and what does it analyse? [Basic]

Answer

SonarQube is a static analysis and code quality platform. It analyzes code for bugs, vulnerabilities, security hotspots, code smells, duplication, coverage data, and maintainability/security ratings across many languages.

Technical explanation

SonarQube does not normally run the application; it analyzes source code and imports external reports such as coverage from test tools.

It supports quality gates so teams can enforce standards before merge or release.

It is often used as a developer feedback system, not just a security scanner, because it also tracks maintainability and technical debt.

Hands-on example

Hands-on: configure sonar-project.properties with sonar.projectKey, sonar.sources, sonar.tests, and coverage report paths. Run sonar-scanner in CI, publish the analysis to SonarQube, and block the merge if the quality gate fails on new-code bugs, vulnerabilities, coverage, or duplication.

Is SonarQube primarily SAST, code quality, or both? [Basic]

Answer

SonarQube is both a code quality platform and a SAST-style static analysis tool. It detects security vulnerabilities and hotspots, but it also evaluates maintainability, reliability, duplication, test coverage, and technical debt.

Technical explanation

Calling it only SAST understates its quality-management role.

Calling it only code quality understates its security rules, vulnerability detection, taint analysis capabilities in supported editions, and security review workflows.

In interviews, I describe it as static code analysis for quality and security with quality gates for governance.

Hands-on example

Example: a PR analysis finds a SQL injection vulnerability, 12 code smells, duplicate blocks, and coverage below the new-code threshold. The pipeline fails because the quality gate checks all configured quality and security conditions together.

What is a SonarQube quality gate, and how do you use it to fail a build? [Basic]

Answer

A SonarQube quality gate is a set of pass/fail conditions applied to an analysis result. In CI, I use it to fail the build or block a merge when new code violates defined thresholds such as new vulnerabilities, coverage, bugs, or duplication.

Technical explanation

Quality gates are most effective when focused on new code, because legacy projects can adopt them without being blocked by old debt on day one.

The CI job submits analysis, waits for the quality gate result, and returns a nonzero status if the gate fails.

Typical conditions include zero new critical vulnerabilities, minimum new-code coverage, acceptable duplication, and no blocker bugs.

Hands-on example

Jenkins example:

withSonarQubeEnv('sonarqube') { sh 'mvn clean verify sonar:sonar' }

timeout(time: 5, unit: 'MINUTES') {

def qg = waitForQualityGate()

if (qg.status != 'OK') { error "Quality gate failed: ${qg.status}" }

}

What is the difference between a quality gate and a quality profile in SonarQube? [Basic]

Answer

A quality gate defines whether a project passes or fails based on metrics. A quality profile defines which analysis rules are enabled for a language. The profile decides what issues can be raised; the gate decides whether the result is acceptable.

Technical explanation

Quality profiles are rule sets, such as Java rules for security, reliability, and maintainability.

Quality gates are governance thresholds, such as no new blocker issues or new-code coverage above 80 percent.

A team can tune profiles per language while keeping a common enterprise gate for consistent release standards.

Hands-on example

Example: the Java quality profile enables rules for SQL injection and insecure random usage. The quality gate then fails the PR if any new vulnerability is found or if new-code coverage drops below the agreed threshold.

What does 'clean as you code' mean in SonarQube, and why focus on new code? [Basic]

Answer

Clean as you code means holding new or changed code to a high standard, even if the legacy codebase still has debt. The idea is to stop adding new problems first, then gradually remediate old issues based on risk and capacity.

Technical explanation

It makes adoption realistic because old code does not block every delivery pipeline immediately.

It creates personal ownership: developers fix the issues introduced in their current change.

It improves the trend of the system over time because every release is expected to leave the changed code clean.

Hands-on example

Hands-on rollout: set the new-code definition to main branch or the last release. Configure the quality gate to require zero new critical vulnerabilities, no new blocker bugs, and minimum new-code coverage. Track legacy debt separately in a remediation backlog.

What metrics does SonarQube track (coverage, duplication, code smells, bugs, vulnerabilities)? [Basic]

Answer

SonarQube tracks metrics such as coverage, duplicated lines, bugs, vulnerabilities, security hotspots, code smells, maintainability rating, reliability rating, security rating, complexity, and technical debt. These metrics drive dashboards and quality gates.

Technical explanation

Coverage is imported from test coverage tools; SonarQube analyzes and displays it but does not usually execute tests itself.

Bugs and vulnerabilities represent higher-confidence correctness or security issues, while code smells represent maintainability risks.

Duplication and complexity help identify areas that are harder to maintain and test.

Hands-on example

Example gate: new coverage >= 80 percent, duplicated lines on new code < 3 percent, zero new blocker bugs, zero new vulnerabilities, and all new security hotspots reviewed. A PR failing any condition must be fixed or explicitly reviewed before merge.

What is the difference between a bug, a vulnerability, and a code smell in SonarQube? [Basic]

Answer

In SonarQube, a bug is a likely correctness problem, a vulnerability is a security flaw with a concrete exploit risk, and a code smell is a maintainability issue that may make future changes riskier or slower.

Technical explanation

Bugs affect reliability, such as null dereference or incorrect logic.

Vulnerabilities affect security, such as injection, weak cryptography, or unsafe deserialization.

Code smells affect maintainability, such as duplicated code, overly complex methods, or dead code; they are usually not immediate incidents but still create debt.

Hands-on example

Hands-on triage: treat new critical vulnerabilities as merge blockers, high-confidence bugs as blockers or required fixes, and code smells based on severity and team standards. Convert accepted technical debt into tracked backlog items rather than ignoring it.

What is technical debt in SonarQube, and how is it estimated? [Basic]

Answer

Technical debt in SonarQube is an estimate of the remediation effort required to fix maintainability issues, usually expressed as time. It helps quantify how much engineering effort is needed to make the code easier and safer to change.

Technical explanation

Each rule has a remediation function or estimated effort, and SonarQube aggregates that across code smells and maintainability findings.

Debt is an estimate, not an exact project plan; teams should validate high-impact items manually.

Debt ratio and maintainability rating help compare codebases, but operational risk and business criticality should also influence prioritization.

Hands-on example

Example: a service has 20 days of estimated debt, but most new code is clean. The team agrees to spend 10 percent of each sprint on the highest-risk debt: complex payment flows, duplicated auth logic, and untested error handling.

How does SonarQube measure code coverage, and does it run your tests? [Basic]

Answer

SonarQube measures code coverage by importing coverage reports generated by test tools. It does not normally run the tests itself; the CI pipeline runs tests and passes reports such as JaCoCo, LCOV, coverage.py, or Cobertura into SonarQube.

Technical explanation

Coverage reflects which lines or branches were exercised by tests, depending on the language and report format.

A missing or misconfigured report can show zero coverage even if tests ran successfully.

Coverage should be used with judgment because high coverage does not prove strong assertions or complete security testing.

Hands-on example

Java example: run mvn test jacoco:report, then run sonar-scanner or mvn sonar:sonar with sonar.coverage.jacoco.xmlReportPaths=target/site/jacoco/jacoco.xml. The PR gate checks new-code coverage based on that imported report.

How do you integrate SonarQube into a Jenkins or GitHub pipeline? [Basic]

Answer

I integrate SonarQube by adding a scanner step to CI, passing project metadata and coverage reports, then waiting for the quality gate result. In Jenkins this commonly uses withSonarQubeEnv and waitForQualityGate; in GitHub Actions it uses a scanner action or Maven/Gradle task plus branch protection.

Technical explanation

The scanner should run after compilation/tests when coverage and test reports are available.

Authentication should use a scoped token stored in the CI secret manager, not hardcoded credentials.

The pull request should be protected so failing analysis prevents merge.

Hands-on example

GitHub Actions sketch:

- uses: actions/checkout@v4

- run: mvn -B verify sonar:sonar -Dsonar.projectKey=checkout

env:

SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

Then require the SonarQube quality-gate status check before merging to main.

What is a SonarQube hotspot, and how is it different from a vulnerability? [Basic]

Answer

A SonarQube security hotspot is security-sensitive code that requires human review to decide whether it is safe. A vulnerability is a finding where SonarQube has stronger evidence of an actual exploitable security flaw.

Technical explanation

Hotspots are review workflows, not automatically confirmed vulnerabilities.

Examples include use of cryptography, CORS settings, file handling, or authentication-related code that may be safe or unsafe depending on context.

A hotspot should be marked reviewed only after the reviewer verifies the implementation and documents the reasoning.

Hands-on example

Example: SonarQube flags a Security Hotspot for a permissive CORS configuration. The reviewer checks whether it is limited to non-sensitive public endpoints. If unsafe, they restrict origins; if safe, they mark it reviewed with justification.

How do you handle false positives in SonarQube? [Basic]

Answer

I handle false positives by validating the finding, documenting why it is not exploitable, marking it appropriately in SonarQube, and tuning rules only when the pattern is repeatedly noisy. I avoid blanket suppression because that hides real issues.

Technical explanation

False positives should be reviewed by someone with enough context to understand the code path and threat model.

Use issue workflow states, comments, or targeted suppression with justification rather than disabling rules globally.

If a rule creates excessive noise across many teams, adjust the quality profile centrally and communicate the reason.

Hands-on example

Hands-on: for a flagged SQL injection that uses a safe query builder, attach evidence in the issue, mark it false positive or accepted as configured, and add a unit test showing parameter binding. Keep the rule enabled for raw SQL concatenation cases.

How do you enforce that pull requests pass the quality gate before merge? [Basic]

Answer

I enforce pull-request quality gates by making the SonarQube quality gate a required status check in the source control system. The pipeline must run analysis for each PR and publish pass/fail status before merge is allowed.

Technical explanation

Branch protection in GitHub, GitLab, Bitbucket, or Azure DevOps should require the gate and prevent admin bypass except through controlled break-glass.

The gate should focus on new-code conditions so teams can adopt it without being blocked by historic debt.

Failures should include actionable links to the SonarQube issue view so developers can fix quickly.

Hands-on example

Example: create a main branch rule requiring 'SonarQube Code Analysis' and 'unit-tests'. In Jenkins, call waitForQualityGate and fail the build on non-OK status. In GitHub, require that status check before the Merge button is enabled.

What is the difference between SonarQube and SonarCloud? [Basic]

Answer

SonarQube is typically a self-managed or enterprise-managed server that an organization operates. SonarCloud is SonarSource's hosted SaaS offering, commonly used for cloud-hosted repositories and teams that do not want to operate the platform themselves.

Technical explanation

SonarQube gives more control over hosting, network placement, plugins, data residency, and enterprise integration.

SonarCloud reduces operational overhead and is convenient for GitHub, Bitbucket, Azure DevOps, and open-source workflows.

The choice depends on compliance, data residency, administration model, repository hosting, and cost structure.

Hands-on example

Example decision: for regulated internal code with strict data residency, use self-hosted SonarQube behind SSO. For a small SaaS team already on GitHub Cloud, use SonarCloud and enforce its PR checks through branch protection.

How would you roll out SonarQube to many teams without blocking them on day one? [Basic]

Answer

I would roll out SonarQube gradually: start with visibility, focus gates on new code, onboard pilot teams, create language-specific templates, then progressively enforce stricter standards. Blocking everyone on day one usually creates resistance and false urgency.

Technical explanation

Baseline existing projects so legacy debt is measured but not immediately gate-blocking.

Define common enterprise gates for new code and allow controlled exceptions for edge cases.

Provide CI templates, documentation, dashboards, and office hours so teams do not need to reinvent integration.

Hands-on example

Rollout plan: Month 1 inventory and pilot. Month 2 enable analysis for all repos with nonblocking reports. Month 3 require PR quality gates on new code. Month 4 add vulnerability SLAs and executive dashboards. Month 5 review exceptions and tune profiles.

What is Wiz, and what category of tool is it (CSPM/CNAPP)? [Basic]

Answer

Wiz is a cloud security platform in the CNAPP category, with strong CSPM, vulnerability, cloud workload, identity, data, Kubernetes, and exposure-risk capabilities. It helps teams understand cloud risk by connecting findings across assets, identities, network exposure, and sensitive data.

Technical explanation

CSPM focuses on cloud posture and misconfiguration risk, while CNAPP combines posture, workload, identity, vulnerability, and runtime/contextual risk views.

Wiz is known for agentless cloud scanning and contextual prioritization through graph-based relationships.

In a DevSecOps process, findings should feed ownership, remediation tickets, pipeline policy, and exception workflows.

Hands-on example

Example: Wiz detects an internet-exposed VM with a critical CVE, access to a sensitive S3 bucket, and an overprivileged IAM role. That combined context is treated as a high-priority attack path, not just another vulnerability ticket.

What is CSPM, and what does it protect against? [Basic]

Answer

CSPM stands for Cloud Security Posture Management. It continuously assesses cloud environments for risky configurations, policy violations, exposure, identity issues, missing logging, encryption gaps, and compliance drift.

Technical explanation

CSPM protects against preventable cloud mistakes such as public storage buckets, overly permissive security groups, disabled audit logging, and unencrypted resources.

It compares deployed cloud resources against benchmarks, organization policies, and compliance requirements.

CSPM is most effective when integrated with remediation workflows and IaC feedback, not only periodic reporting.

Hands-on example

Hands-on: configure CSPM across AWS accounts. Create policies for no public S3 buckets, CloudTrail enabled, EBS encryption enabled, and no 0.0.0.0/0 SSH exposure. Route violations to owning teams with severity based on exposure and data sensitivity.

What is CNAPP, and what capabilities does it combine? [Basic]

Answer

CNAPP stands for Cloud-Native Application Protection Platform. It combines multiple cloud security capabilities such as CSPM, CWPP/workload protection, CIEM/identity risk, vulnerability management, container/Kubernetes security, data security posture, and sometimes runtime detection.

Technical explanation

The value of CNAPP is correlation across layers: code, image, workload, identity, network, data, and cloud configuration.

Instead of separate scanner backlogs, CNAPP prioritizes risks that are exploitable and business-relevant.

It supports cloud-native environments where infrastructure, identity, containers, and applications change rapidly.

Hands-on example

Example: a CNAPP correlates a vulnerable container image, a public load balancer, a service account with admin privileges, and a database containing regulated data. The combination becomes a critical remediation path, even if each single issue looked moderate alone.

What is the difference between agent-based and agentless cloud security scanning? [Basic]

Answer

Agent-based scanning installs software on workloads to collect deep host or runtime telemetry. Agentless scanning connects through cloud APIs and snapshots/metadata to assess resources without installing agents. Agentless is easier to deploy broadly; agent-based is deeper for runtime behavior.

Technical explanation

Agentless scanning has strong coverage for cloud inventory, misconfiguration, image/package vulnerabilities, identities, and exposure without operational friction.

Agent-based tools can observe process activity, network connections, file changes, and runtime attacks in real time.

Most mature programs use agentless for broad posture and agents or eBPF/runtime sensors for high-risk workloads.

Hands-on example

Example: use agentless scanning to inventory every AWS account in two days and prioritize exposed vulnerable assets. Add runtime agents to payment and identity workloads where process-level detection and response are required.

How does Wiz scan a cloud environment without agents? [Basic]

Answer

At a high level, Wiz scans cloud environments agentlessly by connecting to cloud provider APIs with read-only permissions, inventorying resources, analyzing configurations and identities, and inspecting workload snapshots or metadata without installing software on each host.

Technical explanation

Cloud APIs provide metadata about compute, storage, networking, IAM, Kubernetes, and security services.

Snapshot-based analysis can inspect packages and files in workloads while avoiding agent deployment overhead.

The key requirement is carefully scoped read permissions and secure handling of scan data.

Hands-on example

Hands-on pattern: create a read-only cross-account role for the scanner, onboard AWS organizations, verify that all accounts are covered, then review findings grouped by subscription/account, resource owner, internet exposure, vulnerability severity, and sensitive-data context.

What is the Wiz Security Graph, and why is context important for prioritisation? [Basic]

Answer

The Wiz Security Graph is a contextual relationship model that connects cloud assets, identities, vulnerabilities, network exposure, secrets, Kubernetes objects, and data. Context matters because security teams need to know which finding creates a real attack path, not just which finding has the highest standalone score.

Technical explanation

A vulnerability on an isolated build host is not the same risk as the same vulnerability on an internet-exposed workload with access to sensitive data.

Graph relationships reveal combinations such as public exposure plus privilege plus data access.

This helps reduce noise and focus remediation on paths attackers can actually use.

Hands-on example

Example: graph analysis connects an exposed Kubernetes service to a pod running a vulnerable image, to a service account with secrets access, to a database with PII. That chain becomes a priority remediation even before lower-context critical CVEs.

What is a toxic combination (attack path) in Wiz, and why prioritise it? [Basic]

Answer

A toxic combination, or attack path, is a set of individually related weaknesses that together create a high-risk route to compromise. It should be prioritized because attackers chain weaknesses; they rarely rely on a single isolated finding.

Technical explanation

Examples include internet exposure plus critical CVE plus privileged identity, or leaked secret plus broad cloud permissions plus sensitive data access.

Attack-path prioritization is more useful than flat severity lists because it includes exploitability, reachability, blast radius, and business impact.

Fixing one link in the chain can materially reduce risk even before all findings are remediated.

Hands-on example

Hands-on: if a public EC2 instance has a critical RCE and an IAM role that can read production secrets, immediate actions are to restrict ingress, rotate any exposed credentials, patch or replace the instance, and reduce the role policy to least privilege.

How does Wiz help prioritise which vulnerabilities to fix first? [Basic]

Answer

Wiz helps prioritize vulnerabilities by adding cloud context such as internet exposure, workload criticality, exploitability signals, sensitive data access, privilege level, lateral movement paths, and whether the asset is actually running. This turns vulnerability management from score sorting into risk-based remediation.

Technical explanation

CVSS is useful but insufficient because it does not fully represent an organization's environment.

Context such as public reachability and high privileges can elevate a finding, while isolation and nonproduction status can lower urgency.

Prioritization should produce clear owner, SLA, fix path, and verification evidence.

Hands-on example

Example triage: fix first: KEV-listed RCE on internet-facing workload with production data access. Fix second: high CVSS on internal critical service. Defer with SLA: low exploitability package in a nonrunning image. Accept temporarily: compensating WAF rule plus approved exception.

Why is context (internet exposure, sensitive data, privileges) key to vulnerability prioritisation? [Basic]

Answer

Context is key because vulnerability severity alone does not show whether an attacker can reach the asset or whether compromise would matter. Internet exposure, sensitive data access, and privileges determine likelihood and impact in the real environment.

Technical explanation

Internet exposure increases likelihood because attackers can reach the vulnerable surface directly.

Sensitive data increases business impact because compromise may lead to disclosure or regulatory consequences.

Privileges increase blast radius because the attacker may move laterally, access secrets, or modify cloud resources.

Hands-on example

Example: CVE-A has CVSS 9.8 but exists only in a stopped dev VM. CVE-B has CVSS 7.5 but is on an internet-facing API with admin IAM permissions and access to customer data. CVE-B may be the higher operational priority.

How would you use Wiz findings to drive remediation across teams? [Basic]

Answer

I would use Wiz findings to drive remediation by assigning clear ownership, prioritizing with context, creating tickets with actionable fix guidance, tracking SLAs, and verifying closure through rescans. The process should be integrated with engineering workflows, not handled only in a security dashboard.

Technical explanation

Every finding needs owner mapping from tags, accounts, repositories, clusters, or CMDB metadata.

High-risk attack paths should trigger urgent incidents or expedited tickets; lower-risk findings should enter normal backlog with deadlines.

Remediation evidence should include fix commit, deployment version, rescan status, and exception approval if not fixed.

Hands-on example

Operating model: Wiz -> Jira ticket with asset, owner, severity, attack path, recommended fix, SLA. Team patches IaC/image/app dependency. CI rebuilds and deploys. Wiz rescan confirms closure. Weekly review tracks overdue criticals and exception aging.

How does your AI-assisted remediation tool relate to scanners like Wiz? [Basic]

Answer

An AI-assisted remediation tool should complement scanners like Wiz, not replace them. Wiz identifies and prioritizes risk; the remediation assistant can explain root cause, propose code/IaC changes, generate pull requests, summarize blast radius, and guide owners through safe fixes.

Technical explanation

Scanners produce findings and context; remediation tools reduce mean time to remediate by translating findings into concrete changes.

The AI output must be reviewed, tested, and validated through CI/security rescans before production rollout.

The safest pattern is human-in-the-loop automation with guardrails, not autonomous security changes to production.

Hands-on example

Example: Wiz reports a public S3 bucket and permissive bucket policy. The remediation assistant maps the bucket to a Terraform module, proposes a PR adding block_public_access and restricted policy, includes risk explanation, and waits for the owner and CI checks before merge.

What is the difference between a CVE, a CVSS score, and exploitability (EPSS/KEV)? [Basic]

Answer

A CVE is an identifier for a publicly known vulnerability. CVSS is a severity scoring system that estimates technical impact. EPSS estimates likelihood of exploitation in the wild, and CISA KEV lists vulnerabilities known to be actively exploited. Together they help prioritize, but none should be used alone.

Technical explanation

CVE answers 'what vulnerability is this?'.

CVSS answers 'how severe could it be technically?'.

EPSS/KEV answer 'how likely or confirmed is exploitation?', which is often more useful for remediation urgency.

Hands-on example

Example: triage a CVE by checking CVSS, EPSS percentile/probability, KEV status, internet exposure, asset criticality, compensating controls, and available patch. A KEV-listed vulnerability on an exposed production system should get emergency priority.

What is CVSS, and what are its limitations for prioritisation? [Intermediate]

Answer

CVSS is the Common Vulnerability Scoring System. It provides a standardized severity score based on exploitability and impact characteristics, but its limitation is that it does not fully account for local context such as asset exposure, business value, compensating controls, or current attacker activity.

Technical explanation

CVSS is useful for common language and initial severity buckets.

It can over-prioritize theoretically severe issues that are unreachable in your environment.

It can under-prioritize lower-score issues that are exposed, actively exploited, or chained with privileges and sensitive data.

Hands-on example

Hands-on: do not sort a backlog only by CVSS. Build a risk score that also includes KEV, EPSS, public exposure, running status, workload criticality, data sensitivity, exploit maturity, and fix availability.

What is EPSS, and how does it improve on raw CVSS? [Intermediate]

Answer

EPSS is the Exploit Prediction Scoring System. It estimates the probability that a CVE will be exploited in the wild in the near future. It improves prioritization because it adds likelihood-of-exploitation signals instead of relying only on theoretical severity.

Technical explanation

CVSS describes severity if exploited; EPSS helps estimate how likely exploitation is.

EPSS is useful for prioritizing large vulnerability backlogs where teams cannot patch everything immediately.

It should be combined with asset exposure and business impact because likely exploitation on a low-value isolated asset may still be lower priority than a sensitive exposed system.

Hands-on example

Example: create a vulnerability rule: emergency if KEV=true, or EPSS above 0.7 and internet_exposed=true, or CVSS critical plus sensitive_data=true. Otherwise assign standard SLAs based on contextual risk.

What is the CISA KEV catalog, and how would you use it? [Intermediate]

Answer

The CISA Known Exploited Vulnerabilities catalog is a list of vulnerabilities that CISA identifies as known to be exploited in the wild. I use it as a high-confidence signal for urgent remediation, especially for internet-facing or business-critical assets.

Technical explanation

KEV status means exploitation is not theoretical; defenders should treat it as a strong prioritization signal.

Federal agencies have required timelines for KEV remediation, and private organizations often use it as a best-practice input.

KEV should feed asset inventory, vulnerability scanners, ticketing, patch SLAs, and executive risk reporting.

Hands-on example

Hands-on: ingest the KEV JSON feed daily. Join it with vulnerability scan results by CVE. If a KEV appears on an exposed production asset, create a P1 ticket, notify the service owner, apply patch or mitigation, and verify by rescan.

How do you reduce the noise of thousands of vulnerability findings? [Intermediate]

Answer

I reduce vulnerability noise by deduplicating findings, grouping by fix, adding ownership and environment context, prioritizing exploitability and exposure, suppressing nonrunning or unreachable packages where justified, and tracking only actionable remediation units.

Technical explanation

A thousand CVEs across identical base images may require one base-image rebuild, not a thousand independent tickets.

Context such as KEV, EPSS, internet exposure, runtime status, and sensitive data helps separate urgent risk from informational noise.

Exceptions should have owners, expiration dates, compensating controls, and evidence, otherwise they become hidden risk.

Hands-on example

Example: group all Alpine base-image vulnerabilities by image digest and owning platform team. Rebuild the golden base image, trigger downstream image rebuilds, rescan, then close all duplicate findings tied to the old digest.

How do you decide whether to patch, mitigate, or accept the risk of a vulnerability? [Intermediate]

Answer

I decide whether to patch, mitigate, or accept risk based on exploitability, exposure, business impact, patch availability, operational risk of the fix, compensating controls, and compliance requirements. Patching is preferred, mitigation is temporary, and acceptance requires documented approval.

Technical explanation

Patch when a safe fix is available and the vulnerability is reachable or high-impact.

Mitigate when immediate patching is risky or unavailable; examples include disabling a feature, blocking ingress, WAF rule, network segmentation, or reducing privileges.

Accept only when the residual risk is understood, approved by the right authority, time-bounded, and reviewed regularly.

Hands-on example

Example: a critical RCE affects a library used by an internet-facing API. Patch immediately if tests pass. If no patch exists, block the vulnerable endpoint at the gateway, restrict network access, increase detection, and set an exception expiry for the vendor fix.

What is vulnerability management as a lifecycle (discover, prioritise, remediate, verify)? [Intermediate]

Answer

Vulnerability management is a lifecycle: discover assets and findings, normalize/deduplicate them, prioritize by risk, assign ownership, remediate or mitigate, verify closure, and report trends. It is continuous because new vulnerabilities appear after software is already deployed.

Technical explanation

Discovery includes code dependencies, images, hosts, cloud resources, Kubernetes clusters, and SaaS assets.

Prioritization should combine severity, exploitability, exposure, asset criticality, and compliance obligations.

Verification is essential; closing a ticket is not enough unless a rescan or evidence proves the vulnerability is gone or controlled.

Hands-on example

Lifecycle example: scanner finds CVE -> platform deduplicates by image digest -> risk engine enriches with KEV/EPSS/exposure -> Jira ticket assigned -> owner patches dependency -> CI rebuilds -> deployment rolls out -> scanner verifies -> dashboard updates MTTR.

How would you remediate a critical CVE in a third-party Java dependency (as you did at Intuit)? [Intermediate]

Answer

For a critical CVE in a third-party Java dependency, I would identify whether it is direct or transitive, confirm exploitability in our service, upgrade the dependency or parent BOM, run regression tests, rebuild the artifact/image, deploy safely, and verify with SCA and runtime scans.

Technical explanation

The first step is dependency mapping with mvn dependency:tree or Gradle dependencies to find the exact path that introduced the vulnerable jar.

If a direct upgrade is safe, pin the fixed version. If transitive, upgrade the parent library/BOM or use dependency management with care.

Verification should include build success, unit/integration tests, SBOM regeneration, vulnerability rescan, and production rollout monitoring.

Hands-on example

Hands-on:

mvn dependency:tree -Dincludes=org.example:vulnerable-lib

# Update pom.xml dependencyManagement to a fixed version

mvn -B clean verify

syft packages dir:. -o cyclonedx-json > sbom.json

trivy fs --severity CRITICAL,HIGH .

Then deploy canary, monitor errors/latency, and close the ticket only after the scanner no longer reports the CVE.

What is a transitive dependency, and why does it complicate patching? [Intermediate]

Answer

A transitive dependency is a dependency pulled in by another dependency rather than declared directly by the application. It complicates patching because the vulnerable package may be several levels deep, and overriding it can break the library that expects a specific version.

Technical explanation

Direct dependencies are controlled in the application's manifest; transitive dependencies are resolved by the package manager.

Fixing transitives may require upgrading the parent dependency, changing a BOM, excluding a module, or adding a safe override.

Every override needs compatibility testing because dependency graphs can introduce runtime conflicts.

Hands-on example

Example: spring-boot-starter pulls a vulnerable Jackson version. The safer fix is often upgrading Spring Boot's BOM. If not possible, override jackson-databind to the fixed version, run integration tests, and inspect the resolved tree before release.

What is the difference between patching the base image and patching the application dependency? [Intermediate]

Answer

Patching the base image fixes vulnerabilities in the operating-system layer and common runtime packages. Patching the application dependency fixes libraries packaged by the application, such as Java jars, npm modules, Python wheels, or Go modules. Both are required because containers include multiple layers of software.

Technical explanation

Base-image vulnerabilities come from packages installed in the image, for example OpenSSL, glibc, curl, or OS libraries.

Application dependency vulnerabilities come from the app build output, such as Log4j or lodash.

Image scanning should show which layer introduced the vulnerable component so the correct owner can fix it.

Hands-on example

Example: CVE in openssl -> update the base image tag and rebuild. CVE in log4j-core -> update pom.xml or dependencyManagement and rebuild. After both fixes, push a new image digest and deploy that digest, not just a mutable tag.

What is container image scanning, and what tools do it (Trivy, Grype, Wiz)? [Intermediate]

Answer

Container image scanning analyzes image layers for vulnerable OS packages, application dependencies, secrets, malware indicators, and misconfigurations. Tools include Trivy, Grype, Snyk, Anchore, Clair, registry scanners, and CNAPP tools such as Wiz.

Technical explanation

Scanning should happen at build time, registry time, and periodically after deployment because new CVEs appear for existing images.

Findings should map to image digest, layer, package, fixed version, and owner.

Runtime context matters: a vulnerability in a deployed internet-facing image is higher priority than the same issue in an unused image.

Hands-on example

Hands-on:

docker build -t registry.example.com/checkout:${GIT_SHA} .

trivy image --severity HIGH,CRITICAL --exit-code 1 registry.example.com/checkout:${GIT_SHA}

grype registry.example.com/checkout:${GIT_SHA}

If clean, sign and push the immutable digest.

How do you prevent vulnerable images from being deployed (admission control, registry policy)? [Intermediate]

Answer

I prevent vulnerable images from being deployed by combining CI scan gates, registry policies, image signing, and Kubernetes admission control. The cluster should admit only trusted images that meet vulnerability and policy thresholds.

Technical explanation

CI should fail high-risk images before they reach the registry.

The registry should prevent promotion of images that fail policy or lack required metadata/signatures.

Admission controllers such as Kyverno, OPA Gatekeeper, or Sigstore policy-controller can enforce allowed registries, signatures, and vulnerability attestations at deploy time.

Hands-on example

Example Kyverno-style policy outcome: deny Pods if image registry is not approved, image tag is latest, signature verification fails, or the attached scan attestation contains critical KEV findings. Allow emergency override only through audited break-glass.

What is an SBOM, and how do you generate and use one? [Intermediate]

Answer

An SBOM, or Software Bill of Materials, is an inventory of the components inside software: packages, versions, suppliers, hashes, and relationships. I generate SBOMs during build and use them for vulnerability matching, license review, incident response, and supply-chain evidence.

Technical explanation

SBOMs make it faster to answer whether a product contains a vulnerable component after a new CVE is disclosed.

They should be tied to an immutable artifact digest so the inventory matches the exact build deployed.

SBOMs are most useful when stored, searchable, and linked with vulnerability intelligence and ownership data.

Hands-on example

Hands-on:

syft packages registry.example.com/checkout@sha256:abc -o cyclonedx-json > sbom.json

grype sbom:sbom.json

cosign attest --predicate sbom.json --type cyclonedx registry.example.com/checkout@sha256:abc

Then store the SBOM with the release record.

What is the difference between SPDX and CycloneDX SBOM formats? [Intermediate]

Answer

SPDX and CycloneDX are two common SBOM formats. SPDX is widely used for license and package metadata, while CycloneDX is security-focused and commonly used for vulnerability management, dependency relationships, services, and security metadata. Both can be valid depending on ecosystem and tooling.

Technical explanation

SPDX originated strongly around software package data and license compliance.

CycloneDX has strong adoption in application security and supports rich security use cases such as vulnerabilities, services, and compositions.

The best format is the one your scanners, artifact stores, customers, and compliance workflows can consume reliably.

Hands-on example

Example: generate CycloneDX for CI vulnerability workflows and customer security portals. Generate SPDX when legal/license compliance tooling requires it. Store both against the same image digest if the organization has both security and license-audit consumers.

What is supply-chain security, and what is SLSA? [Intermediate]

Answer

Supply-chain security protects the integrity of software from source code through build, dependencies, artifacts, deployment, and runtime. SLSA is a framework that defines increasing levels of build integrity, provenance, and tamper resistance for software artifacts.

Technical explanation

Supply-chain controls include branch protection, dependency review, reproducible or isolated builds, artifact signing, provenance, SBOMs, and deployment verification.

SLSA helps teams mature from basic provenance generation to stronger build isolation and tamper-resistant provenance.

The goal is to prove what was built, from which source, by which builder, and whether it was modified after the build.

Hands-on example

Hands-on design: protected main branch -> CI builds in an isolated runner -> generate SBOM and SLSA provenance -> sign image digest -> store attestations -> Kubernetes admission verifies signature/provenance before allowing deploy.

What is artifact signing, and what is Sigstore/cosign? [Intermediate]

Answer

Artifact signing cryptographically signs a build artifact or image digest so consumers can verify integrity and publisher identity. Sigstore and cosign make this practical for containers by supporting keyless signing, transparency logs, and signature verification in CI/CD and Kubernetes admission.

Technical explanation

Signing mutable tags is weaker; sign immutable digests so the signature binds to exact content.

Keyless signing can bind an artifact to an OIDC identity from CI, reducing long-lived signing-key risk.

Verification should happen before promotion and again before deployment.

Hands-on example

Hands-on:

cosign sign --keyless registry.example.com/checkout@sha256:abc

cosign verify --certificate-identity-regexp 'https://github.com/org/repo/.github/workflows/.*' \

--certificate-oidc-issuer https://token.actions.githubusercontent.com \

registry.example.com/checkout@sha256:abc

How do you verify the provenance of a build artifact? [Intermediate]

Answer

To verify artifact provenance, I check the artifact digest, the signed provenance attestation, the builder identity, source repository, commit SHA, workflow identity, build parameters, and whether the attestation was issued by a trusted CI system. Then I enforce those checks in promotion or admission policy.

Technical explanation

Provenance should answer where, when, how, and from which source the artifact was built.

Verification should use immutable digests and trusted identities, not tags or untrusted metadata.

Policies should reject artifacts built from unprotected branches, unknown builders, missing provenance, or mismatched source repositories.

Hands-on example

Example: verify that image digest sha256:abc was built by GitHub Actions workflow release.yml from org/checkout at commit 1234 on a protected tag. If the provenance source or builder identity does not match policy, block promotion to production.

What is secrets scanning, and how do you stop secrets being committed to Git? [Intermediate]

Answer

Secrets scanning detects credentials accidentally committed to source code, images, logs, or configuration files. To stop secrets from reaching Git, I use pre-commit hooks, server-side scanning, PR checks, developer education, and patterns that avoid static credentials in the first place.

Technical explanation

Tools include gitleaks, trufflehog, GitHub secret scanning, GitLab secret detection, and enterprise DLP controls.

Pre-commit checks give fastest feedback, but server-side scanning is still needed because local hooks can be bypassed.

The best prevention is workload identity/OIDC and secret managers so developers do not handle long-lived keys.

Hands-on example

Hands-on:

gitleaks detect --source . --redact --exit-code 1

Add it to pre-commit and CI. Configure branch protection so the PR cannot merge if a secret is detected. For cloud access, use GitHub OIDC to assume a role instead of storing AWS keys.

What do you do if a secret has already been committed and pushed? [Intermediate]

Answer

If a secret has been committed and pushed, treat it as compromised. Immediately revoke or rotate it, assess access logs for misuse, remove it from active code/config, purge history only as cleanup, and notify affected owners. Deleting the commit is not enough.

Technical explanation

Assume the secret may have been copied by clones, forks, CI logs, caches, or dependency mirrors.

Rotation should happen before or at least alongside history cleanup so the exposed value is no longer valid.

After remediation, add detection rules and preventive controls to avoid recurrence.

Hands-on example

Incident steps: 1) Identify secret type and scope. 2) Revoke/rotate in the source system. 3) Search logs for use after exposure. 4) Remove secret from repo and CI variables. 5) Use git-filter-repo/BFG if needed. 6) Add gitleaks and secret-manager integration.

What is the principle of least privilege, and how do you apply it in CI/CD? [Intermediate]

Answer

Least privilege means every user, workload, and pipeline gets only the permissions required for its job, for only the time needed. In CI/CD, that means scoped tokens, environment-specific roles, approval gates for production, and no broad admin credentials in build systems.

Technical explanation

Pipeline identities should be separated by repository, environment, and action such as build, deploy-dev, deploy-prod, or rollback.

Secrets should be scoped and short-lived, ideally issued through OIDC or workload identity.

Permissions should be reviewed through IAM analysis, audit logs, and automated policy checks.

Hands-on example

Example: a PR build role can read dependencies and push test artifacts but cannot deploy. A staging deploy role can update only the staging namespace. A production deploy role requires protected branch, signed artifact, approval, and a narrowly scoped cloud/IAM role.

How do you secure pipeline credentials and avoid long-lived secrets? [Intermediate]

Answer

I secure pipeline credentials by avoiding long-lived secrets, using OIDC or workload identity, scoping credentials to repository and environment, storing unavoidable secrets in a managed secret store, masking logs, rotating regularly, and auditing access.

Technical explanation

Static cloud keys in CI are high risk because they can be leaked from logs, forks, compromised runners, or repository settings.

OIDC allows the CI system to exchange a short-lived identity token for cloud credentials with policy conditions.

Self-hosted runners need additional hardening: isolation, patching, ephemeral runners for untrusted code, and restricted network access.

Hands-on example

Hands-on: configure GitHub Actions OIDC to assume an AWS IAM role only when repo=org/app, branch=main, workflow=deploy.yml, and environment=prod. Remove stored AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY from repository secrets.

What is OIDC-based authentication from CI to a cloud provider, and why is it safer than keys? [Intermediate]

Answer

OIDC-based authentication from CI to a cloud provider lets a CI job exchange a short-lived identity token for temporary cloud credentials. It is safer than static keys because there is no long-lived secret stored in the CI platform, and the cloud role can validate repository, branch, workflow, and environment claims.

Technical explanation

The cloud provider trusts the CI OIDC issuer and maps token claims to an IAM role or workload identity.

Credentials are issued just-in-time and expire automatically.

Policy conditions can restrict which repos, branches, tags, or environments can assume production roles.

Hands-on example

AWS example: create an IAM OIDC provider for token.actions.githubusercontent.com, create a role with sts:AssumeRoleWithWebIdentity, restrict sub to repo:org/app:ref:refs/heads/main, then use aws-actions/configure-aws-credentials in the deploy workflow.

What is workload identity, and how does it remove static credentials? [Intermediate]

Answer

Workload identity lets an application or workload authenticate to cloud services using its runtime identity rather than a stored static credential. It removes static credentials by binding Kubernetes service accounts, VM identities, or CI identities to cloud IAM roles.

Technical explanation

In Kubernetes, workload identity often maps a service account to a cloud IAM role through projected tokens or metadata integration.

Credentials are short-lived and issued based on trusted identity claims.

It reduces secret sprawl and makes access easier to audit and revoke centrally.

Hands-on example

Example: bind Kubernetes service account payments-api to an AWS IAM role through IRSA/EKS Pod Identity or to a GCP service account through Workload Identity. The pod calls the cloud API without an embedded access key, and the role policy grants only required actions.

What is policy-as-code, and what tools implement it (OPA, Sentinel, Kyverno)? [Intermediate]

Answer

Policy-as-code means security, compliance, and operational rules are written as versioned, testable code and enforced automatically. Tools include OPA/Rego, HashiCorp Sentinel, Kyverno, Conftest, Checkov, tfsec, and cloud-native policy engines.

Technical explanation

Policy-as-code gives repeatable enforcement across CI, IaC review, admission control, and audits.

Policies should be version-controlled, peer-reviewed, tested, and released like application code.

Good policies include clear failure messages and remediation guidance to reduce developer friction.

Hands-on example

Example policy lifecycle: write a rule that denies public S3 buckets, add unit tests for allowed/denied Terraform examples, run it in PR with conftest or Checkov, enforce it in Terraform Cloud/Sentinel, and verify deployed cloud resources with CSPM.

What is OPA, and what is the Rego language used for? [Intermediate]

Answer

OPA, the Open Policy Agent, is a general-purpose policy engine. Rego is OPA's policy language, used to express rules over structured input such as Kubernetes admission requests, Terraform plans, API authorization decisions, or CI metadata.

Technical explanation

OPA separates policy decision logic from application or platform code.

Rego evaluates input JSON and data documents to return decisions such as allow, deny, or violations.

OPA can run in CI, as a sidecar, as an admission controller, or as part of an authorization service.

Hands-on example

Simple Rego pattern:

package kubernetes.admission

deny[msg] {

input.request.kind.kind == "Pod"

container := input.request.object.spec.containers[_]

container.securityContext.privileged == true

msg := "Privileged containers are not allowed"

}

How would you enforce a policy like 'no public S3 buckets' as code? [Intermediate]

Answer

To enforce 'no public S3 buckets' as code, I would check Terraform or CloudFormation before deployment and also monitor deployed AWS resources continuously. The policy should deny public ACLs, public bucket policies, and missing block-public-access settings.

Technical explanation

CI/IaC enforcement prevents new violations before they reach production.

Runtime CSPM detects drift or changes made outside IaC.

Exceptions should be explicit, time-limited, approved, and limited to reviewed public assets.

Hands-on example

Conftest/Rego sketch:

deny[msg] {

resource := input.resource_changes[_]

resource.type == "aws_s3_bucket_public_access_block"

resource.change.after.block_public_acls == false

msg := sprintf("S3 bucket %s must block public ACLs", [resource.name])

}

Run against terraform show -json plan.out before apply.

What is admission control in Kubernetes, and how do OPA Gatekeeper or Kyverno use it? [Intermediate]

Answer

Kubernetes admission control intercepts API requests before objects are persisted. OPA Gatekeeper and Kyverno use admission webhooks to validate or mutate resources so policies such as no privileged containers, required labels, allowed registries, and resource limits are enforced centrally.

Technical explanation

Admission control is powerful because it blocks bad configuration at the cluster API boundary.

Gatekeeper uses OPA/Rego with ConstraintTemplates and Constraints; Kyverno uses Kubernetes-native YAML policies.

Policies should be tested in audit mode before enforce mode to avoid breaking teams unexpectedly.

Hands-on example

Hands-on rollout: install Gatekeeper, deploy constraints in dry-run/audit mode for one week, review violations, fix common templates, then enforce restricted policies for production namespaces while allowing controlled exemptions for platform namespaces.

What is the difference between a validating and a mutating admission webhook? [Intermediate]

Answer

A validating admission webhook approves or denies an API request. A mutating admission webhook modifies the object before validation and persistence. Mutating webhooks set defaults or inject config; validating webhooks enforce rules.

Technical explanation

Mutating examples include injecting a sidecar, adding labels, setting default resource requests, or adding securityContext defaults.

Validating examples include denying privileged pods, public LoadBalancers, missing owners, or untrusted images.

Mutation should be predictable and validation should produce clear messages so developers understand how to fix violations.

Hands-on example

Example: a mutating webhook adds runAsNonRoot: true and seccompProfile: RuntimeDefault when missing. A validating webhook denies the pod if it still requests privileged=true or hostPath volumes in a restricted namespace.

What is 'secure by default', and give an example of a secure-by-default pattern? [Intermediate]

Answer

Secure by default means the default path is safe without requiring every developer to be a security expert. In practice, platform templates, CI/CD modules, base images, IAM roles, and Kubernetes namespaces should start with least privilege, encryption, logging, and restrictive network access.

Technical explanation

Defaults matter because engineers usually follow the fastest path provided by the platform.

Secure defaults reduce the number of policy violations and exceptions that security teams must chase.

Developers can still request exceptions, but exceptions should be explicit and reviewed.

Hands-on example

Example: a golden Kubernetes deployment template sets runAsNonRoot, readOnlyRootFilesystem, dropped capabilities, resource limits, liveness/readiness probes, no host networking, restricted NetworkPolicy, and mandatory owner labels. Teams inherit safety by using the template.

What is Zero Trust, and how does it differ from perimeter-based security? [Intermediate]

Answer

Zero Trust assumes no network location is automatically trusted. Every request should be authenticated, authorized, encrypted, and continuously evaluated based on identity, device/workload posture, context, and least privilege. This differs from perimeter security, which trusts traffic once it is inside the network.

Technical explanation

Perimeter models fail when attackers compromise internal credentials, VPNs, workloads, or lateral movement paths.

Zero Trust emphasizes identity-aware access, mTLS, strong authorization, segmentation, continuous monitoring, and explicit policy.

It is a security architecture direction, not a single product.

Hands-on example

Hands-on: for services, enable mTLS through a service mesh, authorize service-to-service calls with identities, restrict Kubernetes NetworkPolicies, use short-lived workload credentials, and log every privileged action for audit and anomaly detection.

How do you apply least privilege to service-to-service communication (e.g., with Istio mTLS)? [Intermediate]

Answer

For service-to-service least privilege, I authenticate workloads with strong identities, encrypt traffic with mTLS, authorize only required service pairs and methods, and restrict network paths. With Istio, that means PeerAuthentication, AuthorizationPolicy, and optionally NetworkPolicy together.

Technical explanation

mTLS proves workload identity and encrypts traffic, but authorization policies decide who can call what.

Policies should be namespace/service/method scoped rather than allowing all mesh traffic.

NetworkPolicy still matters because it limits traffic at the network layer even if application policy is misconfigured.

Hands-on example

Istio example: enable STRICT PeerAuthentication in the namespace. Create an AuthorizationPolicy allowing checkout service account to call payment POST /authorize only. Deny all other callers by default, then verify with curl from allowed and disallowed pods.

What is secrets management, and how does HashiCorp Vault work at a high level? [Intermediate]

Answer

Secrets management is the controlled storage, access, rotation, and audit of sensitive values such as passwords, API keys, certificates, and tokens. HashiCorp Vault works by authenticating clients, authorizing them through policies, and serving secrets from engines such as KV, database, PKI, transit, or cloud engines.

Technical explanation

Vault centralizes access control and audit logging for secrets.

Clients authenticate using methods such as Kubernetes auth, AppRole, OIDC, or cloud IAM.

Policies define which paths a client can read, write, or generate, and leases control secret lifetime.

Hands-on example

Hands-on: enable Kubernetes auth, bind service account payments-api to Vault role payments, allow it to read only secret/data/payments/config, and deliver the secret through Vault Agent Injector or the Vault Secrets Operator rather than storing it in Git.

What is dynamic secrets generation in Vault, and why is it powerful? [Intermediate]

Answer

Dynamic secrets in Vault are generated on demand and have leases, expiration, and automatic revocation. They are powerful because each workload can receive unique short-lived credentials instead of sharing a static password across applications and environments.

Technical explanation

Database dynamic secrets create temporary database users with scoped permissions.

If a credential leaks, its lifetime and permission scope are limited.

Dynamic secrets improve auditability because each issued credential can be tied back to a workload and lease.

Hands-on example

Example: configure Vault's database secrets engine for PostgreSQL. The payments service requests database/creds/payments-readonly, receives a unique username/password valid for one hour, uses it for connections, and Vault revokes it automatically when the lease expires.

What is the difference between static and dynamic secrets? [Intermediate]

Answer

A static secret is created in advance and remains valid until manually changed or rotated. A dynamic secret is generated when requested, has a lease, and can be revoked automatically. Dynamic secrets are safer for systems that support just-in-time credential creation.

Technical explanation

Static secrets are simpler but create long-lived blast radius and rotation burden.

Dynamic secrets reduce credential reuse and support automated expiry.

Not every integration supports dynamic credentials, so teams often combine static secrets with rotation and dynamic secrets where possible.

Hands-on example

Example: a third-party API key may be static and stored in Vault KV with rotation reminders. A PostgreSQL credential can be dynamic through Vault's database engine, issued per workload for 1 hour and revoked after use.

How does secret rotation work, and why does it matter? [Advanced]

Answer

Secret rotation replaces an old credential with a new one and ensures all consumers switch safely. It matters because leaked or overused credentials become less valuable if they expire quickly and are rotated regularly.

Technical explanation

Rotation must account for consumers, deployment timing, connection pools, caches, and rollback paths.

Safe rotation often uses overlapping validity windows: create new, deploy consumers, verify, then revoke old.

Automated rotation should include audit logging, failure alerts, and emergency rotation procedures.

Hands-on example

Hands-on plan: for a database password, create a new credential, update Vault, restart or reload apps through rolling deployment, monitor auth failures, then revoke the old credential. For dynamic Vault database secrets, rely on leases and configure maximum TTLs to force renewal.

What is encryption at rest versus in transit, and how do you ensure both? [Advanced]

Answer

Encryption at rest protects stored data such as disks, databases, backups, and object storage. Encryption in transit protects data moving across networks using protocols such as TLS or mTLS. A secure system needs both because data is exposed in different states.

Technical explanation

At-rest encryption usually uses KMS-managed keys, database TDE, disk encryption, backup encryption, or object-store encryption.

In-transit encryption uses TLS for client-server traffic and mTLS when both sides authenticate each other.

Key management, certificate lifecycle, and access control are as important as the encryption algorithm.

Hands-on example

Checklist: enable S3 SSE-KMS, EBS encryption by default, RDS encryption, encrypted backups, TLS-only listeners, HSTS for web apps, internal mTLS for service mesh traffic, and automated certificate rotation through cert-manager or the cloud provider.

What is the difference between symmetric and asymmetric encryption? [Advanced]

Answer

Symmetric encryption uses the same key to encrypt and decrypt data. Asymmetric encryption uses a public/private key pair, where one key encrypts or verifies and the other decrypts or signs. Symmetric encryption is faster; asymmetric encryption helps with identity, key exchange, and signatures.

Technical explanation

AES is a common symmetric algorithm for bulk data encryption.

RSA and elliptic-curve algorithms are common asymmetric approaches for key exchange, certificates, and digital signatures.

TLS uses asymmetric cryptography to authenticate and negotiate keys, then symmetric cryptography for efficient data transfer.

Hands-on example

Example: an application encrypts a file with a fast symmetric data key. The data key is encrypted with a KMS-managed asymmetric or envelope key. To read the file, the app asks KMS to decrypt the data key, then uses the symmetric key locally.

What is TLS, and what happens during a TLS handshake at a high level? [Advanced]

Answer

TLS secures network communication by authenticating the server, negotiating encryption parameters, deriving shared session keys, and encrypting application data. At a high level, the client validates the certificate chain, both sides agree on cryptographic parameters, and then use symmetric keys for the session.

Technical explanation

The server certificate proves the service identity if it chains to a trusted CA and matches the hostname.

Modern TLS uses ephemeral key exchange to provide forward secrecy.

After the handshake, application data is encrypted and integrity-protected against eavesdropping and tampering.

Hands-on example

Hands-on: run openssl s_client -connect api.example.com:443 -servername api.example.com to inspect the certificate chain, protocol, cipher, expiry, and hostname. In production, alert before certificate expiry and disable old protocols/ciphers.

What is mutual TLS, and where would you use it? [Advanced]

Answer

Mutual TLS means both client and server present certificates and authenticate each other. I use it for service-to-service communication, internal APIs, high-trust admin endpoints, and zero-trust environments where server-only TLS is not enough.

Technical explanation

Normal TLS authenticates the server to the client; mTLS authenticates both directions.

mTLS gives strong workload identity when certificates are issued and rotated safely.

A service mesh such as Istio can automate certificate issuance, rotation, and policy enforcement for mTLS between pods.

Hands-on example

Example: enable Istio STRICT mTLS for the payments namespace. Calls from checkout to payments carry a client certificate representing the checkout service account. Payments accepts only authorized identities and rejects plaintext or unknown workloads.

What is certificate management, and how do tools like cert-manager help? [Advanced]

Answer

Certificate management covers issuing, renewing, rotating, revoking, and monitoring certificates. Tools like cert-manager automate certificate requests and renewal in Kubernetes by integrating with issuers such as Let's Encrypt, internal CAs, Vault, or cloud certificate services.

Technical explanation

Manual certificate management often causes outages from expired certificates.

cert-manager uses Kubernetes custom resources such as Certificate and Issuer/ClusterIssuer to manage lifecycle.

Good certificate management includes expiry alerts, short-lived certificates where feasible, and clear ownership of internal CAs.

Hands-on example

Hands-on: install cert-manager, create a ClusterIssuer for ACME or Vault PKI, create a Certificate resource for api.example.com, mount the resulting TLS secret into an Ingress, and monitor cert-manager events plus certificate expiration metrics.

What is the difference between authentication and authorization? [Advanced]

Answer

Authentication verifies who or what the caller is. Authorization decides what that authenticated identity is allowed to do. Authentication answers 'are you really this identity?'; authorization answers 'can this identity perform this action on this resource?'.

Technical explanation

Examples of authentication include passwords, MFA, client certificates, OIDC tokens, or workload identity tokens.

Examples of authorization include RBAC permissions, IAM policies, ABAC conditions, and application-level access checks.

Strong systems need both; a valid identity with excessive permissions is still a security risk.

Hands-on example

Example: a developer authenticates to Kubernetes with SSO. Kubernetes RBAC then authorizes whether they can get pods in dev, deploy to staging, or delete secrets in production. The user may authenticate successfully but still be denied a production action.

What is RBAC versus ABAC? [Advanced]

Answer

RBAC grants permissions based on roles assigned to users or service accounts. ABAC grants permissions based on attributes such as user department, resource classification, environment, request context, time, or device posture. RBAC is simpler; ABAC is more dynamic and context-aware.

Technical explanation

RBAC works well for stable job functions such as developer, viewer, deployer, or admin.

ABAC is useful when access depends on attributes such as data sensitivity, project ownership, or request source.

Many real systems combine both: role-based entitlements plus attribute conditions.

Hands-on example

Example: RBAC gives the payments-team-deployer role permission to deploy to the payments namespace. ABAC adds a condition that production deploy is allowed only from a protected branch, through approved CI, during a change window, and by the owning team.

How do you secure container runtime (seccomp, AppArmor, capabilities, read-only root)? [Advanced]

Answer

I secure container runtime by reducing privileges and attack surface: run as non-root, drop Linux capabilities, use seccomp/AppArmor/SELinux profiles, set read-only root filesystems, avoid privileged mode, restrict host namespaces and hostPath mounts, and keep images minimal and patched.

Technical explanation

Runtime security starts in the Dockerfile and Kubernetes securityContext.

seccomp limits syscalls, AppArmor/SELinux enforce mandatory access controls, and capabilities control fine-grained root privileges.

Read-only filesystems and no-root execution make post-exploitation harder.

Hands-on example

Kubernetes securityContext:

securityContext:

runAsNonRoot: true

runAsUser: 10001

allowPrivilegeEscalation: false

readOnlyRootFilesystem: true

capabilities:

drop: ["ALL"]

seccompProfile:

type: RuntimeDefault

What are Linux capabilities, and why drop them in containers? [Advanced]

Answer

Linux capabilities split root privileges into smaller permission units such as NET_ADMIN, SYS_ADMIN, CHOWN, and NET_BIND_SERVICE. Containers should drop unnecessary capabilities because a process running with extra capabilities can escape intended restrictions or increase impact after compromise.

Technical explanation

By default, containers may receive capabilities they do not need.

Dropping ALL and adding back only what is required follows least privilege.

Some capabilities, especially SYS_ADMIN, are broad and should be avoided unless there is a strong reason.

Hands-on example

Example: a web service does not need NET_ADMIN or SYS_PTRACE. Configure capabilities.drop: ['ALL']. If it must bind to port 80, prefer a higher container port mapped by the Service, or add only NET_BIND_SERVICE if absolutely required.

Why should containers not run as root, and what does running rootless achieve? [Advanced]

Answer

Containers should not run as root because a container breakout, writable host mount, or runtime bug can give an attacker more leverage. Running rootless or as a non-root UID reduces privilege inside the container and reduces blast radius if the application is compromised.

Technical explanation

Root inside a container is not the same as root on the host, but it is still more dangerous than a non-root process.

Rootless containers reduce reliance on privileged daemon behavior and host-level root permissions.

Non-root images require correct file ownership, writable paths for temporary data, and compatible application behavior.

Hands-on example

Dockerfile pattern:

RUN addgroup -g 10001 app && adduser -D -u 10001 -G app app

RUN chown -R app:app /app

USER 10001

In Kubernetes, enforce runAsNonRoot and reject images that require root in restricted namespaces.

What are Kubernetes Pod Security Standards (privileged, baseline, restricted)? [Advanced]

Answer

Kubernetes Pod Security Standards define three policy levels: privileged, baseline, and restricted. Privileged is highly permissive, baseline prevents known privilege escalation patterns while allowing common workloads, and restricted applies stronger hardening suitable for security-sensitive workloads.

Technical explanation

Pod Security Admission can enforce these standards at namespace level using labels for enforce, audit, and warn modes.

Baseline is often a practical minimum for general workloads.

Restricted requires controls such as non-root execution, seccomp, dropped capabilities, and no privilege escalation.

Hands-on example

Hands-on:

kubectl label ns prod pod-security.kubernetes.io/enforce=restricted

kubectl label ns prod pod-security.kubernetes.io/audit=restricted

kubectl label ns prod pod-security.kubernetes.io/warn=restricted

Then test a privileged pod and confirm the API server rejects it.

How do you prevent privilege escalation in a Kubernetes cluster? [Advanced]

Answer

I prevent privilege escalation in Kubernetes with least-privilege RBAC, Pod Security Admission restricted mode, no privileged containers, no hostPath/hostNetwork unless approved, dropped capabilities, allowPrivilegeEscalation=false, image/admission policies, secret access controls, and network segmentation.

Technical explanation

Cluster-admin should be tightly controlled and regularly audited.

Service accounts should not use default broad permissions and automountServiceAccountToken should be disabled when unnecessary.

Admission policies should prevent dangerous pod specs from reaching the cluster.

Hands-on example

Checklist: enforce restricted Pod Security for app namespaces, use Role not ClusterRole where possible, block privileged=true, block hostPath, require runAsNonRoot, restrict exec access in production, rotate service-account tokens, and monitor audit logs for escalation attempts.

What is network segmentation, and how do NetworkPolicies enforce it? [Advanced]

Answer

Network segmentation separates systems so compromise in one area does not automatically expose everything else. Kubernetes NetworkPolicies enforce segmentation by allowing only specified ingress and egress between pods, namespaces, and IP blocks, assuming the CNI plugin supports enforcement.

Technical explanation

By default, Kubernetes pods are often broadly reachable inside the cluster unless network policies isolate them.

A good pattern is default deny for ingress and egress, then explicit allow rules for required traffic.

NetworkPolicy is layer 3/4; service mesh authorization can add layer 7 identity and method controls.

Hands-on example

Example: apply a default deny policy in the payments namespace. Add an ingress policy allowing only checkout pods to reach payments on TCP 8080, and an egress policy allowing payments to reach the database service on TCP 5432 and DNS.

What is the difference between a vulnerability, a threat, and a risk? [Advanced]

Answer

A vulnerability is a weakness, a threat is a potential actor or event that could exploit a weakness, and risk is the combination of likelihood and impact if that exploitation happens. Risk requires business context; vulnerability alone is not the full picture.

Technical explanation

Example vulnerability: outdated library with RCE.

Example threat: internet attackers scanning for that RCE.

Example risk: production customer-data service is compromised, leading to data exposure and downtime.

Hands-on example

Hands-on: document a risk register entry with asset, vulnerability, threat actor, exposure, impact, current controls, likelihood, residual risk, owner, due date, and decision: remediate, mitigate, transfer, or accept.

What is threat modeling, and when would you do it? [Advanced]

Answer

Threat modeling is a structured way to identify what can go wrong in a system, who might attack it, what assets need protection, and which controls reduce the risk. I do it during design for new systems, major architecture changes, sensitive data flows, and before exposing new attack surfaces.

Technical explanation

Common methods include STRIDE, attack trees, data-flow diagrams, and abuse cases.

Threat modeling should involve engineering, security, product, and operations because each group sees different risks.

The output should be actionable controls and tracked work, not just a diagram.

Hands-on example

Example: for a new payment API, draw data flows between client, API gateway, checkout, payment provider, database, and secrets manager. Identify threats such as spoofing, tampering, replay, injection, and data leakage. Add controls: mTLS, request signing, input validation, rate limiting, audit logs, and least-privilege tokens.

What is the OWASP Top 10, and name a few categories? [Advanced]

Answer

The OWASP Top 10 is a widely used awareness list of critical web application security risks. Categories include broken access control, cryptographic failures, injection, insecure design, security misconfiguration, vulnerable/outdated components, identification and authentication failures, software/data integrity failures, logging/monitoring failures, and SSRF.

Technical explanation

It is not a compliance checklist by itself, but it is useful for developer education and secure design reviews.

Many categories map directly to controls in CI/CD, code review, testing, and runtime monitoring.

The most effective use is to convert categories into concrete engineering standards and test cases.

Hands-on example

Hands-on: create secure coding checklists from OWASP categories. For injection, require parameterized queries. For broken access control, require authorization tests. For vulnerable components, require SCA. For logging failures, require security event logging and alerting.

What is an injection attack, and how do you prevent SQL injection? [Advanced]

Answer

An injection attack occurs when untrusted input is interpreted as code or commands by an interpreter such as SQL, shell, LDAP, or NoSQL. SQL injection is prevented with parameterized queries, prepared statements, input validation, least-privilege database users, and avoiding dynamic SQL string concatenation.

Technical explanation

Escaping alone is error-prone and should not be the primary control when parameterization is available.

ORMs help but can still be unsafe if raw query strings are built from user input.

Least-privilege database roles limit damage if injection occurs.

Hands-on example

Unsafe: SELECT * FROM users WHERE name = '<user input>'

Safe Java pattern:

PreparedStatement ps = conn.prepareStatement("SELECT * FROM users WHERE name = ?");

ps.setString(1, userInput);

ResultSet rs = ps.executeQuery();

What is the difference between encoding, encryption, and hashing? [Advanced]

Answer

Encoding changes data representation, encryption protects confidentiality and can be reversed with a key, and hashing produces a one-way digest for integrity or password verification. They solve different problems and should not be used interchangeably.

Technical explanation

Base64 is encoding, not encryption; anyone can decode it.

Encryption is reversible by authorized parties with the right key.

Cryptographic hashing is one-way; password hashing should use slow adaptive algorithms with salt, such as bcrypt, scrypt, or Argon2.

Hands-on example

Example: use URL encoding for safe URL parameters, AES-GCM/KMS for storing sensitive reversible data, SHA-256 for file integrity checks, and Argon2id or bcrypt with unique salt for password storage.

Why do you hash and salt passwords rather than encrypt them? [Advanced]

Answer

Passwords are hashed and salted rather than encrypted because applications should not need to recover the original password. A salt makes identical passwords produce different hashes and slows precomputed/rainbow-table attacks; adaptive hashing slows brute-force attempts.

Technical explanation

If encrypted passwords are stolen along with the key, attackers can recover every password.

A unique salt per password prevents attackers from attacking many identical hashes at once.

Use password-hashing algorithms designed to be slow and tunable, not fast general-purpose hashes alone.

Hands-on example

Hands-on: when a user registers, generate a random salt and store bcrypt/Argon2id hash plus parameters. On login, hash the submitted password with the stored parameters and compare using constant-time comparison. Never log or decrypt passwords.

What is a security incident response process, and what are its phases? [Advanced]

Answer

A security incident response process is a structured approach to handling suspected or confirmed security events. Common phases are preparation, identification, containment, eradication, recovery, post-incident review, and continuous improvement.

Technical explanation

Preparation includes runbooks, contacts, logging, access, tabletop exercises, and evidence-handling procedures.

Containment limits damage, eradication removes attacker access, and recovery restores trusted service.

Post-incident review should produce control improvements, not blame.

Hands-on example

Example: for suspected token theft, identify affected identity, preserve logs, disable the token, rotate related secrets, block suspicious sessions, verify no persistence, redeploy clean workloads if needed, restore service, and write a postmortem with detection and prevention actions.

How would you respond to a suspected credential compromise in production? [Advanced]

Answer

For suspected production credential compromise, I would treat the credential as exposed, contain access immediately, rotate or revoke it, investigate usage, assess blast radius, restore trusted credentials, and add controls to prevent recurrence.

Technical explanation

Containment should be fast: disable the credential or restrict its permissions while preserving forensic evidence.

Investigation should review audit logs, CloudTrail, Kubernetes audit logs, CI logs, IPs, actions performed, and time window.

Recovery should include rotation of dependent credentials, validation that attackers did not create persistence, and heightened monitoring.

Hands-on example

Runbook: 1) Open incident. 2) Revoke/disable credential. 3) Snapshot relevant logs. 4) Identify all actions by that principal. 5) Rotate downstream secrets. 6) Remove persistence such as new access keys or roles. 7) Verify service health. 8) Close with lessons learned.

How do you audit who did what in your cloud and clusters (CloudTrail, audit logs)? [Advanced]

Answer

I audit who did what using cloud audit logs such as AWS CloudTrail, Kubernetes audit logs, CI/CD logs, IAM access analyzer data, Git history, and service-specific logs. Logs should be centralized, immutable or tamper-resistant, searchable, and retained according to compliance needs.

Technical explanation

CloudTrail shows AWS API calls, caller identity, source IP, time, request parameters, and result.

Kubernetes audit logs show API server actions such as create, update, delete, exec, and access to secrets.

Audit logs should feed alerts for high-risk actions such as disabling logging, creating admin keys, accessing secrets, or changing network exposure.

Hands-on example

Example detection: alert when an IAM user creates an access key outside the approved pipeline, when CloudTrail is stopped, when a Kubernetes Secret is read in production by an unusual identity, or when a cluster-admin binding is created.

What is compliance-as-code, and how do you continuously prove compliance? [Advanced]

Answer

Compliance-as-code means expressing compliance controls as automated policies, tests, evidence collection, and continuous monitoring instead of manual screenshots and periodic checks. It helps prove controls are enforced over time, not just documented once.

Technical explanation

Examples include IaC policies, Kubernetes admission policies, CIS benchmark checks, encryption checks, logging checks, and access-review automation.

Evidence should be machine-generated, timestamped, tied to control IDs, and stored in an auditable system.

Manual review still exists for risk acceptance and control design, but routine evidence should be automated.

Hands-on example

Hands-on: map SOC 2 control CC6.1 to automated checks: MFA enabled, admin roles reviewed, no public admin ports, CloudTrail on, production deploy approvals required. Export daily evidence from policy engines and store it with control ID, timestamp, result, and owner.

How do you balance security controls with developer velocity? [Advanced]

Answer

I balance security controls with developer velocity by making secure paths easy, fast, and automated. The goal is guardrails, not roadblocks: reusable templates, fast feedback, risk-based gates, clear exceptions, and developer-friendly remediation guidance.

Technical explanation

High-confidence critical issues should block; low-risk or noisy findings should create backlog items or warnings.

Security platforms should provide paved roads such as approved base images, CI templates, secret patterns, and deployment modules.

Measure both risk reduction and friction: false positive rate, scan time, merge-block rate, and MTTR.

Hands-on example

Example: instead of asking every team to write secure Kubernetes YAML, provide a Helm chart with restricted securityContext, probes, NetworkPolicy, and standard labels. Teams move faster because security is embedded in the supported deployment path.

How do you get developers to adopt secure practices without friction? [Advanced]

Answer

Developers adopt secure practices when the secure option is the easiest option and feedback is actionable. I reduce friction with standard templates, IDE/PR feedback, clear examples, self-service documentation, secure defaults, and fast exception workflows.

Technical explanation

Line-level PR comments are more useful than monthly PDF reports.

Security champions and office hours help teams understand recurring patterns.

Metrics should recognize teams that reduce risk, not only name teams with findings.

Hands-on example

Adoption plan: provide starter repos with working SonarQube, SCA, gitleaks, image scanning, OIDC deploy, and secure Kubernetes defaults. Publish a one-page fix guide for top findings. Track adoption by repo coverage and time-to-fix, then improve the templates based on developer feedback.

How would you embed security scanning into a pipeline without making it slow? [Advanced]

Answer

I embed security scanning without slowing the pipeline by using staged scanning, caching, incremental analysis, parallel jobs, risk-based blocking, and asynchronous deep scans. Fast checks run on every PR; heavier scans run on nightly builds or release candidates.

Technical explanation

Pre-commit and PR checks should be fast and high-signal.

Dependency and container scanners should use caches and scan only changed artifacts where possible.

The policy should differentiate between block-now findings and notify/remediate findings.

Hands-on example

Pipeline pattern: PR runs secret scan, SAST incremental analysis, SCA on changed manifests, and IaC checks in parallel. Build runs image scan with cache. Staging runs DAST. Nightly runs full repo and deep dependency scans. Only critical reachable issues block immediately.

What is a break-glass procedure, and why have one? [Advanced]

Answer

A break-glass procedure is a controlled emergency path for bypassing normal access or deployment restrictions during urgent incidents. It exists so teams can restore service quickly while preserving accountability through approvals, time limits, logging, and post-use review.

Technical explanation

Break-glass should not be a shared permanent admin account with no audit trail.

Access should be time-bound, MFA-protected, approved, logged, and reviewed after use.

The procedure should be tested before a real emergency so responders know how to use it.

Hands-on example

Example: production cluster-admin is normally denied. During a P1 outage, an on-call engineer requests break-glass access for one hour through an access system. The grant requires manager/security approval, logs all kubectl commands, and automatically expires.

How do you manage and rotate SSH and API keys at scale? [Advanced]

Answer

At scale, I manage SSH and API keys by reducing static keys, centralizing issuance, enforcing short lifetimes, rotating automatically, inventorying ownership, detecting unused keys, and using alternatives such as SSO, certificates, OIDC, and workload identity.

Technical explanation

For SSH, prefer short-lived SSH certificates or SSO-backed access over permanent authorized_keys sprawl.

For API keys, store them in a secret manager, assign owners, rotate on schedule, and alert on unusual use.

Keys should have least privilege, expiration, environment scope, and automated revocation when employees or services offboard.

Hands-on example

Hands-on: replace long-lived bastion SSH keys with Teleport or OpenSSH certificates valid for 8 hours. For cloud APIs, remove static IAM user keys and use OIDC/workload identity. Run a weekly job that flags keys older than 90 days or unused for 30 days.

What is the difference between a WAF, an IDS, and an IPS? [Advanced]

Answer

A WAF filters and protects web application traffic, usually at HTTP layer 7. An IDS detects suspicious activity and alerts. An IPS detects and actively blocks or prevents suspicious traffic. WAF is application-focused; IDS/IPS are broader network or host detection/prevention controls.

Technical explanation

A WAF can block common web attacks such as SQL injection, XSS patterns, bad bots, or protocol anomalies.

An IDS is usually passive and helps with detection and investigation.

An IPS is inline and can block, so tuning is critical to avoid false-positive outages.

Hands-on example

Example: place AWS WAF or Cloudflare WAF in front of a public API to block known malicious request patterns. Use IDS sensors for alerting on lateral movement. Use IPS cautiously on high-confidence signatures where blocking risk is acceptable.

How do you measure the effectiveness of your security program (MTTR for vulns, coverage)? [Advanced]

Answer

I measure security program effectiveness with outcome and coverage metrics: vulnerability MTTR, SLA compliance, critical exposure count, KEV exposure time, scanning coverage, policy violation rate, secrets incidents, mean time to detect/respond, exception aging, and control pass rates.

Technical explanation

Raw finding count alone is misleading because better scanning can initially increase findings.

Metrics should show risk reduction, speed of remediation, and whether controls are actually deployed across the estate.

Use service/team dashboards so ownership is visible and improvements are measurable.

Hands-on example

Security scorecard: 98 percent repos have SAST/SCA, 95 percent images scanned, critical vulnerability MTTR 3 days, KEV exposure less than 24 hours, 0 public S3 buckets, 100 percent prod deploys signed, 12 aged exceptions needing review.

What recent security tool or practice have you adopted, and what risk did it reduce? [Advanced]

Answer

A recent practice I would highlight is keyless artifact signing with OIDC-based CI identity and admission verification. It reduces the risk of deploying tampered or untrusted images and removes long-lived signing keys from CI.

Technical explanation

The CI workflow signs the immutable image digest using its OIDC identity.

Provenance and SBOM attestations are attached to the artifact.

Kubernetes admission verifies the signature and trusted workflow identity before allowing production deployment.

Hands-on example

Example: implement cosign keyless signing in GitHub Actions, generate SLSA provenance, store SBOM attestations, and configure Kyverno or Sigstore policy-controller to allow only images signed by org/repo release workflows from protected branches.

How would you design a secure-by-default CI/CD pipeline from scratch? [Advanced]

Answer

A secure-by-default CI/CD pipeline should start with protected source control, strong identity, fast security feedback, trusted builds, signed artifacts, policy-gated deployments, least-privilege credentials, and continuous evidence. Security should be automated into each stage rather than added as a manual release checklist.

Technical explanation

Source: branch protection, reviews, CODEOWNERS, secret scanning, signed commits if required, and dependency review.

Build: ephemeral runners, SAST, SCA, tests, image scan, SBOM, provenance, and artifact signing.

Deploy: OIDC credentials, environment approvals, IaC policies, admission control, signature verification, progressive rollout, observability, and audit logging.

Hands-on example

Design: PR checks run tests/SAST/SCA/secrets. Main build creates image by digest, generates SBOM and provenance, signs with cosign. Promotion requires quality gate and approval. Kubernetes admission verifies registry, signature, provenance, nonroot securityContext, and allowed namespace policies before rollout.

How do you prove to an auditor that security controls are enforced continuously, not just documented? [Advanced]

Answer

To prove continuous enforcement to an auditor, I provide automated evidence from the systems that enforce controls: CI logs, policy-as-code results, admission-controller decisions, cloud configuration checks, IAM reviews, vulnerability SLA dashboards, audit logs, and exception records. The evidence should be timestamped, complete, and tied to control objectives.

Technical explanation

Auditors need more than policy documents; they need proof that controls operated during the audit period.

Evidence should show both preventive controls, such as blocked deployments, and detective controls, such as alerts and reviews.

Exceptions should be documented with approval, scope, expiration, compensating controls, and review history.

Hands-on example

Evidence pack: export monthly reports showing 100 percent production deployments passed signature verification, all prod namespaces enforce restricted Pod Security, CloudTrail remained enabled, critical CVEs met SLA, access reviews completed, and all policy exceptions had valid owners and expiry dates.

Source Notes

SonarQube quality gates: https://docs.sonarsource.com/sonarqube-server/quality-standards-administration/managing-quality-gates/introduction-to-quality-gates

SonarQube Security Hotspots: https://docs.sonarsource.com/sonarqube-server/user-guide/security-hotspots

SonarQube Clean as You Code: https://docs.sonarsource.com/sonarqube-server/user-guide/clean-as-you-code

CISA Known Exploited Vulnerabilities Catalog: https://www.cisa.gov/known-exploited-vulnerabilities-catalog

FIRST EPSS: https://www.first.org/epss/

FIRST CVSS: https://www.first.org/cvss/

SLSA build requirements: https://slsa.dev/spec/v1.2/build-requirements

Sigstore cosign documentation: https://docs.sigstore.dev/cosign/overview/

Open Policy Agent Kubernetes admission control: https://www.openpolicyagent.org/docs/kubernetes

Kubernetes Pod Security Standards: https://kubernetes.io/docs/concepts/security/pod-security-standards/

Kubernetes Pod Security Admission: https://kubernetes.io/docs/concepts/security/pod-security-admission/

Kubernetes Network Policies: https://kubernetes.io/docs/concepts/services-networking/network-policies/

HashiCorp Vault database secrets engine: https://developer.hashicorp.com/vault/docs/secrets/databases

HashiCorp Vault static and dynamic secrets tutorial: https://developer.hashicorp.com/vault/tutorials/get-started/understand-static-dynamic-secrets

cert-manager documentation: https://cert-manager.io/docs/

OWASP Top 10: https://owasp.org/www-project-top-ten/

← All interview topics