Activity: harm assessment and mitigation plan for an AI feature

Jukka 5.9.2025

Lesson Progress

0% Complete

Quick overview

Goal: Develop a concise harm assessment and mitigation plan for one specific AI feature that affects young people’s health, sexuality education, SEL or well-being. Get peer feedback and produce practical next steps you can actually implement.
Time: ~90–120 minutes (individual) or ~2 hours (small group + feedback).
Materials: feature brief (1–2 paragraphs), harm-assessment template (below), sticky notes or digital collaboration board, rubric for peer feedback.

Why this matters
When AI touches learners’ wellbeing, small design choices can have big consequences. This activity turns abstract “responsible AI” principles into concrete decisions you can test, explain, and defend.

Instructions — step by step

Pick one concrete AI feature
- Keep it narrow. Example features: a student-facing sexual-health chatbot, an automated SEL sentiment analyzer that flags at-risk students, a recommendation engine for educational videos on puberty, or an automated progress nudger that sends privacy-sensitive messages.
- Write a 1–2 sentence feature brief: what it does, who uses it, where, and what data it uses.
Identify stakeholders & user groups (5–10 minutes)
- List primary users (age bands), caregivers, teachers, counselors, product team, legal/compliance, and regulators.
- Note differences by age/developmental stage and special populations (neurodiverse learners, LGBTQ+ youth, minors in care, etc.).
Rapid harm brainstorm (10–15 minutes)
- For each stakeholder/group, list potential harms. Use categories below to jog ideas (privacy, safety, mental health, bias, autonomy, sexual content, legal/compliance, usability harms).
- Rate each harm for Severity (Low / Medium / High) and Likelihood (Low / Medium / High).
Root cause analysis (10 minutes)
- For the highest-risk items (High Severity * Likelihood), ask “why?” two or three times to find the root cause (data source, labeling, UI, lack of escalation, ambiguous policy).
Draft mitigations (15–25 minutes)
- Propose layered mitigations: design, technical, policy/process, training, and ongoing monitoring. Aim for at least one preventive, one detective, and one corrective control per major harm.
- Estimate residual risk after mitigation (Acceptable / Monitor / Unacceptable).
Create an implementation plan (10–15 minutes)
- Who owns each mitigation? What’s the timeline and minimum viable step (MVP) to reduce risk quickly? What tests or pilots are needed? What documentation or approvals must be completed?
Peer review & feedback (20–30 minutes)
- Exchange plans with a peer or small group. Use the peer feedback rubric (below).
- Ask clarifying questions, challenge assumptions, and suggest missing mitigations or tests.
Finalize next steps (10 minutes)
- Consolidate feedback, produce a short “action list” (3–6 items) with owners and deadlines.

Harm categories (use these to structure brainstorming)

Privacy & confidentiality (data collection, storage, sharing)
Safety & abuse (exposure to harmful or sexual content, grooming, self-harm)
Psychological harms (anxiety, shame, stigma, triggering content)
Developmental/educational impact (misinformation, inappropriate pacing)
Bias & discrimination (intersectional harms, misidentifying identities)
Autonomy & consent (informed consent, youth agency, opt-out)
Legal & compliance (COPPA, GDPR, local laws about sexual health)
Usability & interpretability (confusing explanations, misleading scores)
Operational (failure modes, escalation gaps, lack of human oversight)

Harm assessment & mitigation template

Use this structure to write your plan concisely (aim for 1–2 pages).

Feature brief (1–2 sentences)
Stakeholders / user groups
Top 3 harms (each with Severity + Likelihood)
- Harm A: description
  - Root cause(s)
  - Proposed mitigations (preventive / detective / corrective)
  - Residual risk (Acceptable / Monitor / Unacceptable)
  - Owner & timeline for mitigations
- Harm B: …
- Harm C: …
Monitoring & metrics
- What you’ll measure to know mitigations are working (e.g., escalation rate, false positives/negatives, privacy incident count, satisfaction among youth)
Escalation & incident response
- Who to notify, how to triage, how to involve safeguarding/child protection
Documentation & compliance
- Consent forms, data retention policy, privacy notice, audit logs
Next practical steps (3–6 items with owners & deadlines)
Notes / stakeholder sign-offs required

Example (short) — AI sexual-health chatbot for ages 13–17 in a school wellness app

Feature brief
- A chatbot answers anonymous sexual-health questions typed by students (13–17), trained on public health content and previous anonymized Q&A. Responses are immediate; if a question signals risk (self-harm, abuse), it flags a counselor.
Stakeholders
- Students (13–17), parents/guardians, school counselors, school admin, product team, legal/compliance.
Top harms
- Harm 1 — Misinformation that leads to harmful decisions
  - Severity: High; Likelihood: Medium
  - Root causes: training data includes outdated/incorrect sources; model hallucinates; ambiguous prompts.
  - Mitigations:
    - Preventive: curate and version-controlled content sources; small, domain-specific model or retrieval-augmented responses that cite sources.
    - Detective: automated fact-checking checks and human review of sample conversations.
    - Corrective: clearly visible disclaimer, “ask a counselor” CTA, and easy correction workflow.
  - Residual risk: Monitor
  - Owner/timeline: Content lead + product — prototype with curated sources in 6 weeks.
- Harm 2 — Exposure to or normalization of sexual content inappropriate for age
  - Severity: High; Likelihood: Low–Medium
  - Root causes: user prompts requesting explicit content; model responds with explicit detail.
  - Mitigations:
    - Preventive: strict content filter and age-appropriate response templates; refuse explicit requests.
    - Detective: logging and automated alert if high-risk keywords appear.
    - Corrective: trigger soft-block, provide safer educational alternative, notify counselor if clearance criteria met.
  - Residual risk: Monitor/Unacceptable until filters proven in pilot.
  - Owner/timeline: Safety engineer + school safeguarding lead — filter test in 2 weeks, pilot 4 weeks.
- Harm 3 — Privacy breach exposing student queries
  - Severity: High; Likelihood: Low
  - Root causes: retaining identifiable logs; weak access controls.
  - Mitigations:
    - Preventive: anonymize queries at ingest, minimal retention (e.g., 30 days), encryption at rest/in transit.
    - Detective: access logs and regular audits.
    - Corrective: data breach response plan with parental notification templates and legal counsel.
  - Residual risk: Acceptable with strong controls.
  - Owner/timeline: Security lead — deploy retention & encryption immediately.
Monitoring & metrics
- % of responses citing a trusted source; # flagged conversations per 1,000 queries; false refusal rate (legitimate education requests blocked); incident count.
Escalation & incident response
- If chatbot flags self-harm/abuse: immediate human review within 1 hour; emergency protocol if imminent danger. Clear handoffs and documentation.
Documentation & compliance
- Privacy notice for students/parents, data processing agreement with vendor, age-appropriate consent flows, school sign-off.
Next steps (3 items)
- Run a 2-week closed pilot with volunteer classes and counselor on-call (Owner: Product; Due: 4 weeks).
- Conduct a bias and safety audit of the training data (Owner: Ethics lead; Due: 3 weeks).
- Draft consent + privacy language for parent communication (Owner: Legal; Due: 2 weeks).

Peer feedback rubric — what to look for

Use these prompts when reviewing a peer’s plan.

Clarity & scope
- Is the feature brief specific and bounded?
- Are the user groups and contexts of use well-defined?
Completeness of harms
- Are obvious harms missing (privacy, safety, bias, autonomy)?
- Did they consider different age groups and vulnerable students?
Root causes & mitigations
- Do mitigations directly address root causes?
- Are there preventive, detective, and corrective measures?
- Are legal and safeguarding obligations considered?
Feasibility & ownership
- Are owners realistic and timelines plausible?
- Is the plan actionable (not just aspirational)?
Monitoring & escalation
- Are there measurable indicators?
- Is there a clear escalation path for urgent harms?
Equity & inclusion
- Were diverse perspectives (e.g., marginalized youth) included in testing/feedback plans?

How to give feedback

Start with one thing that’s working well.
Ask 2 clarifying questions.
Suggest 2 concrete improvements (who and when).
Keep tone curious and collaborative.

Example feedback comment

“Nice brief — the scope is clear. Question: how will the app identify age reliably without creating privacy risk? Suggestion: add a privacy-preserving age verification option and a fallback for ambiguous ages, owner: engineering, timeframe: prototype in 3 weeks.”

Facilitator tips

Encourage testers to think like a young person and like a school safeguarding lead — both perspectives matter.
Remind teams: the goal is “good enough to pilot safely,” not perfection.
Use real-world constraints (budget, timeline, legal) to make mitigations realistic.

Practical next steps after the activity

Approve the MVP mitigations for a time-limited pilot with explicit monitoring and human oversight.
Run usability testing with diverse youth panels and school counselors (with consent and safeguarding).
Build monitoring dashboards and weekly review cycles for the first 3 months of deployment.
Draft/update policies: privacy notices, data retention, consent, and incident response aligned to local laws.
Train staff (teachers, counselors, moderators) on how the AI works, when to step in, and how to report incidents.
Schedule quarterly audits for bias and safety and an annual public summary of outcomes and improvements.

If you want, paste your feature brief here and I’ll help you draft a one-page harm assessment and mitigation plan you can use in the next peer review.

Responsible AI for Healthy and Thriving Learners — Principles, Practice and Policy