The Responsibility of Human Programmers in the Development of Ethical AI

The Responsibility of Human Programmers in the Development of Ethical AI

Why “ethical AI” is a software engineering responsibility

AI systems don’t simply “become” ethical (or unethical). Their behavior is a direct consequence of thousands of human decisions: how we define the problem, which data we include or exclude, the loss functions we optimize, the constraints we enforce, the tests we run (and the ones we don’t), the documentation we publish, and the guardrails we ship. In other words, ethics is a lifecycle discipline, not a last-minute compliance checkbox.

Multiple global frameworks now make that lifecycle explicit. The EU’s AI Act, which entered into force on August 1, 2024, phases in obligations through 2025–2027, including transparency duties for general-purpose AI and extensive requirements for “high-risk” systems. Even if you build in the U.S., these rules follow your product into the European market.

In the U.S., while the Biden administration’s 2023 AI Executive Order (EO 14110) was rescinded in January 2025, the NIST AI Risk Management Framework (AI RMF 1.0) remains the government’s primary voluntary playbook for trustworthy AI — and a widely adopted reference in industry. Practically, that means teams can (and should) still use NIST’s functions and categories to structure risk identification, measurement, and mitigation throughout development.

Alongside regulation and policy, professional ethics still matters. The ACM Code of Ethics and the ACM/IEEE Software Engineering Code place the public interest first and lay out concrete duties when our code could cause harm. These aren’t airy ideals; they’re actionable responsibilities that map cleanly to everyday engineering work.

What follows is a builder-centric blueprint: the responsibilities programmers should own at each step of the AI lifecycle, with concrete practices, tests, and artifacts that reduce harm — and reduce your own future product risk.

1) Problem framing: begin with accountable scope and stakeholders

Your responsibility: Ensure the problem statement, objectives, and constraints align with human rights, applicable law, and domain realities.

Why it matters: Many “AI failures” are social or organizational failures in disguise: automating a decision that shouldn’t be automated; optimizing for a proxy metric that encodes structural bias; omitting affected stakeholders from design.

Do this:

Map stakeholders and impacts before modeling. The OECD AI Principles emphasize human-centric values (fairness, respect for rights) and accountability across the lifecycle. Use them to sanity-check goals and non-goals.
Define unacceptable behaviors up front. The EU AI Act bans uses like social scoring and certain manipulative systems; treat these as hard constraints even outside the EU.
Document intended use and limitations. The Model Cards framework recommends describing use cases, performance ranges, and out-of-scope contexts — a habit that prevents misuse downstream.

2) Data sourcing: respect rights, labor, and provenance

Your responsibility: Source and curate data lawfully, ethically, and transparently — including how you acquire labels and enrichment.

Why it matters: Non-consensual scraping, mislabeled ground truth, and exploitative data workforces create both ethical and legal risk. Data is where most downstream harms begin.

Do this:

Document datasets with “Datasheets.” Track collection methods, consent, demographics, known hazards, and recommended uses/limits. This reduces “unknown unknowns” in evals and deployment.
Audit enrichment supply chains. The Partnership on AI’s Responsible Sourcing guidance highlights worker conditions, pay, and agency for annotators and moderators — all of which affect data quality and ethics.
Respect privacy by design. If processing personal data, GDPR Article 25 requires data protection “by design and default.” Engineers should minimize collection, pseudonymize where possible, and configure defaults to least privilege. ISO/IEC 27701 can formalize your privacy management program.

3) Modeling: make fairness, robustness, and safety first-class requirements

Your responsibility: Choose architectures, objectives, and constraints that reflect fairness, safety, and security — not just raw accuracy.

Why it matters: AI systems often inherit and amplify existing inequities. Without explicit constraints and evaluations, your model may perform “on average” while harming specific groups or failing adversarially in the wild.

Do this:

Pick appropriate fairness tests. Depending on context, metrics like Equalized Odds / Equality of Opportunity may be relevant; know the technical trade-offs and legal expectations in your domain.
Avoid the “fairness by abstraction” trap. Technical fixes alone can miss sociotechnical realities. Include domain experts and affected communities in the loop as you iterate.
Engineer for adversaries. Use OWASP Top 10 for LLM Applications and MITRE ATLAS to anticipate prompt injection, data poisoning, model exfiltration, and other AI-specific threats. Build mitigations into your threat model and pipelines.

4) Evaluation: measure what matters — and publish it

Your responsibility: Design evaluations that reflect real-world use and abuse, then publish enough to enable scrutiny and informed use.

Why it matters: Many public incidents were surprises because teams didn’t evaluate on relevant subpopulations, contexts, or adversarial behaviors. A robust eval suite is your best early-warning system.

Do this:

Adopt structured transparency artifacts. Model Cards (models) and Datasheets (datasets) clarify performance ranges, intended use, and ethical considerations. They also speed audits and regulatory submissions.
Test fairness and subgroup performance. Validate across demographics and domain shifts. If you can’t legally collect protected attributes, design proxy tests and collaborate with internal compliance to do bias testing responsibly. (The Apple Card investigation is a reminder that teams must be able to explain outcomes at scale.)
Red-team for safety and security. NIST’s AI RMF and its Generative AI Profile emphasize structured red-teaming and incident response protocols; integrate these before launch and refresh after major model or data updates.

5) Shipping: bake ethics into your SDLC — not a late-stage gate

Your responsibility: Treat safety, security, and compliance as release criteria; tie them to code reviews, CI checks, and sign-off workflows.

Why it matters: If ethical risk isn’t a blocker, it’s a suggestion. High-performing teams operationalize ethics with the same rigor as uptime and P0 bugs.

Do this:

Adopt an AI management system. ISO/IEC 42001 (the first AI management system standard) defines governance, risk, lifecycle controls, and supplier oversight. It helps you turn policy into durable process.
Map to NIST AI RMF functions. Organize controls around Govern, Map, Measure, Manage; track risks, document trade-offs, and assign ownership in tickets and design docs.
Secure the stack. Use OWASP LLM Top 10 to add tests for prompt injection, insecure output handling, and supply-chain vulnerabilities; treat these just like standard application security tests. Complement with MITRE ATLAS scenarios.

6) Monitoring & incident response: ethics doesn’t end at launch

Your responsibility: Watch the model in the wild, capture telemetry ethically, and respond to harms quickly and transparently.

Why it matters: Distribution shifts, prompt-engineering attacks, and unanticipated uses are inevitable. Without monitoring and a clear response plan, small issues become front-page incidents.

Do this:

Log ethically and minimally. You need enough telemetry to detect drift, bias spikes, and abuse — without over-collecting personal data. Again, GDPR Article 25 pushes “least necessary” by default.
Use an incident taxonomy. The AI Incident Database (AIID) catalogs real-world failures; mine it to build playbooks and tabletop drills for your specific domain.
Stand up an IR plan. NIST’s guidance emphasizes incident response teams and escalation paths for AI harms; define thresholds for rollback, feature gates, and notifying users or regulators when required.

7) Case studies: what goes wrong — and what to learn

Amazon’s recruiting model (2018): An experimental résumé screener was scrapped after engineers found it downgraded women for technical roles — apparently learning historical bias from past hiring data. Lesson: No amount of “de-gendering” features fixes a poisoned objective. Start with data provenance, label audits, and fairness constraints aligned to hiring law.

COMPAS risk scores (2016+): ProPublica argued the tool was racially biased; later analyses disputed parts of that claim, pointing to competing fairness definitions and measurement choices. Lesson: Fairness is multi-criteria and context-dependent; teams must specify which fairness constraints apply and why, and disclose trade-offs.

Apple Card (2019–2021): A viral allegation of gender bias triggered a regulator review; NYDFS ultimately found no disparate impact violation — but emphasized the need for explainability and consumer trust. Lesson: Even if a model clears legal hurdles, opacity erodes confidence. Publish plain-language explanations, appeal paths, and model cards.

8) The evolving policy context: what builders need to know

EU AI Act timelines matter. Bans on “unacceptable risk” uses applied from February 2, 2025; obligations for general-purpose AI (transparency and governance) apply from August 2, 2025; high-risk system rules land by August 2, 2026 (with some embedded products extended to 2027). Keep an eye on Commission guidance and codes of practice.
U.S. federal shifts ≠ no standards. EO 14110 was rescinded in January 2025, and a new EO prioritized deregulatory approaches — but NIST’s AI RMF persists as a de facto benchmark, and many states or sectors (finance, health) have their own expectations. If you deploy across borders, aligning to NIST + ISO/IEC 42001 + OECD Principles gives you a robust baseline.
Security is catching up fast. The community is crystallizing best practices for LLM security (OWASP 2024–2025, MITRE ATLAS). Treat these as non-optional in your secure development lifecycle.

9) A practical responsibility checklist for programmers

Think of this as a minimal, reusable Definition of Done for ethical AI features. Tailor it to your domain.

A. Discovery & design

Stakeholder & harm mapping completed; unacceptable uses listed and blocked.
Intended users, contexts, and out-of-scope use documented (draft Model Card started).

B. Data

Dataset Datasheet created for every major training/eval dataset.
Data licensing/consents verified; enrichment vendors vetted for worker conditions and quality.
Privacy by design: minimization, pseudonymization, retention limits implemented.

C. Modeling

Fairness objective(s) selected (e.g., equal opportunity) with legal/compliance input; trade-offs documented.
Security threat model includes prompt injection, data poisoning, and supply-chain scenarios. Mitigations landed in code.

D. Evaluation

Subgroup performance analysis with pre-registered thresholds and continuous monitoring plan.
Red-team exercises executed; issues triaged and fixes verified; publish safety notes.
Model Card completed; known limitations and unsafe contexts listed.

E. Release & operations

Controls mapped to NIST AI RMF and ISO/IEC 42001; owners assigned; sign-offs captured in CI.
Incident response runbook for AI harms (with rollback plan and user/regulator comms). AI Incident Database scenarios used for drills.
Post-launch drift/bias/abuse monitoring live; privacy-respecting telemetry only.

10) Culture: ethical engineering is a team sport

Ethical AI is not just a policy deck or a single “responsible AI” hire. It’s a culture of engineering professionalism with guardrails and incentives that make the right thing the easy thing.

Professional duty: The ACM/IEEE codes explicitly require us to avoid harm, respect privacy, and be honest about limitations. Translate those into your team’s code review checklists and promotion criteria.
Standardization over heroics: Adopt ISO/IEC 42001 or an equivalent AI management system so ethics survives team churn and product pivots.
Learning from incidents: Regularly review the AI Incident Database as a team, just like SREs review postmortems. Turn lessons into tests.
Security partnership: Treat prompt injection, data leakage, and model abuse as security bugs, not “weird model quirks.” Sync with AppSec on OWASP LLM Top 10 and MITRE ATLAS coverage.

11) Frequently asked “but how?” — answers for busy builders

Q: We can’t legally collect demographics. How do we test fairness?

A: Use appropriate proxy measurements (geography, product lines, time-based clusters), synthetic tests, and post-hoc audits with privacy-preserving methods. Pair with careful qualitative UX research to surface harm patterns. When possible, work with compliance on limited, consented collection for testing fairness, as regulators increasingly expect evidence of non-discrimination — even when a protected attribute isn’t directly used. The Apple Card investigation shows how explainability and evidence can make or break trust.

Q: We’re shipping an LLM feature. What are the must-have controls?

Prompt input/output sanitization and injection defenses; function-calling allowlists; tool-use sandboxes.
Guardrails for sensitive capabilities (e.g., finance, health) with hard blocks and human-in-the-loop escalation.
Eval suites for jailbreaks, data leakage, toxic content, and factuality within your domain.
Abuse monitoring with rate limiting and anomaly detection; clear user reporting channels. Reference OWASP LLM Top 10 (2024–2025) for specific test cases.

Q: We already follow NIST and ISO security. Do we really need AI-specific stuff?

A: Yes. Traditional security doesn’t cover data poisoning, prompt injection, model inversion, or reward hacking. That’s why MITRE ATLAS and OWASP LLM exist — and why the NIST AI RMF emphasizes AI-unique risks.

12) What to do this quarter: a 90-day plan

Inventory AI systems and features. For each, name a responsible owner; record intended use, users, and high-level harms. Start a risk register (NIST AI RMF).
Add documentation by default. Create a minimal Model Card and Datasheet template. Make “doc updated” a release gate.
Ship a baseline eval pack. Include fairness metrics appropriate to your domain and an adversarial test set (prompt injection, insecure output handling). Fail the build on regression.
Stand up monitoring & IR. Define telemetry, alerts for drift/bias spikes, and an AI incident runbook. Tabletop an AI incident using AIID examples.
Choose a governance spine. Begin aligning to ISO/IEC 42001 (lightweight to start), mapping controls to NIST AI RMF. Schedule a gap assessment.

13) The mindset shift: from “can we build it?” to “should we ship it, here, like this?”

Ethical AI isn’t about saying no to technology. It’s about doing the engineering to make “yes” responsible. The ACM and IEEE codes are clear: the public good is our primary consideration. The EU AI Act and NIST AI RMF operationalize that duty into concrete obligations and practices. Standards like ISO/IEC 42001 and 27701 give us a durable management system that survives reorgs and hype cycles. And the security community’s work (OWASP, MITRE) shows exactly how attackers will try to break what we build — so we can harden it before they do.

When we — the programmers, reviewers, data wranglers, MLOps engineers, and SREs — own these responsibilities, we reduce harm, reduce regulatory and reputational risk, and build the trust that keeps products alive. Ethics isn’t a speed bump; it’s the map that keeps you out of the ditch.

Ship accordingly.

References & resources (selected)

EU AI Act overview and timelines.
NIST AI Risk Management Framework (1.0) and program page.
OECD AI Principles (human-centric values, accountability).
ISO/IEC 42001 (AI management systems) and ISO/IEC 27701 (privacy).
ACM Code of Ethics and ACM/IEEE Software Engineering Code of Ethics.
Model Cards and Datasheets for Datasets frameworks.
OWASP Top 10 for LLM Applications (2024–2025); MITRE ATLAS threat matrix.
AI Incident Database (use for training & drills).

Note on U.S. policy: EO 14110’s rescission in January 2025 changed federal direction, but did not negate the value of NIST AI RMF or sectoral rules. If you operate in the EU (or build products used there), the AI Act’s phased duties still apply on the dates above.

The Responsibility of Human Programmers in the Development of Ethical AI

Why “ethical AI” is a software engineering responsibility

1) Problem framing: begin with accountable scope and stakeholders

2) Data sourcing: respect rights, labor, and provenance

3) Modeling: make fairness, robustness, and safety first-class requirements

4) Evaluation: measure what matters — and publish it

5) Shipping: bake ethics into your SDLC — not a late-stage gate

6) Monitoring & incident response: ethics doesn’t end at launch

7) Case studies: what goes wrong — and what to learn

8) The evolving policy context: what builders need to know

9) A practical responsibility checklist for programmers

A. Discovery & design

B. Data

C. Modeling

D. Evaluation

E. Release & operations

10) Culture: ethical engineering is a team sport

11) Frequently asked “but how?” — answers for busy builders

12) What to do this quarter: a 90-day plan

13) The mindset shift: from “can we build it?” to “should we ship it, here, like this?”

References & resources (selected)

Like this:

Related

Leave a ReplyCancel reply

The Responsibility of Human Programmers in the Development of Ethical AI

Why “ethical AI” is a software engineering responsibility

1) Problem framing: begin with accountable scope and stakeholders

2) Data sourcing: respect rights, labor, and provenance

3) Modeling: make fairness, robustness, and safety first-class requirements

4) Evaluation: measure what matters — and publish it

5) Shipping: bake ethics into your SDLC — not a late-stage gate

6) Monitoring & incident response: ethics doesn’t end at launch

7) Case studies: what goes wrong — and what to learn

8) The evolving policy context: what builders need to know

9) A practical responsibility checklist for programmers

A. Discovery & design

B. Data

C. Modeling

D. Evaluation

E. Release & operations

10) Culture: ethical engineering is a team sport

11) Frequently asked “but how?” — answers for busy builders

12) What to do this quarter: a 90-day plan

13) The mindset shift: from “can we build it?” to “should we ship it, here, like this?”

References & resources (selected)

Share this:

Like this:

Related

Leave a ReplyCancel reply

Discover more from Granite State Report