Posts

The Agent Interface

7 min read essay, research

On March 1, 2026, I filed an issue against GitHub Agentic Workflows at 4:20 AM UTC. A push failure was being misattributed as a patch failure. I traced it to a specific function, included the file path, line numbers, and a proposed fix, then went to sleep. By 5:31 AM, an agent had picked it up, implemented a fix, opened two PRs, and the maintainer had merged both. 71 minutes. No human wrote a line of implementation code.

In an agentic repo, every issue is an instruction to a machine. That changes what “contribution” means.


A good deep research on your side leads to good issues, then the fix goes in fast.

— Peli de Halleux

I wrote about that in The New OSS. That piece made the claim from 19 personal issues. This one tests it against 435 non-maintainer issues from 158 contributors, with the full methodology open sourced.

The data, scripts, and generated outputs are open source at gh-aw-agentic-contribution-analysisclassify-authors.py, extract-signals.py, link-prs-timeline.py, analyze.py, and branched-analysis.py reproduce every number in this post.

gh-aw is a GitHub Next project — agentic workflows written in markdown, run as GitHub Actions. Over 350 workflow definitions handle everything from issue triage to code implementation to PR review. I pulled every issue in the repository and classified authors by merge rights: bots, maintainers, and community (everyone else). The useful sample is the community slice — 435 issues from 158 contributors.

Issue authorship at repo scale
86.5%
7.8%
5.7%
Bot-authored 6,648 issues
Maintainer 601 issues
Community 435 issues
7,684 total issues on github/gh-aw

For every community issue, there are about 15 bot-authored ones. Human contributors are writing into a system whose default operating mode is automation.

The same evidence, different lanes

Bugs close at a median of 0.53 days. Enhancements take 2.97 days. But the aggregate hides the real mechanism — it only emerges when you look at what body signals do inside each lane.

Within labeled bugs, error output drops median closure from 2.33 days to 0.47 days. Within labeled enhancements, the same signal compresses 6.95 days to 1.41 days. But in the uncategorized majority, error output moves the median from 0.13 days to 0.16 days — slightly slower, not faster.

That reversal is not noise. It appears across multiple signals:

Signal BugEnhancementUncategorized
Error output 2.3 0.47 d 7.0 1.4 d 0.13 0.16 d
Suggested fix 3.8 0.42 d 3.5 0.82 d 0.12 0.17 d
Proposed code 0.66 0.51 d 2.4 4.0 d 0.13 0.16 d
Line number 0.91 0.21 d 3.5 0.82 d 0.15 0.26 d
without signal → with signal (median days to close)
Median days to close, without → with signal. Bright = evidence compresses time. Muted = evidence marks harder work.

In bugs and enhancements, evidence compresses resolution time by days. In the uncategorized majority, the same evidence marks harder work. If an issue carries detailed error logs but the system’s keyword heuristics couldn’t classify it as a standard bug, the issue is inherently complex or ambiguous — the evidence signals difficulty, not clarity. The easy uncategorized issues have no signals at all and close in minutes.

Three execution contracts

77.2% of community issues have no category label. Uncategorized is not one bucket. Decomposed by signal content, the issues separate into three different execution contracts:

Failure-shaped 0.15 days
194 issues · 85% same-day

Error output or reproduction steps. Describes a breakdown, just without the bug label.

Change-shaped 0.22 days
61 issues · 77% same-day

Fix or improvement language without error evidence. Proposes a change, without the enhancement label.

Minimal 0.11 days
81 issues · 84% same-day

No diagnostic signals. Title-only or one-line reports. Narrow enough that the agent acts on the title alone.

The minimal cluster closes fastest — but not because less is more. A title like “Fix typo in README” contains both the problem and the solution. There is nothing to diagnose. For trivial work, the issue is already executable on arrival.

Compare the change-shaped cluster to labeled enhancements. Both propose changes. Both use fix or improvement language. But change-shaped unlabeled issues close at a median of 0.22 days. Labeled enhancements close at 2.97 days. Controlling for body length and quality score, that gap holds. The labeled enhancement median is 3x to 110x slower across every matched bin.

The triage layer is not neutral.

Three gates

The label is only the first gate. Inside gh-aw, routing has three layers:

  1. Classification. A keyword heuristic labels incoming issues as bug, enhancement, or leaves them uncategorized. This determines what kind of work the system thinks it is looking at.

  2. Eligibility. An internal cookie label marks issues as eligible for automated dispatch. Bot-generated work items carry it by default. Community issues do not appear to receive it automatically.

  3. Priority. A dispatch workflow scores eligible issues by type: bugs +40, enhancements +30, security +45. Higher scores get picked up first.

Community issues cannot enter the automated queue without a maintainer explicitly promoting them. Your issue can carry perfect diagnostic evidence and still sit in a manual-review lane.

That means the 0.22-day median for change-shaped unlabeled issues and the 2.97-day median for labeled enhancements may not be two labels in the same pipeline. They may be two different pipelines entirely — one automated, one waiting for a human to decide the work is worth dispatching.

What makes a bug report executable

The cleanest place to measure issue quality is inside the labeled bug subset. That trims away category noise and lets the body signals speak directly.

Signals that make bugs legible to agents
Relative effect size 0 to 100%
Error output
79.8%
79.8%
Run link
31%
31%
Proposed code
22.8%
22.8%
File path
3.3%
3.3%
% lower median closure time when signal is present (labeled bugs only, n=42 closed bugs)

Error output is the standout. Including actual failure text moves the median from 2.33 days to 0.47 days. The combination of error output with suggested-fix language is even stronger: median drops to 0.34 days, an 89.6% improvement.

When enhancements close like bugs

Enhancements are polarized. Of 19 closed enhancements, 8 resolved within a day and 8 took over a week. There is almost nothing in the middle.

The dividing line is whether the enhancement carries failure evidence at a specific location. An enhancement with both error output and a file path closes at a median of 0.88 days — it routes almost like a bug. Without that combination, the median is 6.95 days.

Proposed code does the opposite. In a bug report, code narrows the fix. In an enhancement, it expands the decision surface: the agent is no longer judging whether a change resolves a failure, but whether the proposed implementation is the right product decision. That pushes the work back toward human judgment.

Who implements the fixes

192 of 379 closed community issues link to at least one PR created before closure. 177 of those PRs are merged.

88.1% Agent-authored 177 merged linked PRs, 156 by bots
98.9% One merger @pelikhan merges nearly all
3.3 hrs Issue → merge Median time from filing to merged linked PR

The Copilot SWE agent authored 156 of the 177 merged PRs. One person — @pelikhan, the project lead — merged 175 of them. Bugs link to merged PRs at 96% bot-authored. Enhancements drop to 67%. The more judgment the work demands, the less likely an agent authors the fix.

This is the machine your issue is talking to. The agent has to understand your issue well enough to fix it without asking.

What this means

Good evidence only helps once the system has decided to act on it. The shape of the work determines whether evidence helps.

For failure-shaped work, that means evidence of the breakdown: error output, reproduction steps, the specific location. Show the agent what failed. For change-shaped work, that means specificity of intent. Describe the end state, not the implementation.

A GitHub issue is intent written for an agent. The quality of that interface determines whether you get minutes or days.

71 minutes from filing to merged fix. That is what happens when classification, eligibility, and priority all align.

The data and methodology are open source.