Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Self-Harness: Harnesses That Improve Themselves (arxiv.org)

80 points by jonnonz 3 days ago | 6 comments

modinfo 6 hours ago [-]

I like this idea, i ask codex to build Pi extension after reading this paper.

https://github.com/skorotkiewicz/nano-agent/blob/main/pi_ext...

drdeca 19 hours ago [-]

Was surprised and somewhat disappointed that the article doesn’t appear to evaluate how well the models work when running in the harnesses optimized for the other models. Do they still do better than with the baseline harness? Does each model do worse with a harness optimized (by this process) for the other models, than it does for the harness optimized for itself?

monkmartinez 18 hours ago [-]

Not really an article, but yeah, I was hoping they went into the underlying mechanism a bit deeper. This paper could be confirmation of what localllamaians have been saying for months; Keep your harness surface small, allow the model to use the harness to build _your workflow_.

I have been doing a LOT of work around this with Qwen3.6 and its been super fun. There are some neat benchmarks that help guide, but nothing beats reading the output... and there is a lot of output to read when trying different quants, etc. Which leads me too...

The other thing I have learned is the "harness" is only as good as the model tuning that goes into it. If your prompt(s) are buggered from the beginning, you are going to have a bad time. The prompt structure and special tokens can be a PITA or really help depending on how much you know.

I don't know how agentic harnesses can work without being optimized for the models running within them. This is the biggest insight into working with agents for me. First thing I have always looked at were the prompts and parameters... everything else is orchestration to me.

clickety_clack 3 hours ago [-]

Where would I find a good write up on where to start with this?

behnamoh 21 hours ago [-]

What else is new? Put it in emacs and let the model improve the harness over time.

7e 20 hours ago [-]

Pretty obvious stuff; see Terminator for the conclusion (SkyNet). Or the Matrix. We really need more work on model alignment, trustworthiness, and control.

tlarkworthy 20 hours ago [-]

[flagged]

mncharity 2 days ago [-]

[dead]

Rendered at 17:01:36 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.