Recap on ML

Recap on ML

Notes on AI agents, ML, and dev tooling — by qte77

Home Categories Search About Archive

Building a Trustworthy Agent Loop for a Physical Lab

June 12, 2026

We build autonomous, self-evaluating agents — and we're building the hardest version of that idea, an open sub-$1k self-driving lab. This is the thesis, the plan, and the one piece that already ships. The AI scientist's missing half is the agent, not the model.
Read More

A $150 Pipetting Robot from a Stock 3D Printer

June 1, 2026

Turn a used Anycubic i3 Mega + a DLAB dPette+ electronic pipette into a 96-well disposable-tip pipetting robot. Marlin runs unmodified. Python drives. Apache-2.0.
Read More

GraphJudge — Measuring How Agents Collaborate

January 15, 2026

GraphJudge is an A2A-compliant multi-agent evaluation framework for the AgentBeats competition that captures interaction traces as directed graphs and scores coordination quality through structural graph metrics, LLM-as-judge qualitative assessment, and text similarity reproducibility checks.
Read More

AI Agents-eval Papers Meta Review

August 9, 2025

A meta-review of 50+ agentic AI evaluation papers identifying key dimensions — autonomy, multi-agent coordination, safety, and explainability — and critical gaps in standardization and long-term behavioral assessment.
Read More

AI Agents-eval Enhancement Recommendations

August 9, 2025

A prioritized roadmap of twelve proposed enhancements for the Agents-eval project, spanning multi-dimensional evaluation architecture, safety-first modules, predictive evaluation, and AgentOps integration.
Read More

AI Agents-eval Comprehensive Analysis

August 9, 2025

Per-paper summaries of 50+ agent evaluation research papers from 2023 to 2025, assessing each work's evaluation approach, focus area, relevance to the Agents-eval project, and a concrete integration example.
Read More
« Prev 1 2 3 4 5 Next »