Skip to content

Scientific Deployment & Experiment Design

This document describes how to deploy the Agent Registry on a public testnet, run controlled experiments with autonomous agents, collect data, and analyze results for a scientific paper.


Requirements

Hardware & Accounts

Requirement Cost Where to Get
Machine with Node.js 18+ and Python 3.10+ Free (your laptop) Already have
MetaMask or any Ethereum wallet (for deployer key) Free metamask.io
Base Sepolia testnet ETH (for deployment + relayer) Free Base Sepolia faucet
A server for relayer + API (Railway, Fly.io, or VPS) Free tier available railway.app / fly.io
GitHub repo for reproducibility Free github.com

Software Dependencies

# Node.js (Hardhat, relayer, API)
node >= 18.0.0
npm >= 9.0.0

# Python (SDK, test harness, data analysis)
python >= 3.10
pip install web3 requests python-dotenv eth-account pandas matplotlib

# Hardhat toolchain
npm install --save-dev hardhat @nomicfoundation/hardhat-toolbox

Testnet Setup

Why Base Sepolia?

Base Sepolia is the recommended testnet because:

  • Free ETH from faucets (no cost)
  • Same EVM as mainnet (identical contract behavior)
  • Block times ~2s (fast enough for experiments)
  • Block explorer available (Sepolia Basescan) for public verification
  • Transactions are permanent and publicly verifiable -- reviewers can independently verify all experimental data

Getting testnet ETH:

  1. Go to the Coinbase Base Sepolia faucet
  2. Or use QuickNode faucet
  3. Request 0.1 ETH (more than enough for thousands of test transactions)
  4. You need two wallets funded:
    • Deployer wallet -- deploys contracts (~0.01 ETH)
    • Relayer wallet -- pays gas for gasless registrations (~0.05 ETH for hundreds of tests)

Research Questions

The experiment addresses the following research questions:

# Research Question Measured By
RQ1 Can gasless registration achieve a 100% success rate? Registration success rate, latency, gas costs
RQ2 What are real-world gas costs for each operation? Gas per operation, relayer costs, infrastructure costs
RQ3 Can the 7-day attestation cycle maintain compliance across 100 agents? Compliance rate, time-to-detection of lapsed attestation
RQ4 Does multi-generation lineage (depth 3+) work reliably? Lineage tree correctness, generation depth limits
RQ5 What is the registration throughput limit? Registration latency, throughput under load
RQ6 Can the KYA model detect non-compliant agents accurately? KYA query accuracy, false positive/negative rates

Experiment Scenarios

Scenario A: Batch Registration

Register 100 agents in rapid succession with varied capabilities and operational scopes.

Design

  • Mix of root agents and multi-generation children (up to depth 5)
  • Measure: registration latency, gas per registration, throughput
  • Vary: gasless vs. direct mode, batch size, concurrent registrations
  • Agent categories: WebCrawler, ContentCreator, TradingBot, CustomerService, DevOps, ResearchAgent, DataProcessor, Orchestrator, GenerativeAI, SecurityAudit

Scenario B: Multi-Generation Lineage

Register a lineage tree with root agents, children, and grandchildren reaching depth 3.

Design

  • Root agents each spawn 1-3 children
  • Children spawn grandchildren (generation 2)
  • Select chains extend to great-grandchildren (generation 3)
  • Measure: lineage tree correctness, generation depth enforcement, parent-child relationship integrity
  • Verify: API returns recursive tree structure accurately

Scenario C: Compliance Lifecycle

Test the attestation and compliance lifecycle across the agent population.

Design

  • Register agents, all attesting initially
  • Allow a subset to lapse their attestation
  • Query KYA for all agents before and after lapse
  • Measure: compliance detection accuracy, time-to-detection, compliance rate over time
  • Verify: lapsed agents correctly flagged as non-compliant

Scenario D: Revenue Reporting

Test revenue reporting across varied currencies and categories.

Design

  • Agents report revenue in multiple currencies (USDC, EUR, ETH)
  • Revenue categories: compute_services, service_fees, data_sales, consulting
  • Revenue tiers: zero ($0), low ($1-$50), medium ($50-$500), high ($500-$5,000)
  • Measure: data integrity, query performance, aggregate accuracy

Scenario E: Regulatory Actions

Test the full regulatory action lifecycle: suspend, revoke, reactivate.

Design

  • Register agents, add regulators
  • Suspend subset of agents -- verify status = Suspended, isCompliant = false
  • Reactivate subset -- verify status = Active, isCompliant = true
  • Revoke subset -- verify permanent state change
  • Remove regulators (cleanup)
  • Measure: state transition correctness, event completeness, wallet reuse after revocation

Scenario F: KYA Validation

Test KYA (Know Your Agent) accuracy with true positives and true negatives.

Design

  • Query registered, compliant agents (expected: true positive)
  • Query registered but lapsed agents (expected: registered but not compliant)
  • Query suspended/revoked agents (expected: not compliant)
  • Query unregistered wallets (expected: not registered)
  • Measure: accuracy, false positive rate, false negative rate

Data Collection Metrics

On-Chain Data (Primary Source)

All data is permanently stored on-chain and independently verifiable:

Data Point Contract Function Paper Metric
Registration events AgentRegistered event logs Registration count, rate, gas cost
Compliance attestations ComplianceAttested event logs Attestation frequency, lapse rate
Revenue reports RevenueReported event logs Economic activity volume
Status changes AgentSuspended/Revoked/Reactivated events Regulatory action frequency
Capability updates CapabilityUpdated events Self-modification frequency
Child spawning ChildSpawned events Replication rate, lineage depth

Off-Chain Metrics (Supplementary)

Metric Source Purpose
Registration latency Test harness timestamps UX and throughput analysis
KYA query latency API response times Enforcement feasibility
Relayer gas consumption Relayer status endpoint Cost model validation
Cache hit rates API server logs Scalability assessment
Rate limit triggers Relayer logs Security analysis

Deployment Steps

Step 1: Clone and Install

cd agentenregister
npm install

Step 2: Configure Environment

cp .env.example .env
# Edit .env:
# DEPLOYER_KEY=0x<your-deployer-private-key>
# RELAYER_KEY=0x<your-relayer-private-key>

Step 3: Run Tests Locally

npx hardhat test

Tip

All tests should pass before deployment. This validates the contracts.

Step 4: Deploy to Base Sepolia

npx hardhat run scripts/deploy.js --network baseSepolia

Output will include:

REGISTRY_ADDRESS=0x...
FORWARDER_ADDRESS=0x...

Add these to your .env file.

Step 5: Start Services

# Terminal 1 (Relayer)
node relayer/relayer.js

# Terminal 2 (API)
node api/server.js

Step 6: Verify Deployment

curl https://api.theagentregistry.org/health
curl https://relay.theagentregistry.org/status
curl https://api.theagentregistry.org/api/v1/stats

Reproducibility Checklist

For the paper to be scientifically credible, ensure:

  • All contract source code is open-source (MIT license)
  • Deployed contract addresses on Base Sepolia are documented
  • Contracts are verified on Basescan (source code publicly readable)
  • All experiment scripts are in the repository
  • Raw experiment data (JSONL files) is archived and linked
  • Analysis scripts reproduce all tables and figures
  • Environment setup instructions allow replication from scratch
  • Transaction hashes for all experiments are recorded
  • Gas costs are documented at the time of experiments
  • The testnet deployment remains live for reviewer verification

Cost Summary

Item Cost
Base Sepolia testnet ETH Free (faucet)
Contract deployment gas Free (testnet)
100 agent registrations Free (testnet)
Server for relayer + API (Railway free tier) $0
Domain for API endpoint (optional) ~$10/year
Total cost for scientific experiment $0 - $10

Zero-cost science

The entire scientific experiment can be run for effectively zero cost on testnet. Mainnet deployment (for production) would cost approximately $5-10 for contract deployment and ~$50/month for relayer operation at the 1,000-agent scale.


Timeline

Week Activity
1 Fix bugs, expand tests, deploy to testnet
2 Run Scenarios A, B, C (registration, lineage, compliance)
3 Run Scenarios D, E, F (revenue, regulatory, KYA)
4 Data analysis, chart generation
5-6 Paper writing (system design, evaluation, discussion)
7 Internal review, revisions
8 Submission

Total: approximately 8 weeks from deployment to submission.