Scientific Deployment & Experiment Design¶
This document describes how to deploy the Agent Registry on a public testnet, run controlled experiments with autonomous agents, collect data, and analyze results for a scientific paper.
Requirements¶
Hardware & Accounts¶
| Requirement | Cost | Where to Get |
|---|---|---|
| Machine with Node.js 18+ and Python 3.10+ | Free (your laptop) | Already have |
| MetaMask or any Ethereum wallet (for deployer key) | Free | metamask.io |
| Base Sepolia testnet ETH (for deployment + relayer) | Free | Base Sepolia faucet |
| A server for relayer + API (Railway, Fly.io, or VPS) | Free tier available | railway.app / fly.io |
| GitHub repo for reproducibility | Free | github.com |
Software Dependencies¶
# Node.js (Hardhat, relayer, API)
node >= 18.0.0
npm >= 9.0.0
# Python (SDK, test harness, data analysis)
python >= 3.10
pip install web3 requests python-dotenv eth-account pandas matplotlib
# Hardhat toolchain
npm install --save-dev hardhat @nomicfoundation/hardhat-toolbox
Testnet Setup¶
Why Base Sepolia?
Base Sepolia is the recommended testnet because:
- Free ETH from faucets (no cost)
- Same EVM as mainnet (identical contract behavior)
- Block times ~2s (fast enough for experiments)
- Block explorer available (Sepolia Basescan) for public verification
- Transactions are permanent and publicly verifiable -- reviewers can independently verify all experimental data
Getting testnet ETH:
- Go to the Coinbase Base Sepolia faucet
- Or use QuickNode faucet
- Request 0.1 ETH (more than enough for thousands of test transactions)
- You need two wallets funded:
- Deployer wallet -- deploys contracts (~0.01 ETH)
- Relayer wallet -- pays gas for gasless registrations (~0.05 ETH for hundreds of tests)
Research Questions¶
The experiment addresses the following research questions:
| # | Research Question | Measured By |
|---|---|---|
| RQ1 | Can gasless registration achieve a 100% success rate? | Registration success rate, latency, gas costs |
| RQ2 | What are real-world gas costs for each operation? | Gas per operation, relayer costs, infrastructure costs |
| RQ3 | Can the 7-day attestation cycle maintain compliance across 100 agents? | Compliance rate, time-to-detection of lapsed attestation |
| RQ4 | Does multi-generation lineage (depth 3+) work reliably? | Lineage tree correctness, generation depth limits |
| RQ5 | What is the registration throughput limit? | Registration latency, throughput under load |
| RQ6 | Can the KYA model detect non-compliant agents accurately? | KYA query accuracy, false positive/negative rates |
Experiment Scenarios¶
Scenario A: Batch Registration¶
Register 100 agents in rapid succession with varied capabilities and operational scopes.
Design
- Mix of root agents and multi-generation children (up to depth 5)
- Measure: registration latency, gas per registration, throughput
- Vary: gasless vs. direct mode, batch size, concurrent registrations
- Agent categories: WebCrawler, ContentCreator, TradingBot, CustomerService, DevOps, ResearchAgent, DataProcessor, Orchestrator, GenerativeAI, SecurityAudit
Scenario B: Multi-Generation Lineage¶
Register a lineage tree with root agents, children, and grandchildren reaching depth 3.
Design
- Root agents each spawn 1-3 children
- Children spawn grandchildren (generation 2)
- Select chains extend to great-grandchildren (generation 3)
- Measure: lineage tree correctness, generation depth enforcement, parent-child relationship integrity
- Verify: API returns recursive tree structure accurately
Scenario C: Compliance Lifecycle¶
Test the attestation and compliance lifecycle across the agent population.
Design
- Register agents, all attesting initially
- Allow a subset to lapse their attestation
- Query KYA for all agents before and after lapse
- Measure: compliance detection accuracy, time-to-detection, compliance rate over time
- Verify: lapsed agents correctly flagged as non-compliant
Scenario D: Revenue Reporting¶
Test revenue reporting across varied currencies and categories.
Design
- Agents report revenue in multiple currencies (USDC, EUR, ETH)
- Revenue categories: compute_services, service_fees, data_sales, consulting
- Revenue tiers: zero ($0), low ($1-$50), medium ($50-$500), high ($500-$5,000)
- Measure: data integrity, query performance, aggregate accuracy
Scenario E: Regulatory Actions¶
Test the full regulatory action lifecycle: suspend, revoke, reactivate.
Design
- Register agents, add regulators
- Suspend subset of agents -- verify status = Suspended, isCompliant = false
- Reactivate subset -- verify status = Active, isCompliant = true
- Revoke subset -- verify permanent state change
- Remove regulators (cleanup)
- Measure: state transition correctness, event completeness, wallet reuse after revocation
Scenario F: KYA Validation¶
Test KYA (Know Your Agent) accuracy with true positives and true negatives.
Design
- Query registered, compliant agents (expected: true positive)
- Query registered but lapsed agents (expected: registered but not compliant)
- Query suspended/revoked agents (expected: not compliant)
- Query unregistered wallets (expected: not registered)
- Measure: accuracy, false positive rate, false negative rate
Data Collection Metrics¶
On-Chain Data (Primary Source)¶
All data is permanently stored on-chain and independently verifiable:
| Data Point | Contract Function | Paper Metric |
|---|---|---|
| Registration events | AgentRegistered event logs |
Registration count, rate, gas cost |
| Compliance attestations | ComplianceAttested event logs |
Attestation frequency, lapse rate |
| Revenue reports | RevenueReported event logs |
Economic activity volume |
| Status changes | AgentSuspended/Revoked/Reactivated events |
Regulatory action frequency |
| Capability updates | CapabilityUpdated events |
Self-modification frequency |
| Child spawning | ChildSpawned events |
Replication rate, lineage depth |
Off-Chain Metrics (Supplementary)¶
| Metric | Source | Purpose |
|---|---|---|
| Registration latency | Test harness timestamps | UX and throughput analysis |
| KYA query latency | API response times | Enforcement feasibility |
| Relayer gas consumption | Relayer status endpoint | Cost model validation |
| Cache hit rates | API server logs | Scalability assessment |
| Rate limit triggers | Relayer logs | Security analysis |
Deployment Steps¶
Step 1: Clone and Install¶
Step 2: Configure Environment¶
cp .env.example .env
# Edit .env:
# DEPLOYER_KEY=0x<your-deployer-private-key>
# RELAYER_KEY=0x<your-relayer-private-key>
Step 3: Run Tests Locally¶
Tip
All tests should pass before deployment. This validates the contracts.
Step 4: Deploy to Base Sepolia¶
Output will include:
Add these to your .env file.
Step 5: Start Services¶
Step 6: Verify Deployment¶
curl https://api.theagentregistry.org/health
curl https://relay.theagentregistry.org/status
curl https://api.theagentregistry.org/api/v1/stats
Reproducibility Checklist¶
For the paper to be scientifically credible, ensure:
- All contract source code is open-source (MIT license)
- Deployed contract addresses on Base Sepolia are documented
- Contracts are verified on Basescan (source code publicly readable)
- All experiment scripts are in the repository
- Raw experiment data (JSONL files) is archived and linked
- Analysis scripts reproduce all tables and figures
- Environment setup instructions allow replication from scratch
- Transaction hashes for all experiments are recorded
- Gas costs are documented at the time of experiments
- The testnet deployment remains live for reviewer verification
Cost Summary¶
| Item | Cost |
|---|---|
| Base Sepolia testnet ETH | Free (faucet) |
| Contract deployment gas | Free (testnet) |
| 100 agent registrations | Free (testnet) |
| Server for relayer + API (Railway free tier) | $0 |
| Domain for API endpoint (optional) | ~$10/year |
| Total cost for scientific experiment | $0 - $10 |
Zero-cost science
The entire scientific experiment can be run for effectively zero cost on testnet. Mainnet deployment (for production) would cost approximately $5-10 for contract deployment and ~$50/month for relayer operation at the 1,000-agent scale.
Timeline¶
| Week | Activity |
|---|---|
| 1 | Fix bugs, expand tests, deploy to testnet |
| 2 | Run Scenarios A, B, C (registration, lineage, compliance) |
| 3 | Run Scenarios D, E, F (revenue, regulatory, KYA) |
| 4 | Data analysis, chart generation |
| 5-6 | Paper writing (system design, evaluation, discussion) |
| 7 | Internal review, revisions |
| 8 | Submission |
Total: approximately 8 weeks from deployment to submission.