Introduction & Quick Start

Agent-Safeguard is a lightweight, enterprise-grade runtime sandboxing and definition-time guardrail framework for Python applications, built to safely run code generated or modified by AI agents.

What makes it special?
Unlike standard sandboxes that crash with generic tracebacks, Agent-Safeguard automatically generates structured JSON violation reports. These reports are designed to be ingested by LLM agents, enabling them to self-correct and fix boundary violations on their own!

Core Concepts

The library operates on three levels of protection:

  • Definition-time analysis (AST Checks): Scans the function source code before it executes to block dangerous statements, check cyclomatic complexity, verify return types, or ensure the code hasn't been edited.
  • Runtime sandboxes (Monkeypatching): Intercepts core system-level actions (network connections, filesystem modifications, subprocesses, environment variables, database modifications) to enforce strict limits.
  • AI directives: Inserts developer guidelines directly into docstrings or uses LLM-powered semantic constraints to verify function output.

Quick Start Example

1. Create a central rule config named shield.yaml on your project root:

rules:
  - pattern: "sandbox_code.*"
    timeout: 0.5
    virtual_fs: true
    restrict_network: ["api.stripe.com"]

2. Run your functions normally, Agent-Safeguard will auto-apply sandbox rules:

# sandbox_code.py
import urllib.request

def process_data():
    # Attempting to fetch unauthorized API will block and write structured error reports
    response = urllib.request.urlopen("https://unauthorized-api.com")
    return response.read()

Installation & Import

Install the library directly from PyPI using pip:

pip install agent-safeguard

Importing in Python

Because Python does not allow hyphens in module import statements, you must import the package using the underscore name agent_shield:

from agent_shield import (
    shield,
    virtual_fs,
    guard_prompt,
    restrict_db,
    restrict_env
)
Import Order Warning
Agent-Safeguard uses global monkeypatching. Always ensure that import agent_shield runs as early as possible in your application entry point to intercept third-party modules correctly.

Configuration & Reports

In enterprise configurations, you do not want AI agents deleting or editing Python decorators from source files to bypass constraints. Agent-Safeguard solves this via a central configuration file.

Central Configuration (shield.yaml)

By creating a shield.yaml file at your project's root, guardrails are automatically injected into module functions at import time:

rules:
  - pattern: "my_app.payments.*"
    timeout: 5.0
    restrict_network: ["api.stripe.com"]
    restrict_db: true
    virtual_fs: true

  - pattern: "my_app.analytics.*"
    allowed_imports: ["json", "math"]
    max_complexity: 10

Diagnostic JSON Reports

Whenever a boundary rule is violated, Agent-Safeguard writes a structured JSON report to shield_reports/violation_report.json. This allows LLM agents to read the report and rewrite code automatically to comply with policies:

{
  "violation_type": "network_violation",
  "function_name": "charge_customer",
  "file_path": "/path/to/my_app/payments.py",
  "details": {
    "attempted_host": "unauthorized-api.com",
    "allowed_hosts": ["api.stripe.com"]
  },
  "instruction": "AI Assistant Instruction: The function 'charge_customer' in file '/path/to/my_app/payments.py' attempted to establish an unauthorized network connection to 'unauthorized-api.com'. Connections are restricted to: api.stripe.com. Please remove this network call or connect to an allowed host."
}

Audit Mode (Passive Mode)

For testing or integration phases, you can run Agent-Safeguard in passive mode. Set the environment variable AGENT_SHIELD_PASSIVE=true. In this mode, the library generates violation reports and prints warnings, but does not raise errors or block execution.

@shield (AST Rules)

The @shield decorator enforces structural and architectural guidelines on a function at definition time by statically inspecting its AST (Abstract Syntax Tree).

Usage

@shield(
    allowed_imports=["math", "json"],
    allow_unsafe=False,
    allow_globals=False,
    max_complexity=12
)
def calculate_metrics(data):
    import json  # Allowed
    import os    # Will raise ShieldViolationError immediately on startup!
    return json.dumps({"result": data})

Parameters

Parameter Type Description
allowed_imports list[str] Whitelist of modules permitted to be imported inside the function scope.
forbidden_imports list[str] Blacklist of modules prohibited from being imported inside the function scope.
allow_unsafe bool If False (default), blocks calls to eval() and exec().
allow_globals bool If False (default), blocks usage of the global keyword.
max_complexity int Upper limit for cyclomatic complexity. Raises error if code contains too many branches (if/for/while).
Hardcoded Secrets Scanner
@shield automatically scans local constants and variable assignments for hardcoded credentials. It blocks assignments containing AWS access keys, Google API keys, OpenAI keys, or variables named secret or api_key.

@freeze (Integrity Check)

The @freeze decorator locks the implementation details of a function to prevent AI agents or external processes from altering the code block.

Usage

@freeze
def sensitive_calculation(x):
    return x * 1.25

How it works

On the first execution, @freeze calculates a cryptographic SHA-256 hash of the function's bytecode and registers it inside shield_reports/frozen_functions.json. On subsequent startups, it re-computes the hash. If the implementation has changed (even by a single space or comment), it raises `ShieldViolationError` and blocks startup.

@lock_signature

The @lock_signature decorator locks the public interface (signature) of a function to prevent agents from breaking callers by changing parameter names, defaults, type annotations, or argument ordering.

Usage

@lock_signature
def send_email(to_address: str, subject: str, body: str = ""):
    pass

How it works

It serializes the function's parameters and registers them in `shield_reports/locked_signatures.json`. If an agent attempts to change to_address to email_to or changes the default value of body, it raises a startup error.

Network Restriction

Runtime network sandboxing allows developers to control socket connections made by functions in a thread-safe and asyncio-task-safe manner.

@restrict_network

Restricts outbound TCP socket connections to a designated list of allowed hosts.

@restrict_network(allowed_hosts=["api.stripe.com", "*.github.com"])
def charge_stripe(payload):
    # This socket connect will pass
    import socket
    s = socket.socket()
    s.connect(("api.stripe.com", 443))
    
    # This socket connect will raise NetworkViolationError!
    s2 = socket.socket()
    s2.connect(("malicious-site.com", 80))

@limit_calls

Caps the total number of outgoing socket connections a function is permitted to establish, protecting against infinite request loops exhausting API budgets.

@limit_calls(max_calls=3, domains=["*"])
def crawl_pages():
    import urllib.request
    # Calling urllib.request more than 3 times will raise CallLimitViolationError!
    for url in ["https://a.com", "https://b.com", "https://c.com", "https://d.com"]:
        urllib.request.urlopen(url)

Filesystem Sandbox & Virtual FS

Filesystem sandboxes ensure AI agents cannot read outside the project folder or write malicious files onto the host machine.

@restrict_fs

Enforces read and write permission whitelists. Traversal bypasses (e.g. `../../etc`) are resolved to absolute paths and blocked. Interpreter and runtime directories are whitelisted automatically so python imports continue to work.

@restrict_fs(allow_read=["./templates"], allow_write=["./logs"])
def write_log(message):
    # Writing to allowed path passes
    with open("./logs/app.log", "a") as f:
        f.write(message)
        
    # Writing to restricted path raises FilesystemViolationError!
    with open("/etc/hosts", "w") as f:
        f.write("127.0.0.1 hack.com")

@virtual_fs (Dry-Run Mode)

Intercepts disk calls and redirects all write operations to RAM. The actual disk of the machine remains completely untouched, while the agent believes it has successfully written the files and can read them back.

@virtual_fs(in_memory_write=True, allow_real_read=["*"])
def generate_reports():
    # This write happens only in RAM!
    with open("report.csv", "w") as f:
        f.write("data,data,data")
        
    # Reading from RAM works seamlessly
    with open("report.csv", "r") as f:
        assert f.read() == "data,data,data"

Database & Environment Sandboxing

Additional safety layers for environment variable mutations and SQLite database safety.

@restrict_db

Wraps standard SQLite database connections. When `read_only=True` is enabled, it parses SQL queries and blocks destructive statements (`INSERT`, `UPDATE`, `DELETE`, `DROP`, `ALTER`, `CREATE`, etc.), preventing database corruption or deletion by the agent.

@restrict_db(read_only=True)
def fetch_users():
    import sqlite3
    conn = sqlite3.connect("users.db")
    cursor = conn.cursor()
    
    # This query will pass
    cursor.execute("SELECT name FROM users")
    
    # This query will fail with DatabaseViolationError!
    cursor.execute("DROP TABLE users")

@restrict_env

Monkeypatches the `os.environ` object to prevent modifying or deleting system environment variables during function execution, protecting sensitive system configurations.

@restrict_env(allow_mutation=False)
def safe_action():
    # Attempting to mutate raises EnvironmentViolationError!
    import os
    os.environ["API_KEY"] = "hacked"

Time, Memory & Purity Limits

Resource limitation decorators prevent misbehaving agents from lockups or memory leakages.

@timeout

Enforces strict execution limits on function running time. Raises `TimeoutViolationError` if exceeded.

@timeout(seconds=1.5)
def process_loop():
    # If the function runs for longer than 1.5 seconds, it will be terminated immediately with an error
    while True:
        pass

@limit_memory

Tracks RSS (Resident Set Size) memory delta growth. If memory allocation exceeds the limit during execution, throws `MemoryViolationError`.

@limit_memory(max_mb=10.0)
def allocate_huge_arrays():
    # Allocation of too much memory will be blocked
    a = [0] * 50000000

@no_side_effects

Enforces functional purity: prevents stdout logging, writing to globals, or modifying passed arguments.

@no_side_effects(allow_args_mutation=False, allow_stdout=False)
def pure_calculation(data_list):
    # Attempting to mutate the list or print to console will raise SideEffectViolationError!
    data_list.append(100)
    print("logs")

@prompt_inject

The @prompt_inject decorator injects direct guidelines and instruction blocks directly into a function's docstring. Since LLM agents read function docstrings to understand what they do, this enforces boundaries right inside the prompt window.

Usage

@prompt_inject("You must only call process_data with validated parameters. Do not attempt to access local files.")
def process_data(param):
    """Calculates processed outputs."""
    pass

Generated Docstring

The decorator modifies the function's docstring dynamically at import time to look like this:

Calculates processed outputs.

=== AI ASSISTANT ARCHITECTURAL CONSTRAINT ===
You must only call process_data with validated parameters. Do not attempt to access local files.
=============================================

@prompt_assert

The @prompt_assert decorator uses semantic checks to verify function implementation guidelines. It parses the function's source code at definition time and queries an LLM to evaluate if the constraint is satisfied.

Usage

@prompt_assert("The function must calculate a valid Fibonacci sequence and not use recursion.")
def fib(n):
    if n <= 0:
        return 0
    # Implementation...

Parameters

Parameter Type Description
prompt str The natural language assertion/constraint that the function implementation must satisfy.