Introduction & Quick Start
Agent-Safeguard is a lightweight, enterprise-grade runtime sandboxing and definition-time guardrail framework for Python applications, built to safely run code generated or modified by AI agents.
Core Concepts
The library operates on three levels of protection:
- Definition-time analysis (AST Checks): Scans the function source code before it executes to block dangerous statements, check cyclomatic complexity, verify return types, or ensure the code hasn't been edited.
- Runtime sandboxes (Monkeypatching): Intercepts core system-level actions (network connections, filesystem modifications, subprocesses, environment variables, database modifications) to enforce strict limits.
- AI directives: Inserts developer guidelines directly into docstrings or uses LLM-powered semantic constraints to verify function output.
Quick Start Example
1. Create a central rule config named shield.yaml on your project root:
rules:
- pattern: "sandbox_code.*"
timeout: 0.5
virtual_fs: true
restrict_network: ["api.stripe.com"]
2. Run your functions normally, Agent-Safeguard will auto-apply sandbox rules:
# sandbox_code.py
import urllib.request
def process_data():
# Attempting to fetch unauthorized API will block and write structured error reports
response = urllib.request.urlopen("https://unauthorized-api.com")
return response.read()
Installation & Import
Install the library directly from PyPI using pip:
pip install agent-safeguard
Importing in Python
Because Python does not allow hyphens in module import statements, you must import the package using the underscore name agent_shield:
from agent_shield import (
shield,
virtual_fs,
guard_prompt,
restrict_db,
restrict_env
)
import agent_shield runs as early as possible in your application entry point to intercept third-party modules correctly.
Configuration & Reports
In enterprise configurations, you do not want AI agents deleting or editing Python decorators from source files to bypass constraints. Agent-Safeguard solves this via a central configuration file.
Central Configuration (shield.yaml)
By creating a shield.yaml file at your project's root, guardrails are automatically injected into module functions at import time:
rules:
- pattern: "my_app.payments.*"
timeout: 5.0
restrict_network: ["api.stripe.com"]
restrict_db: true
virtual_fs: true
- pattern: "my_app.analytics.*"
allowed_imports: ["json", "math"]
max_complexity: 10
Diagnostic JSON Reports
Whenever a boundary rule is violated, Agent-Safeguard writes a structured JSON report to shield_reports/violation_report.json. This allows LLM agents to read the report and rewrite code automatically to comply with policies:
{
"violation_type": "network_violation",
"function_name": "charge_customer",
"file_path": "/path/to/my_app/payments.py",
"details": {
"attempted_host": "unauthorized-api.com",
"allowed_hosts": ["api.stripe.com"]
},
"instruction": "AI Assistant Instruction: The function 'charge_customer' in file '/path/to/my_app/payments.py' attempted to establish an unauthorized network connection to 'unauthorized-api.com'. Connections are restricted to: api.stripe.com. Please remove this network call or connect to an allowed host."
}
Audit Mode (Passive Mode)
For testing or integration phases, you can run Agent-Safeguard in passive mode. Set the environment variable AGENT_SHIELD_PASSIVE=true. In this mode, the library generates violation reports and prints warnings, but does not raise errors or block execution.
@shield (AST Rules)
The @shield decorator enforces structural and architectural guidelines on a function at definition time by statically inspecting its AST (Abstract Syntax Tree).
Usage
@shield(
allowed_imports=["math", "json"],
allow_unsafe=False,
allow_globals=False,
max_complexity=12
)
def calculate_metrics(data):
import json # Allowed
import os # Will raise ShieldViolationError immediately on startup!
return json.dumps({"result": data})
Parameters
| Parameter | Type | Description |
|---|---|---|
allowed_imports |
list[str] |
Whitelist of modules permitted to be imported inside the function scope. |
forbidden_imports |
list[str] |
Blacklist of modules prohibited from being imported inside the function scope. |
allow_unsafe |
bool |
If False (default), blocks calls to eval() and exec(). |
allow_globals |
bool |
If False (default), blocks usage of the global keyword. |
max_complexity |
int |
Upper limit for cyclomatic complexity. Raises error if code contains too many branches (if/for/while). |
@shield automatically scans local constants and variable assignments for hardcoded credentials. It blocks assignments containing AWS access keys, Google API keys, OpenAI keys, or variables named secret or api_key.
@freeze (Integrity Check)
The @freeze decorator locks the implementation details of a function to prevent AI agents or external processes from altering the code block.
Usage
@freeze
def sensitive_calculation(x):
return x * 1.25
How it works
On the first execution, @freeze calculates a cryptographic SHA-256 hash of the function's bytecode and registers it inside shield_reports/frozen_functions.json. On subsequent startups, it re-computes the hash. If the implementation has changed (even by a single space or comment), it raises `ShieldViolationError` and blocks startup.
@lock_signature
The @lock_signature decorator locks the public interface (signature) of a function to prevent agents from breaking callers by changing parameter names, defaults, type annotations, or argument ordering.
Usage
@lock_signature
def send_email(to_address: str, subject: str, body: str = ""):
pass
How it works
It serializes the function's parameters and registers them in `shield_reports/locked_signatures.json`. If an agent attempts to change to_address to email_to or changes the default value of body, it raises a startup error.
Network Restriction
Runtime network sandboxing allows developers to control socket connections made by functions in a thread-safe and asyncio-task-safe manner.
@restrict_network
Restricts outbound TCP socket connections to a designated list of allowed hosts.
@restrict_network(allowed_hosts=["api.stripe.com", "*.github.com"])
def charge_stripe(payload):
# This socket connect will pass
import socket
s = socket.socket()
s.connect(("api.stripe.com", 443))
# This socket connect will raise NetworkViolationError!
s2 = socket.socket()
s2.connect(("malicious-site.com", 80))
@limit_calls
Caps the total number of outgoing socket connections a function is permitted to establish, protecting against infinite request loops exhausting API budgets.
@limit_calls(max_calls=3, domains=["*"])
def crawl_pages():
import urllib.request
# Calling urllib.request more than 3 times will raise CallLimitViolationError!
for url in ["https://a.com", "https://b.com", "https://c.com", "https://d.com"]:
urllib.request.urlopen(url)
Filesystem Sandbox & Virtual FS
Filesystem sandboxes ensure AI agents cannot read outside the project folder or write malicious files onto the host machine.
@restrict_fs
Enforces read and write permission whitelists. Traversal bypasses (e.g. `../../etc`) are resolved to absolute paths and blocked. Interpreter and runtime directories are whitelisted automatically so python imports continue to work.
@restrict_fs(allow_read=["./templates"], allow_write=["./logs"])
def write_log(message):
# Writing to allowed path passes
with open("./logs/app.log", "a") as f:
f.write(message)
# Writing to restricted path raises FilesystemViolationError!
with open("/etc/hosts", "w") as f:
f.write("127.0.0.1 hack.com")
@virtual_fs (Dry-Run Mode)
Intercepts disk calls and redirects all write operations to RAM. The actual disk of the machine remains completely untouched, while the agent believes it has successfully written the files and can read them back.
@virtual_fs(in_memory_write=True, allow_real_read=["*"])
def generate_reports():
# This write happens only in RAM!
with open("report.csv", "w") as f:
f.write("data,data,data")
# Reading from RAM works seamlessly
with open("report.csv", "r") as f:
assert f.read() == "data,data,data"
Database & Environment Sandboxing
Additional safety layers for environment variable mutations and SQLite database safety.
@restrict_db
Wraps standard SQLite database connections. When `read_only=True` is enabled, it parses SQL queries and blocks destructive statements (`INSERT`, `UPDATE`, `DELETE`, `DROP`, `ALTER`, `CREATE`, etc.), preventing database corruption or deletion by the agent.
@restrict_db(read_only=True)
def fetch_users():
import sqlite3
conn = sqlite3.connect("users.db")
cursor = conn.cursor()
# This query will pass
cursor.execute("SELECT name FROM users")
# This query will fail with DatabaseViolationError!
cursor.execute("DROP TABLE users")
@restrict_env
Monkeypatches the `os.environ` object to prevent modifying or deleting system environment variables during function execution, protecting sensitive system configurations.
@restrict_env(allow_mutation=False)
def safe_action():
# Attempting to mutate raises EnvironmentViolationError!
import os
os.environ["API_KEY"] = "hacked"
Time, Memory & Purity Limits
Resource limitation decorators prevent misbehaving agents from lockups or memory leakages.
@timeout
Enforces strict execution limits on function running time. Raises `TimeoutViolationError` if exceeded.
@timeout(seconds=1.5)
def process_loop():
# If the function runs for longer than 1.5 seconds, it will be terminated immediately with an error
while True:
pass
@limit_memory
Tracks RSS (Resident Set Size) memory delta growth. If memory allocation exceeds the limit during execution, throws `MemoryViolationError`.
@limit_memory(max_mb=10.0)
def allocate_huge_arrays():
# Allocation of too much memory will be blocked
a = [0] * 50000000
@no_side_effects
Enforces functional purity: prevents stdout logging, writing to globals, or modifying passed arguments.
@no_side_effects(allow_args_mutation=False, allow_stdout=False)
def pure_calculation(data_list):
# Attempting to mutate the list or print to console will raise SideEffectViolationError!
data_list.append(100)
print("logs")
@prompt_inject
The @prompt_inject decorator injects direct guidelines and instruction blocks directly into a function's docstring. Since LLM agents read function docstrings to understand what they do, this enforces boundaries right inside the prompt window.
Usage
@prompt_inject("You must only call process_data with validated parameters. Do not attempt to access local files.")
def process_data(param):
"""Calculates processed outputs."""
pass
Generated Docstring
The decorator modifies the function's docstring dynamically at import time to look like this:
Calculates processed outputs.
=== AI ASSISTANT ARCHITECTURAL CONSTRAINT ===
You must only call process_data with validated parameters. Do not attempt to access local files.
=============================================
@prompt_assert
The @prompt_assert decorator uses semantic checks to verify function implementation guidelines. It parses the function's source code at definition time and queries an LLM to evaluate if the constraint is satisfied.
Usage
@prompt_assert("The function must calculate a valid Fibonacci sequence and not use recursion.")
def fib(n):
if n <= 0:
return 0
# Implementation...
Parameters
| Parameter | Type | Description |
|---|---|---|
prompt |
str |
The natural language assertion/constraint that the function implementation must satisfy. |