Action Tools: How LLMs Finally Learned to Stop Talking and Start Doing

Visualization showing LLM function calling architecture with tools branching out from a central AI model to various APIs and services

An LLM without tools is just an expensive autocomplete.

Give it function calling, and suddenly it's writing code, running queries, and sending emails.

The power? Immense. The risks? Let's talk about those too.

Remember when we had to parse LLM outputs with regex to trigger actions? Dark times! I'm happy for you if you don't even know this story, the later doesn't always be worse. Now, models can directly call functions, use tools, and interact with the real world, subsequently create massive implication.

1. From Text to Action: The Evolution

Let's cut through the fluff. Before function calling, we were stuck in the dark ages of "parse and pray." We'd beg the LLM to output valid JSON:

# The old way: Begging the LLM to output valid JSON
prompt = """
Analyze this SQL query and return EXACTLY this format:
{
  "action": "optimize_query",
  "query": "...",
  "suggestions": [...]
}
IMPORTANT: Output ONLY valid JSON, nothing else PLEASE, I'm begging you, bruh!
"""

response = llm.complete(prompt)
# Pray it's valid JSON
try:
    action = json.loads(response)  # 50% chance of failure
except:
    # Welcome to regex hell
    action = extract_json_with_regex(response)

June 2023 changed everything. OpenAI introduced function calling, and suddenly we had structured, reliable tool use:

from litellm import completion
from dotenv import load_dotenv
from os import getenv

load_dotenv()

# Modern way: Define your function schema
tools = [{
    "type": "function",
    "function": {
        "name": "optimize_query",
        "description": "Optimize a SQL query for performance",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "SQL query to optimize"},
                "target_db": {"type": "string", "enum": ["postgres", "mysql", "snowflake"]}
            },
            "required": ["query"]
        }
    }
}]

response = completion(
    model="openrouter/openai/gpt-oss-20b:free",
    api_key=getenv("OPENROUTER_API_KEY"),
    messages=[{"role": "user", "content": "Optimize: SELECT * FROM users WHERE age > 25"}],
    tools=tools,
    tool_choice="auto"
)

# Clean, structured, guaranteed format
if response.choices[0].message.get("tool_calls"):
    tool_call = response.choices[0].message.tool_calls[0]
    print(f"Function: {tool_call.function.name}")
    print(f"Arguments: {tool_call.function.arguments}")

Run this example yourself 🔧

Script: 1_function_calling_evolution.py

Command:

uv run 1_function_calling_evolution.py

Expected Output:

View markdown file in new tab

Function Calling Evolution Demo

==================================================
=== Old Way: JSON Parsing ===

Raw response:
{
  "action": "optimize_query",
  "query": "SELECT * FROM users WHERE age > 25",
  "suggestions": ["Add index on age column", "Use specific columns instead of *"]
}

✅ Successfully parsed JSON:
{
  "action": "optimize_query",
  "query": "SELECT * FROM users WHERE age > 25",
  "suggestions": [
    "Add index on age column",
    "Use specific columns instead of *"
  ]
}

=== Modern Way: Function Calling ===

✅ Function called: optimize_query
Arguments received:
{
  "query": "SELECT * FROM users WHERE age > 25",
  "suggestions": [
    "Consider selecting only the columns you need instead of using SELECT *",
    "Create an index on the age column to speed up the WHERE clause",
    "If you have a large dataset, consider pagination to limit the number of rows returned."
  ]
}

Query to optimize: SELECT * FROM users WHERE age > 25
Target DB: Not specified
Suggestions:
  1. Consider selecting only the columns you need instead of using SELECT *
  2. Create an index on the age column to speed up the WHERE clause
  3. If you have a large dataset, consider pagination to limit the number of rows returned.

Here's what nobody explains clearly about the terminology: Function Calling (OpenAI's approach), Tool Use (Anthropic's terminology), and Actions (what everyone else calls it) — they're all the same thing. The model decides which function to call and with what parameters. No more regex, no more prayer. Just clean, structured execution.

2. Building Safe Tool Interfaces

Now that we understand how function calling evolved from regex hell to structured tool use, let's tackle the critical question: how do we build tools that won't accidentally destroy our production systems?

Want to give an LLM database access? Here's how to not destroy everything:

The Pydantic Approach: Type Safety First

from typing import Literal
from pydantic import BaseModel, Field, field_validator
import re

class DatabaseQuery(BaseModel):
    """Tool for read-only database queries with multiple safety layers"""
    
    query: str = Field(description="SQL query to execute")
    database: Literal["staging", "analytics"] = Field(
        default="staging",
        description="Target database (prod not available)"
    )
    timeout_seconds: int = Field(default=30, le=60, description="Query timeout")
    
    @field_validator("query")
    def validate_query(cls, v):
        """Multi-layer query validation"""
        # Layer 1: No destructive operations
        dangerous_keywords = ["DELETE", "UPDATE", "DROP", "ALTER", "TRUNCATE", "INSERT"]
        query_upper = v.upper()
        
        for keyword in dangerous_keywords:
            if re.search(r'\b' + keyword + r'\b', query_upper):
                raise ValueError(f"Destructive operation '{keyword}' not allowed")
        
        # Layer 2: Must be a SELECT query
        if not query_upper.strip().startswith("SELECT"):
            raise ValueError("Only SELECT queries allowed")
        
        # Layer 3: Limit check
        if "LIMIT" not in query_upper:
            v = f"{v.rstrip(';')} LIMIT 1000"  # Force limit
        
        return v
    
    def execute(self):
        """Execute with additional runtime checks"""
        # Connection would use read-only credentials
        # Wrapped in timeout context
        # Full audit logging
        pass

Run this example yourself 🔧

Script: 2_safe_tool_interfaces.py

Command:

uv run 2_safe_tool_interfaces.py

Expected Output:

View markdown file in new tab

Safe Tool Interfaces Demo

==================================================
=== Database Query Safety Demo ===


Testing: SELECT * FROM users WHERE age > 25
⚠️  Auto-added LIMIT 1000 to prevent large result sets
✅ Query validated: SELECT * FROM users WHERE age > 25 LIMIT 1000

Testing: SELECT COUNT(*) FROM orders
⚠️  Auto-added LIMIT 1000 to prevent large result sets
✅ Query validated: SELECT COUNT(*) FROM orders LIMIT 1000

Testing: DELETE FROM users WHERE id = 1
✅ Correctly blocked: 1 validation error for DatabaseQuery
query
  Value error, ❌ Destructive operation 'DELETE' not allowed [type=value_error, input_value='DELETE FROM users WHERE id = 1', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

Testing: DROP TABLE users
✅ Correctly blocked: 1 validation error for DatabaseQuery
query
  Value error, ❌ Destructive operation 'DROP' not allowed [type=value_error, input_value='DROP TABLE users', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

Testing: UPDATE users SET admin = true
✅ Correctly blocked: 1 validation error for DatabaseQuery
query
  Value error, ❌ Destructive operation 'UPDATE' not allowed [type=value_error, input_value='UPDATE users SET admin = true', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

Testing: SELECT * FROM users; DELETE FROM orders
✅ Correctly blocked: 1 validation error for DatabaseQuery
query
  Value error, ❌ Destructive operation 'DELETE' not allowed [type=value_error, input_value='SELECT * FROM users; DELETE FROM orders', input_type=str]
    For further information visit https://errors.pydantic.dev/2.11/v/value_error

Testing: select * from products
⚠️  Auto-added LIMIT 1000 to prevent large result sets
✅ Query validated: select * from products LIMIT 1000


=== Defense-in-Depth Pattern ===

Executing safe query:
✅ Input validation passed
📝 Audit log: Executing DatabaseQuery
⏱️  Timeout protection active
🔒 Executing in sandbox environment

🔒 Executing query on staging database:
   Query: SELECT name, email FROM users LIMIT 10
   Timeout: 30s
   Status: Would execute with read-only credentials


Executing file operation:
✅ Input validation passed
📝 Audit log: Executing FileOperation
⏱️  Timeout protection active
🔒 Executing in sandbox environment

The Defense-in-Depth Pattern

Never trust a single validation layer:

from functools import wraps
import time
from typing import Any, Callable

def rate_limit(calls_per_minute: int = 10):
    """Rate limiting decorator"""
    def decorator(func: Callable) -> Callable:
        call_times = []
        
        @wraps(func)
        def wrapper(*args, **kwargs) -> Any:
            now = time.time()
            # Clean old calls
            call_times[:] = [t for t in call_times if now - t < 60]
            
            if len(call_times) >= calls_per_minute:
                raise Exception(f"Rate limit exceeded: {calls_per_minute}/min")
            
            call_times.append(now)
            return func(*args, **kwargs)
        return wrapper
    return decorator

def audit_log(func: Callable) -> Callable:
    """Audit logging decorator"""
    @wraps(func)
    def wrapper(*args, **kwargs) -> Any:
        start_time = time.time()
        try:
            result = func(*args, **kwargs)
            # Log success
            print(f"✅ {func.__name__} succeeded in {time.time() - start_time:.2f}s")
            return result
        except Exception as e:
            # Log failure
            print(f"❌ {func.__name__} failed: {str(e)}")
            raise
    return wrapper

@audit_log
@rate_limit(calls_per_minute=5)
def execute_tool(tool_name: str, params: dict) -> Any:
    """Execute tool with all safety layers"""
    # Validation, execution, monitoring
    pass

3. Tool Categories That Matter

With our safety patterns in place — type validation, rate limiting, and defense-in-depth — we need to understand which tools we're actually building. Because let's be honest: giving an LLM the ability to send emails is very different from letting it read documentation.

Not all tools are created equal. Here's the hierarchy of danger:

🟢 Safe Tools (Start Here)

# Read-only operations
safe_tools = {
    "search_documentation": "Read API docs",
    "query_analytics": "Read-only database queries",
    "fetch_metrics": "Get performance data",
    "list_files": "Directory listings"
}

🟡 Moderate Risk Tools (Add Safeguards)

# State changes with limits
moderate_tools = {
    "send_slack_message": "Rate limited, specific channels only",
    "create_jira_ticket": "Template-based, no custom fields",
    "generate_report": "Resource limits, sandboxed execution",
    "cache_invalidation": "Specific keys only"
}

🔴 High Risk Tools (Human Approval Required)

# Never fully automated
dangerous_tools = {
    "execute_code": "Arbitrary code execution",
    "database_write": "Data modifications",
    "send_email": "External communications",
    "deploy_code": "Production changes"
}

Run this example yourself 🔧

Script: 3_tool_categories.py

Command:

uv run 3_tool_categories.py

Expected Output:

View markdown file in new tab

Tool Categories and Risk Assessment

==================================================
=== Tool Risk Categories ===


🟢 Safe - 4 tools:
----------------------------------------

  📦 search_documentation
     Search and read API documentation
     Safeguards:
       • Read-only access
       ⏱️  Rate limit: 100/min

  📦 query_analytics
     Read-only database queries on analytics DB
     Safeguards:
       • Read-only credentials
       • Automatic LIMIT clause
       • Query timeout 30s
       ⏱️  Rate limit: 50/min

  📦 fetch_metrics
     Get performance metrics from monitoring
     Safeguards:
       • Cached responses
       • Rate limiting
       ⏱️  Rate limit: 60/min

  📦 list_files
     List directory contents
     Safeguards:
       • Restricted to project directories
       • No system paths
       ⏱️  Rate limit: 100/min

🟡 Moderate - 4 tools:
----------------------------------------

  📦 send_slack_message
     Send messages to Slack channels
     Safeguards:
       • Rate limited to 10/minute
       • Restricted to specific channels
       • Message length limit
       ⏱️  Rate limit: 10/min

  📦 create_jira_ticket
     Create tickets in Jira
     Safeguards:
       • Template-based creation only
       • No custom field modifications
       • Rate limited
       ⏱️  Rate limit: 5/min

  📦 generate_report
     Generate PDF/CSV reports
     Safeguards:
       • Resource limits (CPU/Memory)
       • Sandboxed execution
       • Output size limits
       ⏱️  Rate limit: 10/min

  📦 cache_invalidation
     Invalidate specific cache keys
     Safeguards:
       • Whitelist of allowed cache keys
       • Rate limiting
       • Rollback capability
       ⏱️  Rate limit: 5/min

🔴 High - 3 tools:
----------------------------------------

  📦 execute_code
     Execute arbitrary code in sandbox
     Safeguards:
       • Sandboxed environment
       • Resource limits
       • Timeout enforcement
       ⚠️  Requires human approval
       ⏱️  Rate limit: 1/min

  📦 database_write
     Modify database records
     Safeguards:
       • Transaction rollback capability
       • Backup before modification
       • Human approval required
       ⚠️  Requires human approval
       ⏱️  Rate limit: 1/min

  📦 send_email
     Send emails to external recipients
     Safeguards:
       • Template-based only
       • Recipient whitelist
       • Human approval for new recipients
       ⚠️  Requires human approval
       ⏱️  Rate limit: 5/min

⛔ Critical - 2 tools:
----------------------------------------

  📦 deploy_code
     Deploy code to production
     Safeguards:
       • Multi-stage approval
       • Automated testing required
       • Rollback plan mandatory
       ⚠️  Requires human approval
       ⏱️  Rate limit: 1/min

  📦 modify_infrastructure
     Change infrastructure configuration
     Safeguards:
       • Terraform plan review
       • Cost estimation
       • Multi-person approval
       ⚠️  Requires human approval
       ⏱️  Rate limit: 1/min


=== Execution Decision Logic ===


Tool: search_documentation
Risk: 🟢 Safe
✅ Can execute automatically
   Rate limit: 100/min

Tool: send_slack_message
Risk: 🟡 Moderate
✅ Can execute automatically
   Rate limit: 10/min

Tool: database_write
Risk: 🔴 High
🛑 Requires human approval
   Safeguards: Transaction rollback capability, Backup before modification

Tool: deploy_code
Risk: ⛔ Critical
🛑 Requires human approval
   Safeguards: Multi-stage approval, Automated testing required

4. The LangChain Toolkit Approach

Understanding tool risk categories is crucial, but managing individual tools manually gets overwhelming fast. This is where orchestration frameworks shine — and LangChain has become the de facto standard.

LangChain makes tool orchestration almost too easy. Here's production-ready patterns:

Building a SQL Toolkit

from langchain_community.utilities import SQLDatabase
from langchain_community.agent_toolkits import create_sql_agent
from litellm import completion
from langchain_community.llms.base import LLM
from typing import Any, List, Optional

class LiteLLMWrapper(LLM):
    """Wrapper to use LiteLLM with LangChain"""
    model: str = "openrouter/openai/gpt-oss-20b:free"
    api_key: str
    
    def _call(self, prompt: str, stop: Optional[List[str]] = None, **kwargs) -> str:
        response = completion(
            model=self.model,
            api_key=self.api_key,
            messages=[{"role": "user", "content": prompt}],
            stop=stop
        )
        return response.choices[0].message.content
    
    @property
    def _llm_type(self) -> str:
        return "litellm"

# Safe database connection
db = SQLDatabase.from_uri(
    "sqlite:///example.db",  # Use read-only connection in production
    sample_rows_in_table_info=3
)

# Create agent with safety features
llm = LiteLLMWrapper(api_key=getenv("OPENROUTER_API_KEY"))

agent = create_sql_agent(
    llm=llm,
    db=db,
    agent_type="openai-tools",  # Use tool calling
    verbose=True,
    handle_parsing_errors=True,
    max_iterations=5,  # Prevent infinite loops
    max_execution_time=30  # Timeout protection
)

# Safe execution with error handling
try:
    result = agent.invoke({
        "input": "What are the top 5 customers by revenue?"
    })
except Exception as e:
    print(f"Execution failed safely: {e}")

Run this example yourself 🔧

Script: 4_langchain_toolkit.py

Command:

uv run 4_langchain_toolkit.py

Expected Output:

View markdown file in new tab

LangChain Toolkit Approach

==================================================
=== LangChain SQL Toolkit Demo ===

Available tools:
  • execute_sql_query: Execute a read-only SQL query on the database
  • get_database_schema: Get the schema of all tables in the database

==================================================

1. Getting database schema:

Database Schema:
==================================================

Table: customers
  - id (INTEGER)
  - name (TEXT)
  - email (TEXT)
  - country (TEXT)
  - total_spent (REAL)

Table: orders
  - id (INTEGER)
  - customer_id (INTEGER)
  - product (TEXT)
  - amount (REAL)
  - order_date (DATE)


==================================================

2. Executing safe queries:


Query: SELECT * FROM customers WHERE country = 'USA'
Columns: id, name, email, country, total_spent
--------------------------------------------------
1 | Alice Johnson | alice@example.com | USA | 1500.0
4 | Diana Prince | diana@example.com | USA | 3200.0

(Showing 2 of 2 results)
--------------------------------------------------

Query: SELECT COUNT(*) as total_orders FROM orders
Columns: total_orders
--------------------------------------------------
5

(Showing 1 of 1 results)
--------------------------------------------------

Query: SELECT c.name, SUM(o.amount) as total FROM customers c JOIN orders o ON c.id = o.customer_id GROUP BY c.id
Columns: name, total
--------------------------------------------------
Alice Johnson | 1250.0
Bob Smith | 800.0
Charlie Brown | 150.0
Diana Prince | 2500.0

(Showing 4 of 4 results)
--------------------------------------------------


=== LLM Integration Demo ===


Question: What are the top 3 customers by total spent?
Generated SQL: ```sql
SELECT name, total_spent 
FROM customers 
ORDER BY total_spent DESC 
LIMIT 3;
```
Result:
Error executing query: near "```sql
SELECT name, total_spent 
FROM customers 
ORDER BY total_spent DESC 
LIMIT 3;
```": syntax error
--------------------------------------------------

Question: How many orders do we have in total?
Generated SQL: ```sql
SELECT COUNT(*) AS total_orders FROM orders;
```
Result:
Error executing query: near "```sql
SELECT COUNT(*) AS total_orders FROM orders;
```": syntax error
--------------------------------------------------

Question: Show me all customers from the USA
Generated SQL: ```sql
SELECT * FROM customers WHERE country = 'USA';
```
Result:
Error executing query: near "```sql
SELECT * FROM customers WHERE country = 'USA';
```": syntax error
--------------------------------------------------

Custom Tool Creation

from langchain.tools import Tool, StructuredTool
from pydantic import BaseModel

class CodeAnalysisInput(BaseModel):
    file_path: str
    analysis_type: Literal["security", "performance", "style"]

def analyze_code(file_path: str, analysis_type: str) -> str:
    """Analyze code with specific focus"""
    # Implementation here
    return f"Analysis of {file_path} for {analysis_type}"

# Structured tool with schema
code_analyzer = StructuredTool.from_function(
    func=analyze_code,
    name="code_analyzer",
    description="Analyze code for security, performance, or style issues",
    args_schema=CodeAnalysisInput,
    return_direct=False,  # Let agent process results
    handle_tool_error=True  # Graceful error handling
)

5. All-in-One with MCP (Model Context Protocol)

We've seen how to build tools with LangChain, categorize them by risk, and implement safety patterns. But managing all these integrations across different LLM providers gets complex fast. Enter MCP — Anthropic's answer to the tool integration chaos.

Anthropic's MCP is the new kid on the block — and it's quickly becoming the standard everyone's adopting. Instead of rehashing the theory (plenty of that out there already), let's dive straight into a real implementation.

I'll use the official time server as a practical example — a real MCP server that demonstrates STDIO communication (for others or to implement a custom one, see here for more details):

MCP Configuration

The beauty of MCP is its configuration-based approach. Just like Claude Desktop or VS Code, you define which servers to use:

uv add mcp-server-time      # Install Time serber
uv add mcp                  # Install MCP Python SDK

Here's how to build an LLM agent using MCP - just like configuring Claude Desktop:

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from litellm import completion

class MCPAgent:
    """LLM Agent with MCP server configuration"""
    
    def __init__(self, mcp_config):
        """Configure like Claude Desktop:
        mcp_config = {
            "time": {"command": "uvx", "args": ["mcp-server-time"]}
        }
        """
        self.mcp_config = mcp_config
        self.available_tools = []

    async def setup(self):
        """Discover tools from MCP servers"""
        for server_name, config in self.mcp_config.items():
            try:
                # Start MCP server and get tools
                command = [config["command"]] + config["args"]
                session, cleanup_ctx = await self.create_session(command)
                
                tools = await session.list_tools()
                self.available_tools.extend([
                    {"name": t.name, "description": t.description} 
                    for t in tools.tools
                ])
                
                print(f"✅ {server_name}: {len(tools.tools)} tools available")
                await self.cleanup_session(cleanup_ctx)
                
            except Exception as e:
                print(f"❌ {server_name}: {e}")

    async def chat(self, message: str):
        """Chat with LLM using MCP tools"""
        response = completion(
            model="openrouter/openai/gpt-4o-mini",
            api_key=getenv("OPENROUTER_API_KEY"),
            messages=[{"role": "user", "content": message}],
            tools=self.format_tools_for_llm(),
            tool_choice="auto"
        )
        
        # Execute any tool calls through MCP
        if response.choices[0].message.tool_calls:
            await self.execute_tools(response.choices[0].message.tool_calls)
        
        return response.choices[0].message.content

# Simple usage
async def main():
    agent = MCPAgent({
        "time": {"command": "uvx", "args": ["mcp-server-time"]}
    })
    
    await agent.setup()
    result = await agent.chat("What time is it in Tokyo?")
    print(result)

asyncio.run(main())

Run this example yourself 🔧

Script: 5_mcp_example.py

Command:

uv run 5_mcp_example.py

Expected Output:

View markdown file in new tab

🚀 MCP (Model Context Protocol) with Python SDK Demo

============================================================

=== LLM Agent with MCP (Configuration-based) ===


=== MCP Configuration (like Claude Desktop) ===

📋 MCP Server Configuration:
  time: uvx mcp-server-time
🔧 Discovering tools from configured MCP servers...
🚀 Starting MCP server: uvx mcp-server-time
✅ MCP server started and initialized successfully
   ✅ time: 2 tools available
🔌 MCP connection closed

📦 Total tools available: 2

👤 User: What time is it in UTC?
----------------------------------------
  🔧 Calling MCP tool: get_current_time
     Arguments: {'timezone': 'UTC'}
🚀 Starting MCP server: uvx mcp-server-time
✅ MCP server started and initialized successfully
     ✅ Result: {
  "timezone": "UTC",
  "datetime": "2025-08-21T04:52:31+00:00",
  "is_dst": false
}
🔌 MCP connection closed
🤖 Agent: Based on the tools:

📅 {
  "timezone": "UTC",
  "datetime": "2025-08-21T04:52:31+00:00",
  "is_dst": false
}
============================================================

👤 User: What time is it in Paris?
----------------------------------------
  🔧 Calling MCP tool: get_current_time
     Arguments: {'timezone': 'Europe/Paris'}
🚀 Starting MCP server: uvx mcp-server-time
✅ MCP server started and initialized successfully
     ✅ Result: {
  "timezone": "Europe/Paris",
  "datetime": "2025-08-21T06:52:33+02:00",
  "is_dst": true
}
🔌 MCP connection closed
🤖 Agent: Based on the tools:

📅 {
  "timezone": "Europe/Paris",
  "datetime": "2025-08-21T06:52:33+02:00",
  "is_dst": true
}
============================================================

👤 User: Convert 3pm in New York to London time
----------------------------------------
  🔧 Calling MCP tool: convert_time
     Arguments: {'source_timezone': 'America/New_York', 'time': '15:00', 'target_timezone': 'Europe/London'}
🚀 Starting MCP server: uvx mcp-server-time
✅ MCP server started and initialized successfully
     ✅ Result: {
  "source": {
    "timezone": "America/New_York",
    "datetime": "2025-08-21T15:00:00-04:00",
    "is_dst": true
  },
  "target": {
    "timezone": "Europe/London",
    "datetime": "2025-08-21T20:00:00+01:00",
    "is_dst": true
  },
  "time_difference": "+5.0h"
}
🔌 MCP connection closed
🤖 Agent: Based on the tools:

📅 {
  "source": {
    "timezone": "America/New_York",
    "datetime": "2025-08-21T15:00:00-04:00",
    "is_dst": true
  },
  "target": {
    "timezone": "Europe/London",
    "datetime": "2025-08-21T20:00:00+01:00",
    "is_dst": true
  },
  "time_difference": "+5.0h"
}
============================================================

👤 User: What time is it in Tokyo?
----------------------------------------
  🔧 Calling MCP tool: get_current_time
     Arguments: {'timezone': 'Asia/Tokyo'}
🚀 Starting MCP server: uvx mcp-server-time
✅ MCP server started and initialized successfully
     ✅ Result: {
  "timezone": "Asia/Tokyo",
  "datetime": "2025-08-21T13:52:36+09:00",
  "is_dst": false
}
🔌 MCP connection closed
🤖 Agent: Based on the tools:

📅 {
  "timezone": "Asia/Tokyo",
  "datetime": "2025-08-21T13:52:36+09:00",
  "is_dst": false
}
============================================================
✅ Agent cleanup complete - using direct connections

dbt MCP Server

Want to see MCP in action with data analysis coupling with dbt? Check out the official server:

dbt-mcp - Query dbt models and metrics

All available via uvx <server-name>

Why MCP Matters

MCP standardizes how LLMs interact with external tools. Instead of building custom integrations for each LLM provider, you build one MCP server and it works everywhere. Think of it as the USB-C of AI tools.

The Risk: Nobody wants to mention

We've covered the technical implementation — from function calling to MCP servers. But here's where theory meets reality, and where most teams learn expensive lessons.

Give an LLM database write access? Email sending capabilities? Code execution? Each tool is a potential footgun. Here's how to not shoot yourself:

Production Checklist

What actually works in production? Here are the non-negotiable rules:

Least Privilege: Read-only by default, always
Validate Everything: Never trust LLM-generated parameters
Audit Everything: If it's not logged, it didn't happen
Circuit Breakers: Automatic shutoff for suspicious patterns
Human in the Loop: Critical operations need approval
Sandbox Execution: Isolate tool execution environment
Cost Controls: Set spending limits per tool
Rollback Ready: Every action must be reversible

Every tool configuration needs these safety requirements:

Rate limiting (reasonable limits, not 1000 calls per minute)
Timeout controls
Audit logging enabled
Sandbox mode for execution
Clear rollback strategy

And remember: tools that can modify data (write to databases, send emails, delete files) MUST have audit logging enabled. No exceptions.

Some potential honour stories:

The $72K OpenAI Bill: No rate limiting on a code generation tool. LLM went into a loop.
The Dropped Production Table: DROP TABLE wasn't in the blocklist. Guess what happened.
The Email Storm: LLM sent 10,000 emails before anyone noticed. No rate limiting.
The Infinite Loop: Tool called itself recursively. No iteration limits.

Key Takeaways

Start with read-only tools — You can always add write capabilities later, but you can't undo a dropped table
Defense in depth is not optional — Input validation + rate limiting + audit logs + circuit breakers + human approval for critical ops
MCP is now the standard — Standardized tool interfaces mean write once, use everywhere (when it's mature)

What's Next?

Your agents can think (reasoning), access knowledge (RAG), and take actions (tools). But what about remembering what happened 5 minutes ago? Or last week? Time to dive into memory systems...

Technical deep dive series — Part 4 of 5

← Part 3: RAG Systems | Part 5: Memory Systems →

📚 Context Engineering Deep Dive Series:

🎯 Start with the overview: Context Engineering: How RAG, agents, and memory make LLMs actually useful

Action Tools: How LLMs Finally Learned to Stop Talking and Start Doing

1. From Text to Action: The Evolution

2. Building Safe Tool Interfaces

The Pydantic Approach: Type Safety First

The Defense-in-Depth Pattern

3. Tool Categories That Matter

🟢 Safe Tools (Start Here)

🟡 Moderate Risk Tools (Add Safeguards)

🔴 High Risk Tools (Human Approval Required)

4. The LangChain Toolkit Approach

Building a SQL Toolkit

Custom Tool Creation

5. All-in-One with MCP (Model Context Protocol)

MCP Configuration

The Risk: Nobody wants to mention

Key Takeaways

What's Next?

Dat Nguyen

Comments

Related Articles

1. From Text to Action: The Evolution

2. Building Safe Tool Interfaces

The Pydantic Approach: Type Safety First

The Defense-in-Depth Pattern

3. Tool Categories That Matter

🟢 Safe Tools (Start Here)

🟡 Moderate Risk Tools (Add Safeguards)

🔴 High Risk Tools (Human Approval Required)

4. The LangChain Toolkit Approach

Building a SQL Toolkit

Custom Tool Creation

5. All-in-One with MCP (Model Context Protocol)

MCP Configuration

The Risk: Nobody wants to mention

Key Takeaways

What's Next?

Related Articles in This Series

Dat Nguyen

Comments

Related Articles

User Intent & Prompting: Making LLMs Understand What You Really Want

Context Engineering: How RAG, agents, and memory make LLMs actually useful

Memory Systems: Teaching LLMs to Remember (Without Going Broke)

RAG Systems: When Your LLM Needs to Phone a Friend (Your Database)

Agents & Reasoning: When LLMs Learn to Think Before They Speak

We value your privacy

Cookie Preferences

Necessary Cookies

Analytics Cookies