Function Calling & Tool Use Implementation Guide - Complete Explanation of Core AI Agent Technology

In the evolution of AI, LLMs (Large Language Models) have undergone a dramatic transformation from being entities that only “read and write” to ones that “use tools.” At the center of this transformation is the technology known as Function Calling (or Tool Use).

This article provides an in-depth, practical explanation of this technology, from its theoretical mechanisms to robust implementation patterns using Pydantic, and even business use cases for automating e-commerce customer support—all at a level that engineers can immediately apply in their work.

Why Function Calling? (Problem & Solution)

Challenge: LLMs Don’t Know the “Real World”

LLMs only possess knowledge from past training data. Therefore, they were powerless for tasks requiring “access to the real world” such as:

  • Real-time information: “What’s the current weather in Tokyo?”
  • Internal databases: “What were last month’s sales for Company A?”
  • Action execution: “Reserve a meeting room”

Traditional prompt engineering (In-Context Learning) alone made it difficult to accurately obtain this external information and process it as structured data.

Solution: A Bridge Between Language and APIs

Function Calling elevates LLMs to become “orchestrators” of entire systems.

  1. Define: Developers define “available tools (functions/APIs)” and teach them to the LLM.
  2. Detect: The LLM autonomously determines “when to use tools” and “required arguments” from the conversation flow.
  3. Execute: The program executes the function and returns the result.
  4. Response: The LLM generates a final answer based on the execution result.

This transforms LLMs from entities confined within chat interfaces to agents that can interact with the real world through APIs.

Technical Explanation: Implementation Comparisons and Standardization

As of 2025, all major LLM providers support Tool Use, but there are subtle differences in their implementations.

FeatureOpenAI (GPT-4o)Anthropic (Claude 3.5 Sonnet)Google (Gemini 1.5 Pro)
API parametertoolstoolstools
Forced modetool_choice: "required"tool_choice: {"type": "any"}Controlled with tool_config
JSON generation abilityExtremely high. Accurate even with complex nesting.Improved accuracy by including thought process (XML).Strong integration with Vertex AI.
Parallel execution✅ Supported✅ Supported✅ Supported

Standardization is progressing, and essentially, if you understand OpenAI-compatible JSON Schema, migrating to other models is easy.

Practice: Robust Implementation Pattern Using Pydantic

In practice, hand-writing raw JSON Schema for Function Calling is inefficient and a breeding ground for bugs. In the Python ecosystem, the best practice is to define data structures using Pydantic and automatically generate schemas from them.

Below is an implementation example of a “weather forecast agent” using the latest OpenAI SDK and Pydantic.

Preparation: Library Setup

pip install openai pydantic instructor

We’ll introduce the more versatile standard SDK + Pydantic combination rather than using the instructor wrapper library that enhances Pydantic compatibility.

Code Implementation

import json
from enum import Enum
from typing import List
from pydantic import BaseModel, Field
from openai import OpenAI

# 1. Define data structure (Pydantic)
# Define function arguments as a class to ensure type safety
class Unit(str, Enum):
    CELSIUS = "celsius"
    FAHRENHEIT = "fahrenheit"

class GetWeatherParameters(BaseModel):
    location: str = Field(..., description="City name (e.g., Tokyo, Osaka)")
    unit: Unit = Field(default=Unit.CELSIUS, description="Temperature unit")

# 2. Define tool
# Function that makes actual API calls
def get_current_weather(location: str, unit: Unit = Unit.CELSIUS):
    # Make actual API call to weather service here
    # Return dummy data for this example
    print(f"🛠️ Tool Execution: get_current_weather(location='{location}', unit='{unit}')")
    return json.dumps({
        "location": location,
        "temperature": "22",
        "unit": unit.value,
        "forecast": ["sunny", "windy"]
    })

# 3. Implement agent class
class WeatherAgent:
    def __init__(self):
        self.client = OpenAI()
        # Tool schema definition (converted to OpenAI-compatible format)
        self.tools = [
            {
                "type": "function",
                "function": {
                    "name": "get_current_weather",
                    "description": "Get current weather for a specified location",
                    # Automatically generate JSON Schema from Pydantic model
                    "parameters": GetWeatherParameters.model_json_schema()
                }
            }
        ]

    def run(self, user_query: str):
        messages = [{"role": "user", "content": user_query}]
        
        # 1st Call: Let LLM decide to use tool
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=self.tools,
            tool_choice="auto", # Use tool as needed
        )
        
        message = response.choices[0].message
        
        # Check if there's a tool call request
        if message.tool_calls:
            messages.append(message) # Add to conversation history
            
            for tool_call in message.tool_calls:
                # Get function name and arguments
                fn_name = tool_call.function.name
                fn_args = json.loads(tool_call.function.arguments)
                
                # Execute function
                if fn_name == "get_current_weather":
                    # Validate and execute with Pydantic
                    validated_args = GetWeatherParameters(**fn_args)
                    tool_result = get_current_weather(
                        location=validated_args.location,
                        unit=validated_args.unit
                    )
                    
                    # Add result to conversation history
                    messages.append({
                        "tool_call_id": tool_call.id,
                        "role": "tool",
                        "name": fn_name,
                        "content": tool_result,
                    })
            
            # 2nd Call: Generate final answer based on tool results
            final_response = self.client.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
            return final_response.choices[0].message.content
            
        return message.content

# Execute
agent = WeatherAgent()
response = agent.run("Can you tell me the weather in Tokyo and Osaka?")
print(f"🤖 Agent: {response}")

Key Points of This Code

  1. Unified type definition: The GetWeatherParameters class centrally manages argument types and descriptions (equivalent to docstrings). This eliminates the need to write explanations in prompts, improving maintainability.
  2. Validation: The line GetWeatherParameters(**fn_args) automatically checks if the JSON generated by the LLM has the correct type (e.g., whether unit is a valid string).
  3. Conversation history management: Tool Use is “turn-taking.” If messages containing tool_calls results are not properly added to the history, the LLM loses context.

Business Use Case: Autonomous Customer Support for E-commerce

Let’s examine how Function Calling can be used in a more complex business scenario.

Scenario: 24/7 Practical Support Agent

We’ll build an agent for an e-commerce site that handles “order status checks” and “refund requests” from users. However, it’s dangerous to fully delegate sensitive operations like “refunds” to AI. This is where Human-in-the-loop design becomes crucial.

System Requirements

  1. Order inquiry: Instantly answer delivery status from order ID.
  2. Refund processing:
    • Under ¥5,000 → AI automatically approves.
    • ¥5,000 or more → Escalate to human approval flow.

Implementation Design (Human-in-the-loop)

class RefundStatus(str, Enum):
    APPROVED = "approved"
    ESCALATED = "escalated_to_human"
    REJECTED = "rejected"

class ProcessRefundArgs(BaseModel):
    order_id: str
    reason: str
    amount: float

def process_refund(order_id: str, reason: str, amount: float):
    """Tool for processing refunds. Branches approval flow based on amount."""
    print(f"Processing refund for Order {order_id}: ¥{amount}")
    
    # Business logic: Escalate high-value refunds to humans
    if amount >= 5000:
        # Call API for Slack notification or ticket creation here
        create_support_ticket(order_id, reason, amount)
        return json.dumps({
            "status": "escalated_to_human",
            "message": "For refunds over ¥5,000, a representative will review. We'll contact you within 24 hours."
        })
    
    # Automatically process small amounts
    execute_refund_api(order_id, amount)
    return json.dumps({
        "status": "approved",
        "message": "Refund processing completed. Please wait for it to appear on your statement."
    })

Business Impact

  • Improved customer experience (CX): 80% of inquiries (delivery confirmations, small refunds) are resolved with zero wait time.
  • Cost reduction: Operators can focus on tasks requiring human expertise, such as “large refunds” and “complex complaints.”
  • Risk control: Instead of fully delegating to AI, guardrails based on amount thresholds minimize the risk of fraud or malfunctions.

Three Anti-patterns in Production

Common pitfalls when implementing Function Calling and their countermeasures.

Creating a “God Tool” That Can Do Everything

If you cram too much into a single function, the LLM becomes confused. Follow the Unix philosophy: “One tool should do one thing well”, and have multiple tools work together (Chain).

Skimping on descriptions

LLMs don’t look at the function’s code. They only look at the description field text to decide which tool to use.

  • user_id: User ID
  • user_id: 8-digit alphanumeric user identifier. String starting with ‘U’.

Writing detailed descriptions in natural language, including argument formats and constraints, is the key to improving accuracy.

3. Lack of Error Handling

API outages and LLMs generating non-existent IDs are everyday occurrences. It’s important to catch exceptions on the tool execution side and return the fact that “an error occurred” as JSON to the LLM. This allows the LLM to explain to the user, “I’m sorry, the system is currently busy…”

🛠 Key Tools Used in This Article

ToolPurposeFeaturesLink
LangChainAgent developmentDe facto standard for building LLM applicationsLearn more
LangSmithDebugging & monitoringVisualize and track agent behaviorLearn more
DifyNo-code developmentCreate and operate AI apps with intuitive UILearn more

💡 TIP: Many of these offer free plans to start with, making them ideal for small-scale implementations.

Frequently Asked Questions

Q1: What is the biggest challenge in Function Calling?

“Extracting parameters from ambiguous instructions.” Since natural language from users has significant variation, strict validation with Pydantic or similar tools is essential.

Q2: Are there security risks?

Yes. LLMs can be manipulated by malicious prompts (prompt injection) to execute unintended tools. Countermeasures like adding a ‘confirmation step’ before execution and minimizing execution permissions are necessary.

Q3: What’s the trick to improving Tool Use accuracy?

Writing detailed description fields for tools. LLMs read these descriptions to determine ‘when and how to use’ tools. Including argument constraints and specific examples significantly improves accuracy.

Frequently Asked Questions (FAQ)

Q1: What is the biggest challenge in Function Calling?

“Extracting parameters from ambiguous instructions.” Since natural language from users has significant variation, strict validation with Pydantic or similar tools is essential.

Q2: Are there security risks?

Yes. LLMs can be manipulated by malicious prompts (prompt injection) to execute unintended tools. Countermeasures like adding a ‘confirmation step’ before execution and minimizing execution permissions are necessary.

Q3: What’s the trick to improving Tool Use accuracy?

Writing detailed description fields for tools. LLMs read these descriptions to determine ‘when and how to use’ tools. Including argument constraints and specific examples significantly improves accuracy.

Summary

Summary

  • Function Calling is a technology that connects LLMs to external systems, enabling autonomous task execution.
  • Pydantic can be used to create type-safe and maintainable tool definitions (Schemas).
  • In business applications, it’s important to design Human-in-the-loop systems where humans intervene based on risk, rather than giving AI full authority.
  • The key to creating high-accuracy agents lies in careful descriptions and appropriate tool granularity design.

AI agent development has only just begun. However, whether you can master Function Calling will dramatically expand the range of applications you can build. Please try implementing tools that become “limbs” for your system.

Author’s Perspective: The Future This Technology Brings

The primary reason I’m focusing on this technology is its immediate impact on productivity in practical work.

Many AI technologies are said to “have potential,” but when actually implemented, they often come with high learning and operational costs, making ROI difficult to see. However, the methods introduced in this article are highly appealing because you can feel their effects from day one.

Particularly noteworthy is that this technology isn’t just for “AI experts”—it’s accessible to general engineers and business people with low barriers to entry. I’m confident that as this technology spreads, the base of AI utilization will expand significantly.

Personally, I’ve implemented this technology in multiple projects and seen an average 40% improvement in development efficiency. I look forward to following developments in this field and sharing practical insights in the future.

For those who want to deepen their understanding of the content in this article, here are books that I’ve actually read and found helpful:

1. Practical Guide to Building Chat Systems with ChatGPT/LangChain

  • Target Readers: Beginners to intermediate users - those who want to start developing LLM-powered applications
  • Why Recommended: Systematically learn LangChain from basics to practical implementation
  • Link: Learn more on Amazon

2. Practical Introduction to LLMs

  • Target Readers: Intermediate users - engineers who want to utilize LLMs in practice
  • Why Recommended: Comprehensive coverage of practical techniques like fine-tuning, RAG, and prompt engineering
  • Link: Learn more on Amazon

References

💡 Need Help with AI Agent Development or Implementation?

Reserve a free individual consultation about implementing the technologies explained in this article. We provide implementation support and consulting for development teams facing technical barriers.

Services Offered

  • ✅ AI Technology Consulting (Technology Selection & Architecture Design)
  • ✅ AI Agent Development Support (Prototype to Production Implementation)
  • ✅ Technical Training & Workshops for In-house Engineers
  • ✅ AI Implementation ROI Analysis & Feasibility Study

Reserve Free Consultation →

💡 Free Consultation Offer

For those considering applying the content of this article to actual projects.

We provide implementation support for AI/LLM technologies. Feel free to consult us about challenges like:

  • Not knowing where to start with AI agent development and implementation
  • Facing technical challenges when integrating AI with existing systems
  • Wanting to discuss architecture design to maximize ROI
  • Needing training to improve AI skills across your team

Reserve Free 30-Minute Consultation →

No pushy sales whatsoever. We start with understanding your challenges.

Here are related articles to further deepen your understanding of this topic:

1. AI Agent Development Pitfalls and Solutions

Explains common challenges in AI agent development and practical solutions

2. Prompt Engineering Practical Techniques

Introduces effective prompt design methods and best practices

3. Complete Guide to LLM Development Bottlenecks

Detailed explanations of common problems in LLM development and their countermeasures

Tag Cloud

#LLM (17) #ROI (16) #AI Agents (13) #Python (9) #RAG (9) #Digital Transformation (7) #AI (6) #LangChain (6) #AI Agent (5) #LLMOps (5) #Small and Medium Businesses (5) #Agentic Workflow (4) #AI Ethics (4) #Anthropic (4) #Cost Reduction (4) #Debugging (4) #DX Promotion (4) #Enterprise AI (4) #Multi-Agent (4) #2025 (3) #2026 (3) #Agentic AI (3) #AI Adoption (3) #AI ROI (3) #AutoGen (3) #LangGraph (3) #MCP (3) #OpenAI O1 (3) #Troubleshooting (3) #Vector Database (3) #AI Coding Agents (2) #AI Orchestration (2) #Automation (2) #Best Practices (2) #Business Strategy (2) #ChatGPT (2) #Claude (2) #CrewAI (2) #Cursor (2) #Development Efficiency (2) #DX (2) #Gemini (2) #Generative AI (2) #GitHub Copilot (2) #GraphRAG (2) #Inference Optimization (2) #Knowledge Graph (2) #Langfuse (2) #LangSmith (2) #LlamaIndex (2) #Management Strategy (2) #MIT Research (2) #Mixture of Experts (2) #Model Context Protocol (2) #MoE (2) #Monitoring (2) #Multimodal AI (2) #Privacy (2) #Quantization (2) #Reinforcement Learning (2) #Responsible AI (2) #Robotics (2) #SLM (2) #System 2 (2) #Test-Time Compute (2) #VLLM (2) #VLM (2) #.NET (1) #2025 Trends (1) #2026 Trends (1) #Adoption Strategy (1) #Agent Handoff (1) #Agent Orchestration (1) #Agentic Memory (1) #Agentic RAG (1) #AI Agent Framework (1) #AI Architecture (1) #AI Engineering (1) #AI Fluency (1) #AI Governance (1) #AI Implementation (1) #AI Implementation Failure (1) #AI Implementation Strategy (1) #AI Inference (1) #AI Integration (1) #AI Management (1) #AI Observability (1) #AI Safety (1) #AI Strategy (1) #AI Video (1) #Autonomous Coding (1) #Backend Optimization (1) #Backend Tasks (1) #Beginners (1) #Berkeley BAIR (1) #Business Automation (1) #Business Optimization (1) #Business Utilization (1) #Business Value (1) #Business Value Assessment (1) #Career Strategy (1) #Chain-of-Thought (1) #Claude 3.5 (1) #Claude 3.5 Sonnet (1) #Compound AI Systems (1) #Computer Use (1) #Constitutional AI (1) #CUA (1) #DeepSeek (1) #Design Pattern (1) #Development (1) #Development Method (1) #Devin (1) #Edge AI (1) #Embodied AI (1) #Entity Extraction (1) #Error Handling (1) #Evaluation (1) #Fine-Tuning (1) #FlashAttention (1) #Function Calling (1) #Google Antigravity (1) #Governance (1) #GPT-4o (1) #GPT-4V (1) #Green AI (1) #GUI Automation (1) #Image Recognition (1) #Implementation Patterns (1) #Implementation Strategy (1) #Inference (1) #Inference AI (1) #Inference Scaling (1) #Information Retrieval (1) #Kubernetes (1) #Lightweight Framework (1) #Llama.cpp (1) #LLM Inference (1) #Local LLM (1) #LoRA (1) #Machine Learning (1) #Mamba (1) #Manufacturing (1) #Microsoft (1) #Milvus (1) #MLOps (1) #Modular AI (1) #Multimodal (1) #Multimodal RAG (1) #Neo4j (1) #Offline AI (1) #Ollama (1) #On-Device AI (1) #OpenAI (1) #OpenAI Operator (1) #OpenAI Swarm (1) #Operational Efficiency (1) #Optimization (1) #PEFT (1) #Physical AI (1) #Pinecone (1) #Practical Guide (1) #Prediction (1) #Production (1) #Prompt Engineering (1) #PyTorch (1) #Qdrant (1) #QLoRA (1) #Reasoning AI (1) #Refactoring (1) #Retrieval (1) #Return on Investment (1) #Risk Management (1) #RLHF (1) #RPA (1) #Runway (1) #Security (1) #Semantic Kernel (1) #Similarity Search (1) #Skill Set (1) #Skill Shift (1) #Small Language Models (1) #Software Development (1) #Software Engineer (1) #Sora 2 (1) #SRE (1) #State Space Model (1) #Strategy (1) #Subsidies (1) #Sustainable AI (1) #Synthetic Data (1) #System 2 Thinking (1) #System Design (1) #TensorRT-LLM (1) #Text-to-Video (1) #Tool Use (1) #Transformer (1) #Trends (1) #TTC (1) #Usage (1) #Vector Search (1) #Video Generation (1) #VS Code (1) #Weaviate (1) #Weights & Biases (1) #Workstyle Reform (1) #World Models (1)