Skip to content

Agents

AgentConfig is a Pydantic model that holds all runtime settings. Every field has a sensible default; you only need to set what you care about.

from cyclops import AgentConfig
config = AgentConfig(
model="groq/llama-3.1-8b-instant",
temperature=0.3,
max_tokens=1024,
system_prompt="You are a concise technical assistant.",
tool_mode="auto",
max_iterations=10,
)
FieldTypeDefaultDescription
modelstrrequiredLiteLLM model string, e.g. "gpt-4o-mini", "groq/llama-3.1-8b-instant", "ollama/qwen3:4b".
temperaturefloat0.1Sampling temperature. Lower is more deterministic.
max_tokensint or NoneNoneMaximum tokens in the response. None uses the model default.
system_promptstr or NoneNoneSystem instruction prepended to every conversation turn.
tool_modestr"auto""auto" detects native function-calling support; "native" forces it; "naive" uses prompt-based tool parsing (works with any model).
routerRouter or NoneNoneOptional LiteLLM Router for fallback and load balancing.
max_iterationsint10Maximum number of tool-call rounds before the agent stops.
hooksAgentHooks or NoneNoneLifecycle callbacks for observability and tool approval. See Hooks.

Pass a config, an optional list of tools, and optional memory storage.

from cyclops import Agent, AgentConfig, InMemoryStorage
from cyclops.toolkit import tool
@tool
def greet(name: str) -> str:
"""Greet someone by name."""
return f"Hello, {name}!"
memory = InMemoryStorage()
config = AgentConfig(model="groq/llama-3.1-8b-instant")
agent = Agent(config, tools=[greet], memory=memory)

run() is the simplest entry point. It sends a message, handles any tool calls, and returns the final text response as a string.

response = agent.run("Who invented the telephone?")
print(response)

arun() is the async equivalent. Use it inside coroutines or when running multiple agents concurrently.

import asyncio
async def main():
response = await agent.arun("Who invented the telephone?")
print(response)
asyncio.run(main())

Both methods accept an optional response_model parameter for structured output. See Structured Output for details.

The agent accumulates every user and assistant message in its internal history. Each call appends a user message and an assistant reply, so follow-up questions automatically have full context.

agent.run("My name is Alice.")
agent.run("What is my name?") # correctly answers "Alice"

Inspect the current history at any time with the messages property. It returns a shallow copy, so the internal state is not accidentally modified.

agent.run("My name is Alice.")
agent.run("What is my name?")
for msg in agent.messages:
print(msg["role"], ":", msg["content"][:60])

Each message in messages is a plain dict with at least "role" and "content" keys. Tool messages also include "tool_call_id" and "name".

Call reset() to clear the conversation history and start fresh without creating a new agent.

agent.run("Remember: the code word is banana.")
agent.reset()
# History is now empty.
response = agent.run("What was the code word?")

run_with_response() and arun_with_response()

Section titled “run_with_response() and arun_with_response()”

When you need token counts or cost estimates alongside the text reply, use run_with_response(). It returns an AgentResponse object instead of a plain string.

result = agent.run_with_response("Summarize quantum computing in two sentences.")
print(result.content)
print(f"Tokens: {result.tokens_used} Cost: ${result.cost:.6f}")
print(f"Tool calls made: {len(result.tool_calls)}")

See Cost Tracking for the full field reference.

stream() yields text tokens as they arrive. For agents without tools this is true token-by-token streaming from the LLM. For agents with tools, the tool loop runs first, then the final answer streams.

for token in agent.stream("Explain black holes briefly."):
print(token, end="", flush=True)
print()

See Streaming for detailed examples.

When tools are in use, the agent can call tools multiple times before producing the final answer. max_iterations caps how many tool-call rounds are allowed. If the cap is reached, the agent returns the string "Reached maximum tool call iterations.".

config = AgentConfig(
model="groq/llama-3.1-8b-instant",
max_iterations=5, # allow up to 5 rounds of tool calls
)

The default of 10 is enough for almost every task. Lowering it prevents runaway loops in production.

examples/basic_agent.py
"""Basic agent usage example"""
from cyclops import Agent, AgentConfig
# Create agent with default settings
config = AgentConfig(
model="ollama/qwen3:4b", system_prompt="You are a helpful assistant."
)
agent = Agent(config)
# Run the agent
response = agent.run("What is the capital of France?")
print(response)
# Continue conversation (agent maintains history)
response = agent.run("What's the population?")
print(response)