Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills

Use Jupyter code executor to help your agent finish tasks in a smarter way

Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills.
Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills. Image by DALL-E-3

In today’s tutorial, we explore how to connect your agent app to a self-hosted Jupyter server to get a powerful, stateful code runtime sandbox.

This tutorial uses a more universal way to re-create the core tech behind commercial products like Manus and Claude agent skills. Learning this will save you 40 hours of trial and error and make your enterprise-grade agent even more capable than commercial tools.

As usual, the source code sits at the end of this post, grab it if you want.


Introduction

We have shown that letting an agent generate Python code and run it inside a sandbox is more flexible, more scalable, and cheaper in token cost than using fixed function calls like Function Call or MCP. It is the best choice to boost LLM’s number-crunching skills and tackle complex problems.

In a previous post, we showcased a multi-agent code execution system with planning, generation, and reflection abilities:

I Used Autogen GraphFlow and Qwen3 Coder to Solve Math Problems — And It Worked
More reliable than your math professor

This works like Claude’s code execution MCP, both use a Python runtime inside a container to run the code generated by the LLM.

Multi-agent system based on a Python command-line sandbox.
Multi-agent system based on a Python command-line sandbox. Image by Author

But after enough use, we found that even with reasoning before execution and reflecting after execution, agents still could not reliably write code on the fly to finish tasks based on live conditions.

For example, give an agent an unfamiliar CSV file, ask it to clean and analyze the data, and find insights.

Current Python command-line code sandbox-based agent systems can’t handle this.

To see why, let’s look at how human data analysts do analysis.

When facing unknown data, analysts first load it into a DataFrame in Jupyter notebook, then run head to check column names and general types.

Data analysts usually run code step by step in a Jupyter notebook.
Data analysts usually run code step by step in a Jupyter notebook. Image by Author

With column names and types in hand, they write more code to get stats like mean and median or clean null values.

This is where command-line Python runtime falls short — it is stateless. The next Python command cannot reuse the state from the previous one. This is a basic difference from Jupyter.

The difference between Command Line Executor and Jupyter Executor.
The difference between Command Line Executor and Jupyter Executor. Image by Author

Most modern agent frameworks only offer stateless Python command-line code sandboxes. They might give you Claude’s code executor or Azure’s dynamic code container sessions, but these runtimes cost money and have limited resources.


What I’m Bringing You

Value of this post

The goal today is to teach you how to connect your agent system to your company’s or a platform like Vast.ai’s Jupyter Server. This gives you big advantages:

  1. No need for expensive commercial code sandboxes, saving huge compute costs.
  2. Your code and files run on a trusted internal runtime with strong data security and compliance.
  3. Use massive internal compute resources. When processing huge datasets with GPU parallel computing, this is a huge win.
  4. You gain the ability to deploy agent systems and code sandboxes across production in a distributed way, not just on your laptop.
  5. You still get a stateful Jupyter Server-based code sandbox so agents can decide the next code based on prior execution results.

Contents of this post

  1. Use Autogen’s Docker API version to spin up a Jupyter code sandbox so you get the basic idea of a stateful runtime.
  2. Analyze problems with this Docker API approach and what features true enterprise apps need.
  3. Adapt Autogen’s modules to connect to a self-hosted Jupyter Server.
  4. Containerize and manage Jupyter Server deployment with Docker Compose for elegance and ease.
  5. Tweak Jupyter image’s Dockerfile to reclaim idle compute resources.
  6. Try all this with a simple project.
  7. Explore how frameworks like LangChain can use the power of Jupyter code sandboxes.

It’s an exclusive, detailed tutorial. Let’s dive in.


Environment Setup

Build Jupyter Kernel container

The agent code sandbox works because container tech gives safety and environment isolation. First, prepare a Docker image with Jupyter Server.

The core of a Docker container is its Dockerfile. Here is the file to save you time:

# Dockerfile.jupyter
FROM python:3.13-slim-bookworm

WORKDIR /app

COPY requirements.txt /app/requirements.txt

RUN pip install --no-cache-dir jupyter_kernel_gateway ipykernel numpy pandas sympy scipy --upgrade

RUN pip install --no-cache-dir -r requirements.txt --upgrade

EXPOSE 8888

ENV TOKEN="UNSET"
CMD python -m jupyter kernelgateway \
    --KernelGatewayApp.ip=0.0.0.0 \
    --KernelGatewayApp.port=8888 \
    --KernelGatewayApp.auth_token="${TOKEN}" \
    --JupyterApp.answer_yes=true \
    --JupyterWebsocketPersonality.list_kernels=true

I will not explain Docker basics, check DataCamp’s great course if you need background knowledge.

I use python:3.13-slim-bookworm as the base image, not a Jupyter image, because I need custom tweaks later.

I install must-have dependencies separately from requirements.txt to use Docker layer caching best.

Here is the requirements.txt content:

matplotlib
xlrd
openpyxl

I set basic Jupyter parameters, but I will add more later to build a complete Jupyter code sandbox.

After the Dockerfile is ready, run this command to build the image:

docker build -t jupyter-server .

Don’t start the Jupyter container yet, I’ll explain why later.

Install Autogen agent framework

Most agent frameworks moved the Jupyter runtime client into paid products. Autogen is the one I recommend that still supports the Jupyter runtime.

Install autogen-agentchat:

pip install -U "autogen-agentchat"

To use a containerized code executor environment, also install Autogen’s Docker client lib:

pip install "autogen-ext[docker-jupyter-executor]"

After building the image and installing Autogen, you’re ready to code.


Use Jupyter Code Sandbox

Let’s start with the official API example to see Autogen code executor usage.

Autogen modules for Jupyter Docker are: DockerJupyterCodeExecutor, DockerJupyterServer, CodeExecutorAgent.

DockerJupyterServer calls Docker API to launch a container from a Docker image, mount file dirs, and save Jupyter connection info.

DockerJupyterCodeExecutor holds all Jupyter Kernel API operations. Once it has connection info from the Jupyter Server, you can submit code through it.

CodeExecutorAgent is a special Autogen agent to get Python code from context and run it. Give it a model_client and it can write code and reflect on the results itself.

The roles of different modules related to the Jupyter code sandbox.
The roles of different modules related to the Jupyter code sandbox. Image by Author

After you learn each module’s role, build a code executor agent to test if Docker Jupyter stateful sandbox works.

Remember, we built an image jupyter-server? Use it to init DockerJupyterServer.

server = DockerJupyterServer(
    custom_image_name="jupyter-server",
    expose_port=8888,
    token="UNSET",
    bind_dir="temp",
)

Then use this server to init DockerJupyterCodeExecutor:

executor = DockerJupyterCodeExecutor(
    jupyter_server=server,
    timeout=600,
    output_dir=Path("temp")
)

When starting both server and executor we mount local temp directory into the container. Code can read/write files there, but in the Jupyter Kernel, it is the current dir, not temp.

Next, build a CodeExecutorAgent by passing the executor into code_executor.

code_executor = CodeExecutorAgent(
    "code_executor",
    code_executor=executor,
)

Write a main to test code_executor.

async def main():
    async with executor:
        code1 = TextMessage(
            content=dedent("""
            ```python
            x = 1+2
            print("Round one: The calculation for the value of x is done.")
            ```
            """),
            source="user"
        )
        response1 = await code_executor.on_messages(messages=[code1], cancellation_token=CancellationToken())
        print(response1.chat_message.content)

        code2 = TextMessage(
            content=dedent("""
            ```python
            print("Round two: Get the value of variable x again: x=", x)
            ```
            """),
            source="user",
        )
        response2 = await code_executor.on_messages(messages=[code2], cancellation_token=CancellationToken())
        print(response2.chat_message.content)

asyncio.run(main())

To check statefulness, call it twice. First define x and calculate something, second print(x). In the Python command-line sandbox this errors, because the second run can’t see the first run’s context.

In the Jupyter Server stateful sandbox, the kernel stays alive after the first run. The second run in the same executor context can use previous variables:

The code in the next round was able to access the variables from the previous round.
The code in the next round was able to access the variables from the previous round. Image by Author

I’ve proved before that such a sandbox gives unique advantages for complex problem-solving.

How I Crushed Advent of Code And Solved Hard Problems Using Autogen Jupyter Executor and Qwen3
A detailed guide on using AI to generate code and solve puzzles automatically and quickly

This way of starting Jupyter containers from a Docker image in code is called Docker out of Docker.

Problems with Docker out of Docker

If you test Jupyter sandbox locally, direct DockerJupyterServer is fine.

Biggest issue: Jupyter Server starts on the same machine where you run the agent code.

If you’re doing agent research that needs a lot of computing power, or you’re getting ready to deploy your agent app to production, then this way has some problems:

For data security or computing reasons, you might use your company’s Jupyter Server with high resources. For GB-sized data, you need a tens-of-GB memory server, not your laptop.

If you use container technology to deploy your agent app, things can get tricky. Because of network isolation, even if your agent inside the container starts the Jupyter container successfully, it might not be able to reach the network where the Jupyter Server is.

You won’t host both the agent and the Jupyter service on a web server. You’ll host a Jupyter service on a compute server, and let multiple agents use it for maximum hardware utilization.

You can let multiple agents access the same Jupyter Server. Image by Author

For example, I rent a GPU server on Vast.ai and run JupyterLab. I would like my agent to connect directly with it for analysis.

Let agents connect to the Jupyter Server directly

So by now, we all understand that if we want the Jupyter code sandbox to use separate computing resources, we have to let the agent app connect directly to a Jupyter server that’s already set up, instead of starting its own instance.

You can search all over the internet, but it’s really hard to find a solution for this.

Here’s the key: how to let your multi-agent app connect to an already deployed Jupyter Kernel Server for lower cost and higher power compared to Azure or Claude services?

Next, you’ll read about:

  1. How to directly connect to a self-hosted Jupyter service to set up an enterprise-level code sandbox for agents.
  2. How to use Docker Compose to make managing the Jupyter code sandbox easier.
  3. How to change the Jupyter image settings to control and free up hardware resources.
  4. How to build a simple multi-agent app and see the awesome power of solving complex problems with a Jupyter code sandbox.
  5. Whether other agent frameworks, like LangChain, can also utilize this solution.

💡 Unlock Full Access for Free!
Subscribe now to read this article and get instant access to all exclusive member content + join our data science community discussions.