Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills
Use Jupyter code executor to help your agent finish tasks in a smarter way
In today’s tutorial, we explore how to connect your agent app to a self-hosted Jupyter server to get a powerful, stateful code runtime sandbox.
This tutorial uses a more universal way to re-create the core tech behind commercial products like Manus and Claude agent skills. Learning this will save you 40 hours of trial and error and make your enterprise-grade agent even more capable than commercial tools.
As usual, the source code sits at the end of this post, grab it if you want.
Introduction
We have shown that letting an agent generate Python code and run it inside a sandbox is more flexible, more scalable, and cheaper in token cost than using fixed function calls like Function Call or MCP. It is the best choice to boost LLM’s number-crunching skills and tackle complex problems.
In a previous post, we showcased a multi-agent code execution system with planning, generation, and reflection abilities:

This works like Claude’s code execution MCP, both use a Python runtime inside a container to run the code generated by the LLM.

But after enough use, we found that even with reasoning before execution and reflecting after execution, agents still could not reliably write code on the fly to finish tasks based on live conditions.
For example, give an agent an unfamiliar CSV file, ask it to clean and analyze the data, and find insights.
Current Python command-line code sandbox-based agent systems can’t handle this.
To see why, let’s look at how human data analysts do analysis.
When facing unknown data, analysts first load it into a DataFrame in Jupyter notebook, then run head to check column names and general types.

With column names and types in hand, they write more code to get stats like mean and median or clean null values.
This is where command-line Python runtime falls short — it is stateless. The next Python command cannot reuse the state from the previous one. This is a basic difference from Jupyter.

Most modern agent frameworks only offer stateless Python command-line code sandboxes. They might give you Claude’s code executor or Azure’s dynamic code container sessions, but these runtimes cost money and have limited resources.
What I’m Bringing You
Value of this post
The goal today is to teach you how to connect your agent system to your company’s or a platform like Vast.ai’s Jupyter Server. This gives you big advantages:
- No need for expensive commercial code sandboxes, saving huge compute costs.
- Your code and files run on a trusted internal runtime with strong data security and compliance.
- Use massive internal compute resources. When processing huge datasets with GPU parallel computing, this is a huge win.
- You gain the ability to deploy agent systems and code sandboxes across production in a distributed way, not just on your laptop.
- You still get a stateful Jupyter Server-based code sandbox so agents can decide the next code based on prior execution results.
Contents of this post
- Use Autogen’s Docker API version to spin up a Jupyter code sandbox so you get the basic idea of a stateful runtime.
- Analyze problems with this Docker API approach and what features true enterprise apps need.
- Adapt Autogen’s modules to connect to a self-hosted Jupyter Server.
- Containerize and manage Jupyter Server deployment with Docker Compose for elegance and ease.
- Tweak Jupyter image’s
Dockerfileto reclaim idle compute resources. - Try all this with a simple project.
- Explore how frameworks like LangChain can use the power of Jupyter code sandboxes.
It’s an exclusive, detailed tutorial. Let’s dive in.
Environment Setup
Build Jupyter Kernel container
The agent code sandbox works because container tech gives safety and environment isolation. First, prepare a Docker image with Jupyter Server.
The core of a Docker container is its Dockerfile. Here is the file to save you time:
# Dockerfile.jupyter
FROM python:3.13-slim-bookworm
WORKDIR /app
COPY requirements.txt /app/requirements.txt
RUN pip install --no-cache-dir jupyter_kernel_gateway ipykernel numpy pandas sympy scipy --upgrade
RUN pip install --no-cache-dir -r requirements.txt --upgrade
EXPOSE 8888
ENV TOKEN="UNSET"
CMD python -m jupyter kernelgateway \
--KernelGatewayApp.ip=0.0.0.0 \
--KernelGatewayApp.port=8888 \
--KernelGatewayApp.auth_token="${TOKEN}" \
--JupyterApp.answer_yes=true \
--JupyterWebsocketPersonality.list_kernels=trueI will not explain Docker basics, check DataCamp’s great course if you need background knowledge.
I use python:3.13-slim-bookworm as the base image, not a Jupyter image, because I need custom tweaks later.
I install must-have dependencies separately from requirements.txt to use Docker layer caching best.
Here is the requirements.txt content:
matplotlib
xlrd
openpyxlI set basic Jupyter parameters, but I will add more later to build a complete Jupyter code sandbox.
After the Dockerfile is ready, run this command to build the image:
docker build -t jupyter-server .Don’t start the Jupyter container yet, I’ll explain why later.
Expand your learning of Docker basics
If you’re new to Docker, DataCamp’s Introduction to Docker course is a quick, practical way to get started.
You’ll learn to write Dockerfiles, build images, and set up a secure container instance — perfect for the work in this article.
Install Autogen agent framework
Most agent frameworks moved the Jupyter runtime client into paid products. Autogen is the one I recommend that still supports the Jupyter runtime.
Install autogen-agentchat:
pip install -U "autogen-agentchat"To use a containerized code executor environment, also install Autogen’s Docker client lib:
pip install "autogen-ext[docker-jupyter-executor]"After building the image and installing Autogen, you’re ready to code.
Use Jupyter Code Sandbox
Use the recommended Docker API way
Let’s start with the official API example to see Autogen code executor usage.
Autogen modules for Jupyter Docker are: DockerJupyterCodeExecutor, DockerJupyterServer, CodeExecutorAgent.
DockerJupyterServer calls Docker API to launch a container from a Docker image, mount file dirs, and save Jupyter connection info.
DockerJupyterCodeExecutor holds all Jupyter Kernel API operations. Once it has connection info from the Jupyter Server, you can submit code through it.
CodeExecutorAgent is a special Autogen agent to get Python code from context and run it. Give it a model_client and it can write code and reflect on the results itself.

After you learn each module’s role, build a code executor agent to test if Docker Jupyter stateful sandbox works.
Remember, we built an image jupyter-server? Use it to init DockerJupyterServer.
server = DockerJupyterServer(
custom_image_name="jupyter-server",
expose_port=8888,
token="UNSET",
bind_dir="temp",
)Then use this server to init DockerJupyterCodeExecutor:
executor = DockerJupyterCodeExecutor(
jupyter_server=server,
timeout=600,
output_dir=Path("temp")
)When starting both server and executor we mount local temp directory into the container. Code can read/write files there, but in the Jupyter Kernel, it is the current dir, not temp.
Next, build a CodeExecutorAgent by passing the executor into code_executor.
code_executor = CodeExecutorAgent(
"code_executor",
code_executor=executor,
)Write a main to test code_executor.
async def main():
async with executor:
code1 = TextMessage(
content=dedent("""
```python
x = 1+2
print("Round one: The calculation for the value of x is done.")
```
"""),
source="user"
)
response1 = await code_executor.on_messages(messages=[code1], cancellation_token=CancellationToken())
print(response1.chat_message.content)
code2 = TextMessage(
content=dedent("""
```python
print("Round two: Get the value of variable x again: x=", x)
```
"""),
source="user",
)
response2 = await code_executor.on_messages(messages=[code2], cancellation_token=CancellationToken())
print(response2.chat_message.content)
asyncio.run(main())To check statefulness, call it twice. First define x and calculate something, second print(x). In the Python command-line sandbox this errors, because the second run can’t see the first run’s context.
In the Jupyter Server stateful sandbox, the kernel stays alive after the first run. The second run in the same executor context can use previous variables:

I’ve proved before that such a sandbox gives unique advantages for complex problem-solving.

This way of starting Jupyter containers from a Docker image in code is called Docker out of Docker.
Problems with Docker out of Docker
If you test Jupyter sandbox locally, direct DockerJupyterServer is fine.
Biggest issue: Jupyter Server starts on the same machine where you run the agent code.
If you’re doing agent research that needs a lot of computing power, or you’re getting ready to deploy your agent app to production, then this way has some problems:
For data security or computing reasons, you might use your company’s Jupyter Server with high resources. For GB-sized data, you need a tens-of-GB memory server, not your laptop.
If you use container technology to deploy your agent app, things can get tricky. Because of network isolation, even if your agent inside the container starts the Jupyter container successfully, it might not be able to reach the network where the Jupyter Server is.
You won’t host both the agent and the Jupyter service on a web server. You’ll host a Jupyter service on a compute server, and let multiple agents use it for maximum hardware utilization.

For example, I rent a GPU server on Vast.ai and run JupyterLab. I would like my agent to connect directly with it for analysis.
Let agents connect to the Jupyter Server directly
So by now, we all understand that if we want the Jupyter code sandbox to use separate computing resources, we have to let the agent app connect directly to a Jupyter server that’s already set up, instead of starting its own instance.
You can search all over the internet, but it’s really hard to find a solution for this.
Here’s the key: how to let your multi-agent app connect to an already deployed Jupyter Kernel Server for lower cost and higher power compared to Azure or Claude services?
Next, you’ll read about:
- How to directly connect to a self-hosted Jupyter service to set up an enterprise-level code sandbox for agents.
- How to use Docker Compose to make managing the Jupyter code sandbox easier.
- How to change the Jupyter image settings to control and free up hardware resources.
- How to build a simple multi-agent app and see the awesome power of solving complex problems with a Jupyter code sandbox.
- Whether other agent frameworks, like LangChain, can also utilize this solution.
💡 Unlock Full Access for Free!
Subscribe now to read this article and get instant access to all exclusive member content + join our data science community discussions.


