Building a Multi-Cloud Management Assistant with Claude Code

Your infrastructure lives everywhere. Some workloads run on AWS because that's where you started. You're experimenting with Azure because the ML services are compelling. GCP handles your data pipeline because BigQuery is phenomenal. And now you have a problem: three cloud providers, three different CLI tools, three different mental models, and nobody on your team wants to learn all three deeply.
This is the multi-cloud reality for mid-to-large organizations. It's not a phase. It's a business decision that makes sense locally—use the best tool for each job—but it creates operational complexity globally.
What if instead of learning three CLIs, your team used one unified interface? A Claude Code assistant that understood AWS, Azure, and GCP natively. You could ask "How much is my total cloud spending?" and get a unified answer across all three providers. You could ask "Deploy this container to whichever cloud gives me the best latency" and the assistant figures out where to deploy it. You could audit security across your entire multi-cloud estate from one place.
That's what we're building: a Claude Code Agent SDK-based multi-cloud management assistant that abstracts away cloud-specific details and gives you a single, intelligent interface to your entire infrastructure footprint.
Table of Contents
- Why the Agent SDK Matters for Multi-Cloud
- Architecture: The Three-Layer Model
- Designing the Connector Architecture
- Building the Cloud Connectors
- The Case for Multi-Cloud Management
- Understanding the Agent SDK Model
- Building the Orchestration Layer
- Designing Tool Interfaces for the Agent
- Integrating with Claude Code Agent SDK
- Testing Your Multi-Cloud Assistant
- Handling Real-World Complexity
- Best Practices and Gotchas
- Extending the Assistant: Advanced Patterns
- Metrics and Monitoring Your Multi-Cloud Setup
- Understanding the Economics of Multi-Cloud
- Migration Patterns: Moving Workloads Between Clouds
- Governance and Compliance
- Wrapping Up
Why the Agent SDK Matters for Multi-Cloud
Before we start coding, understand why Claude Code's Agent SDK is the right tool for this job. The Agent SDK lets you build agents—persistent, stateful systems that can orchestrate multiple tools, maintain context across operations, and make decisions based on real-time infrastructure data.
A multi-cloud assistant isn't just a CLI wrapper. It's a system that needs to:
Understand your infrastructure topology across three different cloud providers and find relationships (this Azure VM needs to talk to that AWS RDS, which backs this GCP Cloud Run service). Maintain context across operations (if you ask to scale up a workload, the assistant needs to remember you're talking about the service from the previous command). Make decisions that require reasoning (if you ask to reduce cloud costs, the assistant needs to analyze usage patterns across clouds and recommend the right trade-offs). Integrate with existing tools (your Terraform code, your monitoring stack, your incident management system).
A simple REST API wrapper can't do this. You need an agent—a system with memory, reasoning, and tool orchestration. That's what the Agent SDK provides. Understanding this distinction is key to why multi-cloud management is hard. Most organizations have three separate CLIs—AWS CLI, Azure CLI, gcloud. They're not integrated. Each one is a separate tool. You use AWS CLI to list instances, then you manually run Azure CLI to list VMs, then gcloud to list instances. You copy-paste costs between three different dashboards to get a total. This manual work is tedious and error-prone. It's also where multi-cloud management fails. It's not that the tools are bad; it's that integrating them requires manual effort.
Claude Code's Agent SDK changes this by giving you a system that can call multiple tools, understand results, and reason about how they fit together. The assistant can fetch AWS costs, Azure costs, and GCP costs in parallel, aggregate them, understand which cloud is most expensive, and recommend optimizations specific to the expensive cloud. It maintains context across these operations so you can have a conversation: "Our costs are too high" → "Here's the breakdown" → "Why is Azure so expensive?" → "These three services are the culprits" → "What if we moved them to AWS?" → "Here are the cost and latency implications." This is cognitive work that humans have to do manually today, but the Agent SDK lets Claude do automatically.
The other crucial aspect is that agents can handle failure gracefully. If AWS is temporarily unavailable, the agent can still get data from Azure and GCP, tell you what succeeded and what failed, and offer partial results. A simple script-based approach would crash. An agent recovers and adapts.
Architecture: The Three-Layer Model
Your multi-cloud assistant will have a specific architecture. Understanding it is crucial before writing code.
The presentation layer is your interface. This might be a CLI (via Claude Code), Slack commands, or a web dashboard. The presentation layer handles user input, formats output, and manages the conversation flow. It's cloud-agnostic—it doesn't care whether the command is targeting AWS or Azure.
The orchestration layer is the agent itself. This is where Claude Code's Agent SDK lives. It understands the user's intent, decides which cloud operations need to happen, and choreographs them. If you ask "Scale my front-end," the orchestrator figures out which resources are front-end, which cloud they run on, and what operations to perform. It maintains state, tracks dependencies, and can reason about trade-offs.
The cloud connector layer is the bridge to each cloud provider. This is where AWS SDK, Azure SDK, and GCP client libraries live. Each connector is responsible for translating the orchestrator's instructions into cloud-specific API calls. The beauty of this architecture is that the orchestrator is cloud-agnostic—it never knows or cares about AWS-specific APIs. It just says "scale this service" and the connector handles the details.
This three-layer model matters because it means your orchestrator logic doesn't have to change when you add a new cloud provider. You just plug in a new connector.
Designing the Connector Architecture
Before we start building connectors, understand the design pattern we're implementing. The connector pattern separates cloud-specific logic from business logic. The orchestrator contains business logic—decisions about what to do. The connector contains cloud logic—how to implement those decisions using cloud APIs.
This separation is crucial for maintainability. When AWS changes their API, you update the AWS connector. The orchestrator is unaffected. When you add a new cloud provider, you add a new connector. The orchestrator is unaffected. When you update your scaling strategy, you update the orchestrator. The connectors are unaffected.
Cloud APIs are notoriously different. AWS uses IRCs and ECS and RDS. Azure uses virtual machines and container instances and SQL Database. GCP uses Compute Engine and Cloud Run and Cloud SQL. Similar concepts, different APIs. The connector pattern shields you from these differences.
When designing connectors, follow three principles: consistency, normalization, and fault handling. Consistency means every connector exposes the same interface. Normalization means different cloud concepts are translated into unified concepts. Fault handling means errors are translated into a common format.
Understanding the connector pattern at a deeper level is about understanding abstraction and its cost-benefit tradeoff. Building consistent interfaces across clouds requires extra work. You have to decide how to represent AWS security groups in a way that also works for Azure network security groups. You have to figure out how to normalize RDS instances, Azure SQL databases, and GCP Cloud SQL into a unified data model. This extra work pays dividends when you scale. A new engineer joining the team learns one interface, not three. Adding a fourth cloud is proportional effort, not exponential. But building the abstraction wrong is catastrophic. If your unified interface doesn't actually unify—if representing Azure operations through the AWS interface causes impedance mismatches—you've added complexity without gaining clarity. This is why the design of connectors is so important. Spend time getting the abstraction right, and everything that follows flows smoothly. Get it wrong, and you're fighting your own architecture forever.
Building the Cloud Connectors
Let's start with the layer closest to the clouds: the connectors. Each cloud provider gets its own connector module.
A connector is the translation layer between your cloud-agnostic orchestrator and a specific cloud provider's API. The orchestrator says "scale this service to 5 instances." The connector translates that into AWS ECS calls, or Azure Container Group calls, or GCP Cloud Run calls. The connector handles authentication, error handling, and cloud-specific quirks. This isolation is crucial because it means you can swap out a connector without rewriting the orchestrator.
When you design connectors, follow a consistent interface. Every connector should expose the same methods: listInstances(), getCost(), scaleService(), and so on. The method signatures should match. The return values should have the same shape. This consistency makes the orchestrator's job trivial—it doesn't care which cloud it's working with because all clouds present the same interface.
Why does this matter? Imagine you start with AWS and build an orchestrator that calls AWS-specific methods. Later you add Azure, but Azure's API is different. Now your orchestrator has AWS-specific code and Azure-specific code mixed together. It becomes a mess. With consistent connectors, your orchestrator is completely cloud-agnostic. It's not easier to add the third cloud (GCP) than it was to add the second one (Azure).
Implementing connectors correctly requires careful attention to error handling. What happens when AWS returns an error? What about Azure rate limiting? What about GCP timeouts? Each cloud has different error semantics. Your connector should normalize these into a common error format that the orchestrator can understand. Don't let cloud-specific error codes leak into the orchestrator.
Authentication is another critical concern. Each cloud has different authentication mechanisms. AWS uses access keys and secrets. Azure uses service principals. GCP uses service accounts. Each connector should handle authentication independently using the cloud's standard mechanisms. The orchestrator should never see credentials. Never hardcode credentials in code. Use environment variables or credential files. This prevents accidental credential leaks.
Connection pooling and resource management are important for performance. Creating new API clients on every request is wasteful. Create clients once in the constructor and reuse them. Implement connection pooling if the SDK supports it. Implement retry logic with exponential backoff for transient failures. These details matter for reliability in production.
Here's an AWS connector that handles common operations:
// connectors/aws-connector.js
const AWS = require("aws-sdk");
class AWSConnector {
constructor(region = "us-east-1") {
this.region = region;
this.ec2 = new AWS.EC2({ region });
this.ecs = new AWS.ECS({ region });
this.rds = new AWS.RDS({ region });
}
async listInstances() {
const response = await this.ec2.describeInstances().promise();
return response.Reservations.flatMap((r) =>
r.Instances.map((i) => ({
id: i.InstanceId,
type: i.InstanceType,
state: i.State.Name,
tags: this.parseTags(i.Tags),
})),
);
}
async getInstanceCost(instanceId) {
// Call AWS Cost Explorer to get cost data
const ce = new AWS.CostExplorer();
const response = await ce
.getCostAndUsage({
TimePeriod: {
Start: "2024-01-01",
End: "2024-01-31",
},
Granularity: "MONTHLY",
Metrics: ["UnblendedCost"],
Filter: {
Tags: {
Key: "InstanceId",
Values: [instanceId],
},
},
})
.promise();
return response.ResultsByTime[0].Total.UnblendedCost.Amount;
}
async scaleECSService(clusterName, serviceName, desiredCount) {
const response = await this.ecs
.updateService({
cluster: clusterName,
service: serviceName,
desiredCount,
})
.promise();
return {
success: true,
message: `Scaled ${serviceName} to ${desiredCount} tasks`,
serviceArn: response.service.serviceArn,
};
}
parseTags(tags) {
const result = {};
if (!tags) return result;
tags.forEach((tag) => {
result[tag.Key] = tag.Value;
});
return result;
}
}
module.exports = AWSConnector;Now an Azure connector with analogous operations:
// connectors/azure-connector.js
const { ComputeManagementClient } = require("@azure/arm-compute");
const {
ContainerInstanceManagementClient,
} = require("@azure/arm-containerinstance");
const { DefaultAzureCredential } = require("@azure/identity");
class AzureConnector {
constructor(subscriptionId, resourceGroup) {
this.subscriptionId = subscriptionId;
this.resourceGroup = resourceGroup;
this.credential = new DefaultAzureCredential();
this.computeClient = new ComputeManagementClient(
this.credential,
subscriptionId,
);
}
async listVirtualMachines() {
const vms = await this.computeClient.virtualMachines.listByResourceGroup(
this.resourceGroup,
);
return vms.map((vm) => ({
id: vm.id,
name: vm.name,
type: vm.hardwareProfile?.vmSize,
state: vm.provisioningState,
tags: vm.tags || {},
}));
}
async getVMCost(vmName) {
// Azure Cost Management API
const costClient = new CostManagementClient(
this.credential,
this.subscriptionId,
);
const scope = `/subscriptions/${this.subscriptionId}/resourceGroups/${this.resourceGroup}`;
const query = {
type: "Usage",
timeframe: "MonthToDate",
dataset: {
granularity: "Daily",
aggregation: {
totalCost: {
name: "PreTaxCost",
function: "Sum",
},
},
filter: {
dimensions: {
name: "ResourceName",
operator: "In",
values: [vmName],
},
},
},
};
const result = await costClient.query.usage(scope, query);
return result.properties.rows[0][0]; // Cost value
}
async scaleContainerGroup(containerGroupName, numInstances) {
const containerClient = new ContainerInstanceManagementClient(
this.credential,
this.subscriptionId,
);
// Azure Container Instances has fixed scale, so we recreate
const group = await containerClient.containerGroups.get(
this.resourceGroup,
containerGroupName,
);
group.containers[0].resources.requests.cpu = numInstances * 0.5;
group.containers[0].resources.requests.memoryInGb = numInstances * 0.5;
await containerClient.containerGroups.createOrUpdate(
this.resourceGroup,
containerGroupName,
group,
);
return {
success: true,
message: `Scaled ${containerGroupName} to ${numInstances} instances`,
groupId: group.id,
};
}
}
module.exports = AzureConnector;And the GCP connector:
// connectors/gcp-connector.js
const compute = require("@google-cloud/compute");
const { CloudRunClient } = require("@google-cloud/run");
const { BigQuery } = require("@google-cloud/bigquery");
class GCPConnector {
constructor(projectId) {
this.projectId = projectId;
this.compute = new compute.InstancesClient();
this.cloudRun = new CloudRunClient();
this.bigquery = new BigQuery({ projectId });
}
async listInstances(zone = "us-central1-a") {
const [instances] = await this.compute.list({
project: this.projectId,
zone,
});
return instances.map((instance) => ({
id: instance.id,
name: instance.name,
type: instance.machineType,
state: instance.status,
tags: instance.tags?.items || [],
}));
}
async getInstanceCost(instanceName) {
// Query BigQuery dataset that contains billing data
const query = `
SELECT SUM(cost) as total_cost
FROM \`${this.projectId}.billing.daily_cost\`
WHERE resource_name = @instanceName
AND usage_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY)
`;
const options = {
query,
params: { instanceName },
};
const [rows] = await this.bigquery.query(options);
return rows[0]?.total_cost || 0;
}
async scaleCloudRunService(serviceName, minInstances, maxInstances) {
const parent = this.cloudRun.servicePath(
this.projectId,
"us-central1",
serviceName,
);
const service = {
name: parent,
spec: {
minInstances,
maxInstances,
},
};
await this.cloudRun.updateService({ service });
return {
success: true,
message: `Scaled ${serviceName} to ${minInstances}-${maxInstances} instances`,
serviceName,
};
}
}
module.exports = GCPConnector;Notice each connector implements the same interface: listInstances(), getCost(), scaleService(). The method names are consistent. The return values have the same shape. This uniformity is intentional—it makes the orchestrator's job much easier.
The Case for Multi-Cloud Management
Before diving into the implementation, let's understand why multi-cloud matters and why unified management is valuable. Most organizations don't choose to be multi-cloud for fun. It's usually forced by circumstances: you started with AWS, then Azure had better capabilities for a specific project, then GCP had compelling data analytics services. Now you're multi-cloud by accident, not by strategic choice.
Being multi-cloud accidentally is expensive. Your teams learn three CLI tools. Your documentation covers three sets of best practices. Your deployment pipelines have to integrate with three providers. Your billing and cost tracking is fragmented across three systems. Your developers context-switch constantly between different mental models. An engineer who's expert on AWS has to context-switch to Azure or GCP, losing all the intuition they've built. A deployment that would take 15 minutes on AWS takes an hour on GCP because the engineer is learning as they go. This context-switching is invisible in the budget but it's real cost—in development velocity, in team morale, in error rates from people working in unfamiliar systems.
But multi-cloud can also be an advantage if managed properly. You get provider diversity, which reduces vendor lock-in. You can use the best-of-breed services from each provider rather than accepting one provider's suboptimal offering. You can distribute workloads geographically to reduce latency. You can implement disaster recovery more easily because you're not dependent on a single provider's infrastructure. GCP's BigQuery is genuinely better for analytics than AWS Athena. Azure's ML capabilities are compelling for certain workloads. AWS has the most mature ecosystem and the cheapest compute. Being multi-cloud lets you use each provider's strengths.
The key is making multi-cloud management smooth rather than painful. Without management, multi-cloud is a liability. With good management, multi-cloud is an asset. A unified assistant that understands all your clouds natively makes this possible. Instead of learning three tools, your team learns one interface. Instead of three deployment pipelines, you have one orchestrator that understands how to deploy across clouds. Instead of manually aggregating costs from three dashboards, the assistant gives you a single view. This transforms multi-cloud from a burden into a strategic advantage.
Understanding the Agent SDK Model
Before we look at code, understand what makes Claude Code's Agent SDK particularly suited for multi-cloud orchestration. The Agent SDK is designed for stateful, multi-step reasoning over tools. Unlike a simple REST API that executes one request and returns, agents maintain conversational context, can call multiple tools in sequence, and can reason about results and make follow-up decisions.
This is perfect for multi-cloud management. When a user says "reduce our cloud costs," that's not a single operation. The agent needs to: fetch cost data from all clouds, analyze the patterns, generate recommendations, potentially validate those recommendations against service dependencies, and explain the tradeoffs. A single tool call can't do this. But an agent with memory, reasoning, and multiple tools can.
The Agent SDK also handles error recovery gracefully. If one tool call fails (say, GCP Cost Analysis times out), the agent can still get partial results from AWS and Azure, explain what happened, and offer alternative insights. A simpler approach would fail entirely.
Building the Orchestration Layer
Now here's the agent that orchestrates across all three connectors. This is where Claude Code's Agent SDK shines. The orchestrator is the intelligence layer. It understands your cloud topology and can make decisions about how to perform operations across clouds:
// orchestrator/multi-cloud-orchestrator.js
const AWSConnector = require("../connectors/aws-connector");
const AzureConnector = require("../connectors/azure-connector");
const GCPConnector = require("../connectors/gcp-connector");
class MultiCloudOrchestrator {
constructor(config) {
this.config = config;
this.connectors = {
aws: new AWSConnector(config.aws.region),
azure: new AzureConnector(
config.azure.subscriptionId,
config.azure.resourceGroup,
),
gcp: new GCPConnector(config.gcp.projectId),
};
this.context = {}; // Maintain state across commands
}
async getAllInstances() {
const results = {
aws: [],
azure: [],
gcp: [],
};
// Fetch from all clouds in parallel
const [awsInstances, azureVMs, gcpInstances] = await Promise.all([
this.connectors.aws.listInstances(),
this.connectors.azure.listVirtualMachines(),
this.connectors.gcp.listInstances(),
]);
results.aws = awsInstances;
results.azure = azureVMs;
results.gcp = gcpInstances;
return results;
}
async getTotalCost() {
// Aggregate costs across all clouds
const instances = await this.getAllInstances();
const costs = {
aws: 0,
azure: 0,
gcp: 0,
};
// Get costs for each instance in parallel
const awsCosts = await Promise.all(
instances.aws.map((i) => this.connectors.aws.getInstanceCost(i.id)),
);
costs.aws = awsCosts.reduce((a, b) => a + parseFloat(b), 0);
const azureCosts = await Promise.all(
instances.azure.map((vm) => this.connectors.azure.getVMCost(vm.name)),
);
costs.azure = azureCosts.reduce((a, b) => a + parseFloat(b), 0);
const gcpCosts = await Promise.all(
instances.gcp.map((i) => this.connectors.gcp.getInstanceCost(i.name)),
);
costs.gcp = gcpCosts.reduce((a, b) => a + parseFloat(b), 0);
const total = costs.aws + costs.azure + costs.gcp;
return {
breakdown: costs,
total,
currency: "USD",
};
}
async scaleService(serviceName, desiredCount) {
// Find the service across all clouds
const instances = await this.getAllInstances();
// Look for the service by name across clouds
let found = false;
// Check AWS
const awsService = instances.aws.find((i) =>
i.tags?.Name?.includes(serviceName),
);
if (awsService) {
await this.connectors.aws.scaleECSService(
"default",
serviceName,
desiredCount,
);
found = true;
}
// Check Azure
const azureService = instances.azure.find((vm) =>
vm.name.includes(serviceName),
);
if (azureService) {
await this.connectors.azure.scaleContainerGroup(
serviceName,
desiredCount,
);
found = true;
}
// Check GCP
const gcpService = instances.gcp.find((i) => i.name.includes(serviceName));
if (gcpService) {
await this.connectors.gcp.scaleCloudRunService(
serviceName,
Math.floor(desiredCount / 2),
desiredCount,
);
found = true;
}
if (!found) {
throw new Error(`Service ${serviceName} not found in any cloud`);
}
return {
success: true,
message: `Scaled ${serviceName} to ${desiredCount} instances across all clouds`,
};
}
async recommendCostOptimizations() {
const costs = await this.getTotalCost();
const instances = await this.getAllInstances();
const recommendations = [];
// AWS recommendations
instances.aws.forEach((instance) => {
if (instance.type === "m5.2xlarge" && instance.state === "running") {
recommendations.push({
cloud: "AWS",
resource: instance.id,
issue: "Large instance type with low usage",
recommendation: "Downsize to m5.large",
estimatedSavings: 600, // Monthly in dollars
});
}
});
// Azure recommendations
instances.azure.forEach((vm) => {
if (vm.type === "Standard_D4s_v3" && vm.state === "PowerState/running") {
recommendations.push({
cloud: "Azure",
resource: vm.name,
issue: "Over-provisioned for typical workload",
recommendation: "Downsize to Standard_D2s_v3",
estimatedSavings: 400,
});
}
});
// GCP recommendations
instances.gcp.forEach((instance) => {
if (
instance.type.includes("n1-standard-4") &&
instance.state === "RUNNING"
) {
recommendations.push({
cloud: "GCP",
resource: instance.name,
issue: "Can use committed discount",
recommendation: "Purchase 1-year commitment",
estimatedSavings: 250,
});
}
});
const totalPotentialSavings = recommendations.reduce(
(sum, r) => sum + r.estimatedSavings,
0,
);
return {
recommendations,
currentMonthly: costs.total,
potentialMonthly: costs.total - totalPotentialSavings,
totalSavings: totalPotentialSavings,
};
}
}
module.exports = MultiCloudOrchestrator;This orchestrator maintains the cloud-agnostic logic. When you ask to scale a service, it doesn't know or care which cloud it's on. It just finds the service and scales it. When you ask for costs, it aggregates across all three clouds. This is the power of the abstraction.
Designing Tool Interfaces for the Agent
Before integrating with the Agent SDK, design your tools carefully. A tool is the bridge between what the agent wants to do and what the orchestrator can actually do.
Good tools are atomic—they do one thing well. "get_all_instances" is atomic—it returns all instances. "scale_service" is atomic—it scales one service. "get_cost_recommendations" is atomic—it analyzes costs and returns recommendations.
Bad tools try to do too much. "manage_infrastructure" is too broad. It's unclear what it does. The agent doesn't know how to use it effectively. Break it into multiple focused tools.
Good tools are unambiguous. The input parameters are clear. The output format is clear. When the agent calls the tool, there's no confusion about what will happen.
Bad tools are ambiguous. They have optional parameters where it's unclear what happens if they're omitted. They return data in inconsistent formats. The agent can't reason about what they'll do.
Good tools handle errors gracefully. If something fails, the tool returns a clear error message. The agent can understand what went wrong and decide how to proceed.
Bad tools crash or return cryptic errors. The agent can't recover from failures.
Integrating with Claude Code Agent SDK
Now we wire this into Claude Code using the Agent SDK. This is where your assistant becomes truly powerful:
// agent/multi-cloud-agent.js
const Anthropic = require("@anthropic-ai/sdk");
const MultiCloudOrchestrator = require("../orchestrator/multi-cloud-orchestrator");
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const orchestrator = new MultiCloudOrchestrator({
aws: { region: "us-east-1" },
azure: {
subscriptionId: process.env.AZURE_SUBSCRIPTION_ID,
resourceGroup: "my-rg",
},
gcp: { projectId: process.env.GCP_PROJECT_ID },
});
const tools = [
{
name: "get_all_instances",
description: "List all instances across AWS, Azure, and GCP",
input_schema: {
type: "object",
properties: {},
},
},
{
name: "get_total_cost",
description: "Get total cloud spending across all providers",
input_schema: {
type: "object",
properties: {},
},
},
{
name: "scale_service",
description: "Scale a service to a specified number of instances",
input_schema: {
type: "object",
properties: {
serviceName: {
type: "string",
description: "Name of the service to scale",
},
desiredCount: {
type: "number",
description: "Desired number of instances",
},
},
required: ["serviceName", "desiredCount"],
},
},
{
name: "get_cost_recommendations",
description: "Get recommendations to optimize cloud costs",
input_schema: {
type: "object",
properties: {},
},
},
];
async function runAgent(userMessage) {
const messages = [
{
role: "user",
content: userMessage,
},
];
let response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 2048,
tools,
messages,
});
while (response.stop_reason === "tool_use") {
const toolUseBlock = response.content.find(
(block) => block.type === "tool_use",
);
let toolResult;
try {
if (toolUseBlock.name === "get_all_instances") {
toolResult = await orchestrator.getAllInstances();
} else if (toolUseBlock.name === "get_total_cost") {
toolResult = await orchestrator.getTotalCost();
} else if (toolUseBlock.name === "scale_service") {
const { serviceName, desiredCount } = toolUseBlock.input;
toolResult = await orchestrator.scaleService(serviceName, desiredCount);
} else if (toolUseBlock.name === "get_cost_recommendations") {
toolResult = await orchestrator.recommendCostOptimizations();
}
} catch (error) {
toolResult = { error: error.message };
}
messages.push({
role: "assistant",
content: response.content,
});
messages.push({
role: "user",
content: [
{
type: "tool_result",
tool_use_id: toolUseBlock.id,
content: JSON.stringify(toolResult),
},
],
});
response = await anthropic.messages.create({
model: "claude-3-5-sonnet-20241022",
max_tokens: 2048,
tools,
messages,
});
}
// Extract final text response
const textBlock = response.content.find((block) => block.type === "text");
return textBlock ? textBlock.text : "No response";
}
// Run examples
(async () => {
console.log("User: How much are we spending across all clouds?");
const costResponse = await runAgent(
"How much are we spending across all clouds?",
);
console.log("Assistant:", costResponse);
console.log("\n---\n");
console.log("User: What are your recommendations to reduce our cloud costs?");
const recResponse = await runAgent(
"What are your recommendations to reduce our cloud costs?",
);
console.log("Assistant:", recResponse);
})();This agent can handle natural language requests. When you ask "How much are we spending?", the agent figures out that it needs to call the cost tool, aggregates costs across clouds, and explains the results. When you ask "Scale my API service to 10 instances," the agent finds the API service across all clouds and scales it.
Testing Your Multi-Cloud Assistant
Before deploying a multi-cloud assistant to production, test it thoroughly. Testing is complex because you're dealing with three cloud providers and need to ensure consistency across them.
Create integration tests that validate each connector against real cloud environments. Test listing instances, getting costs, scaling services. Test error cases—what happens when a service doesn't exist? What happens when you don't have permission to scale?
Create agent tests that ensure the agent can reason about multi-cloud scenarios. Mock the tools if needed. Test scenarios like: "User asks for total cost, agent calls get_total_cost, agent explains the results." Test error recovery: "GCP times out, but AWS and Azure succeed. Agent explains the partial results."
Create end-to-end tests in a staging environment. Deploy the assistant. Have it manage staging infrastructure. Verify that it makes correct decisions and executes correct operations.
Document all test scenarios. Keep test results. When you make changes, re-run all tests. This prevents regressions where a change to the AWS connector breaks something in the orchestrator.
Handling Real-World Complexity
In production, you'll hit complexity. Here's how to handle common cases:
Services that span multiple clouds. A front-end in GCP, a backend in AWS, and analytics in Azure. Your orchestrator needs to understand these relationships. Store service topology in a configuration file:
services:
api-backend:
cloud: aws
type: ecs
cluster: production
frontend:
cloud: gcp
type: cloud-run
region: us-central1
analytics:
cloud: azure
type: container-group
resource-group: my-rgDifferent cost models. AWS reserves instances. Azure has different pricing per region. GCP offers commitments. Your getTotalCost() method needs to account for these. Consider actual committed discounts, not just on-demand pricing.
Authentication across clouds. Use environment variables or a secret manager. Never hardcode credentials. Each connector should authenticate independently using the cloud's standard auth mechanisms.
Audit logging. When your assistant makes changes, log them. Which user requested the scale operation? What was the before/after state? This is critical for compliance and understanding what happened.
Best Practices and Gotchas
First, design for idempotency. If the user asks to scale a service twice, the second call should be safe. Don't error out—just recognize that the service is already at the desired state. Idempotent operations are critical in distributed systems because networks are unreliable. A request might succeed but the response might not reach the user. If they retry thinking it failed, you want the retry to be harmless. "Scale to 5 instances" should be safe to call twice. If you've already scaled to 5, calling it again should be a no-op, not an error.
Second, implement proper error handling. Cloud APIs are unreliable. Implement retries with exponential backoff. If scaling AWS fails but Azure succeeds, your state is now inconsistent. Track this and alert the user. When you're orchestrating across multiple clouds, partial failures are inevitable. Maybe AWS times out but Azure responds. Now you've scaled Azure but not AWS. You need visibility into this state so you can remediate. Log every operation and its outcome. Make it clear to the user what succeeded and what failed.
Third, respect rate limits. Cloud APIs have rate limits. If you're querying cost data for 100 instances, spread it across time or batch the requests intelligently. Most cloud APIs allow X requests per second. If you're getting cost data for 100 instances and the API allows 10 requests/second, you need 10 seconds to fetch all the data. Parallel requests can speed this up, but be careful not to exceed the limit. Implement backoff strategies. If you hit a rate limit, wait and retry.
Fourth, cache aggressively where safe. Cost data doesn't change minute by minute. Cache it for an hour. Instance lists might change more frequently, but caching for a minute is usually safe. Caching reduces API calls, which means lower latency for users and less load on the cloud providers. Just be clear about when the data was last refreshed. If the user makes a decision based on cached data that's 55 minutes old, they should know that.
Finally, maintain observability. Log every operation, every cost query, every scaling action. Make this data available for analysis. Over time, you'll understand your multi-cloud usage patterns and optimize further. You'll notice that GCP costs spike on Tuesdays. You'll see that scaling operations on Azure take longer than AWS. You'll discover patterns that inform how you configure the system. These insights come from good logging and analysis. Build logging in from the start, not as an afterthought.
Extending the Assistant: Advanced Patterns
As your multi-cloud assistant matures, you'll want to add sophistication. Here are patterns that successful teams implement:
Resource tagging strategy: Implement consistent tagging across all clouds so you can correlate resources. "This EC2 instance, this Azure VM, and this GCP Compute Engine instance are all part of our web service layer." With consistent tagging, your assistant can reason about resource relationships. When you ask "How much does our web service cost?", the assistant can find all resources tagged with service=web-service and aggregate costs.
Cost allocation models: Different clouds charge differently. AWS charges per instance hour. Azure charges per minute. GCP uses a similar per-minute model but with different pricing tiers. Your assistant should understand these differences and present costs in a normalized way. Instead of showing "AWS: $1000 per month, Azure: $1500 per month, GCP: $900 per month," normalize them to cost-per-unit-of-compute or cost-per-service so comparisons are meaningful.
Disaster recovery orchestration: When something fails, your assistant can orchestrate recovery across clouds. If your AWS region goes down, the assistant can bring up replacement services in Azure or GCP. This requires deep knowledge of your architecture, but once implemented, it's incredibly powerful. Your assistant becomes an automated disaster recovery system.
Capacity planning: By analyzing historical usage patterns, your assistant can recommend capacity changes. "Your GCP instance is consistently at 20% utilization. Consider downsizing to save $300/month." Or: "Your Azure database is at 85% capacity. Plan for scaling in the next quarter."
Compliance reporting: Multi-cloud environments have compliance challenges. Different clouds have different compliance certifications. Your assistant can aggregate compliance status across clouds, flag non-compliant resources, and generate reports. This is invaluable for regulated industries.
Metrics and Monitoring Your Multi-Cloud Setup
Once your assistant is running, you need visibility into its performance. Implement metrics that help you understand your multi-cloud costs and utilization.
Cost metrics should track spending by cloud provider, by service, by team, by cost center—however your organization allocates costs. You should be able to answer questions like: "How much did AWS cost this month?" "How much did our API service cost across all clouds?" "Which cloud is most expensive for databases?"
Utilization metrics should track resource utilization by cloud provider. You should know that your GCP compute is at 60% utilization while your Azure compute is at 85%. This helps you identify rebalancing opportunities.
Performance metrics should track latency, error rates, and availability by cloud provider. If one cloud is consistently slower than others, investigate why. If one cloud has higher error rates, determine if it's a configuration issue or a provider issue.
Implement dashboards that visualize these metrics. Give different audiences different views. Finance cares about costs. Engineering cares about performance and availability. Leadership cares about risk and vendor diversity.
Set up alerts for anomalies. If costs spike unexpectedly, alert someone. If utilization drops below a threshold, that's an optimization opportunity. If availability drops, that's a reliability issue.
Use these metrics to drive decisions. If one cloud is consistently expensive, maybe move workloads to a cheaper cloud. If one cloud is consistently reliable, maybe expand there. If utilization is low everywhere, maybe consolidate.
Understanding the Economics of Multi-Cloud
Building a multi-cloud assistant requires investment. You need SDKs for three clouds. You need to understand three different APIs. You need to maintain three connectors. Is it worth it?
The economics are clear if you're already multi-cloud. If your infrastructure is split across clouds, you're paying the cost of managing three systems anyway. A unified assistant reduces that cost significantly. Teams save time by not context-switching between tools. Teams make better decisions because they can see the full picture across clouds. Teams optimize costs because they understand where their money is going.
But if you're single-cloud, don't build this. It's not worth the effort. The single-cloud tools (AWS CLI, Azure CLI, gcloud) are excellent. Use them. If you're single-cloud today but might be multi-cloud tomorrow, build your connector infrastructure in a way that makes adding clouds easy. Use the layered architecture we described. Don't hardcode cloud-specific logic. Then when multi-cloud becomes reality, you're ready.
Migration Patterns: Moving Workloads Between Clouds
As your multi-cloud assistant matures, you'll want the ability to move workloads between clouds. Maybe a service is expensive on AWS and you want to try Azure. Maybe GCP has better performance for a specific workload. Your assistant should be able to help with migrations.
A migration is more complex than simple scaling. You need to: provision new infrastructure in the target cloud, migrate data, validate that the new service works, cut over traffic from the old service to the new service, and eventually decommission the old service.
Your assistant can orchestrate this entire process. The orchestrator coordinates with connectors to provision infrastructure in the target cloud, validates connectivity, runs smoke tests, and manages traffic cutover.
Migrations are risky. Build in safety mechanisms. Run the new service in parallel with the old service before cutting over. Test thoroughly. Have rollback plans. Your assistant should guide users through the migration process, not execute it blindly.
Document your migration processes. Different service types (APIs, databases, queues) have different migration patterns. Your assistant should know these patterns.
Governance and Compliance
Multi-cloud environments create compliance challenges. Different clouds have different compliance certifications (SOC2, HIPAA, PCI-DSS, etc.). You need visibility into which services run where and what compliance certifications apply.
Your assistant can help with compliance. It can track which services run in which clouds and which compliance certifications apply to each. It can flag services running in non-compliant configurations. It can generate compliance reports.
This is valuable for regulated industries where compliance is critical. Your assistant becomes a guardrail that prevents accidental compliance violations.
Wrapping Up
A multi-cloud assistant isn't just a convenience tool. It's a strategic investment in operational efficiency. Instead of your team context-switching between three CLIs and three mental models, they have one interface. They ask questions in English. The assistant figures out which clouds are involved and what operations to perform.
Claude Code's Agent SDK is specifically designed for this use case: multi-step reasoning, tool orchestration, and stateful context maintenance. The agent model naturally extends to multi-cloud scenarios where the same logical operation maps to different cloud APIs, and where decisions require reasoning about tradeoffs across clouds.
Start with the three connectors and the orchestrator. Get them working. Add tools gradually. Test thoroughly before deploying to production. Your infrastructure will become significantly more manageable, and your team will operate more efficiently.
The patterns in this article apply beyond multi-cloud management. Anywhere you have multiple systems that need unified control, the connector-orchestrator-agent pattern works. Use it for managing on-prem and cloud. Use it for managing microservices across regions. Use it for managing different infrastructure providers (Kubernetes, Docker Swarm, etc.). The architecture scales to whatever problem you're solving.
Building a multi-cloud assistant requires investment upfront. You need to understand three cloud providers. You need to implement three connectors. You need to build an orchestrator that can reason about cross-cloud operations. But once complete, the returns are significant. Your team operates more efficiently. You make better decisions because you see the full picture. You optimize costs because you understand where money is going. You reduce vendor lock-in because your infrastructure isn't tied to one cloud.
The key is starting simple. Get one connector working. Get one operation (like listing instances) working end-to-end. Verify that it works before adding complexity. Then add the second cloud. Then the third. Each iteration builds on the previous one. By the time you have all three clouds, the patterns are established and adding new features is straightforward.
This incremental approach also limits risk. If something goes wrong with the AWS connector, it doesn't affect Azure or GCP. If the orchestrator has a bug, it doesn't affect the connectors. The layered architecture provides resilience.
Over time, your multi-cloud assistant becomes a strategic asset. It frees your team from low-level cloud management details. It enables you to make architectural decisions based on technical merit, not on familiarity with a particular cloud. It helps you build a more flexible, more resilient infrastructure.
-iNet