💡Understanding Cortex Architecture
How Cortex work
Cortex can be seen as having three layers
Callable: The infrastructure that handles all LLM-based workflows. It follows serverless function architecture (like AWS Lambda) that enables maximum scalability
Knowledge: Managed vector database for fast information retrieval and near real-time syncing
Client Interface
Copilot: is a managed UI that enables end-users to conveniently interact with AI applications
API: API and SDK clients are provided to support developers who wish to directly interact with their callables to power their own products
A Basic Workflow
When the end user interacts with a Copilot, a few things happen in the background:
Cortex Copilot client compiles users' input into a standard format defined by its UI standard protocol and includes session history. Since Callable is designed to be stateless and serverless, the copilot server will be responsible to store the history.
The Copilot client sends an API request to a specific Callable
The Callable is triggered and performs its predefined programming.
The Callable interacts with Knowledge to retrieve relevant information
The Callable sends requests to the language model services
The Callable returns a response in a predefined format to Copilot, and Copilot displays the messages to users.
Last updated