Glossary

The following terms are a mix of terms frequently used when working with generative AI, similarity-based search, and cloud-based services.

Agent

An agent is a series of steps that are performed for each run that typically prepare or transform input, retrieve information from a vector database, and call an LLM to generate a response.

Agents can be triggered by dynamic input, scheduled to run periodically, or when content is added or updated in a content database.

Content Source

A content source represents a source of data that is indexed in a vector database as embeddings optimized for RAG use cases.

Sources like RSS feeds can be set up to automatically poll for new content, and set to expire content after a certain amount of time to keep storage costs in check for content that does not need to be available for indefinite access.

Credit

Providers of embedding models and LLMs typically charge for usage by the amount of tokens processed. We offer a wide variety of models from many providers, and converting each models usage to tokens provides a unified way to understand usage and cost impact. We could have skipped tokens and translated directly to dollars, but the smallest amounts would be tiny fractions of a dollar-- for example $0.00003214. Our credits are defined so most usage rates fall within 0.1 to 1,000 credits per unit.

Embedding

An embedding model translates the given input, such as a piece of text into a vector in a high- dimensional space, such that similar inputs end up with similar coordinates in that space. The more similar, the closer the distance. Given a collection of text pieces, we can use this property to find (retrieve) text pieces that are more similar to a given input using a vector database by sorting by distance between the embeddings.

Knowledge Base

A knowledge base is composed of one or more content sources, and can be searched for similarity. Agents can also be triggered when content is added or updated in a knowledge base.

Adding a content source to a knowledge base is instantaneous regardless of the amount of content in the content source. The same content source can be associated with multiple knowledge bases at the same time, allowing for different knowledge bases for different purposes with overlapping content without needing to duplicate import of the overlapping content.

LLM (Large Language Model)

Large Language Models are AI models that have been trained on a wide variety of information such that given an input, they will generate output that correspondingly answers questions, summarizes information, or creates new content. In addition to text input and output, some models are also capable of receiving and producing content in image, audio or video format.

MCP (Model Context Protocol)

The Model Context Protocol is an open standard and open-source framework introduced by Anthropic in November 2024 to standardize the way artificial intelligence systems like large language models integrate and share data with external tools, systems, and data sources.

Organization

Organizations allow agents, content sources and knowledge bases to be shared, managed, and used by a set of team members who can be added and removed from the organization by members with administrative rights.

Prompt Call

A prompt call send input to an LLM model and returns the response for further processing or display.

RAG (Retrieval Augmented Generation)

Retrieval augmented generation is a technique that makes it possible to supplement the information any LLM was trained on with additional context, such as information that did not exist or was not available when the model was trained--for example current news, or proprietary data. Adding additional context to a prompt call is an effective way to prevent the model from generating a response with outdated information or hallucinations.

The steps involved are:

Add information as vector embeddings in a database.
Retrieve content similar to a given input from the database.
Optionally rerank the retrieved content.
Pass the retrieved information to an LLM to combine the information into a coherent response.

Reranking

Reranking improves the relevance and quality of retrieval results by re-evaluating and reordering the initial result set based on their relevance to the search input.

RSS (Really Simple Syndication)

RSS is a data format that makes it easy to publish information about frequently updated content, such as news sites, blogs, and podcasts.

Temperature

The temperature controls the randomness an LLM's response. Given the same input, the temperature 0.0 is expected to generate the most predictable output, whereas 1.0 is expected to generate the most varied output, and e.g. 0.3 being closer to 0.0 than 1.0 in randomness.

Token

Large language models convert input into tokens through a process called tokenization. Some LLMs share the same tokenization algorithm, but they tend to differ from provider to provider, and even between different generations of models from the same provider too. Tokenizers typically break longer words into tokens for subwords.

Tool Call

Increasingly more LLMs support a list of optional tools that the model may call while generating a response, such as searching the web, editing a file, or even running code.

Consider a prompt that asks what the latest Premier League results are. Without tool calls, the LLM will either say it doesn't know, or hallucinate an answer. If we pass in an option that enables the web search tool, the LLM will likely try to search for the answer and generate a summary based on the search result instead.

Vector

In math, a vector is a geometric object that points from point A to point B. In the context of RAG, point A is always implied to be the origin (0, ..., 0), while a given array of numbers always represent point B.

A vector database is a data storage that is very efficient at storing many millions of vector records, yet quickly return the vector records with the closest distance to a given vector.