Overview & Metrics
Timeline of Recent LLM Releases
QualityModel PerformanceMetric
Quality
Model Performance
Metric
CostModel PerformanceMetric
Cost
Model Performance
Metric
Know the Industry
LLM Model Overview
The following is not an exhaustive list of language models. It is a curated selection of some of the most popular and widely-used models today as well as those of particular interest to the language service industry. Each model has its own unique features and capabilities, making them suitable for different tasks and applications.
Llama 3Open SourceOrganizationMetaVariants8B70B405B
Llama 3
Open Source
Organization
Meta
Variants
8B
70B
405B
Llama 3 is Meta's latest open-source language model, valued for its efficiency and adaptability. It performs well on limited hardware and can be fine-tuned for various tasks. Its open-source nature allows customization but poses ethical risks of misuse.
BloomOpen SourceOrganizationBigScience InitiativeVariants100B
Bloom
Open Source
Organization
BigScience Initiative
Variants
100B
BLOOM is an open-source language model developed by the BigScience project, designed for versatility and large-scale tasks. It excels in multilingual text generation and is highly adaptable across languages. It is known for its wide language support including Indic and Niger Congo languages.
SEA-LIONOpen SourceOrganizationAI SingaporeVariants3B7B
SEA-LION
Open Source
Organization
AI Singapore
Variants
3B
7B
SEA-LION (Southeast Asian Languages In One Network) was developed to support major Southeast Asian languages, including Thai, Vietnamese, and Indonesian. It's focused on regional applications like translation and customer service. While it excels in handling underrepresented languages, its specialization limits its performance outside of Southeast Asian contexts
GeminiOrganizationGoogleVariantsFlashProUltra
Gemini
Organization
Google
Variants
Flash
Pro
Ultra
Gemini, Google's latest AI model, stands out for its advanced multimodal capabilities, allowing it to process and generate text, images, audio, and video simultaneously. While it's largest variant, Gemini Ultra, has been a leader in a wide range of LLM benchmarks it is not currently available to the public (smaller variants like 'Gemini Pro' are publicly available)
PaLM 2OrganizationGoogleVariantsGeckoOtterBisonUnicorn
PaLM 2
Organization
Google
Variants
Gecko
Otter
Bison
Unicorn
PaLM 2 is Google's previous generation of language model before Gemini, known for its strong multilingual abilities and proficiency in reasoning and coding. It excels at handling over 100 languages and complex tasks, making it versatile for global applications.
GPTOrganizationOpenAIVariantsGPT-3GPT-4GPT-4o miniGPT-4o
GPT
Organization
OpenAI
Variants
GPT-3
GPT-4
GPT-4o mini
GPT-4o
GPT-4 is a highly advanced language model developed by OpenAI, known for its impressive ability to generate coherent and contextually relevant text. It excels in understanding and producing human-like responses across a wide range of topics and can handle complex tasks such as creative writing, problem-solving, and language translation.
ClaudeOrganizationanthropicVariantsHaiku 3.0Sonnet 3.0Sonnet 3.5Opus 3.0
Claude
Organization
anthropic
Variants
Haiku 3.0
Sonnet 3.0
Sonnet 3.5
Opus 3.0
Claude LLM, developed by Anthropic, is a sophisticated language model designed to prioritize safety and reliability in its responses. It excels in generating clear, contextually accurate text and is built with features aimed at minimizing harmful or biased outputs. Its strengths include a robust understanding of nuanced language and a strong emphasis on ethical AI use. However, Claude LLM can sometimes be overly cautious, which might limit its ability to provide bold or creative solutions. Additionally, while it aims to reduce bias, it may still reflect some inherent limitations based on its training data.
MistralOrganizationMistralVariantsNeMoLarge 2
Mistral
Organization
Mistral
Variants
NeMo
Large 2
Mistral's models are based on a transformer architecture, a type of neural network that generates text by predicting the next-most-likely word or phrase. But a couple of them (Mixtral 8x7B and 8x22B) take it a step further and use a mixture of experts architecture, meaning it uses multiple smaller models (called “experts”) that are only active at certain times, thus improving performance and reducing computational costs.
Speak the Language
Key Terms
Multimodal
Multimodal
Multimodal models are capable of processing and generating text, images, audio, and video simultaneously. They can understand and generate content across different media types, allowing for more comprehensive and contextually rich responses.
Fine-tuning
Fine-tuning
Fine-tuning is the process of adapting a pre-trained language model to a specific task or dataset. By updating the model’s parameters on a smaller, task-specific dataset, fine-tuning can improve the model's performance on specific tasks without requiring extensive training from scratch.
Bias
Bias
Bias in AI refers to the systematic errors or inaccuracies in machine learning models that result in unfair or discriminatory outcomes. Bias can arise from the data used to train the model, the design of the model itself, or the context in which the model is deployed. Addressing bias in AI is a critical aspect of developing ethical and equitable AI systems.
Commonsense Reasoning
Commonsense Reasoning
Commonsense reasoning is the ability to understand and make inferences about everyday situations, facts, and concepts that are not explicitly stated. It involves using background knowledge, intuition, and general understanding of the world to interpret and respond to new information. Commonsense reasoning is a fundamental aspect of human intelligence and a key challenge in AI research.
Transfer Learning
Transfer Learning
Transfer learning is a machine learning technique that involves training a model on one task or dataset and then applying that knowledge to a different but related task or dataset. By leveraging knowledge learned from one domain to improve performance in another domain, transfer learning can help models generalize better and require less data for training.
Zero-shot Learning
Zero-shot Learning
Zero-shot learning is a machine learning paradigm in which a model is trained to perform a task without any labeled examples of that task. Instead, the model learns to generalize from related tasks or domains and can make predictions on new tasks it has never seen before. Zero-shot learning is a form of transfer learning that enables models to adapt to new tasks with minimal supervision.
Prompt Engineering
Prompt Engineering
Prompt engineering is the process of designing and refining prompts or instructions that guide language models
RAG
RAG
Retrieval-Augmented Generation (RAG) is a model architecture that combines the strengths of retrieval-based and generation-based approaches to natural language processing. RAG models use a retriever to search for relevant information and a generator to produce responses, enabling more accurate and contextually relevant text generation.
Mixture of Experts
Mixture of Experts
Mixture of Experts is a machine learning architecture that combines multiple smaller models, or "experts," to improve performance on complex tasks. Each expert is specialized in a specific area and is activated based on the input data, allowing the model to adapt to different contexts and make more accurate predictions.
Neural Network
Neural Network
A neural network is a computational model inspired by the structure and function of the human brain. It consists of interconnected nodes, or "neurons," that process and transmit information through weighted connections. Neural networks are used in machine learning to learn patterns and relationships in data and make predictions or decisions.
Transformer
Transformer
The transformer is a deep learning architecture that has revolutionized natural language processing. It uses self-attention mechanisms to capture long-range dependencies in text data and has become the basis for many state-of-the-art language models. Transformers are known for their scalability, efficiency, and ability to model complex relationships in data.
Prompt
Prompt
A prompt is a set of instructions or input provided to a language model to guide its output. Prompts can take various forms, such as questions, statements, or keywords, and are used to elicit specific responses from the model. Effective prompt design is essential for controlling the behavior and output of language models.
Language Model
Language Model
A language model is a statistical model that predicts the likelihood of a sequence of words or characters in a given context. Language models are used in natural language processing tasks such as text generation, machine translation, and speech recognition. They learn patterns and relationships in language data to generate coherent and contextually relevant text.
Natural Language Processing
Natural Language Processing
Natural language processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and human language. NLP techniques enable machines to understand, interpret, and generate human language, allowing for tasks such as text analysis, language translation, and sentiment analysis.
Next-Token Prediction
Next-Token Prediction
Next-token prediction is a task in natural language processing that involves predicting the next word or token in a sequence of text. Language models use next-token prediction to generate coherent and contextually relevant text by estimating the most likely word to follow a given input.
Symbolic Knowledge Distillation
Symbolic Knowledge Distillation
Symbolic knowledge distillation is a technique that involves transferring knowledge from a large, complex language model to a smaller, more efficient model. By distilling the essential information and patterns learned by the large model, symbolic knowledge distillation can improve the performance and efficiency of smaller models.
Vector Embedding
Vector Embedding
Vector embedding is a technique used in natural language processing to represent words or phrases as dense, low-dimensional vectors. These vectors capture semantic relationships between words and enable algorithms to process and analyze text data more effectively. Vector embeddings are used in tasks such as word similarity, sentiment analysis, and document classification.
Knowledge Graph
Knowledge Graph
A knowledge graph is a structured representation of knowledge that captures relationships between entities and concepts in a domain. Knowledge graphs are used in natural language processing and artificial intelligence to store and retrieve information, answer complex queries, and facilitate reasoning and inference.
Vector Database
Vector Database
A vector database is a type of database that stores and indexes vector embeddings of data points. Vector databases are used in machine learning and natural language processing to efficiently search and retrieve similar items based on their vector representations. They enable fast and accurate similarity search and nearest neighbor queries.
Encoder-Decoder Architecture
Encoder-Decoder Architecture
The encoder-decoder architecture is a deep learning model structure commonly used in sequence-to-sequence tasks such as machine translation and text summarization. The encoder processes the input sequence and generates a fixed-length representation, which is then decoded by the decoder to produce the output sequence. Encoder-decoder models are effective for tasks that involve generating variable-length outputs from variable-length inputs.
Start Building
Helpful Tools
Lang ChainOpen Source
Lang Chain
Open Source
LangChain is an open-source framework designed to help developers build applications that integrate with large language models (LLMs) like GPT. It focuses on simplifying the creation of AI-powered applications that leverage natural language processing (NLP) by offering tools for prompt management, memory, chaining LLMs, and integration with various external APIs.
Llama IndexOpen Source
Llama Index
Open Source
LlamaIndex (formerly known as GPT Index) is a data framework designed to make it easier for developers to integrate large language models (LLMs) with their own data, enabling more intelligent and contextually aware applications. It provides an interface to connect LLMs with diverse, large-scale data sources, such as documents, databases, or APIs, by building dynamic knowledge indices from this information.
Llama Parse
Llama Parse
Proprietary parsing for complex documents with embedded objects such as tables and figures. LlamaParse directly integrates with LlamaIndex ingestion and retrieval to let you build retrieval over complex, semi-structured documents.
LM Studio

LM Studio
Desktop application that enables users to run and fine-tune large language models (LLMs) on their local machines.
OllamaOpen Source

Ollama
Open Source
Ollama is a platform that enables running and interacting with large language models (LLMs) on local machines. It is designed to provide a more private, efficient, and cost-effective way for developers and users to work with AI models without relying on cloud-based infrastructure. By allowing LLMs to run locally, Ollama eliminates the need for internet connectivity, reducing concerns about data privacy and latency, while also avoiding cloud-based fees.
Pinecone DB

Pinecone DB
Pinecone is a fully managed vector database service designed to enable fast, scalable, and efficient storage and retrieval of vector embeddings, which are numerical representations of data. It's specifically built to handle large-scale machine learning and AI applications, where vector similarity search is crucial, such as in recommendation systems, semantic search, and natural language processing tasks.
Chroma DBOpen Source

Chroma DB
Open Source
Chroma is a database designed for storing and querying large-scale language models (LLMs) and other AI models. It provides a scalable and efficient solution for managing the data and parameters of LLMs, enabling developers to build and deploy AI applications with ease. Chroma supports a wide range of LLMs and offers tools for model versioning, data management, and performance monitoring.
Get Help
Other Resources
Yejin Choi is a professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Her research focuses on natural language processing, machine learning, and artificial intelligence. She is particularly interested in developing algorithms that can understand and generate human language, with a focus on commonsense reasoning and natural language understanding.
Textbooks Are All You NeedSuriya Gunasekar, Yi Zhang, Jyoti Aneja, et al.Paper
Textbooks Are All You Need
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, et al.
Paper
This paper introduces a new approach to training language models using textbooks as the sole source of data. Shows the importance of high quality data. The authors demonstrate that this method can achieve state-of-the-art performance on a variety of language tasks, including question answering, summarization, and translation.
Segment AnythingAlexander Kirillov, Eric Mintun, et al.Paper
Segment Anything
Alexander Kirillov, Eric Mintun, et al.
Paper
Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object, in any image, with a single click SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training.
The Shift from Models to Compound AI SystemsBerkeley AI Research (BAIR)Blog
The Shift from Models to Compound AI Systems
Berkeley AI Research (BAIR)
Blog
This article discusses the shift from single models to compound AI systems, which combine multiple models to solve complex tasks. It explores the benefits and challenges of building compound AI systems and highlights the potential for improved performance and robustness in AI applications.
Fei-Fei Li is a renowned computer scientist, educator, and entrepreneur, best known for her pioneering work in the field of artificial intelligence (AI) and computer vision. She has made significant contributions to the development of AI technologies and is particularly recognized for her leadership in large-scale visual recognition and deep learning.
Efficiently Adapting Pretrained Language Models to New LanguagesZoltan Csaki, Pian Pawakapan, et al.Paper
Efficiently Adapting Pretrained Language Models to New Languages
Zoltan Csaki, Pian Pawakapan, et al.
Paper
This paper introduces a new method for adapting pretrained language models to new languages with limited data. The authors propose a novel training approach that leverages multilingual data and transfer learning techniques to improve the performance of language models on low-resource languages. The method achieves state-of-the-art results on several language-specific tasks and demonstrates the effectiveness of cross-lingual transfer learning for language model adaptation.
Chapter 11 Large Language ModelsDeep Learning and its Applications Lecture NotesWebsite
Chapter 11 Large Language Models
Deep Learning and its Applications Lecture Notes
Website
This module is an introduction course to Machine Learning (ML), with a focus on Deep Learning. The course is offered by the Electronic & Electrical Engineering department to the fourth and fith year students of Trinity College Dublin.
Generative AI for BeginnersMicrosoftCourse
Generative AI for Beginners
Microsoft
Course
Learn the fundamentals of building Generative AI applications with our 18-lesson comprehensive course by Microsoft Cloud Advocates.