Rare Language Services

Interpretation in the Age of AI

LLM Cheatsheet

Presenter

Cody Johnson

Share On

Thanks for joining us today! Here's a quick reference with some of the key areas we covered. We hope this cheat sheet helps you as you evaluate the impact of AI on your business.

Overview & Metrics

Timeline of Recent LLM Releases

Quality
Model Performance
Metric

Cost
Model Performance
Metric

Know the Industry

LLM Model Overview

The following is not an exhaustive list of language models. It is a curated selection of some of the most popular and widely-used models today as well as those of particular interest to the language service industry. Each model has its own unique features and capabilities, making them suitable for different tasks and applications.

Llama 3
Open Source
Organization
Meta
Variants
8B
70B
405B

Llama 3 is Meta's latest open-source language model, valued for its efficiency and adaptability. It performs well on limited hardware and can be fine-tuned for various tasks. Its open-source nature allows customization but poses ethical risks of misuse.

Bloom
Open Source
Organization
BigScience Initiative
Variants
100B

BLOOM is an open-source language model developed by the BigScience project, designed for versatility and large-scale tasks. It excels in multilingual text generation and is highly adaptable across languages. It is known for its wide language support including Indic and Niger Congo languages.

SEA-LION
Open Source
Organization
AI Singapore
Variants
3B
7B

SEA-LION (Southeast Asian Languages In One Network) was developed to support major Southeast Asian languages, including Thai, Vietnamese, and Indonesian. It's focused on regional applications like translation and customer service. While it excels in handling underrepresented languages, its specialization limits its performance outside of Southeast Asian contexts

Gemini
Organization
Google
Variants
Flash
Pro
Ultra

Gemini, Google's latest AI model, stands out for its advanced multimodal capabilities, allowing it to process and generate text, images, audio, and video simultaneously. While it's largest variant, Gemini Ultra, has been a leader in a wide range of LLM benchmarks it is not currently available to the public (smaller variants like 'Gemini Pro' are publicly available)

PaLM 2
Organization
Google
Variants
Gecko
Otter
Bison
Unicorn

PaLM 2 is Google's previous generation of language model before Gemini, known for its strong multilingual abilities and proficiency in reasoning and coding. It excels at handling over 100 languages and complex tasks, making it versatile for global applications.

GPT
Organization
OpenAI
Variants
GPT-3
GPT-4
GPT-4o mini
GPT-4o

GPT-4 is a highly advanced language model developed by OpenAI, known for its impressive ability to generate coherent and contextually relevant text. It excels in understanding and producing human-like responses across a wide range of topics and can handle complex tasks such as creative writing, problem-solving, and language translation.

Claude
Organization
anthropic
Variants
Haiku 3.0
Sonnet 3.0
Sonnet 3.5
Opus 3.0

Claude LLM, developed by Anthropic, is a sophisticated language model designed to prioritize safety and reliability in its responses. It excels in generating clear, contextually accurate text and is built with features aimed at minimizing harmful or biased outputs. Its strengths include a robust understanding of nuanced language and a strong emphasis on ethical AI use. However, Claude LLM can sometimes be overly cautious, which might limit its ability to provide bold or creative solutions. Additionally, while it aims to reduce bias, it may still reflect some inherent limitations based on its training data.

Mistral
Organization
Mistral
Variants
NeMo
Large 2

Mistral's models are based on a transformer architecture, a type of neural network that generates text by predicting the next-most-likely word or phrase. But a couple of them (Mixtral 8x7B and 8x22B) take it a step further and use a mixture of experts architecture, meaning it uses multiple smaller models (called “experts”) that are only active at certain times, thus improving performance and reducing computational costs.

Speak the Language

Key Terms

Multimodal

Multimodal models are capable of processing and generating text, images, audio, and video simultaneously. They can understand and generate content across different media types, allowing for more comprehensive and contextually rich responses.

Fine-tuning

Fine-tuning is the process of adapting a pre-trained language model to a specific task or dataset. By updating the model’s parameters on a smaller, task-specific dataset, fine-tuning can improve the model's performance on specific tasks without requiring extensive training from scratch.

Bias

Bias in AI refers to the systematic errors or inaccuracies in machine learning models that result in unfair or discriminatory outcomes. Bias can arise from the data used to train the model, the design of the model itself, or the context in which the model is deployed. Addressing bias in AI is a critical aspect of developing ethical and equitable AI systems.

Commonsense Reasoning

Commonsense reasoning is the ability to understand and make inferences about everyday situations, facts, and concepts that are not explicitly stated. It involves using background knowledge, intuition, and general understanding of the world to interpret and respond to new information. Commonsense reasoning is a fundamental aspect of human intelligence and a key challenge in AI research.

Transfer Learning

Transfer learning is a machine learning technique that involves training a model on one task or dataset and then applying that knowledge to a different but related task or dataset. By leveraging knowledge learned from one domain to improve performance in another domain, transfer learning can help models generalize better and require less data for training.

Zero-shot Learning

Zero-shot learning is a machine learning paradigm in which a model is trained to perform a task without any labeled examples of that task. Instead, the model learns to generalize from related tasks or domains and can make predictions on new tasks it has never seen before. Zero-shot learning is a form of transfer learning that enables models to adapt to new tasks with minimal supervision.

Prompt Engineering

Prompt engineering is the process of designing and refining prompts or instructions that guide language models

RAG

Retrieval-Augmented Generation (RAG) is a model architecture that combines the strengths of retrieval-based and generation-based approaches to natural language processing. RAG models use a retriever to search for relevant information and a generator to produce responses, enabling more accurate and contextually relevant text generation.

Mixture of Experts

Mixture of Experts is a machine learning architecture that combines multiple smaller models, or "experts," to improve performance on complex tasks. Each expert is specialized in a specific area and is activated based on the input data, allowing the model to adapt to different contexts and make more accurate predictions.

Neural Network

A neural network is a computational model inspired by the structure and function of the human brain. It consists of interconnected nodes, or "neurons," that process and transmit information through weighted connections. Neural networks are used in machine learning to learn patterns and relationships in data and make predictions or decisions.

Transformer

The transformer is a deep learning architecture that has revolutionized natural language processing. It uses self-attention mechanisms to capture long-range dependencies in text data and has become the basis for many state-of-the-art language models. Transformers are known for their scalability, efficiency, and ability to model complex relationships in data.

Prompt

A prompt is a set of instructions or input provided to a language model to guide its output. Prompts can take various forms, such as questions, statements, or keywords, and are used to elicit specific responses from the model. Effective prompt design is essential for controlling the behavior and output of language models.

Language Model

A language model is a statistical model that predicts the likelihood of a sequence of words or characters in a given context. Language models are used in natural language processing tasks such as text generation, machine translation, and speech recognition. They learn patterns and relationships in language data to generate coherent and contextually relevant text.

Natural Language Processing

Natural language processing (NLP) is a branch of artificial intelligence that focuses on the interaction between computers and human language. NLP techniques enable machines to understand, interpret, and generate human language, allowing for tasks such as text analysis, language translation, and sentiment analysis.

Next-Token Prediction

Next-token prediction is a task in natural language processing that involves predicting the next word or token in a sequence of text. Language models use next-token prediction to generate coherent and contextually relevant text by estimating the most likely word to follow a given input.

Symbolic Knowledge Distillation

Symbolic knowledge distillation is a technique that involves transferring knowledge from a large, complex language model to a smaller, more efficient model. By distilling the essential information and patterns learned by the large model, symbolic knowledge distillation can improve the performance and efficiency of smaller models.

Vector Embedding

Vector embedding is a technique used in natural language processing to represent words or phrases as dense, low-dimensional vectors. These vectors capture semantic relationships between words and enable algorithms to process and analyze text data more effectively. Vector embeddings are used in tasks such as word similarity, sentiment analysis, and document classification.

Knowledge Graph

A knowledge graph is a structured representation of knowledge that captures relationships between entities and concepts in a domain. Knowledge graphs are used in natural language processing and artificial intelligence to store and retrieve information, answer complex queries, and facilitate reasoning and inference.

Vector Database

A vector database is a type of database that stores and indexes vector embeddings of data points. Vector databases are used in machine learning and natural language processing to efficiently search and retrieve similar items based on their vector representations. They enable fast and accurate similarity search and nearest neighbor queries.

Encoder-Decoder Architecture

The encoder-decoder architecture is a deep learning model structure commonly used in sequence-to-sequence tasks such as machine translation and text summarization. The encoder processes the input sequence and generates a fixed-length representation, which is then decoded by the decoder to produce the output sequence. Encoder-decoder models are effective for tasks that involve generating variable-length outputs from variable-length inputs.

Start Building

Helpful Tools

Lang Chain
Open Source

LangChain is an open-source framework designed to help developers build applications that integrate with large language models (LLMs) like GPT. It focuses on simplifying the creation of AI-powered applications that leverage natural language processing (NLP) by offering tools for prompt management, memory, chaining LLMs, and integration with various external APIs.

Llama Index
Open Source

LlamaIndex (formerly known as GPT Index) is a data framework designed to make it easier for developers to integrate large language models (LLMs) with their own data, enabling more intelligent and contextually aware applications. It provides an interface to connect LLMs with diverse, large-scale data sources, such as documents, databases, or APIs, by building dynamic knowledge indices from this information.

Llama Parse

Proprietary parsing for complex documents with embedded objects such as tables and figures. LlamaParse directly integrates with LlamaIndex ingestion and retrieval to let you build retrieval over complex, semi-structured documents.

LM Studio

Desktop application that enables users to run and fine-tune large language models (LLMs) on their local machines.

Ollama
Open Source

Ollama is a platform that enables running and interacting with large language models (LLMs) on local machines. It is designed to provide a more private, efficient, and cost-effective way for developers and users to work with AI models without relying on cloud-based infrastructure. By allowing LLMs to run locally, Ollama eliminates the need for internet connectivity, reducing concerns about data privacy and latency, while also avoiding cloud-based fees.

Pinecone DB

Pinecone is a fully managed vector database service designed to enable fast, scalable, and efficient storage and retrieval of vector embeddings, which are numerical representations of data. It's specifically built to handle large-scale machine learning and AI applications, where vector similarity search is crucial, such as in recommendation systems, semantic search, and natural language processing tasks.

Chroma DB
Open Source

Chroma is a database designed for storing and querying large-scale language models (LLMs) and other AI models. It provides a scalable and efficient solution for managing the data and parameters of LLMs, enabling developers to build and deploy AI applications with ease. Chroma supports a wide range of LLMs and offers tools for model versioning, data management, and performance monitoring.

Get Help

Other Resources

Yejin Choi
Person

Yejin Choi is a professor at the Paul G. Allen School of Computer Science & Engineering at the University of Washington. Her research focuses on natural language processing, machine learning, and artificial intelligence. She is particularly interested in developing algorithms that can understand and generate human language, with a focus on commonsense reasoning and natural language understanding.

Textbooks Are All You Need
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, et al.
Paper

This paper introduces a new approach to training language models using textbooks as the sole source of data. Shows the importance of high quality data. The authors demonstrate that this method can achieve state-of-the-art performance on a variety of language tasks, including question answering, summarization, and translation.

View Paper ->

Segment Anything
Alexander Kirillov, Eric Mintun, et al.
Paper

Segment Anything Model (SAM): a new AI model from Meta AI that can "cut out" any object, in any image, with a single click SAM is a promptable segmentation system with zero-shot generalization to unfamiliar objects and images, without the need for additional training.

View Paper ->

The Shift from Models to Compound AI Systems
Berkeley AI Research (BAIR)
Blog

This article discusses the shift from single models to compound AI systems, which combine multiple models to solve complex tasks. It explores the benefits and challenges of building compound AI systems and highlights the potential for improved performance and robustness in AI applications.

View Blog Post ->

Fei-Fei Li
Person

Fei-Fei Li is a renowned computer scientist, educator, and entrepreneur, best known for her pioneering work in the field of artificial intelligence (AI) and computer vision. She has made significant contributions to the development of AI technologies and is particularly recognized for her leadership in large-scale visual recognition and deep learning.

But what is a neural network?
Video

Go To Video ->

Efficiently Adapting Pretrained Language Models to New Languages
Zoltan Csaki, Pian Pawakapan, et al.
Paper

This paper introduces a new method for adapting pretrained language models to new languages with limited data. The authors propose a novel training approach that leverages multilingual data and transfer learning techniques to improve the performance of language models on low-resource languages. The method achieves state-of-the-art results on several language-specific tasks and demonstrates the effectiveness of cross-lingual transfer learning for language model adaptation.

View Paper ->

Chapter 11 Large Language Models
Deep Learning and its Applications Lecture Notes
Website

This module is an introduction course to Machine Learning (ML), with a focus on Deep Learning. The course is offered by the Electronic & Electrical Engineering department to the fourth and fith year students of Trinity College Dublin.

View Website ->

Generative AI for Beginners
Microsoft
Course

Learn the fundamentals of building Generative AI applications with our 18-lesson comprehensive course by Microsoft Cloud Advocates.

QualityModel PerformanceMetricMMLU

CostModel PerformanceMetricOutput Price

Llama 3Open SourceOrganizationMetaVariants8B70B405B

BloomOpen SourceOrganizationBigScience InitiativeVariants100B

SEA-LIONOpen SourceOrganizationAI SingaporeVariants3B7B

GeminiOrganizationGoogleVariantsFlashProUltra

PaLM 2OrganizationGoogleVariantsGeckoOtterBisonUnicorn

GPTOrganizationOpenAIVariantsGPT-3GPT-4GPT-4o miniGPT-4o

ClaudeOrganizationanthropicVariantsHaiku 3.0Sonnet 3.0Sonnet 3.5Opus 3.0

MistralOrganizationMistralVariantsNeMoLarge 2

Multimodal

Fine-tuning

Bias

Commonsense Reasoning

Transfer Learning

Zero-shot Learning

Prompt Engineering

RAG

Mixture of Experts

Neural Network

Transformer

Prompt

Language Model

Natural Language Processing

Next-Token Prediction

Symbolic Knowledge Distillation

Vector Embedding

Knowledge Graph

Vector Database

Encoder-Decoder Architecture

Lang ChainOpen Source

Llama IndexOpen Source

Llama Parse

LM Studio

OllamaOpen Source

Pinecone DB

Chroma DBOpen Source

Yejin ChoiPerson

Textbooks Are All You NeedSuriya Gunasekar, Yi Zhang, Jyoti Aneja, et al.Paper

Segment AnythingAlexander Kirillov, Eric Mintun, et al.Paper

The Shift from Models to Compound AI SystemsBerkeley AI Research (BAIR)Blog

Fei-Fei LiPerson

But what is a neural network?Video

Efficiently Adapting Pretrained Language Models to New LanguagesZoltan Csaki, Pian Pawakapan, et al.Paper

Chapter 11 Large Language ModelsDeep Learning and its Applications Lecture NotesWebsite

Generative AI for BeginnersMicrosoftCourse

AI Agentic Workflows And Their Potential For Driving AI ProgressVideo

But what is a GPT? Visual intro to transformersVideo

What is ChatGPT doing...and why does it work?Video

Quality
Model Performance
Metric

Cost
Model Performance
Metric

Llama 3
Open Source
Organization
Meta
Variants
8B
70B
405B

Bloom
Open Source
Organization
BigScience Initiative
Variants
100B

SEA-LION
Open Source
Organization
AI Singapore
Variants
3B
7B

Gemini
Organization
Google
Variants
Flash
Pro
Ultra

PaLM 2
Organization
Google
Variants
Gecko
Otter
Bison
Unicorn

GPT
Organization
OpenAI
Variants
GPT-3
GPT-4
GPT-4o mini
GPT-4o

Claude
Organization
anthropic
Variants
Haiku 3.0
Sonnet 3.0
Sonnet 3.5
Opus 3.0

Mistral
Organization
Mistral
Variants
NeMo
Large 2

Lang Chain
Open Source

Llama Index
Open Source

Ollama
Open Source

Chroma DB
Open Source

Yejin Choi
Person

Textbooks Are All You Need
Suriya Gunasekar, Yi Zhang, Jyoti Aneja, et al.
Paper

Segment Anything
Alexander Kirillov, Eric Mintun, et al.
Paper

The Shift from Models to Compound AI Systems
Berkeley AI Research (BAIR)
Blog

Fei-Fei Li
Person

But what is a neural network?
Video

Efficiently Adapting Pretrained Language Models to New Languages
Zoltan Csaki, Pian Pawakapan, et al.
Paper

Chapter 11 Large Language Models
Deep Learning and its Applications Lecture Notes
Website

Generative AI for Beginners
Microsoft
Course

AI Agentic Workflows And Their Potential For Driving AI Progress
Video

But what is a GPT? Visual intro to transformers
Video

What is ChatGPT doing...and why does it work?
Video