instarr.in
Log In

Red Pajama 2: The Public Dataset With a Whopping 30 Trillion Tokens

$ 5.50

4.6 (664) In stock

Together, the developer, claims it is the largest public dataset specifically for language model pre-training

Integrated AI: The sky is comforting (2023 AI retrospective) – Dr Alan D. Thompson – Life Architect

togethercomputer/RedPajama-Data-1T · Datasets at Hugging Face

ChatGPT / Generative AI recent news, page 3 of 19

Top 10 List of Large Language Models in Open-Source

Data management recent news

cerebras/SlimPajama-627B · Datasets at Hugging Face

RedPajama Reproducing LLaMA🦙 Dataset on 1.2 Trillion Tokens, by Angelina Yang

Integrated AI: The sky is comforting (2023 AI retrospective) – Dr Alan D. Thompson – Life Architect

ChatGPT / Generative AI recent news, page 3 of 19

RedPajama-Data-v2: An open dataset with 30 trillion tokens for training large language models

RLHF: Reinforcement Learning from Human Feedback

Data management recent news

2311.17035] Scalable Extraction of Training Data from (Production) Language Models

What's in the RedPajama-Data-1T LLM training set

Releasing 3B and 7B RedPajama-INCITE family of models including base, instruction-tuned & chat models

Related products

Women'S 3-Piece Classic Silk Pajamas Set - Red, NOT JUST PAJAMA

The Heartwarming Inspiration Behind “Llama Llama Red Pajama” by Anna Dewdney, by Krista Azzeh

Mens Red Silk Pajamas, Premium Loungewear

Llama Llama Red Pajama Day Snacks - Fun-A-Day!

Lulu's Fancy Red Women Satin Pajama Set, Shorts and Top Set