General Concept

AI: Artificial Intelligence
Machine Learning: Machine find a pattern and predict
Deep Learnining: using netural network, input layers + multi hidden layers + output layer, like what our brain think
Generative AI: using some foundation models, like LLM, audio model, video models..
LLM: Large language Model
Scaling Laws: Compute, Dataset Size, Parameters. Test Cross-Entropy Loss
GPU: compute parallel, float compute

LLM

Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

How are LLM trained

Transformer-based neural networks are very large. These networks contain multiple nodes and layers. Each node in a layer has connections to all nodes in the subsequent layer, each of which has a weight and a bias. Weights and biases along with embeddings are known as model parameters. Large transformer-based neural networks can have billions and billions of parameters. The size of the model is generally determined by an empirical relationship between the model size, the number of parameters, and the size of the training data.

Training is performed using a large corpus of high-quality data. During training, the model iteratively adjusts parameter values until the model correctly predicts the next token from an the previous squence of input tokens. It does this through self-learning techniques which teach the model to adjust parameters to maximize the likelihood of the next tokens in the training examples.

Benefits

Versatility of applications
Human productivity boost
Natural language understanding
Interactive at scale
Task-specific without fine-tuning

reference

Large Language models(LLM)

General Concept

LLM

How are LLM trained

Benefits

Recent Posts

Tags