General Concept

  • AI: Artificial Intelligence
  • Machine Learning: Machine find a pattern and predict
  • Deep Learnining: using netural network, input layers + multi hidden layers + output layer, like what our brain think
  • Generative AI: using some foundation models, like LLM, audio model, video models..
  • LLM: Large language Model
  • Scaling Laws: Compute, Dataset Size, Parameters. Test Cross-Entropy Loss
  • GPU: compute parallel, float compute

LLM

Large language models, also known as LLMs, are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

How are LLM trained

Transformer-based neural networks are very large. These networks contain multiple nodes and layers. Each node in a layer has connections to all nodes in the subsequent layer, each of which has a weight and a bias. Weights and biases along with embeddings are known as model parameters. Large transformer-based neural networks can have billions and billions of parameters. The size of the model is generally determined by an empirical relationship between the model size, the number of parameters, and the size of the training data.

Training is performed using a large corpus of high-quality data. During training, the model iteratively adjusts parameter values until the model correctly predicts the next token from an the previous squence of input tokens. It does this through self-learning techniques which teach the model to adjust parameters to maximize the likelihood of the next tokens in the training examples.

Benefits

  • Versatility of applications
  • Human productivity boost
  • Natural language understanding
  • Interactive at scale
  • Task-specific without fine-tuning

reference