Record

Thursday, January 30, 2025

Multimodal

It refers to the capability of a model to accept more than 1 types of input, for example text and picture.

It refers to the capabilities exhibited by a model which is not explicitely programmed into. It is like a bonus emerging from the final model. For example, abilities to summarize, or doing arithmetics are such capabilities.

Prompt Engineering

A discipline to craft input given to LLM that will increase its accuracy and coherence. It is to modify the output by manipulate the input using specific words or format.

Chain of thought prompting ask the model to generate intermediate steps before the final output.

Few-shot and zero-dot prompting is used to trigger chain of thought prompting.

Self-consistent promoting involved asking the question multiple time

Knowledge prompting is to provide additional info to the model with the prompt

Wednesday, January 29, 2025

Fine tuning LLM

Fine tuning is to train a pre-train model on specific task so sharpen the response on such niche area. This can be done by training part of the model (kept rest unchanged).

Training LLM

The model is tweaked to maximise the likelihood of the training data (maximum likelihood estimation).

Autoregression is to predict the next word in sequence. Masked training is to mask out part of sentence and let the model to fill in the blanks.

NSP next sentence prediction let the model to predicts whether a pair of sentence appear consecutively in the training corpus. This let model to fine tune narrative flow and coherence.

The training cost is high but the inference cost is less about 4.5 to 1

Tokenization

Tokenizarion is to dissect the text into smaller units. The unit must not always be word. It can be a character, a phrase or symbols. Tokenization enables us to grapple the complication of language (vocabulary, format, grammar etc). This allows significant reduction in compute and memory resources required for the model.

Different model build may use different Tokenization methods - rule based, statistical or neural. The number of token affect the computation required by the model to process. Therefore, the charge by model is based on number of token input.

Attention

Attention mechanism is like placing bookmarks on a lengthy text. The bookmark enables one to keep track of important or significant portion of the text which can be used to produce output later in time. By focusing on these parts of inputs, the output become more accurate and contextual.

Self attention is a variant that capture relationships between different part of input sequence, regardless of their distances in the text.