Record: January 2025

Thursday, January 30, 2025

Multimodal

It refers to the capability of a model to accept more than 1 types of input, for example text and picture.

It refers to the capabilities exhibited by a model which is not explicitely programmed into. It is like a bonus emerging from the final model. For example, abilities to summarize, or doing arithmetics are such capabilities.

Prompt Engineering

A discipline to craft input given to LLM that will increase its accuracy and coherence. It is to modify the output by manipulate the input using specific words or format.

Chain of thought prompting ask the model to generate intermediate steps before the final output.

Few-shot and zero-dot prompting is used to trigger chain of thought prompting.

Self-consistent promoting involved asking the question multiple time

Knowledge prompting is to provide additional info to the model with the prompt

Wednesday, January 29, 2025

Fine tuning LLM

Fine tuning is to train a pre-train model on specific task so sharpen the response on such niche area. This can be done by training part of the model (kept rest unchanged).

Training LLM

The model is tweaked to maximise the likelihood of the training data (maximum likelihood estimation).

Autoregression is to predict the next word in sequence. Masked training is to mask out part of sentence and let the model to fill in the blanks.

NSP next sentence prediction let the model to predicts whether a pair of sentence appear consecutively in the training corpus. This let model to fine tune narrative flow and coherence.

The training cost is high but the inference cost is less about 4.5 to 1

Tokenization

Tokenizarion is to dissect the text into smaller units. The unit must not always be word. It can be a character, a phrase or symbols. Tokenization enables us to grapple the complication of language (vocabulary, format, grammar etc). This allows significant reduction in compute and memory resources required for the model.

Different model build may use different Tokenization methods - rule based, statistical or neural. The number of token affect the computation required by the model to process. Therefore, the charge by model is based on number of token input.

Attention

Attention mechanism is like placing bookmarks on a lengthy text. The bookmark enables one to keep track of important or significant portion of the text which can be used to produce output later in time. By focusing on these parts of inputs, the output become more accurate and contextual.

Self attention is a variant that capture relationships between different part of input sequence, regardless of their distances in the text.

Saturday, January 25, 2025

68000 implemented instruction

Motorola 68000 CPU sees an instruction with $A oxide as an unimplemented instructio which is similar to a software interrupt mechanism. It akins to a supervisor call instruction in other CPU. The data portion of the instruction contain the routine number to pass control to. A flag indicate if it is for the OS or ROM. The routine number is an index to dispatch table that contains. The address of the routine.

Sunday, January 19, 2025

Serial protocol

UART can be used to connect 2 nodes using 2 wires - control and data. The control like start and stop transmission between the nodes.

I2C is from Philips. It is a master slave set up on a bus. The node got control of the bus becomes the master. It sent out the address of the slave it wants to communicate with, the node (read or write) and follow by data transmitting in the wire. The address are configured in each device connoted to the bus either built in or via jumpers etc

SPI came from Motorola. It is also a master slave set up but the master is usually a more intelligent controller and slaves are less intelligent peripheral devices. Master select the space via a CS (chip select) wire that connect to a specific slave. Therefore the master controller may have more than 1 CS wire to multiple slave. Master is also responsible to generate the clock signal to support the transmission (sampling the wire for data). There is also a daisy chain confirmation possible to allow the master to communicate to multiple slaves via one CS wire but the implementation is complex, data need to pass through all slave sequentially during transmission and not all device support this configuration. There are 2 data wires. One to transmit era from master to slave and one for the reverse.

Saturday, January 18, 2025

Generative AI

Gen AU create new data that is similar to that inputs. Output from gen AI can also transform the I’m out into different domain (eg changing an image for night to day) or changing its style (from picture to cartoon).

Gen AI understands the distribution of features for inputs and sample the distribution ( based on probabilities) to create new data.

Discriminative AI

This type of AI algorithm that classify or distinguish class of target objects objects. It is applied in areas include NLP (sentiment classification, topics classification), re commendation (predicts user preference) and computer vision.

Discriminative AI tried to identify the “line” that separate different classes in the feature space.

Saturday, January 11, 2025

Large Language Midel

LLM is trained with lots of text and is used to predict the next word or sentence based on input (prompt) and the previous interaction in the session. The older RNN loop back the output to generate newer output but it has limited “memory” than LLM.

Given input, LLM compute a probability distribution of the next word spanning the entire language corpora. This approach is used on sentence level as well to output a coherent and contextually amenable output.

Wednesday, January 8, 2025

GAN types

GAN generate image randomly which is unpredictable. By using an additional condition vector to indicate the class of image we like to the graining, we can get GAN to generate specific class of image we wanted. This is called conditional GAN.

A controllable GAN is a GAN that we can use the random vector to influence part of the image. For example, change the image hair color, gender etc through altering the value of the input vector. This is equivalent of mapping the vector space to the image space. The input vector may need much more values (dimension) to make the out put more controllable. It is because in a low dimensional vector, one value change may affect multiple feature of the image. This is called entanglement. A higher dimensional vector is more likely to minimize the correlation of 1 value to many features in the image space

Monday, January 6, 2025

Random vector input for GAN

The input vector contains random value samples from a normal distribution. In other words, the value close to the mode will s more likely to be used than those value distant from the mode. The vector, called the noise vector, is equivalent to encoding of real data (image).

Training GAN

(1) generate a batch of images with the generator. The initial image will not noise and easily distinguished by the discriminator.

(2) grab a batch of real images. Marked the combined batch with label 0 or 1 for fake or real images.

(3) feed the batch to the discriminator. Using back propagation to adjust the weighs and bias of the discriminator so that it can learn to recognise the real image

(4) freeze the weigh and bias for the discriminator. Create a batch of fake image from the generator. Label these image as real and use the output from the discriminator to anjust the weigh and bias of the generator through back propagation and gradient descent for the complete model (discriminator toward generator)

(5) repeat the training until the discriminator cannot discern the fake picture from real (ie the out out is about 0.5.

Sunday, January 5, 2025

GAN

Generative adversarial network is used to generate image from an input vector of random value. GAN comprises of 2 neural networks - genarator used to create innate and a discriminator used to rate how real the image generated is.

The image created by generator is fed to the discriminator with real images. The output from the discriminator s used to train the generator so that it eventually generate image that the discriminator could not differentiate form real ones. Training us via back propagation to adjust the weighs and bias

Once the model is grained, the discriminator is discarded. The generator is used to generate artificial images

Wednesday, January 1, 2025

CNN architecture

A CNN consists of multiple layers of convolution and pooling. A convolution layers uses a set (multiple kernels) to learn from its input. The output is fed into the pooling layers which has no weigh and bias like the convolution layer as it has nothing to learn. Pooling later submarines the output of a convolution layer by dividing the input matrix into a set of 2x2 grid and extract the maximum value from each grid cell. This reduce the spatial extent of the input which feed into the next convolution layers

The final convolution layer feed into a conventional neural network called the dense layer to give the final output

Record