Record: December 2024

Tuesday, December 31, 2024

CNN

traditional neural network classify an image based on optimal weighs and bias. It does not learn the semantics of the image. CNN was invented by Yann LeCun

Convuluted neural network uses kernels to highlight (pick up, understand) the structure of an image like vertical edges, horizontal edges, orientation, texture and colour.

Traditional neural network learns as weak even if image is scrambled (transposing pixels in the same way for every input picture). CNN could not as this will destroy the structural information of the picture.

CNN “convert” the original input into a set of structural information representational input to a traditional neural network to do the final classification. Similar class would generate a similar representation by the convolution layers before input to the dense layer ( ie the traditional neural network).

CNN can give the ability of the model to see and used a lot in computer vision.

Convolution

Convolution is a mathematical operation which apply a kernel to transform input. Let the input be a matrix (representing an image for example). A kernel is a small matrix that act like a filter that transform a part of the image via multipl Caution and addition to derive a final number as output. The kernel is applied from the top left of the matrix? Slide one number to the right each time until it is applied to the entire matrix. The output “enhanced” certain feature in the matrix in the output matrix.

Stochastic Gradient Decend

Running through the whole test data set to determine the next weigh and bias values is ideal but when the data set is massive, it becomes computationally challenging to complete an epoch.

A compromise is to randomly picks a subset of input called minibatch and uses a few of tthem to approximate the descend calculation. The act of randomly picking is called stochastic.

The advantage of this approach is that it reduces the computational demand and at the same time provide a good approximation of the gradient of the loss function which is typically unknown at the beginning. A strong gradient hint can be ignored by early ti avoid falling into local minima

Monday, December 30, 2024

Training neural jetwork

A network need to train multiple time with the same test data but each time used different initial weighs and bias values. Through each training (epoch), the final weigh and bias will generated result that form the confusion matrix. These matrices are used to compare the result to denting the best weigh and bias sets ( local minima vs global minims).

Weigh Initialization

Initially, random values are used to initiakuze the weighs. Weigh cannot be initialized to zeros as it make the learning void. Later, algorithm are established to initials the weigh and bias values based on the node function, the number of Inuit and the number of outputs, which give better training result.

Back propagagtion

Neural network calculate the inputs and propagate the result from one layer to another left to right.

A loss function calculate the errors. The loss function represent a n-dimensional plane f which n equal to the number do features.

Back propagation is to calculate the effect of each weight contributed to the loss based on Differential calculus. The outcome is used to adjust the weigh (add or subtract from the weigh) the calculation is from output to input, right to left, and thus called back propagation.

Neural network

Each node in the newok has one or more inputs and an output. Each input has an associated weight and the node has a bias value. The value of each input multiplies its weight. Product value if inputs are summed up and the bias is added to the result. The result is input to a function represented by the node to generate the output value. The function maps the sum to an output value.

Training the network is to find the set of weights and boss’s that generate the least error. This is done by using the gradient descend algorithm.

Saturday, December 28, 2024

SVM

Support vector machine use mathematic model to segregate the dimension into 2 classes. The data points of a class is congregated on 1 side of the model (line, plane, multi dimensional plane). Tye distance between the separation plane and the closest vectors form the margin. The vectors on the margin are called support vectors. SVM is the methodology to find the optimal separation plane.

The advantage of SVM is it requires less computing power than neural network.

Friday, December 27, 2024

Forest and tree

Decision tree encoded a series of if/then/else question to reach a leaf node as output. Cloning a decision tree to a forest of trees can improve the accuracy of the model. Cloning a tree by using a different subset of training data (which allow duplicate use of same data in the sampling). Another variation can use a subset of features. Using these inputs to train a tree, they introduce randomness into the tree formation. Random trees are like wisdom by group.

Expert system

Expert systems are hard to build and is brittle (not easy to respond to changes).

K nearest neighours

This model does not contains a function. The data is the model itself. It is very simple to train this model. But the run time performance is not good.

Extrapolation

AI model is good at interpolation but bad for extrapolation. Interpolation means the object to classify is included in the training data. Extrapolation means when the object was not seen in the sample and thus the model attempt to estimate a match. For example a model to classify for and car could not classify a wolf.

Therefore the completeness of training data for a specific task is important for a model to work well.

Wednesday, December 4, 2024

BCS

BCS is a KSDS file and the dataset name is the key. System managed datasets and VSAM datasets are recorded in VVDS. the non-VSAM and non-system managed datasets are recorded in BCS.

BCS contains information of dataset like where does it resides (DASD, tape, other medium).

MVS Catalog

Catalog is a VSAM dataset which contain information of other dataset in the system. There is one master catalog which is required for IPL. The MCAT is provided as a parameter in the IPL. MCAT contains info of all the SYS1 dataset and it point to other user catalogs.

There are multiple user catalogs. Catalogs points to VVDS (VSAM Volume). There is one VVDS per volume. VVDS contains dataset info for VSAM dataset and non-VSAM datasets under system management. VVDS is like the old VTOC. One VVDS can be pointed by more than 1 BCS (base catalog structure = catalog).

Monday, December 2, 2024

Language Environment

LE provide a common run time for various languages on mainframe. LE basic routines support starting/stopping program, allocation of storage and communication between programs written in different languages and handling errors

LE common library services include math routines, date/time service etc.

Language may called these services via its specific call which underneath mapped to LE call.

LE provides a common dump format for all languages.

POSIX

POSIX defines a set of API applicable not only to UNIX but to other operating systems as well. It is a set of standard describing a list of operating system components from C language , shell interfaces for system admin etc. POSIX originated from IEEE, later sponsored by ISO and incorporated in X/OPEN Portability Guide (XPG)

Unix support on mainframe

MVS/ESA 4.3 open edition support POSSIX standard (1003.1/1a/1c and 1003.2). Additional function was added to MVS 5.2.2 to meet the XPG4 and over 90% of XPG4.2. More function added later and OpenEdition became an official UNIX system (from application perspective, there is no difference).

XPG4 branded means zOS supports a common set of UNIX API. Branded means zOS is registered to /Open and IBM is entitled to use the X/Open trademark. XPG4.2invlift all commands and utilities and C services

Record