Get Complete Project Material File(s) Now! »
Machine Learning
Machine Learning (ML) can be described as the sub-field of Artificial In-telligence (AI) that uses data to teach a machine how to perform a task. The simplest scenario can be using a person’s face image as an input to the algorithm so that the machine/program can learn to recognize the same person in the image[2].
Donald Michie described that it would be impactful if machines could learn from experience to improve efficiency by learning from the program during execution. A simple but effective rote-learning facility can be pro-vided within the framework of a suitable programming language [14]. Now computers can learn from their experience and improve their effi-ciency, it’s called “Machine learning”. It is part of Artificial Intelligence (AI) that gives power to a system to learn and improve from their experi-ence. Its focus on the development of machine which can access data and learn from that data. The process of learning starts from the observation, data, or examples to check patterns in data to make decisions for the fu-ture. The main purpose of machine learning is to allow the system to do this whole process without any human interaction or help. There are two methods for machine learning which are supervised and unsupervised learning. Supervised learning can apply as what has been learned in the past and data is labeled and classified. This learning can apply to new data for using labeled to predict future events. An Unsupervised machine learning method or algorithm is when the data used to train is neither labeled nor classified. The purpose is to find hidden structure from this unlabeled data.
Below figure 2.1 is from Deep Leaning by Goodfellow [15] which shows that representation learning which is part of machine learning and ma-chine learning is a subpart of artificial intelligence. Machine learning consists of three phases which are preparation, execu-tion, and evaluation. The first step preparation begins which gathers data for training and then prepares for a suitable model. The execution phase deals with the training of the system on a given model on training data to predict the outputs and update new features(weights) and intercepts(bi-ases). In the evaluation phase check generated results with evaluation data and if results are not accurate to add more iteration like the previous step to get more accurate results.
Neural Networks
Artificial Neural Networks (ANN) or Neural Networks (NN) consist of the multiple numbers of processing units that communicate by sending a large number of signals to each other over a large number of connections [16]. Neural Networks (NN) is a process to create a computer application that learns from data. A collection of connected layers does some specific task called neural at each layer and which communicate with other layers. Then assign a task to network and ask to solve the problem, it attempts to do, again and again, each time strengthen its connection that leads to suc-cess or failure. According to Gurney, Kevin neural networks are simple processing units that are interconnected and network functionality de-pends on the neuron. The network’s processing ability based on connec-tion strengths, its weight, and learning method from the training set[17].
Neural network architecture
Neural network architecture consists of different layers, below in figure 2.2 is the basic architecture of the neural network, on the most left its input layer and neuron of this layer is called input neurons. The rightmost layer is called the output layer, and neurons of this layer are called output neu-rons, in this architecture, it consists of only one neuron. The middle layer of this network is called the hidden layers and the neuron of this layer is neither called input or output. In this network, it consists of only one layer but different networks can have multiple network layers [18].
The following network figure 2.3 has two hidden layers and the network consists of four layers. Sometimes people called this architecture as mul-tilayer perceptron or MLPs use sigmoid activation function. The purpose of the activation function is to introduce non-linearity in networks.
Suppose a network consists of different inputs from 1, 2,. . , and weights 1, 2,. . , attached to every input as shown in figure 2.4. These weights are real numbers which are expressing the importance of respective input to output. The output of the neuron 0 or 1 determined by whether the sum of the weights is less than or greater than a specific threshold [18].
Figure 2.4: Multilayer perceptron with activation [19]
Most ANNs follow the “learning rule” get input from a random weight which gets more accurate over time. Biases and activation function ap-plied to the network and then calculate error and delta through backprop-agation with some adjustment to weight and biases. When weights and biases become accurate on any given inputs it produces pretty good re-sults from the system. figure 2.5 explains step by step architecture of neu-ral networks. There are many methods involve like Feed Forward, loss function Backpropagation.
Neural Networks used many different rules for learning, one of the most common is the delta rule. This rule is mostly used by the most common class of ANNs called “Backpropagational Neural Networks” (BPNN). It is an abbreviation for the backward propagation of errors. Backpropaga-tion is used to calculate error and delta and adjust all the weights and biases of the previous layer until conditions are fulfilled or iterations are finished.
Hidden layers used sigmoidal activation function which helps to polarize and stabilize. The learning rate calculates how much any specific iteration affects the weights and biases and then Momentum calculates how much past iteration outcome affects weights and biases. We can calculate the change in a given bias by below formula ( ℎ ) = ( ∗ ∗ ) + ( ∗ ℎ ) (2.1)
Let’s take another example where the neural network takes two inputs 1and 2 and this network has 2 hidden layers and two output layers.
The hidden and output layer includes biases too with weights. The basic structure looks like in figure 2.6.
To start the training of this network lets add some initial random weights and biases that are attached to the network as shown below.
The purpose of backpropagation is to update the weights so that the neu-ral network can learn to correctly and predict the output from a given input. From the feedforward, the net at each neuron is calculated by the dot product between its associated weights and the output activations from the previous layer. The output of the previous layer neurons is then calculated by the applying activation function and the process repeat it-self for all the neuron in all layers, Mathematical formula is below [21]. ℎ1= 1∗ 1+ 2∗ 2+ 1∗1 (2.2)
For calculating ℎ1 total net input 1 = 0.4 ∗ 0.593269992 + 0.45 ∗ 0.596884378 + 0.6 ∗ 1 = 1.105905967 (2.3)
With the squash implementation of logistic function to get output for ℎ1
ℎ1 = 1 = 1 = 0.593269992 (2.4)
1+ − ℎ1 1+ −0.3775
The repetition of the process can get the values of others too. To calculate the error for each output neuron by using the squared error function and sum them to get the total error. = ∑ 1 ( − )2 (2.5)
To get the total error, sum the error of all output neuron’s error as below = 1 + 2 (2.6)
Now goal with backpropagation is to update all the weights in the net-work so network output could come closer to the target output. Therefor network error could be minimizing for each neuron as well for the whole network. For example, in the network 5, after applying the chain rule the total error change would be [18].
Convolutional Neural Networks
To learn about the hundreds of hundreds of objects from millions of im-ages or data, it needs a model with a large learning capacity. The com-plexity of object recognition makes it a difficult task that means this prob-lem cannot be specified for all types of data set, so our model should have some prior knowledge to suitable for all types of datasets. As compared to traditional feedforward neural networks with the same size, the con-volutional neural network has fewer connections and parameters so they are easier to train [22].
“To learn about thousands of objects from millions of images, need such model with large learning capacity” [23]. The convolutional neural net-work is the most popular artificial neural network which is used for im-age analysis”. A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take in an input image, assign im-portance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other” [23]. Image analy-sis is not the only case to use convolutional neural networks, they can be used for data analysis and classification problems as well. More generally we can say that artificial neural networks have some type of specializa-tion for picking out or detect patterns. Pattern detection makes convolu-tional neural networks useful for image analysis.
The thing which makes convolutional neural networks better from others is hidden layers that receive input and transform input in some way to produce better output for the next layer of the convolutional network. This transformation is a convolutional operation. Each layer has several filters that detect patterns. The pattern can be edge detection, circle or corner and the next layer can detect images like an eye, finger, etc. and the final layer can detect the whole image.
Top deep convolutional neural networks architecture typically organized into an alternative convolutional layer and max-pooling neural networks layer followed by some dense and fully connected layers as Krizhevsky [22] described. The input 3D (RGB) image representation is the input layer and then transforms into a new 3D feeding to the following layers. In this example figure 2.9, there are five convolutional layers, three max-pooling layers and at the end three fully-connected layers [24].
Figure 2.9: Deep learning CNN for image classification [22]
First, a convolutional layer with a stride size of 4 and filter size 5 × 5 ap-plied this layer to extract different features. It has small patterns com-pressed to small parameters. Max pooling layer applied to get small spa-tial resolution but max-pooling should not apply so many times to the network that it loos spatial resolution. This process repeated itself till get-ting required results, then apply fully connected to convert 3D or2D the results into 1D after that apply some activation functions to get results between 0 and 1 like Tanh, Sigmoid, Relu, or leaky-Relu.
Generative Adversarial Networks
Generative adversarial Networks are relatively new and famous since 2014 [1]. GANs belong to a set of algorithms called generative models and these algorithms belong to a field, named unsupervised learning, a subset of machine learning, the purpose is to study algorithms that learn the structure of given data with any specific target value [25].
GANs consist of two deep convolutional neural networks, one network is called a discriminator and the other is called a generator. GANs learn the generative model by training one network, the “Discriminator” to dis-tinguish data between real and fake, meantime trains another model “Generator” to generate data from noisy data, like real ones. It trains until it fools the discriminator [26].
GANs architecture
To understand the generative adversarial network, one must understand its discriminate and generative algorithms. Generative algorithms try to classify data that is given like it makes predictions and guesses on learn-ing. In the below figure 2.10, one model is called the generator which gen-erative data like the original one, so then it can trick the discriminator that data is original, like a forger. On another discriminator check if the gen-erated sample is like the real sample or not. Ian J. Goodfellow who proposed the GANs framework describes it as the generative model pitted against the adversary, a discriminative model that determines whether the input(sample) is coming from real data or fake. The generative model learns from previous experience and updates itself and generates new samples. Competition continues until discrimi-native cannot distinguish real or fake data [1].
Assume real data is X and the latent representation of an image is z. Dis-criminative (D) takes the sample x from real data and assign a value of 1 to it, that means it is real data and assigns 0 to ( ) sample to assign fake data. Neural network G (z, θ₁) is the model mention above from generator G, it map input noisy data z to the desire data x, and other neural network D (x, θ2) models the discriminator and output probability in range 0 and 1. In both networks, θi represents the weight or parameter associated with each network [2].
GANs training
The training of both networks is in cyclic phases. In the first phase, dis-criminator D gets trained. First, the discriminator start training to do cor-rectly classification of data if it is real or fake, which means its weight updates to maximize the probability that if the data come from x is a real dataset while minimizing the probability that any fake image belongs to it is not real. In the technical terms, the loss/error function used D(x) and minimize D(G(z)). The discriminator train on both real and fake and max-imize its reward my minimizing its loss. It uses Stochastic Gradient De-scent (SGD) with backpropagation. At this stage, the generator does not get trained.
In the second phase, generator G gets trained by the feedback from the discriminator. Now the generator is updating its weights and optimized to maximize the probability that any fake image is classified as a real one. It means its loss/error function use for this network is max D(G(z)).
Since both networks are trying to optimize opposite loss functions like minimax game with value V (G, D). The generator is trying to maximize its output and the discriminator is trying to minimize the same value. The value function for these both will be. ( , ) = ~ ( )[ ( ) ] + ~ ( )[log (1 − ( ( ))) ] (2.8)
After both generator and discriminator are modeled with neural net-works, a gradient-based optimization algorithm can be used to train the GAN. As stochastic gradient descent has proven most successful in mul-tiple fields, it is also successful for GAN [2].
Depth Image-Based Rendering
3D videos are a new type of visual media that has a highly expanded user’s sensation over the 2D videos. The development of different display technologies and user expectations for 3D videos has given the researcher a new area to search for how to represent and render realistic 3D impres-sions. A user will create a 3D depth impression for each eye to receive its corresponding view [27]. The 3D impression is attractive in applications such as imaging [28] [29], multimedia services [30], and 3D reconstruction [31].
Table of contents :
1 Introduction
1.1 Background and problem motivation
1.2 Problem Description
1.3 Tools, Requirements
1.4 Milestones
1.5 Overall Aim
1.6 Goals
1.7 Scope
1.8 Outline
1.9 Contribution
2 Theory
2.1 Machine Learning
2.2 Neural Networks
2.2.1 Neural network architecture
2.2.2 Neural network training
2.3 Convolutional Neural Networks
2.4 Generative Adversarial Networks
2.4.1 GANs architecture
2.4.2 GANs training
2.5 Depth Image-Based Rendering
2.6 Related Work
3 Methodology
3.1 Data Preparation
3.2 GAN Training
Disocclusion Inpainting using
Generative Adversarial Networks
Nadeem Aftab
3.3 Disocclusion inpainting with GAN
3.4 Inpainting performance measure
3.5 Analysis of results
3.6 Libraries
3.7 Hardware
4 Implementation
4.1 Data Preparation Implementation
4.2 GAN Implementation and training
4.2.1 Generator Implementation
4.2.2 Discriminator Implementation
4.2.3 GAN Training
4.3 Inpainting on prepared data with GAN Implementation
4.4 Inpainting performance measure
4.5 Implementation for analysis of results
5 Results
5.1 DIBR generated data
5.2 GAN Results
5.3 Inpainting Method Results
5.4 Performance measure Results
5.5 Analysis of Results
6 Conclusion
6.1 Evaluation according to goals
6.2 Conclusion of inpainting disocclusion
6.3 Future Work
6.4 Ethical and Social Impact
References