Adaptive Structured Noise Injection for shallow and deep neural networks

Get Complete Project Material File(s) Now! »

Machine learning and bioinformatics

In all areas of biological and medical research, the role of the computer has been dramatically enhanced in the last to decades. One particular area that can best reflect this influence is genome sequencing, which aims at mapping the base pairs of an organism’s DNA or genome (≥ 3 billion pairs for the human genome). These base pairs, also called nucleotides, are of four types, A, T, C and G, and are the building blocks of DNA. DNA sequencing methods used in the 1970’s and 1980’s were mainly manual, for example the original Maxam-Gilbert sequenc- ing (Maxam and Gilbert, 1977) and the original Sanger sequencing (Sanger and Coulson, 1975). The shift to more rapid, automated sequencing methods in the 1990’s finally allowed for sequencing of whole genomes and as a result enhanced the generation of more reliable and precise data (Smith et al., 1986; Hunkapiller et al., 1991). Sequencing of nearly an entire human genome was first accomplished in 2000 partly through the use of simple or hierarchical shotgun sequencing technologies (Consortium et al., 2001; Venter et al., 2001). Since then, and thanks to advances both in computing power and biological research that accelerated DNA sequencing, costs witnessed a fast decrease, which even surpassed largely the Moore’s law by 2008 with the advent of what is termed as second generation sequencing techniques, also known as the next-generation sequencing (NGS), which relies on massively parallel sequencing of short DNA fragments. As shown by figure 1.2, this resulted in a dra- matic increase in the number of genomes sequenced, further increasing the pressure towards modelling, synthesis and interpretability of obtained data and its analysis.

Bridging the gap: towards data-dependent regularisation

We presented through the last sections di↵erent regularisation methods, relying on di↵erent assumptions, either explicitly or implicitly. Explicit regularisations such as adding a penalty to the ERM problem are well studied and di↵erent efficient implementations have been de- veloped (Sra et al., 2012). Besides individual limitations, we have seen that these methods enforce properties such that smoothness or sparsity that are often not coming from a prior knowledge but rather generic assumptions that might be later useful such as feature selection (Bishop, 2006). Many approaches have been designed to complement these methods with ex- isting prior knowledge about the data, the model or the task at hand (Lauer and Bloch, 2008; Huang et al., 2011b). Structured sparsity-enforcing regularisation methods, for instance, al- low to additionally incorporate particular prior assumptions on the structure of the input variables, such as overlapping groups, non-overlapping groups and acyclic graphs (Yuan and Lin, 2006; Obozinski et al., 2011). Examples of uses of structured sparsity methods include face recognition (Jia et al., 2012), magnetic resonance image (MRI) processing (Chen and Huang, 2012), and analysis of genetic expression in breast cancer (Jacob et al., 2009) among other applications.
Despite these several successful applications, and maybe also because of it, the prolifer- ation and diversity of these particularly designed explicit regularisation methods, exploiting a particular prior knowledge, can be sometimes confusing. Moreover, the e↵ect of the addi- tion of the same prior knowledge in the learning problem depends drastically on the training dataset and the particular regularisation method (Lavi et al., 2012). On the other hand, considered implicit regularisation methods such as data augmenta- tion has also, despite being a classical technique, witnessed a growing popularity in recent years. As it was already described, in this case there is a clear lack of generic methods for augmenting the data, although interesting research that aims to automatically learn useful data transformations has been prublished very recently (Lemley et al., 2017; Cubuk et al., 2018).

READ Multi-Criteria Recommender Tool for Supporting Intrusion Response System

(INI) framework and a new approximation of dropout

The idea of noise injection in the data from which we want to learn seems at first sight quite counterintuitive. However, the idea of adding randomness in algorithms has been present for decades in the community of computer science and has started to emerge early in the machine learning community with the development of optimisation algorithms for neural networks three decades ago. This technique has recently recaptured the community interest with the regain of popularity of multilayer neural networks and their increasing complexity that emphasises the need of their regularisation. Despite this revival of research interest around noise injection variants and applications, there is still a lack of clear definition for this set of methods and their interpretations from di↵erent frameworks views. In this chapter:
1. We provide an overview of Noise Injection in supervised learning from the point of view of di↵erent supervised learning settings.
2. We reformulate the Input Noise Injection framework in the supervised learning setting with general distributions and noising functions.
3. We summarise intuitions, algorithms and a part of theoretical justifications (in linear models) around the use of (INI).
4. We present a brief overview of the popular dropout method and related works.
5. We provide a novel approximation for dropout leading to new insights for linear and non-linear models and with general (potentially non-smooth) loss functions.
6. We complement this chapter by a series of experiments about the e↵ectiveness of (INI) in improving the generalisation performance of linear models in the supervised learning setting. These experiments are performed on simulations and benchmark datasets in vision recognition, document classification and cancer prognosis tasks.

Table of contents :

Abstract
R´esum´e
Acknowledgements
Contents
Liste of figures
1 Introduction
1.1 Machine learning: past and present
1.2 Machine learning and bioinformatics
1.3 The supervised learning setting
1.4 The overfitting phenomenon
1.5 Assessing overfitting
1.6 Preventing overfitting
1.7 Thesis and contributions
2 The RA responder challenge
2.1 Introduction
2.2 The challenge
2.3 Methods
2.4 Results
2.5 Conclusions and acknowledgements
3 Noise injection in the input data
3.1 Introduction
3.2 Formulation
3.3 INI as a regulariser
3.4 Algorithms
3.5 The special case of dropout
3.6 Another approximation for Dropout
3.7 Experiments and empirical insights
3.8 Discussion and remarks
4 DropLasso: A robust variant of Lasso for single cell RNA-seq data
4.1 Introduction
4.2 Methods
4.3 Results
4.4 Discussion
5 ASNI: Adaptive Structured Noise Injection for shallow and deep neural networks
5.1 Introduction
5.2 Dropout and multiplicative noise
5.3 Structured noise injection (SNI)
5.4 regularisation e↵ect
5.5 E↵ect on learned representation
5.6 Experiments
5.7 Discussion
5.8 Availability
6 Conclusion
A Supplementaries I
A.1 Supplementary tables
A.2 Supplementary figures .