Luc Giffon - Page Personnelle

About

Ancien étudiant en biologie jusqu'au niveau licence, je me suis réorienté vers un master en informatique fondamentale, spécialisé dans l'apprentissage automatique et la fouille de données. Après ce Master, j'ai réalisé une thèse de doctorat sur la compression de modèle d'apprentissage automatique avec un accent porté sur les réseaux de neurones. J'ai ensuite travaillé sur l'apprentissage compressif dans le cadre d'un post-doc à l'ENS Lyon avant d'être finalement recruté comme data scientist chez Yanport, une proptech. Dans cette entreprise, je construis des solutions de data science pour l'analyse statistique du marché de l'immobilier. Je suis aujourd'hui à la recherche de nouvelles opportunités.

Localisation :

Lyon, France

Age :

E-mail :

perso@lucgiffon.fr

Langues :

French, English

Publications

2021

IJCNN

PSM-nets: Compressing Neural Networks with Product of Sparse Matrices

Luc Giffon, Stéphane Ayache, Hachem Kadri, Thierry Artières, Ronan Sicre

Over-parameterization of neural networks is a well known issue that comes along with their great performance. Among the many approaches proposed to tackle this problem, low-rank tensor decompositions are largely investigated to compress deep neural networks. Such techniques rely on a low-rank assumption of the layer weight tensors that does not always hold in practice. Following this observation, this paper studies sparsity inducing techniques to build new sparse matrix product layers for high-rate neural networks compression. Specifically, we explore recent advances in sparse optimization to replace each layer's weight matrix, either convolutional or fully connected, by a product of sparse matrices. Our experiments validate that our approach provides a better compression-accuracy trade-off than most popular low-rank-based compression techniques.

2021

Machine Learning Journal

QuicK-means: Acceleration of K-means by learning a fast transform

Luc Giffon, Valentin Emiya, Liva Ralaivola, Hachem Kadri

K-means -- and the celebrated Lloyd algorithm -- is more than the clustering method it was originally designed to be. It has indeed proven pivotal to help increase the speed of many machine learning and data analysis techniques such as indexing, nearest-neighbor search and prediction, data compression, Radial Basis Function networks; its beneficial use has been shown to carry over to the acceleration of kernel machines (when using the Nyström method). Here, we propose a fast extension of K-means, dubbed QuicK-means, that rests on the idea of expressing the matrix of the 𝐾 centroids as a product of sparse matrices, a feat made possible by recent results devoted to find approximations of matrices as a product of sparse factors. Using such a decomposition squashes the complexity of the matrix-vector product between the factorized 𝐾×𝐷 centroid matrix 𝐔 and any vector from O(KD) to O(AlogA+B), with 𝐴=min(𝐾,𝐷) and 𝐵=max(𝐾,𝐷), where 𝐷 is the dimension of the training data. This drastic computational saving has a direct impact in the assignment process of a point to a cluster, meaning that it is not only tangible at prediction time, but also at training time, provided the factorization procedure is performed during Lloyd's algorithm. We precisely show that resorting to a factorization step at each iteration does not impair the convergence of the optimization scheme and that, depending on the context, it may entail a reduction of the training time. Finally, we provide discussions and numerical simulations that show the versatility of our computationally-efficient QuicK-means algorithm.

2021

PhD Thesis

Sparse approximations and kernel methods for machine learning model compression

Luc Giffon

The explosion of computing power and massive data collection has led to the current golden age of artificial intelligence research. This overabundance of resources has favoured the design of algorithms that are certainly impressive from an application point of view, but also excessively demanding in terms of learning data and/or computing power. This thesis aims at studying and experimentally validating the benefits, in terms of amount of computation and data needed, that kernel methods and sparse approximation methods can bring to existing machine learning algorithms. Convolutional neural networks have revolutionized the processing of structured data such as images or text. However, the new neural architectures, always more powerful, contain more and more parameters to be adjusted, which requires a very large volume of data. Conversely, kernel methods, based on the use of a function called « kernel», are less efficient in practice than neural networks but require little data. One can therefore naturally wonder whether it is possible to combine these techniques to keep the best of both worlds. In a first part of this thesis, we propose a new type of neural architecture that uses a kernel function to reduce the number of learnable parameters, thus making it robust to overfiting in a regime where few labeled observations are available. The computational power required to run recent learning algorithms is closely related to the «complexity» of these algorithms. A possible approach to reducing this complexity is the use of sparse approximations. For example, one can try to decompose a matrix into a product of sparse matrices. The final operator obtained from this approximation contains fewer non-zero values than the initial matrix and can therefore be used instead of the initial matrix to calculate low-cost matrix products. Another example: a signal can be approximated by a linear sparse combination of vectors from a dictionary. In a second part of this thesis, we seek to reduce the complexity of existing machine learning models by including sparse approximations. First, we propose an alternative algorithm to the K-means algorithm which allows to speed up the inference phase by expressing the centroids as a product of sparse matrices. In addition to the convergence guarantees of the proposed algorithm, we provide an experimental validation of both the quality of the centroids thus expressed and their benefit in terms of computational cost. Then, we explore the compression of neural networks by replacing the matrices that constitute its layers with sparse matrix products. Finally, we hijack the Orthogonal Matching Pursuit (OMP) sparse approximation algorithm to make a weighted selection of decision trees from a random forest, we analyze the effect of the weights obtained and we propose a non-negative alternative to the method that outperforms all other tree selection techniques considered on a large panel of data sets.

2020

Conference CAP

Pruning Random Forest with Orthogonal Matching Trees

Luc Giffon, Charly Lamothe, Léo Bouscarrat, Paolo Milanesi, Farah Cherfaoui, Sokol Koço

In this paper we propose a new method to reduce the size of Breiman's Random Forests. Given a Random Forest and a target size, our algorithm builds a linear combination of trees which minimizes the training error. Selected trees, as well as weights of the linear combination are obtained by mean of the Orthogonal Matching Pursuit algorithm. We test our method on many public benchmark datasets both on regression and binary classification and we compare it to other pruning techniques. Experiments show that our technique performs significantly better or equally good on many datasets 1. We also discuss the benefit and shortcoming of learning weights for the pruned forest which lead us to propose to use a non-negative constraint on the OMP weights for better empirical results.

2018

Conference IJCNN

Deep Networks with Adaptive Nyström Approximation

Luc Giffon, Stéphane Ayache, Thierry Artières, Hachem Kadri

Recent work has focused on combining kernel methods and deep learning to exploit the best of the two approaches. Here, we introduce a new architecture of neural networks in which we replace the top dense layers of standard convolutional architectures with an approximation of a kernel function by relying on the Nyström approximation. Our approach is easy and highly flexible. It is compatible with any kernel function and it allows exploiting multiple kernels. We show that our architecture has the same performance than standard architecture on datasets like SVHN and CIFAR100. One benefit of the method lies in its limited number of learnable parameters which makes it particularly suited for small training set sizes, e.g. from 5 to 20 samples per class.

2018

Conference CAP

Emulsion de noyaux et d'apprentissage profond

Luc Giffon, Stéphane Ayache, Thierry Artières, Hachem Kadri

Les modèles à base de méthodes à noyaux et d'apprentissage profond ont essentiellement été étudiés séparemment jusqu'à aujourd'hui. Des travaux récents se sont focalisés sur la combinaison de ces deux approches afin de tirer parti du meilleur de chacune d'elles. Dans cette optique, nous introduisons une nouvelle architecture de réseaux de neurones qui bénéficie du faible coût en espace et en temps de l'approximation de Nyström. Nous montrons que cette architecture atteint une performance du niveau de l'état de l'art sur la classification d'images des jeux de données MNIST et CIFAR10 tout en ne nécessitant qu'un nombre réduit de paramètres.