We invite contributions related to the theoretical and methodological aspects of Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization approaches and closely related topics including :

• Data analysis and data visualization
• Temporal and incremental data mining
• Various mathematical approaches including information theory and mathematical statistics
• Software and hardware implementations
• Architectural solutions including hierarchical and growing networks, ensemble models and special metrics
• Neuro-cognitive studies that compare modeling and empirical results at different levels
• Models, experimental investigations and applications of autonomous mental development.

We also call for scientific and practice-oriented papers that demonstrate the use of Self-Organizing Maps, Learning Vector Quantization, Clustering and Data Visualization approaches in different application areas including but not limited to :

• Data mining
• Pattern recognition
• Signal processing
• Knowledge management
• Time series processing
• Modeling dynamic phenomena
• Anomalies and fault detection
• Industrial applications
• Bioinformatics
• Biomedical applications
• Telecommunications
• Financial analysis
• Applications for transport optimization
• Cognitive modeling
• Language modeling
• Robotics and intelligent systems
• Image processing and vision
• Speech processing
• Text and document analysis

Confirmed invited speakers

Etienne Côme, IFSTTAR

Generative models for urban mobility analysis

Abstract : The development of smart technologies and the advent of new observation capabilities have increased the availability of massive urban datasets that can greatly benefit urban studies. For example, a large amount of urban data is collected by various sensors, such as smart meters, or provided by GSM, Wi-Fi or Bluetooth records, ticketing data, geo-tagged posts on social networks, etc. Analysis of such digital records can help to build decision-making tools (for analytical, forecasting and display purposes) with a view to better understanding the operating of urban systems, to enable urban stakeholders to plan better when extending infrastructures and to provide better services to citizens. Although some of the devices used to record these datasets were not initially designed for the analysis of urban mobility, their usefulness is obvious. We will detail several analyses of such massive datasets using generative modeling tools which are particularly well suited to encode prior knowledge available on the task at hand. Our presentation will focus on the analysis of Bike-Sharing data and ticketing logs with cases study in Rennes and Paris.

Jean-Daniel Fekete, INRIA, Université Paris Sud

Yann Guermeur, LORIA

Rademacher Complexity of Margin Multi-category Classifiers

Abstract : In the framework of agnostic learning, one of the main open problems of the theory of multi-category pattern classification is the characterization of the way the confidence interval of a guaranteed risk should vary as a function of the fundamental parameters which are the sample size m and the number C of categories. This is especially the case when working under minimal learnability hypotheses. We consider margin classifiers based on classes of vector-valued functions with one component function per category. The classes of component functions are uniform Glivenko-Cantelli and the vector-valued functions take their values in a hypercube of R^C. For these classifiers, a well-known guaranteed risk based on a Rademacher complexity applies. Several studies have dealt with the derivation of an upper bound on this complexity. This article establishes a bound which is based on a new generalized Sauer-Shelah lemma. Under the additional assumption that the \gamma-dimensions of the classes of component functions grow no faster than polynomially with \gamma^-1, its growth rate with C is a O(\sqrt (C) \ln (C). This behaviour holds true irrespective of the degree of the polynomial

Pascal Massart, Université Paris Sud

Estimator selection : the calibration issue

Abstract:Estimator selection has become a crucial issue in non parametric estimation. Two widely used methods are penalized empirical risk minimization (such as penalized log-likelihood estimation) or pairwise comparison (such as Lepski’s method). Our aim in this talk is twofold. We shall first give some general ideas about the calibration issue of estimator selection methods. We shall review some known results, putting the emphasis on the concepts of minimal and optimal penalties which are helpful to design data-driven selection criteria. Secondly we shall present a new method for bandwidth selection within the framework of kernel density density estimation which is in some sense intermediate between these two main methods mentioned above. We shall provide some theoretical results which lead to some fully data-driven selection strategy.

Alfredo Vellido, Universidad Politécnica de Cataluña

The eye of the beholder : visualization and interpretability in practical applications

Abstract : Modern data science has knowledge discovery processes at its core. The road from raw data to manageable information and, from there, to knowledge extraction is by no means straightforward. In many practical applications of data analysis, knowledge extraction may not even be enough by itself, unless such knowledge is shown to be actionable. Machine Learning, hand in hand with Statistics, is playing an increasingly important role in these applications. Such role is likely to be curtailed, though, unless we guarantee the interpretability of models and results. At a moment in time in which society at large and the natural sciences have become data-rich, interpretability becomes key, as legislation at the European Union level is about to be passed that will grant subjects a « right to explanation », guaranteeing individuals the right to ask for an explanation of any algorithmic decision made about them. This brief talk will focus on unsupervised learning and visualization to illustrate the problem of model interpretability. It will draw examples mostly from the field of biomedicine, in which interpretability can be a hard constraint on the applicability of Machine Learning methods.

Nathalie Villa-Vialaneix, INRA Toulouse

Stochastic Self-Organizing Map variants with the R package SOMbrero

Abstract : Self-Organizing Maps (SOM) are a popular clustering and visualization algorithm. Several implementations of the SOM algorithm exist in different mathematical/statistical softwares, the main one being probably the SOM Toolbox [Kohonen, 2014]. In this presentation, we will introduce an R package, SOMbrero, which implements several variants of the stochastic SOM algorithm. The package includes several diagnosis tools and graphics for interpretation of the results and is provided with a complete documentation and examples.


The proceedings of WSOM+ 2017 will be published with IEEE Xplore. The Latex and the Word templates are available for the authors. Manuscripts should be no longer than eight pages.

Selected papers will be published in a special issue of the journal Neural Computing & Applications.