The Single Most Important Equation for Brain-Computer Information Interfaces
The Kullback-Leibler Divergence Equation for Brain-Computer Information Interfaces
The Kullback-Leibler equation is arguably the best place for starting our thoughts about information theory as applied to Brain-Computer Interfaces (BCIs), or Brain-Computer Information Interfaces (BCIIs).
The Kullback-Leibler equation is given as:
We seek to express how well our model of reality matches the real system. Or, just as usefully, we seek to express the information-difference when we have two different models for the same underlying real phenomena or data.
The K-L information is a measure, or a heuristic distance, between an approximating model g compared with f, which for our purposes can be either another model of the same data, or the actual data distribution itself.
Because the K-L measure is not symmetric, that is, it is not appropriate to call it a distance. Instead, we refer to this quantity as the K-L divergence.
Kullback-Leibler Divergence Notation
- , the K-L information (or divergence) is the “information” lost when model g is used to approximate f,
- f itself can be either the real data distribution, or a different model of the data – so that we are either comparing a model against data, or a model against another model,
- f and g are n-dimensional probability distributions over the n-dimensional domain x, and
- The range of parameters underlying observed and modeled states is denoted .
Kullback-Leibler Divergence: Continuous and Discrete Formalisms
We can use the K-L divergence in dealing with either continuous data (or continuous models), or in the discrete case. For this, we’ll drop the notation for the parameter set , noting that either f or g can be functions of parameters as well as the data space.
Continuous
Discrete
Here, the sum is taken over the dataset {d(i)}.
Goal in Using the Kullback-Leibler Divergence
The goal is to minimize the information lost, or the divergence.
We note that if f(x) = g(x), then ln(f(x)/g(x)) = ln(1) = 0, and thus I(f,g) = 0, which means that no information is lost when the “real” situation is used to model itself.
However, this “real” situation described by may be very complex; we are looking for an approximating model . Here, g is not only a function of the (n-dimensional) space x which is the domain of f, but also of a set of parameters , which allows us to tune the approximating model.
Note: This description of the Kullback-Leibler formalism is taken from Strimmer (2013), Burnham & Anderson (2001), and White (date unknown); see references below.
All Models Are Wrong
George E.P. Box, a renowned statistician (1919-2013), is famously credited with having said:
All models are wrong, but some are more useful than others.
Perhaps more germane to our point – we are now getting some success with a range of BCIIs. Now that we’ve proved that we can do this, our focus is now on questions such as how do we:
- Achieve consistent and accurate performance?
- Expand the range of capabilities?
- (Perhaps most important) Measure our results?
The realm of Brain-Computer Information Interfaces (BCIIs) addresses these issues.
To achieve our goals in creating effective Brain-Computer Information Interfaces, we need statistics.
As Bell said,
We have a large reservoir of engineers (and scientists) with a vast background of engineering know how. They need to learn statistical methods that can tap into the knowledge. Statistics used as a catalyst to engineering creation will, I believe, always result in the fastest and most economical progress… (Statement of 1992, quoted in Introduction to Statistical Experimental Design — What is it? Why and Where is it Useful? (2002) Johan Trygg & Svante Wold)
The Next Post on the Kullback-Leibler Divergence
This series of blog posts on the K-L divergence will continue. I will insert links to this one from future posts, and put in links from here to the future posts as they evolve.
The direction that this will take – after a bit more of a theoretic overview – will focus on the application of the K-L divergence to Brain-Computer Information Interfaces (BCIIs):
- How is the K-L divergence being used in current BCI/BCII work today?
- How can we envision it being applied to future BCI/BCII developments?, and (perhaps most importantly)
- What are the limitations with this method (and with the family of related methods), and what else can we use?
References
Information Theory – Seminal Papers
- On Information and Sufficiency, Kullback S, Leibler RA: On Information and Sufficiency. Ann. Math. Statist. 22 (1) (1951), 79-86. On Information and Sufficiency – full PDF file.
- A Mathematical Theory of Communication, Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal. XXVII (3) (July, 1948), 379-423.
Information Theory – Tutorials
- Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies, Wildlife Research (2001), 28, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.)
- Strimmer, K. (2013). Statistical Thinking (Draft), Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics)
- White, G. (date unknown). Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference. Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics)
- Yu, B. (2008). Tutorial: Information Theory and Statistics – ICMLA (ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)
6 thoughts on “The Single Most Important Equation for Brain-Computer Information Interfaces”