The Single Most Important Equation for Brain-Computer Information Interfaces

November 28, 2014 AJMaren Comments 6 comments

The Kullback-Leibler Divergence Equation for Brain-Computer Information Interfaces

Brain-Computer Information Interfaces: A single key equation

The Kullback-Leibler equation is arguably the best place for starting our thoughts about information theory as applied to Brain-Computer Interfaces (BCIs), or Brain-Computer Information Interfaces (BCIIs).

The Kullback-Leibler equation is given as:

We seek to express how well our model of reality matches the real system. Or, just as usefully, we seek to express the information-difference when we have two different models for the same underlying real phenomena or data.

The K-L information is a measure, or a heuristic distance, between an approximating model g compared with f, which for our purposes can be either another model of the same data, or the actual data distribution itself.

Because the K-L measure is not symmetric, that is, it is not appropriate to call it a distance. Instead, we refer to this quantity as the K-L divergence.

Kullback-Leibler Divergence Notation

, the K-L information (or divergence) is the “information” lost when model g is used to approximate f,
f itself can be either the real data distribution, or a different model of the data – so that we are either comparing a model against data, or a model against another model,
f and g are n-dimensional probability distributions over the n-dimensional domain x, and
The range of parameters underlying observed and modeled states is denoted .

Kullback-Leibler Divergence: Continuous and Discrete Formalisms

We can use the K-L divergence in dealing with either continuous data (or continuous models), or in the discrete case. For this, we’ll drop the notation for the parameter set , noting that either f or g can be functions of parameters as well as the data space.

Continuous

Discrete

Here, the sum is taken over the dataset {d(i)}.

Goal in Using the Kullback-Leibler Divergence

The goal is to minimize the information lost, or the divergence.

We note that if f(x) = g(x), then ln(f(x)/g(x)) = ln(1) = 0, and thus I(f,g) = 0, which means that no information is lost when the “real” situation is used to model itself.

However, this “real” situation described by may be very complex; we are looking for an approximating model . Here, g is not only a function of the (n-dimensional) space x which is the domain of f, but also of a set of parameters , which allows us to tune the approximating model.

Note: This description of the Kullback-Leibler formalism is taken from Strimmer (2013), Burnham & Anderson (2001), and White (date unknown); see references below.

All Models Are Wrong

George E.P. Box, a renowned statistician (1919-2013), is famously credited with having said:

All models are wrong, but some are more useful than others.

Perhaps more germane to our point – we are now getting some success with a range of BCIIs. Now that we’ve proved that we can do this, our focus is now on questions such as how do we:

Achieve consistent and accurate performance?
Expand the range of capabilities?
(Perhaps most important) Measure our results?

The realm of Brain-Computer Information Interfaces (BCIIs) addresses these issues.

To achieve our goals in creating effective Brain-Computer Information Interfaces, we need statistics.

As Bell said,

We have a large reservoir of engineers (and scientists) with a vast background of engineering know how. They need to learn statistical methods that can tap into the knowledge. Statistics used as a catalyst to engineering creation will, I believe, always result in the fastest and most economical progress… (Statement of 1992, quoted in Introduction to Statistical Experimental Design — What is it? Why and Where is it Useful? (2002) Johan Trygg & Svante Wold)

The Next Post on the Kullback-Leibler Divergence

This series of blog posts on the K-L divergence will continue. I will insert links to this one from future posts, and put in links from here to the future posts as they evolve.

The direction that this will take – after a bit more of a theoretic overview – will focus on the application of the K-L divergence to Brain-Computer Information Interfaces (BCIIs):

How is the K-L divergence being used in current BCI/BCII work today?
How can we envision it being applied to future BCI/BCII developments?, and (perhaps most importantly)
What are the limitations with this method (and with the family of related methods), and what else can we use?

References

Information Theory – Seminal Papers

On Information and Sufficiency, Kullback S, Leibler RA: On Information and Sufficiency. Ann. Math. Statist. 22 (1) (1951), 79-86. On Information and Sufficiency – full PDF file.
A Mathematical Theory of Communication, Shannon CE: A Mathematical Theory of Communication. The Bell System Technical Journal. XXVII (3) (July, 1948), 379-423.

Information Theory – Tutorials

Burnham, K.P., and Anderson, D.R., Kullback-Leibler information as a basis for strong inference in ecological studies, Wildlife Research (2001), 28, 111-119. pdf (Reviews concepts and methods – including K-L – in the context of practical applications to experimental data, rather than a deeply mathematical review – good for understanding basics.)
Strimmer, K. (2013). Statistical Thinking (Draft), Chapter 4: What Is Information?. pdf (Very nice, clear intro – v. good for understanding basics)
White, G. (date unknown). Information Theory and Log-Likelihood Models: A Basis for Model Selection and Inference. Course Notes for FW663, Lecture 5. pdf (Another very nice, clear intro – v. good for understanding basics)
Yu, B. (2008). Tutorial: Information Theory and Statistics – ICMLA (ICMLA, 2008, San Diego). pdf (Very rich and detailed 250-pg PPT deck, excellent for expanding understanding once you have the basics.)

Alianna J. Maren

Statistical Mechanics, Neural Networks, Artificial Intelligence

The Single Most Important Equation for Brain-Computer Information Interfaces

November 28, 2014 AJMaren Comments 6 comments