Entropy Trumps All (First Computational for the 2-D CVM)

January 10, 2019 AJMaren Comments 0 Comment

Computational vs. Analytic Results for the 2-D Cluster Variation Method:

Three lessons learned: first computational results for the 2-D Cluster Variation Method, or CVM. The first-results comparisons between analytic predictions and the actual computational results tell us three things: (1) the analytics are a suggestion, not an actual values-prediction, and the further that we go from zero-values for the two enthalpy parameters, the more that the two diverge, (2) topography is important (VERY important), and (3) entropy rules the ‘Verse.

Entropy Rules the ‘Verse

We’ll start with the third observation: entropy maximization will always trump the analytic prediction.

One ring to rule them all, one ring to find them,
One ring to bring them all and in the darkness bind them.

From The Lord of the Rings, by J.R. Tolkien

Practically speaking, the “one ring” that rules our ‘Verse is entropy. As the computational results that we’ll see here show, even when we start with a system that is initially configured to be highly non-entropic, and use a very strong interaction enthalpy parameter to keep it that way, entropy still drives the system dynamics.

This holds, in the face of an analytic solution that would suggest otherwise.

A Quick Side Note

Time constraints require that I dole out these computational results with an eye-dropper. I’m not deliberately being withholding; it’s just the demands of a high teaching load. (More about that in some other post; suffice to say that Northwestern’s Master of Science in Data Science program now offers a full-out AI specialization.

I spent most of 2018 developing the two AI courses that are currently offered; MSDS 458 AI and Deep Learning and MSDS 453 Natural Language Processing. We’ll have two more AI courses starting next quarter, developed (thankfully!) by other faculty members. Suffice to say, course development ate most of my time last year, and we should be able to continue these blog conversations a bit more starting now in 2019.

That said, I’m teaching three classes, and doing a good job with each takes time. This means that we’ll be looking at computational results for the 2-D CVM over a bit more of an extended timeframe.

I put together the basics for this post a few weeks ago, and selected the highly non-entropic example below that we’ll use as our kick-off. However (surprise!), even with a much more entropically-styled example (next post), we’ll see that the force of entropy maximization won out over the enthalpy contributions, and the resulting set of triplet configuration variables, for a starting enthalpy parameters pairing, was surprisingly different from the original configuration variables set.

The Starting Point: A Highly Structured (Low-Entropy) 2-D Grid

Let’s kick off with a very non-entropic system.

Recall (if you’re new to these posts) that we’re working with the Cluster Variation Method. This is a means for expanding the entropy in an (otherwise classic) Ising model to include, in addition to the “on” (A) and “off” (B) nodes, the configuration variables. These are:

Y(i); where i = 1..3: the nearest-neighbors,
W(i); where i = 1..3: the next-nearest-neighbors,
Z(i); where i = 1..6: the triplets.

Recall also that for the Y(i) and W(i), there is a degeneracy factor of 2 when i = 2 (because there are two ways of counting the unlike pairs, e.g., A–B and B–A). The degeneracy factor is also 2 for the cases of i = 2 and i = 5 for the Z(i) triplet variabales. What the CVM method does is add terms to the entropy part of the basic free energy equation, accounting for the influence of the configuration variables. For more on the free energy equation, see The Big, Bad, Scary Free Energy Equation (and New Experimental Results).

Recall also – just a notational convention – that capital letters Y(i), W(i), and Z(i) refer to the actual counts of the various configuration variables in a grid, and lower case letters y(i), w(i), and z(i) refer to the fractional values.

The experiments that I’m running all involve a 2-D CVM grid of 16×16, or 256, nodes. Each node can be “on” or “off.” (A or B.) The grid is designed to be a full envelope, that is, it wraps around itself from right-to-left edge, and from top-to-bottom. This way, there is no point in the grid for which the full set of configuration variables would be incomplete. Since I’m using such a small system, that makes our results more accurate.

Many of the figures that I use to illustrate these grids have a “wraparound” on the far right and top of the grid; these just show what the values on the left-hand-side and the bottom are, and help us with visualizing the nearest neighbors and triplets that continue for the right-side or top edges.

Here’s a grid that I designed to test the system in a highly non-entropic state. As you can see, all the “on” (A) nodes have been pushed to the left and right sides. Since the grid wraps around (left-to-right and top-to-bottom), this is actually one big contiguous mass of A nodes.

Figure 1: A 2-D CVM grid with equal numbers of A and B nodes, arranged in a highly non-entropic manner. (Due to wrap-around, the dark A nodes connect at the left and right edges, and also top and bottom, to form a solid contiguous mass.)

Now, since we have a specific and well-defined system, and we can count the diverse configuration variables, we can actually compute the free energy. Specifically, based on just the configuration variables alone, we can compute the entropy.

In this system, I’ve got the activation enthalpy per node (epsilon0) set to zero. That means that the first term in the enthalpy equation (the overall activation enthalpy, a simple linear function of the relative fraction of A nodes) is zero. The multiplying epsilon0 parameter is zero, so the whole term is zero.

This also means that the equilibrium state is found when the number of A nodes is the same as the number of B nodes.

The reason that this grid is slightly “bulgey,” that is, has a fat bump in the mass of A nodes on the lower left (and a corresponding concavity on the lower right) is that we need to have some nodes creating next-nearest neighbors and triplets that would NOT be found if we just had a very symmetric system. That is, we need some triplets of the A–B–A and B–A–B nature. We could get these by creating small “islands” of A in the sea of B, or we can make an “irregular coastline” in the continental A mass. If we don’t create at least a little bit of each kind of triplet, then our entropy equation tries to take the logarithm of zero, and we all know that doesn’t work out very well.

Another Quick Side Note – the Z(i) Triplet Values are Approximate

If you’ve been following these posts (and yes, it’s been a while, so you’ve probably forgotten this point) – my current code is an approximation. That is, it doesn’t compute the full values for the Z(i) triplet configurations. This is because I wrote that code to read the triplets moving across a row, left-to-right, where the horizontal triplets require looking one row up and one row down (for each of two different kinds of horizontal triplets, for a given node). What I didn’t include were the vertical triplets; triplets that (given a specific starting node) require looking up two rows and down two rows (for each of two different vertical triplets, respectively).

The code to compute the W(i) (next-nearest-neighbor) values is correct. (I thought I’d missed on that … but went back and checked, that part is fine.) So what we’re running with is a situation where I have accurate counts of the Y(i) and W(i) counts, and half of the Z(i) triplets. (Also, this code – for what it does – has verified and validated to the hilt; I believe in it by now.) So – I’m running with half the Z(i) count, but dividing it by half the proportional factor, so the overall values for the z(i) are (I believe) approximately correct.

I’m probably not going to back and modify the current code to update the Z(i) computations, because what I really need is to transition the whole thing to object-oriented Python, and that will be a time-consuming adventure. So I think that the approximate results right now are good enough to draw preliminary conclusions, which will be new and interesting and worth having some conversations.

And yes, I will share code. Just let me get the first couple of papers written and at least up on arXiv first.

Analytic Results: Part 1

Whenever we propose a specific pattern, we can always go ahead and get the associated free energy. Specifically, we can get the (non-varying) entropy for any specific pattern. Then, depending on our activation enthalpy (epsilon0) and interaction enthalpy (epsilon1) parameter values, we can get the two enthalpy terms; one each for activation and interaction enthalpies.

As mentioned previously, if our activation enthalpy (epsilon0) parameter is zero, we should have equal numbers of A and B nodes, and our total activation enthalpy should be zero.

If we specify the other enthalpy parameter, epsilon1, we can get the interaction enthalpy term. This also contributes to the free energy.

Just getting the free energy, even though we can do it, is not very interesting. The reason is that we have no way of knowing if what we get is a minimal free energy.

Specifically (and let’s stay with the case where epsilon0 = 0), when we propose a value for epsilon1, we’re saying that there is a certain energy (enthalpy) involved in each pairwise interaction. If epsilon1 = 0, then we have no interaction enthalpy, and we’d expect the distribution of configuration variables (pairs and triplets) to be completely random.

Clearly, in the figure that we have here, the distribution is anything BUT random. It’s like going to a party and finding all the girls on one side, and all the boys on the other side. Strictly like-with-like interactions, except at the boundary edge of the A continent.

If we have a positive value for epsilon1, and an enthalpy interaction term configured so that it is negative in the y1 and y3 (like-with-like, or A–A and B–B pairs), then we’d expect that increasing the y1 and y3 pairs (as fractions of the total y’s) would lead to a more negative interaction enthalpy term. This should lower the free energy.

Visually, if we increase epsilon1, we’d expect to see more and more clusters – that is, bigger and more cohesive clusters. We’d look for a grid topography that emphasizes like-with-like pairing.

Of course, the more that we do that, the more that we’re playing games with our entropy. We’re becoming less “distributed across all possible states,” and more structured. This means that we’d likely be lowering our entropy, or increasing our negative entropy. This would move us away from a free energy minimum.

So … it all depends. Will a strong-enough value for epsilon1 trump the entropy term, and force us into a highly structured situation? Or will entropy dominate?

Just looking at a graph, and seeing the free energy value (for a given epsilon1), we’d have no idea.

There are, however, two distinct ways in which we can find out whether we’re at a free energy minimum or not.

Analytic Results: Part 2

As luck (and some devilishly tricky mathematics) would have it, there actually IS an analytic solution to the free energy equation, for the specific case where the activation enthalpy parameter (epsilon0) is zero; that is when the numbers of A and B nodes are the same. (x1 = x2 = 0.5)

This analytic solution was originally presented in Kikuchi and Brush (1967; see References, below). I’ve replicated the solution and presented the calculation details a few different times. (Again, see the References.)

So … this would lead us to think that for the case where epsilon0 = 0, and epsilon1 is some specific value, we can compute (analytically) the values for each of the configuration variables (the y(i), w(i), and z(i)), and the associated thermodynamic variables – specifically the free energy.

Further, since this analytic solution is based on the notion that we ARE at a free energy minimum (because we solved for the free energy equation set equal to zero), we think that we should trust this number.

Well, it turns out that we CAN get analytically-based configuration variables for a given epsilon1. But whether or not the system is truly at free energy minimum … Well, that’s the next section.

Figure 2: Configuration variables <em>y2</em>, <em>z1</em>, and <em>z3</em> as functions of the interaction enthalpy parameter <em>h</em>, where <em>h = exp(2*epsilon1)</em> — Figure 2: Configuration variables y2, z1, and z3 as functions of the interaction enthalpy parameter h, where *h = exp(2*epsilon1)*

(Oops … forgot to mention. The h-value is a function of epsilon1. Specifically, h = exp(2*epsilon1), and this is used in deriving the analytic solution.)

There IS, however, one thing that we can do with our analytic equation. If we create a pattern, we should be able to back-work to what we THINK the h-value should be associated with that pattern. That is, we use our functions of the configuration variables in terms of h in reverse. Instead of using h to predict what we think we should have for z (and the other configuration variables), we actually count our z’s, look at the graph, and find what we believe should be the corresponding h-value.

Computational Results

You know what I really trust? It’s not the analytic solution. That makes some assumptions about equivalence between terms. Also, when I plotted this out (configuration variables versus h, some years ago), some very wonky things started to happen for large h-values. In the course of creating the analytic solution, we create a denominator term. For certain h-values, this denominator goes to zero. This means that everything in that vicinity of h is going to be massively untrustworthy.

Instead of trusting the analytic solution, there’s something a lot more practical and straightforward: Create the grid. Specify an h-value. Create some starting grid pattern. Obtain the configuration variable values and the thermodynamic values. Note the free energy value.

Then:

Randomly (at least for now), select a pair of nodes. Ascertain that one is an A node, and the other is B.
Flip the two nodes; A becomes B, and vice versa.
Get the new configuration variable values, and the new thermodynamic values.
Do the test: is the new free energy value lower than the previous one? If so, keep the flip. If not, reverse the flip.
Rinse and repeat, for a specified number of times.

Now, this is hugely labor-intensive. Also, the code is complex. Figuring out those configuration variables — all that counting — is neither fun nor easy.

But, after two years of code development (and MUCH V&V testing – code results against manual, so many different ways), I can do this.

So here’s what happens when we have a pattern such as the one shown in Figure 1. I used the analytic function to predict what the h-value should have been with that pattern. This is shown in Figure 2.

Then, just to go out on a limb, I increased the h-value a bit further. Just for fun.

The figure suggested that h = 1.65 would be about right for the Figure 1 pattern; that is, that value of h would correspond to the values for z1 and z3.

Just for fun, I ran the program with h = 2.0, instead of h = 1.65. This was really pushing the “like-with-like” dynamic in the system.

Here’s what happened when I did that, shown in Figure 3.

Figure 3: The same original "rich club" pattern, after multiple steps of free energy minimization. (Note: the free energy came to a near-constant value within relatively few "swapping" steps, and further swaps no longer resulted in a lower free energy.) — Figure 3: The same original “rich club” pattern, after multiple steps of free energy minimization. (Note: the free energy came to a near-constant value within relatively few “swapping” steps, and further swaps no longer resulted in a lower free energy.)

Summing Up

We have something that hasn’t been seen before – actual free energy minimization results in a 2-D CVM grid. We’re seeing that taking a computational approach – creating a grid pattern and then making little, step-by-step alterations, testing for free energy minimization each time, gives us actual results. In fact, the following Figure 4 shows us the evolution of this process.

Free energy reduction over multiple steps of node-swapping, using the original "rich club" pattern as a starting point, with h = 2.0. — Figure 4: Free energy reduction over multiple steps of node-swapping, using the original “rich club” pattern as a starting point, with *h = 2.0*.

We can see that not too many swaps are needed to get to a stable, free energy-minimum system.

Here’s the initial and final grid patterns, side-by-side.

The initial and exemplar final free energy-minimized 2-D CVM grids, starting with a "rich-club-like" pattern where the h-value was approximately 1.65. To push the system towards a low-entropic solution, the free energy minimization was run with h= 2.00. — Figure 5: The initial and exemplar final free energy-minimized 2-D CVM grids, starting with a “rich-club-like” pattern where the *h-value* was approximately *1.65*. To push the system towards a low-entropic solution, the free energy minimization was run with *h= 2.00*.

Here’s an illustration of the resulting configuration variables, in an average over three runs.

The resultant configuration variables, averaged for three different runs of free energy minimization, starting with the "rich club-like" grid whose initial configuration variable values are given in the left-most column. — Figure 6: The resultant configuration variables, averaged for three different runs of free energy minimization, starting with the “rich club-like” grid whose initial configuration variable values are given in the left-most column.

The primary characteristic of this new pattern configuration is that it has greatly increased the relative fraction of the configuration variables that were very low in the starting pattern. As a specific example, the z(3) value increased from about 0.02 to greater than 0.06. The z(3) triplet configuration variable represents the A–B–A triplet, which occurred very infrequently in the original “rich club-like” pattern, and which now occurs much more often after free energy minimization; it is more than three times the original value.

Clearly, the system likes to have a bit more entropy. A bit more distribution among possible states.

What we’re seeing now is just the statistical physics. We’re getting a (first time, to my knowledge) look at free energy minimization across a 2-D CVM grid.

We’re not yet trying to insert this into a neural network, nor are we attempting any form of advanced machine learning work. (However, I’m revising my 40+ page tutorial on Karl Friston’s work, and will use this to illustrate how a model can be applied to a representation of an external state … look for that in coming posts.)

We’re just laying the groundwork, so far.

One more final point. Topography is important. I’ll pick up on that theme in future posts.

Live free or die, my friend –

AJ Maren

Live free or die: Death is not the worst of evils.
Attr. to Gen. John Stark, American Revolutionary War

Key References

Cluster Variation Method – Essential Papers

Kikuchi, R. (1951). A theory of cooperative phenomena. Phys. Rev. 81, 988-1003, pdf, accessed 2018/09/17.
Kikuchi, R., & Brush, S.G. (1967), “Improvement of the Cluster‐Variation Method,” J. Chem. Phys. 47, 195; online as: online – for purchase through American Inst. Physics. Costs $30.00 for non-members.
Maren, A.J. (2016). The Cluster Variation Method: A Primer for Neuroscientists. Brain Sci. 6(4), 44, https://doi.org/10.3390/brainsci6040044; online access, pdf; accessed 2018/09/19.

The 2D CVM – Code, Documentation, and V&V Documents (Including Slidedecks)

GitHub for 2-D CVM Cluster Variation Method (Public Repository)

Documentation / Experimental results showing computation versus analytic: 2D-CVM_Expts-vary-eps0-and-eps1_computational_2019-01-19.pptx (New PPT with experimental results).

NOTE: I’m not making my code for this available just yet. I HAVE put a copy my PPT slidedeck up on my public GitHub repository; the one listed above. It’s got all kinds of juicy things; we’ll discuss over the next few weeks. But not the code. Not just yet.

Let me get a paper or two published, and then we’ll see.

At over 4,400 lines … not trivial.

Alianna J. Maren

Statistical Mechanics, Neural Networks, Artificial Intelligence

Entropy Trumps All (First Computational for the 2-D CVM)

January 10, 2019 AJMaren Comments 0 Comment

Computational vs. Analytic Results for the 2-D Cluster Variation Method:

Entropy Rules the ‘Verse

A Quick Side Note

The Starting Point: A Highly Structured (Low-Entropy) 2-D Grid

Another Quick Side Note – the Z(i) Triplet Values are Approximate

Analytic Results: Part 1

Analytic Results: Part 2

Computational Results

Summing Up

Key References

Cluster Variation Method – Essential Papers

The 2D CVM – Code, Documentation, and V&V Documents (Including Slidedecks)

Previous Related Posts

Related

Leave a Reply Cancel reply

Computational vs. Analytic Results for the 2-D Cluster Variation Method:

Entropy Rules the ‘Verse

A Quick Side Note

The Starting Point: A Highly Structured (Low-Entropy) 2-D Grid

Another Quick Side Note – the Z(i) Triplet Values are Approximate

Analytic Results: Part 1

Analytic Results: Part 2

Computational Results

Summing Up

Key References

Cluster Variation Method – Essential Papers

The 2D CVM – Code, Documentation, and V&V Documents (Including Slidedecks)

Previous Related Posts

Share this:

Related

Leave a Reply Cancel reply