Blog

"Automatic Discovery of Similar Words" – Chapter 2 in Survey of Text Mining II

"Automatic Discovery of Similar Words" – Chapter 2 in Survey of Text Mining II

This post begins a review of “Automatic Discovery of Similar Words,” by Pierre Senellart and Vincent D. Blondel, published as Chapter 2 in Berry and Castellanos’ Survey of Text Mining II. This is an excellent and useful chapter, in that it:1) Addresses the broad issue of computational methods for discovering “similar words” (including synonyms, near-synonyms, and thesauri-generating techniques) from large data corpora,2) Illustrates the different leading mathematical methods, giving an excellent overview of the SoA,3) Competently discusses how different methods…

Read More Read More

Follow-on Thoughts: Clustering Algorithm Improvements for Text-based Data Mining

Follow-on Thoughts: Clustering Algorithm Improvements for Text-based Data Mining

A good night’s sleep is excellent for clearing away mental cobwebs, and has given me more perspective on Chapter 1, “Cluster-Preserving Dimension Reduction Methods,” by Howland and Park in Survey of Text Mining II: Clsutering, Classification, and Retrieval (ed. by Berry & Castellanos). If you will, please recall with me that the Howland & Park work proposed a two-step dimensionality reduction method. They successfully reduced over 20,000 “dimensions” (of words found in the overall corpus collection) to four dimensions, and…

Read More Read More

Survey of Text Mining II: Cluster-Preserving Dimension Reduction Methods (Chapter 1)

Survey of Text Mining II: Cluster-Preserving Dimension Reduction Methods (Chapter 1)

Some time ago, I promised a colleague a review of an excellent book’ Survey of Text Mining II: Clustering, Classification, and Retrieval, edited by Michael W. Berry and Malu Castellanos. Overall, this book would serve well as the basis for a one-semester graduate course in specialized methods for (textual) data analytics. It presupposes an expert’s (or at least a solid journeyman’s) understanding of basic algorithms along with the issues of textual data mining / analytics. Each chapter presents a new…

Read More Read More

Graph Theory — Becoming "Organizing Framework"

Graph Theory — Becoming "Organizing Framework"

Something I’ve been noting — both on my own, and in conversations with Jenn Sleeman , who’s in touch with the academic world at UMBC — Graph theory is growing in centrality as a fundamental organizing framework for many current and emerging computational processes. Specifically, anything more complex than a simple “tuple” (RDF or OWL, etc.), needs to be matched against a graph or partial graph. One good “integrative” paper is Understanding Belief Propagation and its Generalizations by J.S. Yedidia,…

Read More Read More

Non-Equilibrium Information Theory (DARPA group)

Non-Equilibrium Information Theory (DARPA group)

Of possible interest — DARPA group attempting to use non-equilibrium information theory to study mobile ad hoc wireless networks (MANETs). Lots of information theory pubs, not too sure yet they’re really on to what constitutes “non-equilibrium,” worth investigating.

Non-Equilibrium Theory: Basic References

Non-Equilibrium Theory: Basic References

Core References for Non-Equilibrium Theory, and Initial Discussion on Financial Meltdown of 2008-2009, Lehman Brothers, Examples of Webposts Increasing over Time Gathering up two of the most classic sources: Prigogine’s Thermodynamics of Irreversible Processes , and Kubicek and Marek’s Computational Methods in Bifurcation Theory and Dissipative Structures. So here’s an interesting little do-at-home experiment: Study the meltdown of the Lehman Brothers, which started the whole stock market runoff in September / October of 2008. Using Google (which is a horrible…

Read More Read More

Quick Note: Helmholtz vs. Gibbs Free Energy

Quick Note: Helmholtz vs. Gibbs Free Energy

Using this blog as an online set of research notes (about that which I don’t mind sharing ) — suppose that we try using an equilibrium-based approach of some sort for modeling what we all know is a very non-equilibrium world. Which formulation, Helmholtz or Gibbs, works best for us? Helmholtz free energy is at constant temperature and volume. It is denoted as A, where the defining equation is A = U-TS, where U is enthalpy, T is temperature, and…

Read More Read More

Equilibrium and Utility: Two Different Realms

Equilibrium and Utility: Two Different Realms

Continuing with Beinhocker’s Origin of Wealth, it is important to distinguish carefully between some of the ideas that Beinhocker is expounding. While overall, he is doing a good job of bringing in many related thoughts and ideas, there is a slight tendency towards “mushing.” In that note, I’d like to suggest that we discern carefully between ideas involving utility (Origins, hardcover; pp. 34 & 37), and equilibrium. On pg. 34, Beinhocker begins a discussion of how utility is an underlying…

Read More Read More

"Origins of Wealth" – A (Multi-Part) Critical Review

"Origins of Wealth" – A (Multi-Part) Critical Review

Over the last few months, questions of not only wealth and finances, but the underpinnings of our entire financial structure, have become paramount in many of our minds. We — that usually means you and me — and right now means the world collectively — have largely misunderstood the world’s financial structure over recent years. (Those who HAVE accurately understood are not only more secure, but substantially richer by now.) Most of us are current on “what went wrong.” Most…

Read More Read More