Patent – Knowledge Discovery
Multiple Representation Levels Key to a Knowledge Discovery Architecture
Patents introducing the Knowledge Discovery Architecture with Seven Representation Levels
These two patents introduce the seven layer knowledge discovery architecture: key to figuring out what processing steps are needed to get the results you want.
The Knowledge Discovery Architecture has seven distinct layers; five are representation levels and two are feedback control loops (one larger and outer, the other smaller, inner, and faster).
- A. J. Maren and S. V. Campbell, Knowledge discovery method with utility functions and feedback loops; US Patent US 7333997 B2, online at Knowledge Discovery System
- A. J. Maren et al., Knowledge Discovery System; US Patent Application: US 20050278362 A1 (2005), online at Knowledge Discovery System
A Useful Way to Think About Knowledge Discovery and Text Analytics
Text analytics is a tough problem because it requires using several very different kinds of technologies. This means that text analytics, done right, is an interdisciplinary task.
NOTE: The following talks about seven levels of knowledge representation, where Levels 1 – 5 are in the knowledge discovery architecture itself, and Levels 6 & 7 are “feedback control” levels. In more recent writing, particularly for the Northwestern University PREDICT 453 Text Analytics course, I collapse this down to a much simpler three levels, corresponding with common notions of statistical, syntactic/semiotic, and semantic.
We can figure out how complex a text mining algorithm is – and also how much processing time it will take – by thinking of the algorithms as working on different levels of processing. (To be precise, we should use the artificial intelligence term, and call them representation levels.) There are five basic levels:
- Level 1: Find and identify the entities and concepts in a document – extract the concepts and entities, and then match them to a dictionary of concepts and entities – see if they are already known, or if new ones are showing up,
- Level 2: Find out which concepts and entities show up near each other – often, simple neighborhood proximity is a strong clue that two entities and/or concepts are related; this is another statistical process, and while more complex than Level 1, does not require very difficult processing,
- Level 3: Find the real relationships between entities and concepts that are close to each other – SENTIMENT ANALYSIS – this is a much more difficult task, and requires syntactic (not just statistics-based) algorithms; done right, it produces relationships that can tell a lot about how people feel about people and things,
- Level 4: Give context to what we’ve extracted – sometimes, there are important clues in supporting information, such as geographic, demographic, and financial data, along with images and videos that are connected to a topic, and finally
- Level 5: Make sense by connecting everything to your world-view – concepts and entities, which are really just words, make sense when they are related to each other by a backbone of ontologies and taxonomies that express your understand of what-connects-to-what in the world. This gives meaning and a reference frame for your text analysis results.
From this description of Processing Levels, it is clear that sentiment analysis – a Level 3 process – is much more complex than a simple Level 1 (entity extraction) or Level 2 (entity association) process.
Since each processing level is about an order-of-magnitude (ten times) more complex than the preceding level, then it makes sense that sentiment analysis is about 100 times more complex than simply getting the names out of a document.
In fact, sentiment analysis is even more complex, because without Level 5 (making sense of the world), it just doesn’t work. So sentiment analysis – done right – is one of the most complex tasks around.