OnBlog: The Onboard Informatics Blog

Conversations on the Art and Science of Information

Posts Tagged ‘data

DIKW: Data, Information, Knowledge, Wisdom

with 2 comments

Here’s the thing…data is useless.

Now, given what we do—or are at least perceived by the world at large to do—I should probably qualify that, huh? Honestly, though, I think the statement can stand on its own. While data seems like it’s useful, it’s trash, and this fact causes me no end of angst. We’re constantly referred to as “data providers”, even by members of our own team and marketing collateral, but in actuality we do not provide data.

We don’t provide data because it’s useless, meaningless, without value. Data is a collection of unrelated facts, the “product of observation”, with no meaning beyond its own existence. At their most basic levels, our products and services provide information, and go up from there. We’re information providers, and—much more importantly—knowledge providers. If you’re a data-geek, those distinctions and the DIKW “knowledge hierarchy” concepts probably aren’t new and the next bits are going to seem a bit “Applied Information Science 101” to you—go on and bail, my feelings won’t be hurt. If you’re not a data-geek, but interested enough in information science for business or other reasons to be reading this blog, it’s probably a good distinction to start making. Note: if you are a data-geek and you haven’t read Ackoff’s “From Data to Wisdom”—go read that instead of this.


  • I have three Things.
  • One of the Things is reddish-brown, two Things are grey.
  • One Thing weighs about two tons, one Thing about 10lbs., and one about 20g.
  • One Thing has a trunk, one Thing has a tail, and one Thing has a publicist. Yep, a publicist.

No fair drawing correlations and/or conclusions yet! “Data” is exactly what you see there in that pile. We now have some facts (and even that’s an assumption at this point) about three “Things”. That’s it, no more, no less. That’s data. See what I mean? Useless. So if data is useless, what the hell are we doing? Well, if you take data and apply some processes to clean it, standardize it, and create some relationships between its constituent bits and pieces, you get information.



1 reddish-brown 10lbs. has a publicist
2 grey 20g has a tail
3 grey 2 tons has a trunk

I’d argue that this stuff—information—is only mildly less useless than data, but it’s a start. It’s organized and has at least the potential(!) for allowing us to manufacture knowledge from it. It’s important only because if you get this part wrong, then any derivative knowledge is also suspect. Truthfully though, unless you know what you’re doing this stuff is almost more dangerous than raw data (more on that in a minute). The only reason we provide it in this form is because some of our customers have the desire and acumen to manufacture their own knowledge, and just want to make certain that they have the very best raw materials for doing so, and advice on the best way to go about the process.

But that begs the question, what is knowledge? Basically, you take your set of information and apply a cognitive process to it, one which actually draws correlations and conclusions, hypothesizes causal relationships, etc. This is done using a variety of mechanisms, which all boil down to human analysis. Algorithms, models, simulations—at the end of the day it’s just what some human being or a group thereof decided would be a useful way to process information into knowledge, signal from noise.



1 reddish-brown 10lbs. has a publicist tabby cat
2 grey 20g has a tail mouse
3 grey 2 tons has a trunk elephant

Well, that’s much more useful! It tells us what each of the entity-instances (records) is, and some of their attributes (fields). Feeling warm and fuzzy, now? Here’s the punchline: this last table, the one describing the knowledge we rendered from the information, which was in turn cobbled together from the data…as described herein, it has the potential to be both incorrect and incomplete. Remember the old adage about “it’s not what you don’t know that messes you up, it’s what you know that isn’t so?” I’m paraphrasing, of course.

So how could our example be wrong? In the knowledge set we drew the, not unnatural, conclusion that #3 was an elephant. What if it’s a Chrysler 300? That fits the available information (grey/2 tons/trunk). It could be something else altogether, though. How might our example be incomplete? In #1 we correctly assessed the “Thing” to be a tabby cat, but failed to differentiate it as Morris the Cat (ergo, the publicist)—a fairly important piece of knowledge, and a conclusion that might have realistically been drawn by a sophisticated enough model. Now take it up a step, to the information. What if the aggregation process failed and the #2 record has the trunk, #3 the tail? Well the probability that #3 is, in fact, an elephant just increased. But maybe #2 is actually Stuart Little, or Fievel. I mean, how many other mice do you know with trunks?

Which brings us to wisdom. Wisdom is basically a local phenomenon—strangely topical given that the focus of recent conversations in the RE.net seems to be revolving heavily around localism as the most significant agent/broker value proposition. I’ve heard it phrased as “local knowledge”. Not to belabor the semantics, but I feel the phrase “local wisdom” is more applicable.

I mean, we have knowledge. From evaluating the information Onboard organizes from the data that we aggregate, I “know” that the schools are “great” in an area, and maybe I can therefore help home-buying parents find a starter home. The local agent, though, can tell them that the HOA for the home they’re looking at just voted in a real PITA who hates kids and doesn’t let them ride their bikes without sign off in triplicate, and that speeding seems to be a problem. You probably won’t find that in our databases. Yet :-).

For the purists, I know I skipped Ackoff’s “Understanding” layer—formally defined as the “appreciation of why”, as opposed to “who, what, where, when, and how”, and nested between knowledge and wisdom. This is by design. First, the common wisdom (loaded word in this context) in information science circles seems to be to steer away from some of the more…metaphysical aspects connoted by his treatment of the subject. Second, if you look at my treatment of the “Knowledge” layer you’ll see that I tend to combine in the one layer both the deterministic processes defined by Ackoff’s version and the probabilistic/interpolative processes he espouses for his “Understanding” layer. I don’t really see the benefit in a separation, and actually feel that the cognitive processes involved are complimentary enough to warrant combination as a matter of course. And if he doesn’t like it, he can just come find me, huh? Battle Royale!

What’s the point of all this? The point is that data is useless, information is only as good as the systems and assumptions used to process it, and the quality of a knowledge set is a factor of both its constituent information and the cognitive processes used to manufacture that knowledge. Ultimately, the way you determine whether a “data provider” is worth a damn is by looking at the people who make up the team which aggregates and organizes that data into information, and whose grey matter and diligence is responsible for transforming that refined information into useful knowledge.

Final note: this shouldn’t be construed as the only legitimate treatment of knowledge management, or as a comprehensive description of our thought processes at Onboard. In many ways this methodology is limited, and doesn’t—at least not intuitively—take into account the dynamism inherent in knowledge of any significantly useful complexity. My intention was to use this as an introduction into the amount and nature of thought that goes into creating knowledge, and identify the sharp difference between a product and “data”. Data may be fungible, but knowledge…is…not. And, knowledge-wise? I’ll put our team up against anyone else’s.

As for wisdom? It’s probably overrated, and almost certainly to remain a uniquely ineffable human endeavor. We’re working on it though. This’ll have to do for now:

Information is not knowledge
Knowledge is not wisdom
Wisdom is not truth
Truth is not beauty
Beauty is not love
Love is not music
Music is THE BEST…
Wisdom is the domain of the Wis

– lyrics from Frank Zappa’s rock opera, “Joe’s Garage”, Act III, Scene XVI

Written by liamdayan

August 20, 2008 at 8:37 pm

Why More Data Makes People Happier

leave a comment »

When potential buyers consider what they want in a community, what comes to mind? Are they young hipsters looking for an apartment closest to the most live music venues? Or are they looking for a chiropractor in the vicinity? The priorities consumers have when buying are as varied as the consumers themselves, and it takes a warehouse full of information to satisfy them all.

Local Amenities, one of the most important products and services Onboard Informatics provides for its customers, is the transformation of raw data into meaningful information about the services available in a particular community. Each month sees new additions to the data records we maintain, meaning that the overall picture of a community that can be created is even more inclusive.

Last year at this time, Onboard was supplying 2,221,609 Amenities records to its clients. Currently that number stands at 4,084,253 records, almost double that.  New categories have been added that give a more comprehensive overview of community offerings. If you can think of something you’d want to have in the place you live, chances are that the data is there to tell you whether or not it’s available.

In 2007, if you were a health nut looking for eating and drinking establishments that specifically sold health foods, you were out of luck. But organic food fans, never fear — new Amenities records can help you find the nearest Whole Foods. Working parents who need childcare services? The most recent Amenities data lets you search for nearby Gymborees.

As for education, new data about student housing and vocational schooling has been added to existing records on Catholic, public, and private schools (as well as higher education). In addition to the information Onboard offers about education under Amenities, clients also have access to School Profiles and Reviews. The increase in data records can be seen here, too, since we’ve added about 5,600 reviews over the most recent month.

The more data you have, the more informed you are. As data records in areas like Local Amenities and School Reviews continue to increase, the results can only be more knowledgeable and more satisfied clients. Plus, now they’ll be able to search for the nearest spas in a community — does it get any better?

Written by Tara Powers

August 5, 2008 at 6:19 pm

Bulldog Pride! …Or Why I Will Never Look at School Mascots the Same Way Again

with 2 comments

bow wow wow

I’ll bet you thought your school mascot was pretty awesome. No other school’s paltry representative could hold a candle to your Mustangs or Lions or Wildcats.

Well, guess what? There are hundreds and thousands and millions (okay, maybe I’m exaggerating) of other Mustangs, Lions, and Wildcats out there. How do I know this? Because I’ve tracked down all of their Web sites.

Well, maybe not all of them (although at times it felt like that). For the past several weeks, part of my work here at Onboard Informatics has included updating our (rather extensive) list of invalid school Web site URLs.

For instance, a link to Willow Grove Elementary School may not lead where it’s supposed to because:

a) there might be a misspelled or incorrect address in our database,

b) the link was correct at one time, but now isn’t because the school district has updated or moved its Web sites, or

c) it actually does lead where it’s supposed to, but takes longer to load and so is coming up invalid.

What to do? Well, since Onboard has the school name, address, and district information available as well, we go out to the Internet to search for that school’s current, working Web site. Oftentimes, searching by district is the easiest way to go about tracking down these schools. Since the data is grouped by district in our file, a group of schools that all come from the same district can be taken care of by finding just one district home page.

But of course, nothing is as easy as it sounds.

For one thing, did you ever stop to think about exactly how many Springfield School Districts there are? (Answer: A lot more than you’d think. One in Massachusetts, Oregon, New Jersey, Missouri, and Illinois, and that’s just the first page of Google results.)

What about those pesky Colorado school districts that follow every normal name with an alpha-numeric code? And don’t even get me started on Missouri and its Roman numerals (Harrisburg R-VIII? Really?).

Then there are the schools that, try as you might, you just can’t find. Maybe they’ve closed, or the district’s Web site really isn’t functioning, or they’re in a rural area whose schools may not have set up Web sites yet. In cases like those, we delete the invalid URL that had previously been misdirecting users, but we leave the field blank — from a data perspective, it’s better not to have a Web site listed for a particular school than to supply an incorrect one.


When the sons of Eli break through the line

All of these school URLs are helpful to have on hand when providing information about the offerings of districts in a particular neighborhood. At Onboard, we have valid, school-specific URLs populated for almost 40,000 of our schools — roughly 33 percent of our total listings. We also have a school or district URL populated for close to 80 percent, or 100,000 schools.

Out of about 36,000 distinct total school URLs that were validated, 3,500 changed their URLs over the last year. Of the remainder, we were able to provide valid URLs for about 15,000 schools for which we previously had no information.

After going through all 36,000+ of those school URLs, we ran the modified data through a check to pull out any links that were still broken — only about 1,000 (a much more manageable number, relatively speaking). And when that 1,000 is compared to the approximately 6,000 invalid URLs we finished with last quarter, that averages out to around 3,500 broken links Onboard deals with over a quarter.

Making sure that the data out there is as clean and accurate as possible is a vital part of what we do at Onboard, and keeping data that is constantly being modified, the way school data is, up-to-date is an ongoing task.

So maybe your school’s mascot isn’t the one-of-a-kind Golden Eagle (or other unstoppable creature) you thought it was, but you can still take pride in the fact that your school has a fully functioning website. Just do all of us data collectors a favor — don’t set all 25 links on your home page to blink simultaneously. Trust me, that’s never a good design…


Quick and Random (and not statistically accurate) Fun Facts:

Most popular mascots — Bulldogs, Eagles, Tigers, Lions
Most unique mascots — Winged Beavers, Atom Smashers, Awesome Blossoms, Fighting Quakers, Cheese Makers
Most “interesting” school names — Slaughter Elementary School, Stalker Elementary School

Written by Tara Powers

July 30, 2008 at 10:41 pm

Data in Real Estate (Part 2): Creating Quality

leave a comment »

Having established a foundational knowledge of data and its application to the geographic sphere of real estate, the ability to determine what sort of data will be most valuable for a company’s business ventures is even more important. In the most basic sense, data quality refers to the degree of excellence in relation to the portrayal of the geographic “phenomena” being examined, all contributing to the data’s fitness for use.

How can we say what makes “good” data? When you talk about a good book or a good movie, isn’t your judgment dependent on certain subjective qualities — interests, mood — that are individual to you? To a degree, yes, but there are also certain aspects that must be present without fail in order for a book or movie to be considered “quality.” A book must be free of unintentional spelling and grammatical errors, for instance, and a movie needs to have clearly identified characters and some form of plot.

The overall quality of data can be thought of in the same way. While the specifics of what makes good data will vary according to the type of data you’re seeking — real estate data as opposed to sports statistical data, for instance — there are non-negotiable elements that apply to data as a whole. Read the rest of this entry »

Written by Tara Powers

July 25, 2008 at 5:58 pm

Posted in Informatics

Tagged with ,

Data in Real Estate (Part 1): Creating Accessibility

leave a comment »

The real estate industry has been affected by the nearly infinite amount of information available through the Internet in the same way that all industries have been. Consequently it is now more important than ever that the information clients receive is accurate and reliable. Understanding the nature of this data and the way in which it is interpreted and coordinated by real estate Web sites can contribute to an awareness of the complexity surrounding such data management, as well as the way in which that complexity is being simplified for the clients served.

But what is data, exactly? Raw data, data normalization, aggregate data — the word is thrown around with such frequency that it may prove difficult to come up with a concrete and coherent definition of such a broad term, even when applied to the real estate field.

Before a company can provide the data its clients want — quality data — it is vital that it has at least a basic understanding of what data and terminology associated with data mean in real estate. Read the rest of this entry »

Written by Tara Powers

July 25, 2008 at 4:44 pm

Posted in Informatics

Tagged with