OnBlog: The Onboard Informatics Blog

Conversations on the Art and Science of Information

Archive for the ‘Informatics’ Category

DIKW: Data, Information, Knowledge, Wisdom

with 2 comments

Here’s the thing…data is useless.

Now, given what we do—or are at least perceived by the world at large to do—I should probably qualify that, huh? Honestly, though, I think the statement can stand on its own. While data seems like it’s useful, it’s trash, and this fact causes me no end of angst. We’re constantly referred to as “data providers”, even by members of our own team and marketing collateral, but in actuality we do not provide data.

We don’t provide data because it’s useless, meaningless, without value. Data is a collection of unrelated facts, the “product of observation”, with no meaning beyond its own existence. At their most basic levels, our products and services provide information, and go up from there. We’re information providers, and—much more importantly—knowledge providers. If you’re a data-geek, those distinctions and the DIKW “knowledge hierarchy” concepts probably aren’t new and the next bits are going to seem a bit “Applied Information Science 101” to you—go on and bail, my feelings won’t be hurt. If you’re not a data-geek, but interested enough in information science for business or other reasons to be reading this blog, it’s probably a good distinction to start making. Note: if you are a data-geek and you haven’t read Ackoff’s “From Data to Wisdom”—go read that instead of this.

Data:

  • I have three Things.
  • One of the Things is reddish-brown, two Things are grey.
  • One Thing weighs about two tons, one Thing about 10lbs., and one about 20g.
  • One Thing has a trunk, one Thing has a tail, and one Thing has a publicist. Yep, a publicist.

No fair drawing correlations and/or conclusions yet! “Data” is exactly what you see there in that pile. We now have some facts (and even that’s an assumption at this point) about three “Things”. That’s it, no more, no less. That’s data. See what I mean? Useless. So if data is useless, what the hell are we doing? Well, if you take data and apply some processes to clean it, standardize it, and create some relationships between its constituent bits and pieces, you get information.

Information:

 

ID COLOR WEIGHT OTHER
1 reddish-brown 10lbs. has a publicist
2 grey 20g has a tail
3 grey 2 tons has a trunk

I’d argue that this stuff—information—is only mildly less useless than data, but it’s a start. It’s organized and has at least the potential(!) for allowing us to manufacture knowledge from it. It’s important only because if you get this part wrong, then any derivative knowledge is also suspect. Truthfully though, unless you know what you’re doing this stuff is almost more dangerous than raw data (more on that in a minute). The only reason we provide it in this form is because some of our customers have the desire and acumen to manufacture their own knowledge, and just want to make certain that they have the very best raw materials for doing so, and advice on the best way to go about the process.

But that begs the question, what is knowledge? Basically, you take your set of information and apply a cognitive process to it, one which actually draws correlations and conclusions, hypothesizes causal relationships, etc. This is done using a variety of mechanisms, which all boil down to human analysis. Algorithms, models, simulations—at the end of the day it’s just what some human being or a group thereof decided would be a useful way to process information into knowledge, signal from noise.

Knowledge:

 

ID COLOR WEIGHT OTHER RECORD_TYPE
1 reddish-brown 10lbs. has a publicist tabby cat
2 grey 20g has a tail mouse
3 grey 2 tons has a trunk elephant

Well, that’s much more useful! It tells us what each of the entity-instances (records) is, and some of their attributes (fields). Feeling warm and fuzzy, now? Here’s the punchline: this last table, the one describing the knowledge we rendered from the information, which was in turn cobbled together from the data…as described herein, it has the potential to be both incorrect and incomplete. Remember the old adage about “it’s not what you don’t know that messes you up, it’s what you know that isn’t so?” I’m paraphrasing, of course.

So how could our example be wrong? In the knowledge set we drew the, not unnatural, conclusion that #3 was an elephant. What if it’s a Chrysler 300? That fits the available information (grey/2 tons/trunk). It could be something else altogether, though. How might our example be incomplete? In #1 we correctly assessed the “Thing” to be a tabby cat, but failed to differentiate it as Morris the Cat (ergo, the publicist)—a fairly important piece of knowledge, and a conclusion that might have realistically been drawn by a sophisticated enough model. Now take it up a step, to the information. What if the aggregation process failed and the #2 record has the trunk, #3 the tail? Well the probability that #3 is, in fact, an elephant just increased. But maybe #2 is actually Stuart Little, or Fievel. I mean, how many other mice do you know with trunks?

Which brings us to wisdom. Wisdom is basically a local phenomenon—strangely topical given that the focus of recent conversations in the RE.net seems to be revolving heavily around localism as the most significant agent/broker value proposition. I’ve heard it phrased as “local knowledge”. Not to belabor the semantics, but I feel the phrase “local wisdom” is more applicable.

I mean, we have knowledge. From evaluating the information Onboard organizes from the data that we aggregate, I “know” that the schools are “great” in an area, and maybe I can therefore help home-buying parents find a starter home. The local agent, though, can tell them that the HOA for the home they’re looking at just voted in a real PITA who hates kids and doesn’t let them ride their bikes without sign off in triplicate, and that speeding seems to be a problem. You probably won’t find that in our databases. Yet :-).

For the purists, I know I skipped Ackoff’s “Understanding” layer—formally defined as the “appreciation of why”, as opposed to “who, what, where, when, and how”, and nested between knowledge and wisdom. This is by design. First, the common wisdom (loaded word in this context) in information science circles seems to be to steer away from some of the more…metaphysical aspects connoted by his treatment of the subject. Second, if you look at my treatment of the “Knowledge” layer you’ll see that I tend to combine in the one layer both the deterministic processes defined by Ackoff’s version and the probabilistic/interpolative processes he espouses for his “Understanding” layer. I don’t really see the benefit in a separation, and actually feel that the cognitive processes involved are complimentary enough to warrant combination as a matter of course. And if he doesn’t like it, he can just come find me, huh? Battle Royale!

What’s the point of all this? The point is that data is useless, information is only as good as the systems and assumptions used to process it, and the quality of a knowledge set is a factor of both its constituent information and the cognitive processes used to manufacture that knowledge. Ultimately, the way you determine whether a “data provider” is worth a damn is by looking at the people who make up the team which aggregates and organizes that data into information, and whose grey matter and diligence is responsible for transforming that refined information into useful knowledge.

Final note: this shouldn’t be construed as the only legitimate treatment of knowledge management, or as a comprehensive description of our thought processes at Onboard. In many ways this methodology is limited, and doesn’t—at least not intuitively—take into account the dynamism inherent in knowledge of any significantly useful complexity. My intention was to use this as an introduction into the amount and nature of thought that goes into creating knowledge, and identify the sharp difference between a product and “data”. Data may be fungible, but knowledge…is…not. And, knowledge-wise? I’ll put our team up against anyone else’s.

As for wisdom? It’s probably overrated, and almost certainly to remain a uniquely ineffable human endeavor. We’re working on it though. This’ll have to do for now:

Information is not knowledge
Knowledge is not wisdom
Wisdom is not truth
Truth is not beauty
Beauty is not love
Love is not music
Music is THE BEST…
Wisdom is the domain of the Wis

– lyrics from Frank Zappa’s rock opera, “Joe’s Garage”, Act III, Scene XVI

Advertisements

Written by liamdayan

August 20, 2008 at 8:37 pm

Picture This

leave a comment »

Imagine being able to walk the streets of a neighborhood from virtually anywhere.  You get to see what it looks like.  You see who lives, works, and plays there.  Yet you haven’t even taken a step in its direction.  You are viewing all of this from 1,000 miles away.

The ultimate experience may still be science fiction but photo blogging gets us closer than ever before.  Photo blogging simply combines two things;  blogging and photos.  Blogging being the ability to quickly post content to the internet.  And photos being the content most often posted.

Applied to the real estate market this could be especially useful in conveying the look and feel of an area to prospective home buyers.  A scenario I could envision would be a website, or section of your website, devoted to a geographic area (neighborhood, subdivision, community).  A group of trusted users, agents and home sellers, would be able to quickly and easily add photos, videos, and other content about that area.

ShutterFly launched, Share Sites, which makes this possible. 

 Although it suffers from some of the issues with free services (lack of customization, external links) it does provide a model for allowing mainstream users an easy way to

share content.  The New York Times also points out that they intend to expand their service offering by enabling users to upload video and embed the content into any blog or social network.

To illustrate my point, take a 20 minute walk around our office at 90 Broad St.

Written by Ira Monko

August 14, 2008 at 3:40 pm

Why More Data Makes People Happier

leave a comment »

When potential buyers consider what they want in a community, what comes to mind? Are they young hipsters looking for an apartment closest to the most live music venues? Or are they looking for a chiropractor in the vicinity? The priorities consumers have when buying are as varied as the consumers themselves, and it takes a warehouse full of information to satisfy them all.

Local Amenities, one of the most important products and services Onboard Informatics provides for its customers, is the transformation of raw data into meaningful information about the services available in a particular community. Each month sees new additions to the data records we maintain, meaning that the overall picture of a community that can be created is even more inclusive.

Last year at this time, Onboard was supplying 2,221,609 Amenities records to its clients. Currently that number stands at 4,084,253 records, almost double that.  New categories have been added that give a more comprehensive overview of community offerings. If you can think of something you’d want to have in the place you live, chances are that the data is there to tell you whether or not it’s available.

In 2007, if you were a health nut looking for eating and drinking establishments that specifically sold health foods, you were out of luck. But organic food fans, never fear — new Amenities records can help you find the nearest Whole Foods. Working parents who need childcare services? The most recent Amenities data lets you search for nearby Gymborees.

As for education, new data about student housing and vocational schooling has been added to existing records on Catholic, public, and private schools (as well as higher education). In addition to the information Onboard offers about education under Amenities, clients also have access to School Profiles and Reviews. The increase in data records can be seen here, too, since we’ve added about 5,600 reviews over the most recent month.

The more data you have, the more informed you are. As data records in areas like Local Amenities and School Reviews continue to increase, the results can only be more knowledgeable and more satisfied clients. Plus, now they’ll be able to search for the nearest spas in a community — does it get any better?

Written by Tara Powers

August 5, 2008 at 6:19 pm

The Process Behind the Alienation: A glance into the art and science used for a “Best of” story…

leave a comment »

As Onboard Informatics continues to be “the” data provider for publishers, we are proud to be the unsung heroes of the success of this year’s latest and greatest place to reside story.

But how do we go about selecting places to be highlighted by our publisher clients? Well when we aren’t throwing darts at a big map of the United States, while counting our bribe money for letting Gary, Indiana on the list we actually use very exact methodologies that produce the best results possible…

I want to give you a very high level over view of how the process goes; I don’t want to be too specific as to not reveal our secret sauce (As much as I wish, it is not Thousand Islands dressing… L)

So our hypothetical magazine will be Murph Digest, and they want to do a story that will highlight the Best Big City to Live.

In the very beginning of the process is where we unfortunately start to alienate some really great places to live. Sorry Gary, Indiana next year will be your year… But in all seriousness, good screening criteria are vital! It will ensure that the places selected truly represent the focus of the story.. Because our hypothetical story will center on large cities, we will establish a population threshold that Murph Digest considers large enough to consider that place a “big” city. So every city that is left will at least meet the bare minimum in the population field.

Now that we are only left with places that qualify as “big” cities, we can proceed to the next step. While working with the client, we identify what data points they are interested in for their story. We want to know who their readers are and what are their readers are interested in, i.e what market are they targeting. Onboard Informatics experience and expertise is invaluable during this part of the process. Besides the obvious data, we have data on some of the most outlandish things and methodologies for aggregating it that never cease to amaze our clients. So for Murph Digest, low crime rates and the number of divorced women are the only two fields that they are interested in.

After gathering the crime rate data and the number of divorced women for all the places that qualify as a “big” city, we can rank the places. Getting input from the client, we select the best weighting technique (lots of secret sauce here), and we can send over a simple spreadsheet to the client with the data filtered ahead of time. Once the spreadsheet is in the client’s hands, they can play with the weights and base their decisions on their own preferences; and apparently low crime is only worth 10% and the number of divorced women is worth 90% to Murph Digest. I guess their readers are cougar hunters!!!

Written by John Paul Murphy

August 1, 2008 at 3:52 pm

A response to “The Youth Myth: Why It’s Hip To Be Square in Real Estate Brokerage”

leave a comment »

Falling within the demographic group of real estate’s red-headed stepchild, ages 18 to 34, I felt it necessary to defend my brethren from a realtor that clearly doesn’t understand our relevance. Now, I’m no pied-piper, I’m not a leader of men, but I do know bs when I see it. I think that over the next 7 years, 18 to 34 year olds are going to be buying a significant amount of houses; and, subsequently, realtors should pay us more attention than we’re currently getting-which is where Mr. Brady and I differ. . .

[Brady]

You’re just back from Inman Connect? Forget everything you heard there. Chasing the hip, young 18 to 34 market is gret if you’re selling sneakers but could be detrimental to the health of YOUR business for the next 7 years. here’s why:

1- They ain’t got no money.

[Me] I disagree with my grammatically ignorant friend (“ain’t,” really?), to quote Jay-Z, “I’ve got ninety-nine problems but my bank account isn’t one” . . . or something like that. The monetary situation of 18-34 year olds is what it has been for the last few years:

· According to the report, members of Generation Y command more spending power than preceding generations at the same stage of life “because they are well-educated and have higher starting salaries out of college.””

· In 2006: “first-time buyers, the median age was 32.”

· In 2007: 25 -34 year olds bought more homes than any other demographic group

· These unique home buyers are the youngest of the home buying segment, and are the most likely to purchase a home in the next two years in comparison to any other age group.

· The long-term demand for second homes looks favorable because there are large numbers of people buying second homes. “Currently . . . 40.9 million are between ages of 30 to 39. These younger segments will drive the second-home market over the next decade.”

Don’t get me wrong, I’m not saying that my age demographic is the only action in town; but to say we “ain’t got no money” is not only offensive it’s just plain wrong. Maybe we don’t have as much money as some other age groups; maybe we don’t have as much money as Mr. Brady would like, but how much money do we need in order to be perceived as “worthy of a realtor’s time?”

 

[Brady]

2- They don’t trust real estate as an investment. This demographic believes that real estate is either perpetually overpriced or that it is dangerous. Some eschewed the asset class, some leveraged it irresponsibly and lost. It’s not that they don’t trust you because you’re a shady REALTOR, they don’t trust your product.

3- They view you as a functionary. Your value hasn’t been established to them because they haven’t had good experiences with real estate. They see you as an over-priced clerk because they watched you make “easy money’ while they chased the overpriced asset.

[Me] I’m addressing both #2 and #3 together because I think they both reek of bitterness. In #2, I think things are a bit backwards. We trust real estate as an investment; we just don’t trust shady realtors. And I think that the mistrust derives from the difficulties, the dragging of feet, that has occurred within the RE community in regards to accessible community data and housing listings on-line. If it’s not public: it’s a secret, it’s private; it’s trespassing; it’s not cool; it’s irritating to a generation of people that (for the most part) don’t know what the world would be like without the internet. And despite all the red tape:

· Fifty-two percent, of the Generation Y age group think a home is a better financial investment than stocks.

In #3, I think someone sounds like their in desperate need of a hug. If it makes you feel any better, I value you Mr. Brady. I wouldn’t buy a house without you and I don’t think I’m alone:

· 82 percent of Generation Y purchased their home through a real estate agent or broker, more likely than any other age group.

[Brady]

4- They need a lot of education…lots of it. Since old is now new (in lending), the young are basically dinosaurs.

[Me] Do we really need an education? If old is now new in the world of lending (with FHA mortgages—a.k.a the first time buyer mortgage—making a comeback) I have to wonder what age group would be fueling such a trend. Because I would assume, if Baby-Boomers wanted to purchase more real estate, they would walk across their summer home’s beautifully landscaped lawn and shake their money tree . . . No lending necessary!!

[Brady]

5- They really don’t have any “pain”. They’ll be focusing on mitigating losses rather than maximizing wealth. Their “pain” is best served by loss mitigation specialists and not wealth maximizers.

[Me] I can’t believe that such a broad generalization could actually be conceived as sound reasoning.

[Brady]

So…if that’s true, why the hell are you screwing around on Facebook and Twitter? Because the fastest growing user groups on those two social networks are the cheese, baby…the 45-65 age group.

[Me] Erroneous!! One particular month is not indicative of Facebook’s overall growth last year. In fact, if you read the article hyperlinked in Mr. Brady’s post, and check the original press release it references, you’ll see that “The most dramatic growth occurred among 25-34 year olds (up 181 percent), while 12-17 year olds grew 149 percent and those age 35 and older grew 98 percent.” Baby-Boomers were the 2nd slowest growing user group. And Twitter never even came up.

[Brady]

PS: I’m generalizing when I categorize the demographic groups. There are a lot of successful and responsible 18-34 year-olds but your odds are better with their parents until 2015. The cool part is that 80% of your competition will buy into the Youth Myth while you clean up on the Boomers.

[Me] I tip my cap for recognizing the generalizations, but I’m still not sold on the Baby-Boomers being in a situation to buy more homes in the next 7 years. I mean, unless the Boomers have severed all ties with their children, I think their situation isn’t exactly “pain free.” If I’m doing the math right, a Boomer’s kid (depending on their age) is in need of College tuition, help with paying for their wedding, purchasing a home, clothes for grandkids . . . good parents do these sort of things so that their kids don’t have to worry about starting their adult lives in debt and they can (for argument’s sake) buy a house. I’d say that Boomers aren’t going to be looking for a retirement home, investment property, or vacation home for . . . about 7 years from now.

Written by Michael Demetriou

July 31, 2008 at 9:56 pm

Bulldog Pride! …Or Why I Will Never Look at School Mascots the Same Way Again

with 2 comments

bow wow wow

I’ll bet you thought your school mascot was pretty awesome. No other school’s paltry representative could hold a candle to your Mustangs or Lions or Wildcats.

Well, guess what? There are hundreds and thousands and millions (okay, maybe I’m exaggerating) of other Mustangs, Lions, and Wildcats out there. How do I know this? Because I’ve tracked down all of their Web sites.

Well, maybe not all of them (although at times it felt like that). For the past several weeks, part of my work here at Onboard Informatics has included updating our (rather extensive) list of invalid school Web site URLs.

For instance, a link to Willow Grove Elementary School may not lead where it’s supposed to because:

a) there might be a misspelled or incorrect address in our database,

b) the link was correct at one time, but now isn’t because the school district has updated or moved its Web sites, or

c) it actually does lead where it’s supposed to, but takes longer to load and so is coming up invalid.

What to do? Well, since Onboard has the school name, address, and district information available as well, we go out to the Internet to search for that school’s current, working Web site. Oftentimes, searching by district is the easiest way to go about tracking down these schools. Since the data is grouped by district in our file, a group of schools that all come from the same district can be taken care of by finding just one district home page.

But of course, nothing is as easy as it sounds.

For one thing, did you ever stop to think about exactly how many Springfield School Districts there are? (Answer: A lot more than you’d think. One in Massachusetts, Oregon, New Jersey, Missouri, and Illinois, and that’s just the first page of Google results.)

What about those pesky Colorado school districts that follow every normal name with an alpha-numeric code? And don’t even get me started on Missouri and its Roman numerals (Harrisburg R-VIII? Really?).

Then there are the schools that, try as you might, you just can’t find. Maybe they’ve closed, or the district’s Web site really isn’t functioning, or they’re in a rural area whose schools may not have set up Web sites yet. In cases like those, we delete the invalid URL that had previously been misdirecting users, but we leave the field blank — from a data perspective, it’s better not to have a Web site listed for a particular school than to supply an incorrect one.

___________

When the sons of Eli break through the line

All of these school URLs are helpful to have on hand when providing information about the offerings of districts in a particular neighborhood. At Onboard, we have valid, school-specific URLs populated for almost 40,000 of our schools — roughly 33 percent of our total listings. We also have a school or district URL populated for close to 80 percent, or 100,000 schools.

Out of about 36,000 distinct total school URLs that were validated, 3,500 changed their URLs over the last year. Of the remainder, we were able to provide valid URLs for about 15,000 schools for which we previously had no information.

After going through all 36,000+ of those school URLs, we ran the modified data through a check to pull out any links that were still broken — only about 1,000 (a much more manageable number, relatively speaking). And when that 1,000 is compared to the approximately 6,000 invalid URLs we finished with last quarter, that averages out to around 3,500 broken links Onboard deals with over a quarter.

Making sure that the data out there is as clean and accurate as possible is a vital part of what we do at Onboard, and keeping data that is constantly being modified, the way school data is, up-to-date is an ongoing task.

So maybe your school’s mascot isn’t the one-of-a-kind Golden Eagle (or other unstoppable creature) you thought it was, but you can still take pride in the fact that your school has a fully functioning website. Just do all of us data collectors a favor — don’t set all 25 links on your home page to blink simultaneously. Trust me, that’s never a good design…

___________

Quick and Random (and not statistically accurate) Fun Facts:

Most popular mascots — Bulldogs, Eagles, Tigers, Lions
Most unique mascots — Winged Beavers, Atom Smashers, Awesome Blossoms, Fighting Quakers, Cheese Makers
Most “interesting” school names — Slaughter Elementary School, Stalker Elementary School

Written by Tara Powers

July 30, 2008 at 10:41 pm

Data in Real Estate (Part 2): Creating Quality

leave a comment »

Having established a foundational knowledge of data and its application to the geographic sphere of real estate, the ability to determine what sort of data will be most valuable for a company’s business ventures is even more important. In the most basic sense, data quality refers to the degree of excellence in relation to the portrayal of the geographic “phenomena” being examined, all contributing to the data’s fitness for use.

How can we say what makes “good” data? When you talk about a good book or a good movie, isn’t your judgment dependent on certain subjective qualities — interests, mood — that are individual to you? To a degree, yes, but there are also certain aspects that must be present without fail in order for a book or movie to be considered “quality.” A book must be free of unintentional spelling and grammatical errors, for instance, and a movie needs to have clearly identified characters and some form of plot.

The overall quality of data can be thought of in the same way. While the specifics of what makes good data will vary according to the type of data you’re seeking — real estate data as opposed to sports statistical data, for instance — there are non-negotiable elements that apply to data as a whole. Read the rest of this entry »

Written by Tara Powers

July 25, 2008 at 5:58 pm

Posted in Informatics

Tagged with ,