Fred Wilson has a great post on his blog this morning about the semantic web (Making The Web Smarter). Beyond the mention of my company InfoNgen, it also provided an interesting perspective on the how the web is evolving in practice. This is a subject I’m passionate about, so I couldn’t pass up the opportunity to throw in my two cents.
With InfoNgen, I spend a great deal of time thinking about potentially new and innovative ways to analyze and classify content – including a broad range of web based content. Without a doubt, the research going on around the semantic web is some of the most interesting in this field. While there has been some really exciting progress in applying this research to many constrained information domains, creating this self-describing, intelligent network of information on an “internet wide scale” is still an incredibly daunting task.
And as Fred points out, it isn’t one we are making a lot of progress in.
I am struck by the similarities between the efforts happening here, and the work that took place from the 70′s to the early 90′s in the field of artificial intelligence. In computer circles, A.I. was the cutting edge discipline of it’s day. Until the arrival of the Internet, it was a magnet for creative engineers and scientific talent. People saw it as the next great revolution in technology. Encouraged by successes like chess playing computers that could beat grand masters and medical expert systems that demonstrated real value in clinical situations, expectations were high that we would soon see computers that would be able to interact with us conversationally – personal assistants that could carry out spoken directions and provide us with relevant advice and information. This video – done by Apple in 1987 – is a great example of what people were hoping computers would soon be able to do for them:
It’s more than 15 years later, and we’re still a very long way off from the promise shown in this video.
Today’s efforts to create the foundation of the semantic web are in some ways like a reemergence of artificial intelligence – but now repackaged for a web centric world. Many of the concepts and technical disciplines that were sitting behind A.I. – Bayesian inference, natural language processing, weighted decision trees, classifiers, and knowledge bases just to name a few – are now in some form or fashion powering various commercial and open efforts to realize the semantic web. And while they do share a common set of technologies, that doesn’t mean they need to share a common fate.
But to be successful, things will need to start coming together in a different way.
This time around, these technologies will need to leverage the core social fabric inherent in the web architecture. Analysis needs to be pushed out to the edge and become an integral and interactive part of the content creation process. This would not only be able to suggest tags or other meta level markups, but also offer potential summaries for quick display, highlight ambiguous terms or content blocks for refinement, and suggest unique topical terms that could be included in the content to improve discoverability. The human generated editorial insights that exist in trusted content sets need to be leveraged to mine for relationships in other content sets that exist more broadly. (Fair use/copyright law will need to be updated and clarified keep up with innovations in this area.) Most importantly, the creation of public databases, taxonomies, and ontologies need to become a priority for open source efforts, potentially leveraging a DBpedia style model of publication and quality control. Freely available datasets will be the fuel that powers many of these efforts going forward. Overall, any successful approach here needs to blend the things people do well with technologies that can amplify and extend it, producing something neither could accomplish well on its own.
With all of that said, I’m not naive. I don’t believe we will ever have a truly global, harmoniously classified semantic web. There are simply too many perspectives to rationalize in a way everyone can agree on, and too many people looking to game the process for their own gain. The Utopian model discussed academically is really an idealized goal that isn’t achievable on a practical level. But I strongly believe that it will be possible to offer to the broad web community the same improved web experience currently provided by vertically focused solution providers like InfoNgen. Meaningful progress at this level will require more than the isolated technological breakthroughs of any single company or organization. Though it can be anchored around the same core semantic concepts, getting the scale and scope needed to succeed here will require some kind of cooperative framework to share and enhance the currently disconnected efforts and innovations that are taking place today. Without having some mutually beneficial relationship exist between the various commercial and open sourced initiatives, it is likely that the global semantic web will end up hitting the same kind of wall that the original efforts in A.I. did.
While a technical discussion of the various solutions in this space may be interesting, the end goal of the semantic web is to make it easier for for individuals and organizations to discover and apply information that is relevant to them. This means that access to content needs to become more flexible, and conform to the variety ways people may think about it and want to consume it. This is in sharp contract to the traditionally rigid way publishers have wanted to package and present it in the past.
None of this will be easy, but getting publishers to embrace this kind of change may be the biggest challenge of all.
If the semantic web is really going to take off on a large scale, it needs to happen first on a small scale. Semantics need to become an everyday part of the way individuals deal with information at a personal level.


