Thursday, July 17, 2008

Social network aggregation

It would be really cool if there were only one social network to belong to, with the possiblity to tag the kind of relationship to the contacts one has. This is why we now have social network aggregators, which is a good idea to begin with. However, if we think about the content our contacts produce, of which not all would be relevant, given a specific context, I start to think that some intelligent filtering on top of contact aggregation would be helpful. This might also help discovering new, potentially interesting, contacts relative to what my current concerns are.

Tagging is the first idea that comes to mind, however, as any string of characters can be a tag, I feel that some semantic support is necessary. What I am thinking of is some controlled hierarchy with additional cross-links between terms, so any given term would have potentially several paths to some root node. Nothing new here.

When it comes to some well-defined contribution (e. g. a video about a place, a review of some restaurant), this may be not too difficult to integrate, but how about short utterances, e. g. via twitter? If this utterance contains a link, it might be not too difficult if it is possible to extract the kind of relationship by looking at the destination. But how about if the utterance is just some kind of statement like "I just can't stand the weather", or if one utterance is a follow-up of another?

I would be interested in any kind of thoughts regarding the semantic integration of micro-blogging - feel free to comment or to directly get in touch with me! Thanks in advance.

Thursday, July 10, 2008

Virtual worlds - do they help in collaboration?

Today I found an announcement in my mailbox that Google has released Lively where you can create a virtual alter ego (also called avatar) together with all sorts of apparel and virtual spaces. Well, after having played around with it a bit (it's free of charge, being paid by advertisements), I can say that, while it is easy to use, it needs a bit of time to get ready to go, without me really being able to see what the benefit is. Maybe one thing that is missing is a number of templates, involving rooms, avatars, furniture, clothing and the like which would get you started really quickly. But on the other hand, I may be one of these persons that is getting too old in order to really grasp the benefits and perhaps the fun that may be involved in using this kind of virtual space.

But the question was whether virtual worlds may help in collaboration, that is, in a professional context. If I do not see the person I am working with (for example in international project work), I personally do not mind, as I am likely to either know who that person is, or, if not, I am able to at least see a picture via their web page or get some other information via their blog etc. What counts for me is to have an impression how well I can work with that person, and on what basis. In other words, I need to know what language (technical, marketing, research, adminstration) my partner is used to, added by how well I can get along with that person (although personal sympathy should probably not have too much of an influence). So, why do I need the support of virtual gadgets? After all, I am not earning my money fooling around with cool stuff, but working on my projects and getting my tasks done.

Tuesday, June 03, 2008

Personalization and Context

Well, it's been some time that I last posted here. While some say that blogging needs to occur at least a couple of times a week, I take the liberty to post whenever I think I have something to say. This time, it's about personalization, a term that seems kind of fuzzy, as it seems to refer to services that are tailored to the current needs of their users. These needs are related to the current user context (e. g., location, time and wheather), as well as their areas of interest.

To take the example of someone looking for a restaurant, their context will be defined by where they are and what time it is (e. g. looking for lunch as opposed to having dinner), while their area of interest may be partially determined by the kind of food they like.

Personalization and mobility often go hand in hand, as it is rather tedious to enter terms in search fields of mobile browsers. As the needs and tastes of someone are rather not too dynamic, it makes sense to manage them via user profiles, as opposed to context information, which may vary in case of someone that often changes locations.

This said, I am interested in discovering context models and research pointing in that direction. I do explicitly welcome any insights and hints that may help me understand the issue of personalization a bit better.

Wednesday, February 13, 2008

Online Encyclopedias and Reliability

Which is the product name that comes to mind when talking about encyclopedias first? Well, I suppose it is Wikipedia (and not the Encyclopedia Britannica). I also assume that many people my age do have a multi-volume encyclopedia at home, but when did you last use it? As most people nowadays (including myself) do use Google for search, and as Wikipedia entries always show up as the first couple of search results, printed encyclopedias (or their counterparts on CD-ROM or DVD) increasingly seem to be a phenomenon of the past.

There are many criteria by which to judge an encyclopedia, some of which are timeliness, trustworthiness, comprehensibility, detailedness, elaborateness and others. Talking about Wikipedia, whose ideas was that everyone can contribute, correct and expand, a criticism was that you would never know if a given entry was reliable. On the other hand, the timeliness of Wikipedia seems to be unbeatable. Now, even though there is a team of editors that takes care of surveying updates, the question is that if they are really experts to be able to judge whether a contribution is not only correct, but also contains the major issues in a concise way.

Enter Brockhaus, which is more or less the German equivalent of the Encylopedia Britannica, that will unleash their online encyclopedia "for free" to the general public. (As nothing really is for free, this service will be paid via advertisements, and I do hope that I will not be annoyed by flashy banners). Does this mean that Wikipedia and Brockhaus entries will now be displayed side by side in Google (at least for German-speaking users)? Will there be a translated version of Brockhaus articles? What will be the return on investment (assuming that it also means the death for the printed version of the encyclopedia)? What will happen in order to keep the posted articles up to date? What feedback and collaboration mechanisms will there be in order to involve selected readers in an editorial process?

As it seems, timeliness is an increasingly important key factor. While it seems sufficient for ordinary (non-fiction) publications to have an updated version every couple of years, this is not enough regarding encyclopedias. And, last but not least, how to make sure that copyrighted articles in encyclopedias by ordinary publishing houses will not be copied to collaborative encyclopedias such as Wikipedia?

Thursday, January 24, 2008

Motivation and trust

Web 2.0 applications live from the contributions of participating individuals. With respect to services that are considered as a private issue (e. g. writing one's own weblog or participating in writing Wikipedia articles), intrinsic motivation seems to be enough to ensure a regular usage. (Well, this blog is not quite a positive example for this hypothesis - but this may be more related to the fact that my time for coming up with new and interesting thoughts is rather limited. If I were only writing it as a personal diary, it might be different, but I don't think it would be too interesting for the general public). When talking about corporate usage of social software, extrinsic motivation does not really seem to result in employees making more use of the systems that are being offered to them as means to share their knowledge. Some more thoughts about motivation can be found here. Intrinsic motivation may be good, but then why should an employee be willing to post what they know to a public they do not know? I may be willing to share my expertise with people I like or trust (or both), but why go beyond? Why should social software change the current situation of sharing information in an enterprise where content management has not really helped?

With these (and other) questions in mind, addressing the observed situation of selective motivation, it may be helpful to install social software with the possibility to mutually define who should be able to read one's own contributions. This means that by default, the documents that a being written by an author are not visible by anyone else, but the author may invite others to join his personal network. As the network grows, it is likely that not everyone in the list of contacts should have the same visibility with regards to the author's contributions. This means that it will be necessary to introduce levels of privacy, e. g. on a scale between 1 and 5, which may be added to any contribution and any link between two users, which will result in the lists of contacts being segmented with regards to (mutual) trust.

I admit that these are only first thoughts without being further elaborated - but whereas the technical side does not really seem to be a big issue, the question remains how to convince the management to support such a personal information ecosystem. If I have some more thoughts about it, I may come back to this issue later ...

Thursday, November 08, 2007

Applications or data?

When I look at Facebook and the huge amount of applications I can add to my profile, I tend to get lost. Not on my own profile page, which has a fairly limited amount of applications, but when I see other pages with 50 or more applications added. What is this good for? Far from wanting to judge other people by the kind of applications they add (e. g. how active is your sex life), I can only speak for myself. Doing so, I would say that I am focused on those applications which will display my interests (music I listen to, books I read, reviews I wrote elsewhere, postings). At least, this is what I would consider the personal benefit for other people looking at my profile. They will probably not care as much about the services I use, but rather on what output they produce. (On a side note, I really do not know how many applications I am missing because there are just too many. But this also applies to the desktop computer world). But is Facebook really the social network operating system?

Another approach is Google's OpenSocial initiative, a set of APIs to integrate multiple social services. Personally, I am registrated at so many services that I would rather manage my profile and network at one cental place. But this would mean not only access to the data, but also their full integration - which OpenSocial does not seem to support. Nevertheless, I still hope that I will be proven wrong so developers will be able to implement applications integrating data from multiple sources and services which will be of true benefit for the user as they will be able to take my full context into account.

Monday, October 22, 2007

People networks - where to register?

We are in a time of so-called social services being as abundant as never before. Which means that every week there's an abundancy of services with the basic idea to connect people. As it seems difficult to know where to register (I just joined the third virtual media shelf and am starting to loose track on how many services I am registered to at all), I would like to propose the following characterization:

  • Media-based recommendation networks - aggregate books, music and videos you own or like to find people with similar media consumption preferences and have the service recommend other media items or people with similar (media) preferences. Examples are librarything, shelfmates, moviepilot (in German),only to name a few
  • Sequential media recommendation services: specify a preferred media item and have the service recommend you similar items, such as last.fm or Pandora
  • Location-based networks - share ratings about localizable entities (e. g. shops, restaurants, monuments, museums) and use the service as a kind of collaborative tourist guide. These service are often available in web-based and / or mobile versions, and some of them include automatic localization. Examples are qype, qiro, townster.
  • Offer-and-demand-based networks - share your professional experience, your hobbies or your needs in order to find what you're looking for, e. g. a new job, a relationship, a professional (e. g. craftsman) to get a job done, etc. Examples include Xing, LinkedIn, MyHammer (in German), Friendster, not to mention the zillions of dating platforms.
  • (Micro)publishing sites, such as weblog hosting services, twitter or jaiku

I am not sure where resource aggregation services for photos, videos or bookmarks fit into that classification, as the social network and recommendation factor does not seem to be prominent with regards to services such as flick.r, YouTube and del.icio.us.

Also, as this is only a first attempt, I do welcome comments that are geared towards expanding this classification. More specifically, does anyone know of other attempts to classify all these fancy services that claim to be Web2.0?

Thursday, October 04, 2007

Finding what you're searching for

I do not have precise numbers regarding how much the amount of information on the World Wide Web has been increasing in the past couple of years, but from my searches I get the impression that the same information is available in an exceedingly high number of copies (so to say), be it news, frequently asked questions, or product information, just to name a few.

Every now and then, the magic word natural language search pops up, producing about 104 million hits on Google if I just type in the words, and still 226.000 hits if put into quotation marks - which is way too much to handle. Others have written about the topic before, so I am not going to repeat what has been said before, but the question is what can be done to find the information in something that seems like a huge haystack of Web pages.

Why is everyone using Google? Because it actually does quite well, and I can second that I mostly do not have to go beyond the first couple of result pages to find the information I want. If it does not appear, either my search turns out to have been too unspecific, or the information is not available at all.

Enter startup companies such as Powerset that claim to revolutionize search. I doubt that, given that it is hard to extract any semantics from most searches, which do only contain about three significant search terms or less. I would assume that natural language search may be able to yield decent results, but is the benefit (from the point of view of the users) really as significant as claimed? I am not so sure about this.

The problem is not that our search technologies are not good enough. It is that there is too much information to search within. So, I suspect that the future lies in dedicated search engines for specific domains (e. g. news) rather than a new universal search engine.

Friday, September 07, 2007

The Future of (IP)TV

The start of internet-based TV (IPTV) is often claimed to be as much of a step forward as the introduction of color television. The most significant change, from a consumer's point of view, is the potential use of a backchannel, turning a former broadcasting device into an interactive media center.

Two alternative approaches are known: on one hand, the set-top-box based delivery, on the other, P2P based interactive television. The former seems a suitable way to sell high speed broadband connections for telecommunication providers, the latter is yet another attempt at bringing P2P platforms to a wider clientele, with competitors such as Joost, Babelgum or Zattoo. (A posting comparing these three P2P platforms can be found at ReadWriteWeb).

Perhaps it is to early to say who will be winning the competition in the long run, but the following factors seem to be essential for IPTV, whether P2P or not:

  • Attractive and high-quality content: In order to substitute and / or expand regular TV, partnerships with both traditional broadcasting houses and niche providers of video content is a prerequisite to offer a decent selection of (streamed) media. This is also a great opportunity for professional content producers. However, it is important that the main focus should not be user-generated content (such as YouTube), although this may be offered as an addition.
  • Audio and video (technical) quality: This should be considerably superior to PAL or NTSC standards, otherwise there is no point for end users to give up conventional TV
  • Extensible widgets on the software that delivers content and functions to the customer: Like Facebook that opened its interfaces for third-party application providers, additional functions that enable interactivity may help turning TV into a collaborative experience. More precisely, this is essential for any kind of personalized content delivery that suggests specific programs based on past viewing or permits user-triggered suggestions (e. g. forwarding of a program to specific user groups as a suggestion). It also permits IPTV providers to focus on developing their core platform while remaining open for future development.
  • Integration of external information sources (such as news portals, weblogs, discussion boards) via RSS fields, with the option of filtering the currently delivered feed against the characteristics of the currently delivered media stream - call it personalized aggregation of multimedia channels
  • Intelligent filtering and forecast: The more diverse the delivered content (e. g. number of broadcasting channels), the more efficient the filtering and search mechanism need to be. While personal recommendations require some kind of user profiling, quick access to all available content needs to be ensured via an efficient combination of search and navigation techniques.

IPTV has the potential not only to substitute broadcasted TV, but may also offer media distribution to a broader public (such as video on demand) and turn uni-directional viewing into a communicative experience. However, there is still quite a way to go in order to be a true competitor for the mass markets.

Wednesday, September 05, 2007

Local portals: bridging the gap between reality and virtuality

How do you find special offers in your town? You either know what store you want to go to (either in reality or via their web page), or you could just go and see (trial and error), but where can you possibly find specific products or services? For the former, you have eBay, where you can look for any goods which are either offered via auctions or static (instant) purchases, assuming that the provider of the product offers some means of sending his products to the requesting customer. For the latter (services), you have auctioning platforms such as my hammer, bringing demand and users together. But what about the real word, e. g. you have a need for a product that you would like to buy locally, wanting to find the best value for your money (which does not necessarily mean the lowest price)? Local platforms such as Qype may help in finding opinions regarding local places, but although it is possible for stores and service providers to enter their own information, they seem to be reluctant to do so. Perhaps this is so because of lack of time, or fear of being bashed, but I am not sure. My feeling is that these platforms can help in raising awareness for local product or service providers, but their task does not seem to be to answer specific demands.

Entrepreneurs of small businesses often do not have the time nor the money to invest in their own web presence. For them, some framework where they can easily place their products, specific offers and background information can be helpful. On the other hand, before investing their time (and possibly, money), they need to be sure that such a platform will not only help them in improving their business (i. e. more customers), but that they can also trust in the platforms' persistence and reliability.

One example in Germany, targeted at just that is CityPedia (not to be confused with the British platform bearing the same name). Of course time will tell whether it will succeed in attracting businesses and private users alike, but I am quite confident that its founders will succeed provided they manage to emphasize the benefits and potentials of their platform for both businesses and end users.

Friday, August 31, 2007

Local portals - what's the benefit?

If you look for places related to a city or town (e. g. sights to see, restaurants, shops, businesses), there are basically three approaches: either you use your favorite search engine and click on one of the result lists (trial and error), you know their website and load their own presentation, or you use a user generated content triggered service (I am refusing to use the term web2.0 deliberately) such as Qype to see whether there is some contribution about that place by some user.

The first approach will possibly lead to a small number of pages that may have some helpful background information, but the majority will just have the address and possibly a link to their homepage (if available). The second approach will only work for a small fraction of those companies that have the money or manpower to have their own web presentation (and you don't really know whether it's trusted information or more like a biased advertising). Finally, the third approach depends on whether other users have contributed to that location or not.

My personal strategy is (in that order) 3, 1 (for additional information I may have missed) and 2 (for restaurants, I would like to see their menu before actually going there). To me, the disadvantage is that I have to look in different places, especially relating to option 1, and that information is often duplicated (even if the wording is different). Instead, I would like to have the information I am looking for aggregated in one single place.

For entrepeneurs that are thinking of establishing yet another local information page, I suppose that the goal for success is that they have a clear understanding of what kind of information they would like to offer, and in what context they would like to see their offering related to other services. While I understand that competition is helpful for some time (taking the example of location based information services), it does not really help the user to have to navigate through several offerings that depend on user content and share more or less the same functions.

In summary, assuming that there is some truth behind the buzz phrase that content and context is king, it has to be made really clear to investors, shareholders and end users what this really means. For instance, one argument could be reliability and trustworthiness. Second, any information service related to the real world should provide a link between virtuality and reality that will provide added value. If your users will access your service on a regular basis, this may be an indication that your proposition is working.

Thursday, July 19, 2007

Mobile Tagging

Apparently a white paper regarding mobile tagging was just published, but somehow it is not available currently. That seems like a new term, somewhat confusing to me, because tagging means to place a tag (i. e. a word) to describe a resource (most often, a piece of text). On the other hand, mobile tagging means to decode a barcode (either EAN13 or 2D) via a mobile phone and use this ID to access information regarding the "tagged" resource.

Well, the first benefit seems obvious, namely the improvement of user interaction, as everyone knows how tedious it is to enter URLs on a mobile device. The second one relates to bridging the divide between the real and virtual world: any resource in real life can be associated with a tag, and any function can be associated that takes the resource's id and performs an action, such as retrieving information or placing an EBay auction, buy a concert ticket or initiate a media download - the possibilities seem endless.

Wednesday, July 18, 2007

Exodus from Second Life

I should start these thoughts by admitting that I have never been a user of LindenLab's Second Life platform, nor do I intend to use this service. Why? Simply because I have the impression that it is all about a virtual environment you can create, without having an equivalence in the real world. Already some time ago, there were comments of the platform being increasingly unstable. It has been observed that the computer manufacturer Dell and hotel chain Starwood already left their virtual islands due to a lack of visitors. But honestly, what could be the reason for me to visit a company's virtual other self? If it's about an Internet-based service, I would probably have a look at their web presence, but what additional use can a virtual island have? I honestly have no explanation.

When it is about community building and information / experience sharing, then virtual communities are a great means of communication platform between singular end users. But when it gets too comercial, then this may mean the beginning of an exodus, such as seen with SecondLife. Of course, that's a personal opinion, but if I want to buy something I will possibly go to the relevant internet shops or auction platforms.

The remaining question, then, is whether the users are going to come back. In June, the number of active users has been decreasing by 2.5 percent, where out of 8 million registered users, only 40.000 are active at peak times. As I read it, the companies' exodus was a consequence of the decreasing number of users. Thus, if this trend continues, more companies are going to leave their virtual residences.

Another way to perceive this, however, is that the companies' virtual representation was not attractive enough for end users. But is this really the case? I don't know, but would welcome any further thoughts on this.

Bottom line, however, is the challenge to link the real to the virtual world in order to achieve a blended experience for end users.

Monday, July 02, 2007

Cooperative Tags

Tags are a great facility to assign meaning to contributions. They are very useful when it comes to personal information management. However, they are problematic regarding collaborative information management - every user will generate different tags relative to their experience, intentions, etc.

Much has been written about the issue of folksonomies, taxonomies and tags. I do understand that taxonomies are not considered very user-friendly - but on the other hand, the evolution of folksonomies does not seem to be very goal-directed. Which goal, you may ask? To help find information based on specific concepts that may come to mind.

The most well-known approach is to assign user-specific labels to a resource, or a chunk of information. For very popular resources, you may end up with lots of different tags, while other resources may end up with only the tags its author had thought of.

Today I discovered an interesting approach to help keep folksonomies tidy by adding a means to rate them (seen on MovieLens). By rating the appropriateness of a tag (on a scale between 0 and 1), based on a sufficiently large number of users, the so-called "wisdom of the crowds" should lead to an improvement of supplied tags. As tags are always relative to a tagged entity, the other question that should also be addressed is how to appropriately monitor tag evolution. Should inappropriate tags (i. e. whose rating relative to a resource does not exceed a given threshold) be automatically removed? Should tags which are considered as very useful be added to a tag dictionary?

Last, but not least: when combining taxonomies and folksonomies, what should be done to relate them, e. g. should there be associated (recommended) tags for a term in the taxonomy?

As I am surely not the first to raise these questions, I would welcome any feedback on this issue.

Thursday, June 28, 2007

Keeping track of community services

The current hype are communities with user-generated content - starting from pure weblogs with their blogrolls, media sharing sites (flick.r, youtube), bookmarking services (del.icio.us, Mr. Wong), recommendation sites (Qype, DaWanda), genealogy sites and others. (I'm sure someone out there must have a more comprehensive overview). Some of these services intentionally try to look flashy and innovative (especially the ones being implemented in Flash). But how do I really keep track of all these services I am subscribed to (and I am not going to ask how to have the time to take care of all this)?

Enter Facebook, which is called a social utility by their creators (for some more information look into mashable, TechCrunch and Wikipedia). Actually, it'a kind of service aggregator plus social network, which may help structuring the own set of subscribed services.

Like most of the cited services, this one also lives from advertisements - but I am not sure what the consequences of alternative ad revenue models such as pay-per-action will be. While it seems relatively easy to find potential investors for services which label themselves as Web 2.0, only time will tell if the revenues will be sufficient in the long run, especially if there are many competitors.

Thursday, June 21, 2007

Ratings and Trust

Some online communities offer some kind of rewards, e. g. points that one can collect (associated to specific actions like writing a contribution, sharing one's knowledge, number of contacts). Assuming that everyone acts according to fair principles, there seems no problem with that. However, taking the example of rating other people's contributions, it is not uncommon for people to generate some specific accounts from where they will rate their own contributions, however under another user account (associated to themselves, which the platform is not aware of).

There seem to be several approaches to cope with this issue:

  • Require full address upon registration together with phone number in order to check it against phone listings. This has the disadvantage that not every potential user may be listed in some given directory (e. g. students sharing an appartment, where one phone is shared among several people)
  • Require first and last names at registration (with the possiblity to choose a nickname for users who do not want to unveil their identity to the general public).
  • Require passport or ID card upon registration. This requires a mechanism to verify users given their passport number.
  • Only allow one account per email address. Of course, I may be generating a large number of email addresses to circumvent this, but still it may help
  • Require user photographs for any active account. But, on the other hand, how would it be possible that a photo really shows the user, and not some other person?
  • Allow only active accounts, where activity relates to productive actions (such as writing a contribution). That is, remove accounts whose users have not been showing any social interaction with their peers for a given time (e. g. a week, a month)

As I am only starting to think about valid mechanisms to ensure a community of trust, I welcome any ideas that expand on my own thoughts.

Thursday, June 14, 2007

More semantics for social networks!

Profiles are difficult to generate (you mostly need metadata). And profiles are difficult to match (takes a substantial amount of computing power). That is why we see a large variety of services on the Web which work based on communities, such as Qype. So far, it's been German only, but today, Qype has their UK launch party, so I'm quite excited about how this interactive city magazine will evolve. Recommendations in Qype and elsewhere are, then, based on who your friends or acquaintances are. That is, if you trust someone to write good reviews and add that person to your list, then your're regularly updated on what that person writes about.

What's even better is that some services, including blogs and other regularly updated sites that you like reading, have RSS feeds, which you may nicely feed into twitter - and then, you can get a mixture of interesting contributions (based on your "profile") on your mobile phone, wherever you are.

Besides these technical issues, what I learnt is that if you base your service on virtual communities, they need some real equivalence on one hand (i. e. meeting the people in real life, or at least some of them, that you like by their contributions). On the other hand, you need to take care of your community by giving them some motivation to stay tuned.

These are all nice experiences, but what happens if your personal network grows too big? Then you probably need some more semantics which contributions to feed you first. But I am only starting to think about all this, so any thoughts on this are welcome!

Monday, June 04, 2007

Geotagged media

Well, the association of resources with geocoordinates, also known as geotagging, is not new, but one of the reason for me to write about Panoramio, a picture-sharing platform where photos are associated with locations, is its recent announcement of being taken over by Google (after already having closely cooperated). While Google does not yet reveal how this service will be integrated, it makes a lot of sense also from a user perspective to make Google Maps a more personalized experience. Looking at the world in hybrid mode (satellite pictures) is nice, but they're not up to date. And considering the effort that Google is taking in photographing the world (well, at least some cities, as it seems), why not take advantage of the Google community taking digital pictures and uploading them for other users to look at?

Of course it's all about gathering user-related information, and I assume that it will remain one of Google best-kept secrets of what they will actually do with all this data. Having Google Mail, Google Documents and now something that might be called Google Media, user context (time, location, interest) becomes as enriched as you could possibly think of. But hey, you are not forced to use any of these services, right?

Anyway, if pictures can be geo-tagged, any other kind of resource will also do. Videos, mash-ups, "office" documents, newspaper articles, podcasts, blogs - anything that may exist in digital form. And if you think mobile, you may get all these geo-tagged resources while walking by some marked location, or instantly leave your own photographs or videos right after shooting them.

Oh, I forgot about Orkut. This may be the foundation to share digital resources among persons you directly relate to, as an alternative to writing your friends a postcard or SMS from your vacation.

I am sure this is only the tip of the iceberg - many more possible scenarios I have not been thinking of yet ...

Wednesday, May 30, 2007

ReCaptcha, digital libraries and OCR

Over 10 years ago, digital libraries were a hot research topic. Back then, I participated in a number of projects, which finally led to my dissertation. Now, many attempts are known to make books available in digital form. For those sources which are not available in digital format, the only way seems to be OCR. Unfortunately enough, it is subject to errors, which cannot always be corrected automatically.

Enter ReCaptcha, a collaborative approach which helps preventing web sites from spam (comparable to Captcha, but with real words instead of just a bunch of characters. The idea is to present two words to the user: one of them was correctly identified via OCR, the other one produced an error. Assuming that someone who is able to correctly identify one of the words will also be able to produce a correct identification to the second one, the side effect is that the set of incorrectly identified words (via OCR) can dramatically be reduced (as a side-effect to the original purpose of Captchas). Here's a demonstration of how ReCaptchas work.

And for those who would like to use ReCaptchas, Google Code offers plugins and libraries for the reCAPTCHA API. Well done!

Monday, April 16, 2007

Personal recommendations

Regarding recommender systems, the most familiar are known to be related to audio content, i. e. personal radio. For instance, there's Pandora, which works based on manually generated metadata, and it also explains why a title was recommended. On the other hand, there's Last.fm, subtitled the social music revolution, which apparently uses some kind of collaborative filtering (a similar approach is already familiar from Amazon). In both cases, the service "learns" from user ratings to better serve the end user. Other recommendation approaches rely on communities or content analysis.

Now there may be several criteria regarding popularity of recommendation-based systems, but those that are really based on such a feature (unlike Amazon, which uses that as an additional feature to better serve their customers) seem to be dependent on two core issues: the quality of their recommendations, as experienced by the end user, and the required effort to handle available content (e. g. metadata management).

If IP based entertainment services are to succeed - as compared to good old radio or television -, personalization seems to be a must. I am sure that broadcasting companies, many of which are working on providing their programs in digital archives already, will understand this as an added value and, possibly, an opportunity to generate additional business.