Tag Archives: semantic web

Challenges in linked data

I referenced recently Tim Berners Lee’s encouragement to everyone looking to publish linked open data to use the Resource Definition Framework.  I also referenced in this blog recent work completed by the New York Times in this field.  The New York Times initiative has attracted an amount of comment in the technical community identifying the teething issues/ errors in this data as published.

Stefan Mazzocchi’s recent post, Data Smoke and Mirrors, speaks to some of the issues associated with publishing lots of linked data using RDF.  Stefan has reviewed a triplification of all the data from data.gov – and has been left somewhat bemused.  The posting itself provides some examples.

The point here is that we want to see the data published, we want to see the standards used – but it’s far from simple and publishing for the sake of publishing or triplifying for the sake of triplifying may be self defeating.  As a community we need to focus on quality and the end user of the data.

semantic web and the subprime crisis

Nice piece by Michael Cataldo outlining potential benefits of semantic web – in terms of making it easier to access data on the web and cross reference/ correlate the data.  Michael makes the point that fuller adoption of semantic web principles at an earlier date may have assisted in preventing some of the elements of the subprime crisis.

I am very much a fan of the semantic web and indeed of the movement towards linked open data.  However it is interesting to read reports of Tim Berners Lee’s own frustrations wrt advances in linked open data e.g. the fact that data is being published on data.gov in non RDF formats (thereby limiting the ability of people to browse from this data to other RDF marked up data).

I think Michael Cataldo, in looking to demonstrate potential benefits of semantic web, may be stretching things a little far wrt the subprime crisis – were people motivated to make the data easily understood or was obfuscation not part of the intent?

Thinking about the scope of semantic web

Read an excellent summary paper by Mills Davis of Project 10X.  Interesting description of the ‘notion’ of semantic web: The key notion of semantic technology is to represent meanings and knowledge (e.g., knowledge of something, knowledge about something, and knowledge how to do something, etc.) separately from content or behavior artifacts, in a digital form that both people and machines can access and interpret.

Would recommend the summary paper to anyone looking to gain an insight into the semantic web 3.0 and its potential.

Semantic Web 1: Semantics – what is an ontology?

To a computer, the Web is a flat, boring world, devoid of meaning. This is a pity, as in fact documents on the Web describe real objects and imaginary concepts, and give particular relationships between them. For example, a document might describe a person. The title document to a house describes a house and also the ownership relation with a person. Adding semantics to the Web involves two things: allowing documents which have information in machine-readable forms, and allowing links to be created with relationship values. Only when we have this extra level of semantics will we be able to use computer power to help us exploit the information to a greater extent than our own reading.

- Tim Berners-Lee “W3 future directions” keynote, 1st World Wide Web Conference Geneva, May 1994

hen we speak of web 3.0 and the semantic web we focus on computer processing/ understanding of web content.  Currently web sites are ‘marked up’ to make them easier for us as readers of the site to follow them.  Using HTML certain text is marked as a ‘header’, certain text is marked as ‘bold‘, as indented, etc.  All of this facilitates us, as humans, in reading and following/ understanding the content.  But, more importantly, we understand much of the content based on our own knowledge, the context of each phrase/ sentence, etc.

So, how much of this data on the web could be processed (‘understood’) by computers, analysed and presented back to us as humans in a useful format e.g. categorised, annotated, summarised, ranked, etc?  Broadly there are two possible ways forward: software which can figure out what the content is about (Natural Language Processing etc.) or some additional ‘marking-up’ of the content – to flag what specific terms/ words/ phrases mean.

Natural Language Processing is a major area – huge rseearch completed and ongoing, major advances made over the years.

On the mark up front there have also been significant advances and product offerings.

One core element in all of this computer processing/’understanding’ is agreement of the meaning of terms/ concepts – hence the use of the phrase ‘semantics’.  We are all familiar with the phrase often used in trying to resolve/ advance arguments: ‘it’s a question of semantics’.  Generally the intent of the phrase is to say that the antagonist and protagonist agree conceptually but that much of the disagreement is accounted for by misunderstanding/ different understanding of the terms being used by either party.

Dealing with concepts, their relationships and meanings is addressed using ONTOLOGIES.  The semantic web has given rise to a whole field in the development, publication and maintenance of ontologies.  Rather than trying to explain ‘ontologies’ in detail here I think this short video – focused on introducing ‘biomedical ontologies’ – does a great job of explaining the concept of and use for ontologies.

Practical use for semantics

We spend a great deal of time ourselves online trying to find information, comparing and contrasting data from different web sites.  A number of us are well used to using sites such as www.kayak.com to assist in checking out travel options.

Read an interesting piece on www.cazoodle.com.  For now offering comparison shopping re electronic goods and apartment rental (in US).

Authors claim to be using the power of their semantic search engine to extract the relevant data from multiple sites to present detailed product purchasing options and comparisons.  In presenting the apartment data they include very good mashups to present the locations.  In the case of electronic goods still seems to me that there is a lot of scope for variation in the additional items e.g. additional memory for a camera.  However, even allowing for this, certainly shows the power of applications which can process data presented on web sites – and that is a basic objective for web 3.0/ semantic web.

Reblog this post [with Zemanta]

Driving success of semantic web

Printing press from 1811, photographed in Muni...
Image via Wikipedia

Read an interesting survey re traction around the semantic web.  Listed a number of barriers to adoption of semantic web:

  1. organisational culture
  2. the complexity of the technology
  3. a general lack of experts
  4. a lack of success stories
  5. a lack in quality of available software and
  6. the problem to quantify the benefits

I thought it would be interesting to consider each of thse in some more detail in a series of postings – designed to assist in promoting a greater understanding of semantic web and its potential use.  Would welcome any feedback/ ideas on this subject.

The referenced survey targeted a fairly technical, web savy, group, across Europe.  Am keen to engage more directly with business poeple – amongst many of whom I am not sure there is a clear understanding of, or interest in, the semantic web.

Reblog this post [with Zemanta]

Is the person and technology becoming one?

Have just spent a couple of weeks on vacation – without broadband access at my fingertips.  Continued to monitor email and SMS – from my phone.  Probably online three times over the fortnight – had to make an effort.  Posted a few photos to facebook from the phone.

Real difference was not interacting with twitter and other social networks on a regular basis throughout the day.  Also – listened to the radio for news and read a few newspapers.

Just watched Kevin Kelly video/ presentation on future of the web.  KK (of Wired) sees the internet as one computer.  We use various devices to access the one computer.  ‘Things’ e.g. cars, clothes, devices which incorporate chips (e.g. RFID) are effectively part of the one computer.  And, indeed, we are in many respects sensors for this one computer – as more and more information ends up in the one computer.

This is enough to scare off a lot of people.  In the Q&A session KK fields a number of interesting questions, including what are the opt out options, is the one computer and the human race in conflict?  Interestingly seems that most people are happy to go along with what’s happening.  He has a great line ‘No personalisation without transparency’.  Effectively you have to open up, provide information about yourself, your business, whatever, if you want a personalised experience.

This morning read a posting about Gordon Bell – a Microsoft researcher who is attempting to record everything in his life digitally.

Interesting line in this from GB: ‘By using e-memory as a surrogate for meat-based memory, he argues, we free our minds to engage in more creativity, learning, and innovation (sort of like Getting Things Done without all those darn Post-its)’.

I have often thought that this is the case.  An example being that sometimes overprep for a meeting (reading all the material, anticipating the questions, etc) results in a less creative, open discussion.  Another example would be whether examinations are still bogged down in being largely tests of memory rather than tests of reasoning.

All of this relates closely to one of my own areas of primary interest – linked data and the semantic web.  Linked data requires entities to share more data – for the benefit of being able to correlate this with other shared data.  The semantic web aims to enable ‘intelligent’ processing of data by computers – ie the one computer referenced by KK.

I think KK is right.  The one computer is more and more a fact of life.  There are many benefits – and a number of threats.  While there are opt outs – and ways to escape e.g. go and live on a deserted island off the west coast of Ireland – inevitably the internet continues to be more pervasive (and invasive).

Looking forward to another few days of restricted broadband access.  And then back to life interacting with the one computer.

Ireland – leading the way in eLearning and semantic web

Spent the morning at a workshop run by DERI (Digital Enterprise Research Institute) at Enterprise Ireland.  If we spent more time focusing on what we can achieve through the likes of DERI and the Irish Learning Alliance (ILA) we might begin to dig ourselves out of our current difficulties.

Excellent presentations by Johnny Parkes, Bill McDaniel, Liam Moran and Mark Leyden.

Web 3.0 – in terms of getting at the data across the web – has great potential.  Poses interesting challenges/ questions for organisations traditionally obsessed with confindentiality of their data.  However for those who understand and resolve the connundrum (sharing their data) web 3.0 offers the potential of much greater insights and decision making.

case study – social networking in travel industry

Contributed to a case study in the Innovation section (pp42 – 44, Experts’ Advice – P44) of  today’s Irish Times  – looking at how a ski adventure company could use social networking to market their business.

Text of my advice in the case study:

BlackRun: Online for off piste

This is a typical 2009 scenario in Irish business – someone from the Facebook generation (‘gen f’) bringing ideas about social networking to the owners. The concerns are classic: fad or not, geeky or not? Simone is right – at least half BackRun’s target audience is social network friendly. So it’s a ‘no brainer’ – need to get on board. The good news: with some upfront planning this can be achieved, without swamping the team.

BlackRun needs a basic web site, optimised for search – integrated with a blog (could use software such as WordPress). Ruth & Simone need to set targets for blog posting frequency e.g. 3 times per week. Team members should be profiled in the blog and encouraged to post. Twitter, LinkedIn and Facebook accounts should be established – using auto notification of postings on the BlackRun blog. Worthwhile Twitter accounts should be identified and ‘followed’. BlackRun should aim to tweat daily – ask questions, answer queries, use hashtags. Facebook advertising should be considered.

There are great tools available to assist in managing online presence e.g. google webmaster, WordPress utilities, Tweetdeck, Nexus (Facebook). BlackRun needs to avail of these.

Finally, management should commit to measuring the effectivess of these initiatives on a weekly basis.

Barry O’Gorman consults in social networking, collaboration and semantic web.