Warning: this is a fairly geeky post, with some technical stuff that may cause brain explosion. You’ve been warned.

My PhD work is focused on virtual patients; I’m particularly interested in how we can use the semantic web to create more flexible, smarter virtual patient systems.
But why bother with semantic web technologies when a simple relational database could do? Well, the fact is that semantic web technologies have several benefits over relational database for the design of web-based virtual patients.

First, the semantic web is a common standard, well formalized and clearly designed to be used independently of any particular implementation. Thus, the data is understandable in the same way across different systems. By contrast, SQL for instance, despite being a standardized language, is not implemented in the same way by all software vendors, which is obviously a problem. Classic databases are also not designed to work on the web specifically, in the sense that it isn’t possible to query an SQL database from the web directly. On the other hand, the semantic web provides SPARQL endpoints to do just that.
Then, RDF (one of the building blocks of the semantic web) provides a clear semantic to define simple statements about virtual patients (or anything else for that matter). The statement “Mr. X is a virtual patient” can be represented using a very simple triple:
<http://www.example.com/> rdf:type <http://example.com/VirtualPatient>
With a relational database, there are a number of ways to represent this statement, which would be very specific to each individual developer’s choice, and vary widely in complexity and clarity. As a consequence, doing anything complex with the data, such as generating dynamic feedback (as I’ll demonstrate later) becomes also very idiosyncratic, and it’s easy to end up with a system that only you and your team can understand and maintain. This is prohibitive in a world were collaboration is badly needed due to lack of resources.
Sadly, this is the current state of affair in the field of virtual patients: ad hoc systems designed to work on a given platform, in a given medical school. The efforts to standardize virtual patients using XML with Medbiquitous have gone a long way towards changing that, but mere XML lacks the powerful features available with the semantic web.
Finally, the interoperability and clarity offered by the semantic web allow use to reuse data from external sources on the web to enrich the data we already have about the virtual patient. You can query knowledge from DBpedia, Freebase and others and use that to generate feedback or add pictures and information about specific symptoms or conditions. Of course, you need to provide a clear way for virtual patient authors to verify the knowledge that’s pulled out from these external source, as it might not be accurate.
What the model looks like
To see if the semantic web can actually generate useful feedback, I created a very basic model to represent a virtual patient and the knowledge we have about it.
This picture shows, for instance, how to represent the following interconnected statements in a machine-readable manner:
- Mr Smith is a Virtual Patient,
- Mr Smith has a Muscle Ache,
- Mr Smith has Fever,
- Muscle Ache is a symptom of the Flu,
- Fever is a symptom of the Flu,
- Mr Smith has the Flu.
Representing Students’ Actions
To generate feedback, we have to design a model representing what the student is doing. This picture shows a model of how we represent the fact that Tim, the student, has asked a question about aching or stiffness.
The model is based on the notion of a “work session”. A session is linked to a virtual patient and to a student, which means that every time a student logs in and starts a virtual patient, a new, unique session is created. This also means that if a student stops his work and logs off, next time he or she comes back all the previously asked questions are recorded and displayed to the student.
Using the data linked to the student’s work session, we know everything the students has done. And because of the information we have about the patient’s symptoms, we can also infer that the student should probably think about flu as a potential diagnosis. If we add more data, we can infer many other plausible conditions that the student should look at, and generate feedback dynamically based on each student’s choices.
I will explain how this feedback generation works in more detail in my next post. In the meantime, please leave your comments if you have ideas about what a model of a virtual patient should include, or if you have any other question

