Approximative Queries on Semistructured Corpora
This dissertation project has a long history. It issues from an occupation at Regensburg University where I investigated how historians could use digital corpora. I found that the most fundamental precondition was the availability of a search engine. My Master's Thesis started out to write such one. My corpus was the PHI / TLG corpus containing Greek and Latin texts. Unfortunately this corpus did not seem to receive any further development attention, but I had heard that there were plans to convert it into XML. But that was an issue in the period after I quit University.
Later I decided that it was worth to have another look at the topic and to redesign my search engine from ground so that it was able to research in non-orthographic, XML-encoded historical texts.
Eventually, after meandering over diverse sub-issues, I ended up modifying the open source XML database BerkeleyDB-XML by creating an approximative matching funktion as part of XPath.
See my list of publications.