Database Design and the Hellenistic Greek Lexicon

Over the last few days I have had some interesting conversations with people who design databases (including my eldest son) regarding what would be involved in designing a database to allow lexical information for Ancient Greek to be presented in an extremely flexible manner so that the user of the lexicon could use any form of a given Greek word as the lemma under which to organize the lexicon.

If you are familiar with database design, I would love to hear your comments on this issue. If you are not, I’d be glad to address any questions I’m competent to answer.

How do you think the database should be structured to support the kind of lexicon I have proposed?

10 Replies to “Database Design and the Hellenistic Greek Lexicon”

  1. I use SIL’s FieldWorks Language Explorer for my own work, though I haven’t done a whole lot with the lexicon in it yet since thus far my focus has been morphology, but there’s plenty of potential there. The information is structured, so there’s the freedom in adapting the presentation to whatever you might need at a particular moment.

  2. Hi Micheal,

    My first thought was the same as Mike’s. If FieldWorks (FW) does the work, why develop a custom made database? About 70% of software development time goes to maintenance. That is a lot of time. You may have a perfectly good reason for it, but reading your lexicon design blog is not yet specific enough to enable us to evaluate what is missing from FW. Other than the fact that it does not run on Macs.

    If in the end you decide to do it, then please make the database open source, and please make it work on a Mac also. 😉

    If you want to be able to do collaboration, may be CouchDB could be a good database engine. Or are you thinking of an XML based database?

    You say this will be a life-long project, which is good, since it will be.
    Just two ideas:
    1. I have found the big Liddell-Scott to be the best lexicon in existence, compared to Bauer, Kittel etc. Please include every category that is there in their lemma articles.
    2. Louw-Nida is very good for semantic analysis. Please include semantic domain information also. But please do not hesitate to improve upon their analysis.

    Since this is a big project, it might be a good idea to invite some Greek scholars to collaborate with you. Dictionary work can easily be threaded or given to many processors or professors. Thus the collaborators might work on some lexemes that you are not working on. The main thing is to agree on the procedure and principles. And you could retain the right to veto their work.

    Kari Valkama

    1. Semantic domain information will be included in what I hope to do. An electronic lexicon will be able to do much more with that than the print lexicon that Louw and Nida produced. Encoding the various semantic domains to which each word is connected will allow for some powerful ways to reconfigure the lexicon on the fly.

      There is one fundamental difference between what I am proposing and an electronic version of the Liddell-Scott lexicon: the new lexicon that I have in mind will be specifically Hellenistic. Liddell-Scott is an invaluable resource precisely because of the historical breadth it covers. I am shooting for something much more limited: a relatively synchronic lexicon of the Hellenistic Period. Of course, a perfectly synchonic lexicon is not possible if you want to cover the breadth of the Hellenistic literature, but I want to limit the lexicon as far as is reasonable to give an accurate picture of usage near the turn of the era. I envision a historical slice of no more than about 300 years.

      The project I propose will also differ fundamentally from current lexica of biblical Greek in that I intend to cover a much broader swath of the hellenistic literature. I do not see this as a luxury, but as an absolute necessity for a valid lexicon. For too long we have been satisfied with tools that ignore the majority of the literature from the period. This is simply not acceptable.

      Of course you are right about the collaboration issue. I’m not sure how best to structure the tools necessary to encourage collaboration, but I would love to have the participation/cooperation/colaboration of several other scholars in this endeavor.

  3. Everything I do at is tested on MAC, Linux, and Windows. The lexicon will not be an exception to this.

    I am also a strong supporter of open source software. My first thought for the lexicon database was to do it in MySQL. Yep… editing it directly. At this point I’m thinking of making the database as flexible as possible so that I can issue API keys, etc. to allow it to work with other pieces of software that I have not yet envisioned.

    FieldWorks is a great piece of software, but it doesn’t run on a MAC. Since I do most of my work on a MAC, that’s a problem.

    I will respond more fully from home this evening. You have all presented some great ideas.

    1. FieldWorks is a great piece of software, but it doesn’t run on a MAC. Since I do most of my work on a MAC, that’s a problem.

      It will run on Mac in the future…unfortunately, the Linux development team and the Mac development team are the same person and the free OS takes priority for minority languages.

      1. Actually, I think working on the Linux version first makes sense. I’m all for increasing the use of Linux for academic computing. It’s a great operating system, very stable, and free. I have a Linux box at home that I use for testing and some basic computing.

        I’m writing on a Lenovo Thinkpad right now, but it doesn’t belong to me. I just find that working with Greek is a lot easier on the MAC. Still, I use all three of the most popular operating systems.

        I will be glad to see both the Linux and MAC versions of FieldWorks when they appear.

        My main concern right now, though, is the categories that should be available in the database, independent of what software is used to manage it. Let’s think just about verbs for a minute, and I think you’ll see what I mean. For each verb I would like to have a field for each of the “principal parts” plus one for each of the two infinitive forms. I would also like to have a separate field for each definition of the verb in question and fields for each of its arguments (AGENT, PATIENT, etc.).

        For nouns, a different set of fields would be necessary, of course. To make a data set as enormous as a dictionary of Hellenistic Greek run at an acceptable speed, this would mean the database would need to access different tables, one or more for verbs, another one or more for nouns, etc.

        What I’m interested in discussing is not the intricacies of software to do this, but what the fields ought to be. Regardless of how we manage the database, making the right decisions about fields at the outset will greatly reduce the headaches that can emerge later.

        What fields do you suggest for verbs, nouns, adjectives, adverbs, prepositions, etc.?

  4. Well, then to get back to your specific question of categories and fields…

    I would advocate distinguishing the pronouns so that the those pronouns that refer to actual interlocutive participants (1st and 2nd person) are distinguished from the rest more clearly rather than grouping 1st, 2nd, and 3rd person pronouns together. Both in function and in inflection 3rd person patterns closer to the demonstrative than it does to 1st & 2nd person.

      1. Yes. On the basis of semantics, the range of syntactic functions, and morphology, the third person forms are quite distinct from the first and second person pronouns.

Comments are closed.