Wikiquote:Themebot

ThemeBot is a proposed automatic theme-generator bot.

Algorithm
This bot would do the following:
 * For each page linked from List of people by name (except for List of people by occupation).
 * Parse each quote or Tranliteration / Translation.
 * For each noun or verb (or some subset to be decided on):
 * If a theme page does not exist for that word, create one.
 * Change the noun or verb on the person page into a link to the theme page. (i.e. prefix it with and suffix it with.
 * If the quote from the person page does not yet exist in the list of quotes on the theme page, add it in alphabetical order. (If another quote already exists on the theme page with more than 90% of the same words, replace it instead of adding a near duplicate. This would prevent the same quote being added if minor changes were made to it such as fixing of typos.)

Note that quotes could be added to theme pages manually and they would not be deleted by the ThemeBot.

If this was created and worked well, it could be possible to add quotes based on related terms rather than exact word matches e.g. a quote containing the word Christianity could be added to the Religion theme page. The link to the theme page would then be Christianity.

Issues with the ThemeBot
Add Pros, Cons, and Retorts as appropriate.
 * Pro: most of the manual creation of theme based pages could be automated which would allow us to concentrate on increasing the number of person pages.
 * Con: too many themes may easily be created, swamping Wikiquote with thousands of themes.
 * Retort: initially we could restrict it to creating theme pages for a smallish, specified list of words. This list could be increased over time. Maybe we could have a Requested Themes page to see what themes people want.
 * Retort: A requested theme page is rather pointless, as one could just as easily add the theme manually.
 * Also: the percentage of theme based pages in Wikiquote as a whole would decrease as the number of people pages increased into the thousands.
 * Con: duplicate or near duplicate quotes may still be added if the number of same words drops below 90%.
 * Retort: these could be removed manually when noticed, and would not be added again by the ThemeBot.
 * Con: not all quotes would be parsed correctly if they were not entered according to the template.
 * Retort: if in doubt, ThemeBot could ignore them.

And a <-- pagetheme: theme1, theme2; --> + <-- pagealso: synonym1, antonym1; --> for a page-wide theme. As soon as a page submit (Save page) takes place the ThemeBot could then parse that one page and updates the links, and it would have meaningful themes.
 * Alternative: Unless having the bot use a lookup list for synonyms the number of generated themes will be silly if all nouns are used. And with multiple languages to be included how will this be parsed, as indeed Marco d'Itri remarked below. Suggestion, use a visible tag for quotes themselves:
 * See also: more pages on Theme

Hooloovoo 18:20 Jan 2, 2004 (UTC) sorry for the expansive addition :)

Votes for or against the concept

 * For: Nanobug, Formulax, Angela
 * Against: Fonzy (for the moment maybe in a year.), LittleDan (all bot-made content is IMO worthless), Marco d'Itri (parsing natural language for concepts is hard... what about adding hidden theme metadata to quotes or whole pages?), Basil Fawlty (themes will be added as needed, such is the nature of a wiki), Gaurav, Kalki, Technopilgrim

Current status
In-limbo: If and when we have a significant majority of people for the proposal design and then coding will commence. This seems unlikely to happen in the short to medium term.

Comments

 * "People willing to help with the coding." What does that mean? I'm willing to help, but I don't know much PHP (I'm assuming that's what's going to be used?). I'm kinda learning as I go with PHP. In short, I'm willing, but I don't know if I'm useable. --Sasha--


 * A bot cannot possibly have enough intelligence to determine the theme of a quote just by parsing it for nouns and verbs. Is the following about God or love? No: "My God, I love Monet." It would make much more sense to allow the contributor to create links in the quote based on its theme.  A bot should simply check each theme page for "what links here" and grab the appropriate quote. "We will fight them on the beaches..." is not about beaches!  It would better look like We will fight them on the beaches..."  A bot could easily determine that the quote applies to the themes England, WWII, and war by checking "what links here" on those theme pages.  --Steve Sliva--

People willing to help with the coding

 * Nanobug

Implementation considerations

 * Frequency of bot: off-peak so as not to interfere with normal editing.
 * Output: strictly limited initially and only increased slowly over time.
 * Language written in: ?.

Additional Links

 * Wikipedia:Bots - bots are bad, at least some of the time.
 * Rambot - the classic Wikipedia bot.