FIELD MARKS FOR WEBSTER 1913 and CIDE ===================================== Tagset.web: Explanations of the tags used to mark the Webster 1913 dictionary and the CIDE (Collaborative International Dictionary of English). Note that the list of tags used to mark the public domain version of this dictionary is shorter than the full set described here. If any tag is not listed here, it is either (1) one of the "point" (font size) or "type" (font style) tags, which should be self-explanatory; or (2) Is a functional field with no effect on the typography. Last modified March 12, 1999. For questions, contact: Patrick Cassidy cassidy@micra.com 735 Belvidere Ave. Plainfield, NJ 07062 (908) 561-3416 or (908) 668-5252 ------------------------------------------------------------- A separate file, webfont.asc, contains the list of the individual non-ASCII characters represented by either higher-order hexadecimal character marks (e.g., \'94, for o-umlaut) or by entity tags (e.g., . Note: The tags on this list are similar in structure to SGML tags. Each tag on this list marks a field; each field opens with a tagname between angle brackets thus: , and closes with a similar tag containing the forward slash thus: . No tags are used without closing tags. Thus the HTML
to indicate a line break is symbolized here as an entity,
has a corresponding

. The absence of an end-field tag, or the presence of an end-field tag without a prior begin-field tag constitutes a typographical error, of which there may be a significant number. Any errors detected should be brought to the attention of PJC or the appropriate editor. Most of the tagged fields are presented in the text in italic type, with a number of exceptions. Where a word is contained within more than one field, the innermost field determines the font to be used. Wherever recognizable functional fields were found, an attempt was made to tag the field with a functional mark, but in many cases, words were italicised only to represent the word itself as a discourse entity, and in some such cases, the "italic" mark was used, implying nothing regarding functionality of the word. The base font is considered "plain". Where an italic field is indicated, parentheses or brackets within the field are not italicised. Where no font is specified for a tag, the tag is merely a functional division, and was printed in plain font unless otherwise tagged. This type of segment is marked by an asterisk (*) where the font name would be. The size of the "plain" font in the original text is about 1.6 mm for the height of capitalized letters. ============================================================= Explicit typographical tags: These were used where the purpose of a different font was merely to distinguish a word from the body of the text, and no explicit functional tag seemed apropriate. ----------------------------------- Tag Font ----------------------------------- Explicit formatting tags: . . . . . . . . . . . . . . . . . . plain font (that used in the body of a definition) -- normally not marked, except within fields of a different front. italic (in master files) italic (for use in HTML presentation) bold (in master files) bold (for use in HTML presentation) small caps (used mostly for "a. d.", "b. c.") This is the same font a , but has no functional or semantic significance A squared bold font without serifs approximating the "universe bold" font on the HP Laserjet4, slightly larger than the capitals in a definition body. Used in expositions describing shapes, such as "Y", "T", "U", "X", "V", "F". Font the same as the headword , though the field is not a headword. Used only once. subscript superscript superscript <...type> A series of tags, many unique, designating certain unusual fonts, such as "bourgeoistype" for "bourgeois type", in the section on typography. Sans-serif font A series of point size markers, many unique. Vertically organized column. Vertically organized column -- only part of a table which needs to be completed. Used once. ============================================================= Tags with semantic content: . . . . . . . . . . . . . . . . . . . . . . . . . . . * Alternative spelling segment. Almost always contained within square brackets after the main definition segment. Expository words such as "Spelled also" are in plain font; the actual alternative spelling is marked by ... tags within this segment. italic Antonym. italic Alternative spelling. The actual word which is an alternative spelling to the headword. These are functionally synonyms of the headword. In most cases these also occur as headwords, with reference to the word where the actual definition is found, but not all such words are listed separately, particularly if the spelling is close enough to the headword to be found at the same point in the dictionary. Whether listed separately or not, these words should be indexed at this location, also. italic Authority or author. Used where an authority is (may be right- given for a definition, and also used for the justified. See author, where a quotation within double quotes in the section is given in the same paragraph as the on formatting). definition. The double quotes are indicated by the open-quote (\'bd) and close-quote (\'b8). In both cases, it is typically right-justified, almost always fitting on the same line with the last line of the definition or quotation. Within collocation segments, it is usually used only after quotations, and is not right- justified, except occasionally where it would be close to the right margin, and then apparently is is right-justified. We have not explicitly marked those which are right-justified, but they can be recognized because they are on a line by themselves, preceded by two carriage returns. * Marks a biography. Should be longer than a short mention of who a person was, which is typically included as a definition. italic Marks the name of a book, pamphlet, or similar document. * A field of knowledge which of which the headword is a division. * tags the CAS (Chemical Abstracts Service) registry number for a chemical substance. italic tags the infectious disease caused by the headword. Implied type of the agent is a microorganism, and the tag must mark a disease. * Same as without the italic type. * Same as without the italic type. italic inverse of causes: tags the causative agent of an infectious disease, which is the headword . the tag must mark a microorganism, virus, or prion, and the implied type of the headword is a disease. Used only for The single letter in the headers to each letter of the alphabet. * marks the proper name of a city. Used only occasionally and not consistently at this stage. italic Converted to: used to tag substances which are products prepared by conversion from the headword. Usually chemicals. Rarely used up to 1998. italic Composed of. Tags a substance of which the headword is at least partly composed. The substance may be particulate, such as diatoms composing diatomaceous earth. italic Contrasting word. Not exactly an antonym, which is marked , but a contrasting word which is often introduced as "opposite to" or "contrasts with". italic Collocation reference. A reference to a collocation. Each such collocation should have its own entry, marked by ... tags, and these references should function as hypertext buttons to access that entry. * Date-with-year tags a date containing a year. * definition. The definition may have subfields, particularly (an illustrative phrase starting with "as" or "thus" and containing the headword (or a morphological derivative). The , \'bd...\'b8 quotations (left and right double quotes) and fields may be found within a definition field, but should and usually are located outside the definition proper. The marking macro was inconsistent in this placement, and the exclusion of the , and quotations needs to be completed by the proof-readers. Certain definitions contain fields within them, where the headword is an irregular derivative of another headword. In these cases, the field follows immediately after the tag, and these entries do not have a separate field. In such cases, the field is italic, as usual. * Marks an education institution, a subtype of organization.
Just a place-holder for illustrations, but seldom used. italic Marks the name of a movie film. italic Field of specialization. Most often used for Zoology and Botany, but many "fields of specialization" are marked for technical terms. The parentheses are usually within this field, but are not themselves in italics. * Hyperym. Points to the hypernym from WordNet 1.5 Initially, used only for entries extracted from WordNet 1.5. Not present in the original 1913 version. * Illustration place-holder. Seldom used. * Designates one item in a row of a table. Used only when intervening spaces do not serve properly as natural field separaters. * Always a filled rectangular array. * Music figure. Ony in a note under the entry "Figure", the two numbers of each such field are bold, 20 point type, stacked as in a fraction with a bar between them, but also having a horizontal stroke midway through each numeral. Unique to this entry.

* paragraph tag, used always in pairs. Line breaks may be embedded inside the paragraphs. * Marks the name of a publication other than book, which is marked by . * Collocation, plain text -- used to tag phrases that should be parsed as a unit, but has no typographical significance. italic Always right-justified, as described for . * Marks the set of references used for a longer article such as a biography. * Right justified * Designates a row in a table. * Supra. The two parts of each such field are stacked, one over the other, *without* a horizontal bar between (as in a fraction). Used only in one entry, for a musical notation. * Always a filled rectangular array, having and elements. ==================================================================== Functional Tags -------------------------------------------------------------------- Tag Font Meaning (Comparatives are relative to the plain font.) ----------------------------------------------------------------------- <-- --> * Comment, not a tag. These segments should be deleted from the written or printed text. Page numbers of the original text are indicated within such comments; these may be left in, if desired. * HTML-style comment. Used to indicate page numbers in the public domain version. small caps Tags for the actual adjective or adverb comparatives or superlatives. Should be indexed. See also conjf (verbs) and decf (nouns). italic Alternative name. Usually for plants or animals, but also used for other cases where words are introduced by "also called", "called also", "formerly called". These are functionally *synonyms* for that word-sense. italic Same as , but the marked word is a plural form, whereas the headword is singular. * Adjective morphological segment, primarily the comparative and superlative forms. The occasional adverb morphology is also tagged this way. * A segment occurring within the definitional sentence, providing an example of usage of the headword. Not conceptually a part of the actual definition. smaller spacing Collocation definition. Similar in structure to headword definitions (the field). May contain an field. Plain type, but with closer spacing than main definitions. bold, Collocation. A word combination containing the smaller by headword (or a morphological derivative). 1 point The collocations do not have an explicitly marked part of speech. See also , tagging embedded collocations. Collocation, no typographic significance. Used to mark a word combination defined in the dictionary without affect on font. small caps The conjugated (non-infinitive) forms of verbs. imp. & p. p. is common, as well as p. pr. & vb. n. Irregular variants of these are less common. Words in this field perhaps should be indexed. smaller Collocation segment. The font and size is vertical normal in a cs, but the spacing between lines spacing is smaller (0.9 mm between lower-case letters, rather than 1.1 mm in the main body of the definition). For an on-line dictionary, reproducing this typography is probably pointless. small caps The actual morphological variants of nouns or pronouns. Should be indexed. * Embedded Collocation. A word combination containing the headword (or a morphological derivative, embedded within a definition without a separate definitin of its own. These collocations should be defined implicitly by the text of the definition in which they are embedded. See also , tagging explicitly defined collocations. Small Caps Entry reference. References to headwords within the "etymology" section are in small caps. Such references also occur in the body of definitions, and in "usage" segments. Such entry references should function as hypertext buttons to access that entry. * Etymology. Always contained within square brackets. Normal type is used for explanatory comments, and italics for the actual words (marked ) considered as etymological sources. italic Etymological source. Words from which the headword was derived, or to which it is related. The Greek words within an etymology segment are invariably etymology sources, and should be marked as such, but are not so marked, even in the rare cases where the Greek word transliteration has been written in. italic Etymological source, being the name of a person or geographical location which is the eponym for the concept. This is used to distinguish eponymous etymologies from others, and can also be found in the body of a definition or note, not only in the etymology field. Very few of the names that should be marked this way have actually been so marked, as of version 0.42. In cases where such eponymous names have not yet been thus marked, they will usually be marked by , the non-semantic italic-font marker, or, in etymologies, by . italic Example. An example of usage of the headword, usually found within an or segment. * Frequency of use, ordinal rank. This is used for WordNet entries, in which the synonyms were ranked in order of frequency of use. 1 indicates that the headword is the first word on the list of synonyms. * First use. A date at or around which the first use of this word in writing is recorded. Not in the original 1913 Webster, and usu. taken from a recent dictionary. Only a few such fields have been entered as of version 0.41 transliteration Greek. The Greek words have been transliterated using the equivalents explained in the file "webfonts.asc". In most cases, the transliterations are typical for Greek letters, except for theta (transl = q), phi (transl. = f), eta (transl. = h), and upsilon (transl. = y, whether pronounced as y or u). This was to eliminate any ambiguity. These words occur primarily in etymologies, and to conform to the usage of should also be marked by , but as of version 0.41 they are not usually thus marked. bold, headword. Each main entry begins with the larger by mark, and ends at the next mark. The 2 points main entries are not otherwise explicitly marked as a distinctive field. The same word may appear as a headword several times, usually as different parts of speech, but sometimes with different entries as the same part of speech, presumably to indicate a different etymology. Within the hw field the heavy accent is represented by double quote ("), the light accent by open-single-quote (`), and the short dash separating syllables by an asterisk (*). A hyphen (-) is used to represent the hyphen of hyphenated words. italic, Usage mark. Almost always within square brackets, occasionally in parentheses or without any bracketing. but The most common usage marks, explanatory "Obs." = obsolete "R." = rare, "Colloq." = may be plain. colloquial, "Prov. Eng." = Provincial England, etc. are in italics. Some usage notes are also marked with , but are in plain. For simplicity, all words in this field may be italic, until additional explicit marks are added. * A usage mark in plain type (not italic). Found within a definition, when there are more than one sense-number listed. "Fig." at the head of an entry is the most common case. * Multiple collocation. Similar to multiple headword, when two or more collocations share one definition; however, the two collocations are in-line, rather than stacked or justified. There may be "or" or "and" words (italicised), or an "etc." (plain type) within this field. In many cases, the * Multiple headword. This field is used where more than one headword shares a single definition. In the dictionary, the (usually) two headwords are left-justified one below the other in the column, and are tied together on the right side of the headwords by a long right curly brace. This division is strictly functional, for analytical purposes, and does not affect the typography. * Noun morphology section. Rarely used, mostly for irregular personal pronouns. * Explanatory note. No explicit font is indicated. These segments may be separate, as in the separate paragraphs starting * Plural. The "plural" segment starts with a "pl." which is italicised, but in this segment is not otherwise marked as italicised. Other words occurring in this segment are plain type. The "pl." can be easily explicitly marked if necessary. italic Part of speech. Always an abbreviation: e.g., n.; v. i.; v. t.; a.; adv.; pron.; prep. Combinations may occur, as "a. & n.". small caps Plural word. The actual plural form of the word, found within a segment. * pronunciation. The default font is normal, but many non-ASCII characters are used. The pronunciation field may have more than one pronunciation, separated by an " smaller by Quotation. No bracketing quotation marks, two points, though occasionally \'bd-\'b8 quotations occur centered, within these quotations. These quotations Separate tend to be more complete sentences, rather paragraph than just phrases, such as are contained within quotation marks within the definition paragraph. italic, Quotation author. Used only for the quotations right justified marked with that are centered in their own paragraphs. italic Quotation example. An example of usage of the headword, within quotations marked by .. tags. italic Subdefinition, marked (a), (b), (c), etc. THese are finer distinctions of word senses, used within numbered word-sense (for main entries), and also used for subdefinitions within collocation segments, which have no numbering of senses. The letter is italic, the parentheses are not. This tag is also used to indicate the lettered subdefinition when it is referred to at another point in the text. italic The name of a ship. Rarely used. * Singular. Analogous to the segment, but more rarely used, mostly for Indian tribes, which are listed in the plural form. small caps Singular word. The singular form of the plural-form headword. bold, Sense number. A headword may have over 20 larger by different sense numbers. Within each numbered 2 points sense there may be lettered sub-senses. See the (sub-definition) field. italic Source. The author of the definition. Used only for definitions not originally present in Webster 1913, and not present in the original version intended to mimic the 1913 printed dictionary. This source is used for each word sense, and may differ for different senses of a word, especially where a Web1913 definition was substantially modified, or a new word sense was added to a previously defined word. italic Species name. Used to mark the taxonomic names of living things which are represented in italic font in the original printed version. Originally, not only species, but genera, orders and families were also thus marked. The conversion from to , , or is not completed, and may stil be found marking such groups. However, orders and families are also frequently mentioned in the original in normal font, and in such cases are not marked with any tag. So, this mark is not a reliable indicator of all mentions of taxonomic names. plain Synonyms. A list of synonyms, sometimes followed by a segment. narrower Comparisons of word usage for words which are spacing sometimes confused. As with collocation segments, font is plain, but spacing is smaller than normal definition spacing. This seems pointlessly complicating for an on-line display. * Verb morphology (conjugation) segment, delimited by square brackets. * Morphological derivatives not contained in the bracketed segments, as above. For nouns derived from adjectives, adverbs from adjectives, etc. This segment is usually found at the end of the main entry. The adverbial and nominalized derivatives at the end of a main entry are usually introduced by an em dash [represented as two hyphens (--)]. bold, Same font as , with accents and syllable larger by breaks marked as in the headword. 2 points Marks the actual morphological forms within a segment; typically, adverbial or nominalized form of an adjective. * Second definition (occasionally, a third definition is present). This is used where a second or third part of speech with the same orthography is placed under one headword. Within this segment, there will be a field, and sometimes a and/or a quotation. * "Specifically:" Used to mark the words "specifically", "Hence", "as" which are used to introduce a second definition typically more specific than the first, but in general derived by extension of the initial definition. This functions as a warning of multiple definitions where the sense-numbers are not explicitly used. It is also useful in separate senses, to tag polysemous definitions which may be specializations or generalizations of the preceding definition. italic. Plural form. Used exclusively to mark the "pl." abbreviation, which introduces a definition for the headword, *when used in the plural form*. Not related to , which spells out the plural form, but does define it. italic Usage example. Used only a few times, within segments. italic supertype (hypernym) the inverse of and identical to but not derived from WordNet. plain, Chemical formula. The letters are plain font, numbers but the numbers are subscript. This is mostly subscript useful as a functional mark to pinpoint chemicals. plain, Chemical formula same as , but not processed specially by the tag-converter program. The letters are plain font, but the numbers are subscript. Used in place of when the formula has a tag inside, which cannot now be processed by the processing routine. * chemical name. Used to allow a IUPAC chemical name to be processed as a unit in spite of embedded dashes, parentheses, and commas. * "see" reference to related words, outside of the main definition field. italic Mathematical expression. In this dictionary, essentially all letters (used as variable labels) in math expressions are in italic font. The "+" and "-" may also appear typographically different from elsewhere in the dictionary. italic Also a mathematical expression, but the colon and double colon may have a different typography than usual., as in a:b italic Singular form. Analogous to , to define the singular word where the headword is the plural form. ** only modifies the word "sing." * Morphological derivation. Used to mark the entry-reference portions of those entries which are defined as morphological derivatives (plural, p. p., imp.) of other headwords. Used just as an attempt to mark and regularize the entry format. May be ignored typographically. a stack, Fraction. Used for non-numerical fractions with which cannot be expressed as a superscript, Exponential. Used in mathematical expressions. smaller font. italic Translation (e.g. of Greek), in the body of a definition or etymology. Used only twice. italic Word translated: the word in italic is translated by a subsequent word. Usually in etymologies, where the word translated is not actually etymologically related to the headword. The translated word is not necessarily English. italic translation of the preceding word (or of the headword) into English. bold, Collocation font. Same font as used in collocations. smaller This is used only in the list of "un-" words not by 1 point actually defined in the dictionary. Probably could be replaced by a segment mark for the entire list! The "un-" words should be indexed as headwords. * Functional expression (math). The function names are in plain type, the variables are italic. italic Illustration reference. Used ony occasionally, not yet (v. 0.41) consistently. italic Figure reference. * Chemical reaction. Similar to chemical formulas (which are contained but not explicitly marked), with some other symbols. italic Verb Particle. Only a few particles were actually marked, but in a future version more may be. ? Table Title. Used only once. italic Title of a literary work, movie, opera, musical composition, etc. Used rarely but should be used in every case, except in <au> references. <root> * Square root -- differs from the entity <root/, which is a square root sign that does not extend beyond the number following it. The <root> field has a bar (vinvulum) over the expression within the field, as well as the square root symbol preceding the expression in the field. Used only once. <vinc> * Vinculum. In a mathematical expression, a bar extending over the expression within the field. Used only once. This apparently serves the same function as a parentheses, of causing the expression within the field to be evaluated and the result used as the (mathematical) value of the field. <nul> plain Nultype. An older version of <plain>. <cd2> * Second collocation definition. Somewhat similar to <def2>. Purely a mark to reduce functional ambiguity, with no effect on the typography. <hypen> * Hypernym. Mark introduced for the World Wide Webster, when adding words from WordNet. In most cases, this tag marks the WordNet hypernym (for nouns and verbs). Where the <au> mark is PJC or includes a +PJC, the hypernym may not be the same as in WordNet. The words marked by this tag need to be bracketed in some way, but this is deferred until the definitions included with the hypernyms have been deleted, and other disambiguating marks substituted. <stype> italic Subtype. A functional mark, to point out words which are conceptually subtypes of the headword. <styp> * Subtype. A functional mark, to point out words which are conceptually subtypes of the headword, but with no *typographical* significance. <simto> * Similar-to. A semantic relational mark for closely related words which are not quite synonyms, nor hypernyms, nor hyponyms. Introduced with WordNet data. <conseq> * Consequence. For adjectives, is an attribute which or is a consequence of possessing the headword attribute. <hascons> Introduced with WordNet data. <consof> * Consequence of. For adjectives, an attribute which implies the headword as a natural consequence. <part> italic Part. Marks a word designating something which is conceptually a part of the headword. Rarely used. <parts> italic Part, plural form. Same as <part>, but marks the name of the part in its plural form. <partof> * Marks a word designating something of which the headword is conceptually a part. Inverse of <part>. This is very broad, and may mean constituent or separable part. Rarely used. <contxt> * Context. Used only for introductions to definitions, giving the context of usage, which are not part of the definition proper, as: <contxt>when used of a person:</contxt> <grp> * Marks the name of a group of people not formally organized. <membof> italic marks a group of which the headword is a member. This is rarely used, but should be indexed as an entry word or phrase. <member> italic marks a member of a group defined by the headword. This is rarely used, but should be indexed as an entry word or phrase. <members> italic Same as <member>, but marks a plural word, designating the name of the members in its plural form, for lack of ambiguity. <person> * marks the proper name of a person. Used only occasionally, but should be used more frequently for cases where first names are abbreviated, to reduce ambiguity of the period for automatic analysis. <persn> * marks the name of a person, when only one name (usually the last name) is given. Not used consistently where it should be. <corpn> * Name of a business company, corporation, or partnership. Started using November 1988. Rare. <corr> italic Correlative. A word intimately associated with the headword in a manner such that one cannot appear without the other. NOt exactly an inverse. <qperson> italic marks the name of a person, quoted in a dialogue. Used only in <q> blockquotes as of vers. 0.45. <org> * marks the name of an organization; sometimes used for the names of groups of people not formally organized *see also <grp>. <prod> italic produces. Designates a substance produced by a living organism. Rarely used. <prodp> * produces (plainfont). Designates a substance produced by a living organism. Same as <prod>, but does not affect font. Rarely used. <prodby> * produced by. Designates a living organism which produces the headword substance. Rarely used. <stage> italic life stage of an organism. Used to indicate variant forms of an organism defined by the headword. Rarely used. <stageof> * an organism one of whose life stages is the headword. Inverse (correlative) of <stage>. Rarely used. <inv> italic inversely related to headword -- e.g. depository is the inverse of depositor; buyer is the inverse of seller. Called "correlative" in the Webster 1913 and the CIDE. Rarely used. <methodfor> italic is a method to accomplish the action defined by the headword. Rarely used, and only in the supplemental section. <examp> italic example or instance of the headword, where the emphasized word is not a proper subtype. <stage> italic a stage of life of the headword -- for living things, such as insects, whose life stages may take different names. <unit> italic a unit of measure, usually preceded by a number. Also used to tag the unit of a measure which is the headword. <uses> italic tags a tool or method used by the headword, which is usually some process. <usedfor> * tags a method or process for which the headword is a tool. <usedby> italic tags a tool or method which uses the headword, which is usually a physical object. <perf> italic performs -- tags a word which is a process or activity performed by the headword. <recipr> italic reciprocal -- used for cases where the tagged word is a reciprocal participant in an action, such as donor and recipient. The difference between this and <inv> inverse has not yet been systematically settled. Used seldom, and mostly in the supplemented version. <sig> italic significance, meaning -- used in definitions where the actual meaning is prefixed with commentary explaining usage or other attributes of the word, as with prefixes or suffixes. <wns> italic WordNet sense. Where known, the correspondence of the sense of an entry with that of WordNet 1.6 is given after the definition, in a tag of the form: <wns>[wns=3]</wns>, in which the number is the numbered sense in WordNet. ============================================================= Biological classifications: --------------------------- <phylum> italic Taxonomic phylum name. <subphylum> italic Taxonomic subphylum name. <class> italic Taxonomic class name. <subclass> italic Taxonomic subclass name. <ord> italic Taxonomic order name. Also used for suborders, initially. <subord> italic Taxonomic suborder name. <fam> italic Taxonomic family name. Also used to tag "tribes". <subfam> italic Taxonomic subfamily name. <gen> italic Taxonomic genus name. <var> italic Variety. Used to mark subspecies or varities below the level of species in living organism systematic names. <varn> italic Variety. Used to mark subspecies or varities below the level of species in living organism systematic names. Duplicative variant of <var>