American Association for the Advancement of Slavic Studies (AAASS)
Bibliography & Documentation (B & D) Committee home
Subcommittee on Digital Projects
The current scheme attempts to capture all the distinctions that users of the inventory will want to limit searches by, as well as include other information they would be interested in reading about. See the brief summary of how the inventory addresses each of the original oppositions in the charge.
There is a many-to-many relationship between projects and collections. So the database will also have keys between the two tables, associating projects and collections.
Sometimes, instead of digitizing or creating a collection, a project digitizes or creates just one resource. While "collection" seems like too specific a term, "resource" isn't appropriate when there are collections of individual resources. Furthermore, "collection" suggests selection principles, access points, and other principles of librarianship are at work.
Any field whose value has an associated URL (such as a personal or institututional homepage, or a description of a project appearing online) should be recorded in a uniform way. The exact format for recording it must be decided; for now HTML hyperlinks are embedded in the sample records.
| Project | |||||
|---|---|---|---|---|---|
| Field number | Field name | Repeatable? | Data type | Notes | |
| 1 | Name | yes | free text | Follow AACR for transcribing? Repeatable because some sites store your language preference or redirect you to a version in your langage based on your IP address, so no language is listed first. URL is associated when applicable. |
|
| 2 | Creator | yes | LC Name Authority File or form generated according to AACR2r2002r | ||
| 3 | Manager | yes | LC Name Authority File or form generated according to AACR2r2002r | Use only if person/corporate body carrying on work is not original creator |
|
| 4 | Participating | yes | LC Name Authority File or form generated according to AACR2r2002r | Also include persons/corporate bodies that have participated in the project in the past? If collections created by the project list participants, should we just put all of them here regardless of which collection they're involved with? That's what I've been doing. |
|
| 5 | Funder | yes | LC Name Authority File or form generated according to AACR2r2002r | Separate funding organization and grant number by " -- ". Separate grant numbers by "; " | |
| 6 | Host | yes | LC Name Authority File or form generated according to AACR2r2002r | Use only for websites, when the project is hosted by a person/corporate body other than creator or manager | |
| 7 | Description | no | free text | ||
| 8 | Goal | yes | controlled vocabulary:
|
Do we need other values? Make this required? Should there by criteria for these values so that a project that preserves poorly or provides poor access doesn't qualify? |
|
| 9 | Digital processes | yes | controlled vocabulary:
|
Make this required? If a project modifies files created by others, do we include digital processes used by the first party? |
|
| 10 | Inception | no | YYYY-MM-DD |
Date project was begun | |
| 11 | Future plans | no | free text | Plans to use new digital processes; plans to include wholy new collections | |
| Collection | |||||
| Field number | Field name | Repeatable? | Data type | Notes | |
| 1 | Title | yes | free text | Follow AACR for transcribing? Need to make it repeatable because some sites store your language preference, so no language is listed first. URL where an end user could access the collection / primary entry point for the collection is associated when applicable |
|
| 2 | Creator | yes | LC Name Authority File or form generated according to AACR2r2002r | Make this required? For now, I'm only filling this in if it's different from the project creator. |
|
| 3 | Description | no | free text | ||
| 4 | Subject focus | yes | controlled vocabulary | Make this required? Use LC headings? If so, do we really want non-librarians assigning subject headings to collections they submit? |
|
| 5 | Geographical focus |
yes | controlled vocabulary | Make this required? Use English terms from Getty Thesaurus of Geographical Names? Or LC geographic subdivisions, used without LC associated subject headings? If we use LC headings for the subject focus, we need this because sometimes LC subject headings don't subdivide geographically. Maybe this field can be excluded. |
|
| 6 | Chronological focus | yes | controlled vocabulary | Make this required? What kind of vocabulary should we use? Maybe this field can be excluded. |
|
| 7 | Language of items | yes | ISO 639-2 | ||
| 8 | Size of collection | ? | free text | Byte size, number of files, or number of cataloged items? (e.g., does every page image of a book count as an item?) Repeatable if we allow more than one of these measures. |
|
| 9 | Format of original items | yes | controlled vocabulary (for now, AACR General Media Designations) |
Use AACR General Media Designations, plus "born digital" or "created de novo"? Or develop our own controlled vocabulary that distinguishes, for example, newspapers, serials, and monographs? What's a better name for this field? |
|
| 10 | Source type | yes | controlled vocabulary
|
"Bibliographic" only applies to reformatting projects. We could use "tertiary" instead of "bibliographic",
but "tertiary" usually includes indexes, directories, reference sources, and databases, |
|
| 11 | Identifier for original items | yes | LC Name Authority File or form generated according to AACR2r2002r for the works, or OCLC numbers for the items being digitized | Not applicable for de novo collections. | |
| 12 | Location of original items | yes | LC Name Authority File or form generated according to AACR2r2002r | Not applicable for de novo collections. If copies of the item scanned are owned by other institutions as well, is it worth recording this? Should this be "location" or "owner"? |
|
| 13 | Format of surrogate items | yes | Internet MIME types | Should we give types for only those items presented to user? For example, many text encoding projects encode in SGML or XML but deliver in HTML, or save archival copies of images in TIFF format but deliver GIFs or JPEGs. For now, I'll list all known. | |
| 14 | Metadata/encoding scheme | yes | controlled vocabulary:
|
||
| 15 | Medium of collection | yes | controlled vocabulary:
|
Is there a less awkward name for this field? How about something better than "no medium"? | |
| 16 | Web services | yes | controlled vocabulary:
|
||
| 17 | Access conditions, rights asserted | no | free text | A statement of any access restrictions placed on the digital collection; information about rights (copyrights, etc) held in and over the digital collection Do we indicate purchasing as an access condition, even if there is no license for use? Do we indicate that a resource requires the user to download and install a free plugin (in order to use a proprietary format)? Do we give a statement give on the page page of the host but not linked to from the collection? |
|
| 18 | Made available | no | YYYY-MM-DD | Date the collection was made publically available | |
| 19 | Frequency of additions | no | free text | Frequency of additions to the digital collection (such as closed, irregular, daily, weekly, monthly, yearly) | |
| 20 | Future plans | no | free text | Plans to include new materials in collections | |
The project-collection (or project-resource) distinction remains problematic. Many digital projects do not distinguish the project from what it creates. If a collection or resource exists without a defined project, we could:
Another approach to the problematic distinction is to have more field values shared between projects and collections, with the values inherited from project to collection unless stated otherwise. For example, the Access conditions, rights asserted field would have one value for the project that would apply to all collections unless stated otherwise.
Do we give current affiliations of personal names or those at the time of creation or administration? If always current, it would be better to store personal and corporate names as a separate entity, so when a name is updated once, it's reflected in all projects and collections with which it's associated. Do we give affiliations for non-academics? Place of work even if a person creates the project outside of his/her job?
Should planned collections and parts of collections be listed in the inventory, without values for the Made available field? This will make searching to see if anyone else is planning to digitize an item easier.
When a personal name is affiliated with an institutuion, give it in form personal name | institution. Should we include all of institutional hierarchy (department, college/faculty/school, university)? Should hierarchy be in parentheses or separated by full stops? Just give whichever part of hierarchy LC gives?
Which fields should be required for every record?
Currently, the controlled vocabulary values are in English, or they're from LC authority files. Will we allow free text (such as for descriptions) to be in other languages? Which languages? When a field value is given in more than one language, should we explictly tag segments as being in one language or another?
Are we interested in recording particular software packages used in projects?