Database Definition

Standard definitions for commonly used terms about databases.


Database - Definitions from Heavily-Referenced Books


Short & simple: Dictionary definitions: Pet Definitions from WikiZens: These definitions are actually quite nice, because a mass or structured set of data does not imply to a certain bias such as an enterprise. A small person or anyone can store a mass of data for retrieval, expansion, updating. The data is obviously arranged.

However I'm not sure if computer should be part of the definition since one could store data via a card system or offline storage possibly without a computer - but then would it be a database?

Is the human brain a special form of data base?


Typical Services Provided by Database Management Systems


Database System, Database Management System: A general-purpose software system which can manage databases for a very large class of the possible application worlds Rishe '92

AnIntroductionToDatabaseSystems defines these functions/area of responsibility for a DBMS:

<quote type=approximative>
  1. Data definition. A DBMS must be able to accept data definitions in source form (a Data Definition Language) and convert them to the appropriate object form
  2. Data manipulation. A DBMS must be able to handle requests to retrieve, update, or delete existing data in the database or add new data in the database. Generally this is done through a Data Manipulation Language
  3. Data security and integrity. The DBMS must monitor user requests and reject any attempts to violate the security and integrity constraints defined by the DBA.
  4. Data dictionary The DBMS must provide a data dictionary function. The dictionary contains data about data ( metadata or descriptors ) (DataDictionary)
  5. Performance It goes without saying that the DBMS should perform all the tasks identified above as efficiently as possible
</quote>


Re: "The DBMS must monitor user requests and reject any attempts to violate the security and integrity..."

I disagree with this as an absolute requirement. One can have a large and usable database [DBMS] without any formal validation or checking. If one turns off all checking in a given database, and moves that functionality to applications, the DB is still usable. One may say the "quality went down", but being a database and being a good database may be different criteria.

Certainly, and we can also call it a DBMS if it randomly forgets data, or if it randomly corrupts or transforms or invents data, or even if it randomly refuses 95% of all service requests. Maybe not a 'good' DBMS, but still a DBMS...

[Explicit constraints (including security) and higher data integrity essentially distinguish a DBMS from a file system.]

I disagree. A file system is not a database because you cannot add new attributes. If you put constraints on a file system, that would not make it a database.


Database Model: A convention for specifying the concepts of the real world in a form understandable by a DBMS. -Rishe 92

Examples: relational model, network model, hierarchical model, ODMG object model, semantic binary model, deductive database model, etc.


Why do we have a need to have some standard definitions? Because very often people tend to forget the basics about the most commonly used concepts. And instead they substitute definitions with metaphors.

Related to the above definitions, there are some principles that need to be taken into consideration:


Some Definitions from People that Haven't Written Highly Referenced Books:

A database can also be considered a master, general-purpose (or semi-general-purpose) AbstractDataType.

TopMind's definition: They are "Attribute managers". They pre-package many common attribute and collection management idioms/abstractions into a single tool.


Shared-ness

This is to explore the shared-ness idea raised above. Under this, a DBMS is a tool that shares attribute-handling and collection-oriented idioms among:

As a test definition, let's assume that two out of three of these must be true to qualify.


Attributes and Weighted Definitions

Re: "To avoid hypocrisy on your part, "you need to provide, clear, precise rules/algorithms/formulas" to clarify exactly what "attribute" means."

I don't think I can provide such. A WeightedDefinition may be more appropriate, including perhaps for "database" itself. "Value", "Data", "Fact", "Information", etc. will likely have similar problems. --top

Provide a clear rule/algorithm/formula for precisely determining 'attribute-ness', then. Or is WeightedDefinition a page you created in an attempt to justify your hypocrisy?

I never implied it was clear, unlike you and "fact".

"Facts", used in DatabaseIsRepresenterOfFacts, was clarified in the first paragraph on the page ("a set of propositions that are believed to be true"). And, TopMind, I have never insisted that a rule/algorithm/formula is needed (or even appropriate) for a 'clear' definition. You aren't a hypocrite for consistently failing to meet my standards.

{A "fact" is a proposition that evaluates to 'true'. How clearer can you get than that?}

RDBMS don't typically store expressions, but attributes. Sure, one can view or re-write them as propositions, but this is true of just about any information, and thus provides an insufficient falsification test to "fact", leaving it too wide open. -t

RDBMSs generally store relationships, not attributes. You've married your DatabaseDefinition to the concept of 'entity' - something that can have attributes (as per EntityAttributeValue and EntityRelationshipDiagram). You also seem intent on marrying DatabaseDefinition to RDBMS, rather than simply ensuring that DatabaseDefinition includes RDBMS.

The broader world of databases - of which TopMind is apparently ignorant (though he may instead be a relational zealot in active denial) - allow users to manage and store constraints (X is between Y and Z, or X is the same as Y though Y is unknown), contingencies (X is true if Y is true), definitions (ancestor is parent or ancestor of parent), abstractions and heuristics (most X are Y, some X are Y, all X are Y, if X and Z then likely Y), even fuzzy propositions (X is like a chair), and so on - any sort of proposition that you might imagine to be true and wish to manage. Any given DBMS needn't manage all facts of all sorts; rather, any given DBMS will be managing some facts of some sorts. Perhaps TopMind is opposed to 'DatabaseIsRepresenterOfFacts' because he incorrectly reads it as 'DatabaseIsRepresenterOfAllFacts'. It seems silly to me that TopMind would assume a database must be omniscient. Any sane person would assume 'DatabaseIsRepresenterOfSomeFacts', which does not require supporting all sorts of facts.

In any case, it seems, TopMind, as though you are frustrated about irrelevant distinctions between 'information' and 'facts', while stupidly ignoring various relevant distinctions (as between 'fact' and 'proposition', or 'fact' and 'DomainValue', or 'fact' and 'management service'). It only takes one relevant distinction to 'falsify' a definition, TopMind. For your elucidation, 'information' is pretty much synonymous with 'data' excepting its contextual connotation: information (deriving from "inform", to teach, shape the mind) connotes something communicated, whereas 'data' (deriving from datum - a 'given') connotes something held or stored. The difference between information and data was never in the substance, but rather in where you found it. In English, you can find like differences between 'asteroid' vs. 'meteor' or 'lava' vs. 'magma'.

If it's "all about the context", then the definition becomes an ever more complicated case-by-case Sherlock Puzzle as to grow useless to most people. Don't tell me, it will eventually rely on the definition of "intent" if we probe enough. All roads lead to your own personal Rome. My definition gets one 90% the way there with 1/100th the complexity of yours. I'm sure you are going to argue that if one wants to "do it right" and get 91% instead, they'll have to read 80 boring academic books written by your buddies, which are probably their only audience. -t

It isn't "all about the context". DatabaseIsRepresenterOfFacts does not depend on a distinction between 'information' and 'facts'. If you were literate, you'd have read the first sentence in the prior paragraph which described this as an irrelevant distinction.

When it comes to YOUR writing, I am not literate, but rather dumb as a drunk 1st grader. Your writing sucks that bad.

[Top, I wouldn't be too quick to dismiss the possibility that his writing is just fine, and that you either have serious comprehension problems or you aren't very bright. I'm betting it's a combination of both.]

You don't try very hard to be clear. You just throw it up on the wall as-is and patronize anybody who cannot decipher your convoluted internal mental model that tries to be little else beyond mere parsable as English.

[Go read AnIntroductionToDatabaseSystems cover to cover. Come back when you're done. Is that clear enough for you?]

I don't see where it attempts to state a semi-formal and compact definition. I have the 6th edition. Is there somewhere specific you wish to reference?

[In the 8th edition, section 1.3 provides the following definition: "A database is a collection of persistent data that is used by the application systems of some given enterprise." That's both semi-formal and compact. However, my point was actually a bit more subtle, which is that the whole book, in a sense, defines "database". Unless you wish to establish a definition to support a particular academic argument -- which inherently means your definition will be highly constrained and specific -- it is unlikely you will be able to arrive at a general definition that everyone can agree upon, because there are varying informal views of what a "database" is, and there is no body to legislate one definition as correct and the others as wrong. Therefore, multiple, possibly contradictory, detailed definitions for "database" will be equally valid. Thus, trying to arrive at a single, semi-formal, and compact definition that is superior to Date's is a rather pointless endeavour.]

In my opinion it's a good exercise to find and document the similarities and the differences. There is value in documenting disagreement. It may fuel a better def in the future. Wouldn't you like to see the feature tradeoff decisions and mental juggling that the authors of your favorite or shop language performed? This is a nice feature of C2 that the encyclopedia style desired by some WikiZens wouldn't cover well. -t


Foo-Base

Here's a working model of the definition to explore. We have things like "databases" and "knowledgebases" where the atoms and operations are different. Perhaps we can define a more broader concept of an "ia-base" where "i" is "idiom" and "a" is "atom". "Data" generally means the idiom of atoms called "attributes". If they had called it "attribute-base", managers wouldn't purchase it. But generally "attribute" is the better word, in my opinion.

Generally an ia-base at least facilitates the storage and retrieval of the idiom atoms as-is, meaning that you can get out what you put in without too much hassle, and in its original form, assuming another user didn't change it. Common features associated with storage and retrieval are often included, such as concurrency management (ex: 2+ people want to change the same idiom at the same time).

But in addition, it also handles common processing idioms or features associated with the atoms. (Processing idioms are a subset of idioms, here). For example, generally there will be some form of aggregation, such as counts. Counting the atoms in some way is almost always a necessity regardless of the domain of the atoms.

However, different super-domains may have different forms of aggregation. For a KnowledgeBase? (AI), an operation like "sum" may not be appropriate, but there may be other common aggregation or aggregation-like operations that a KnowledgeBase? will typically need, like maybe a probability of a truth test. Generally these common processing idioms will be common to multiple applications, or systems that use similar atoms. A GIS-base would likely support things like searching within a radius, closest-to, and filtering by objects within a bounding polygon(s) because these are common processing idioms to GIS and not just to a specific GIS app; they are something that a GIS-er will likely eventually need ragardless of which company or area they are working in.

Of course the line between "super" domain processing idioms and narrow domain idioms can be blurry. For example, many RDBMS have financial functions. Perhaps the derivatives industry wants even more of these and many that are specific to derivatives. Such would be shifting away from an "attribute-base" (database) and toward a "financial-base". (Some argue that existing RDBMS are a bit business-centric and thus are really business-bases and not data-bases.) If one particular derivatives shop wanted proprietary or single-company functions just for their particular company, it's shifting toward being an application and less a something-base. It begins to blur the line.

We could draw a hard line saying if it handles something that two or more applications would otherwise have to implement, then it's a thing-base instead of an application. But this may be premature dicing and trigger definition battles over "application". Two or more companies may also be a boundary to consider.

In short, a foo-base is a system that handles the storage and retrieval of foo's atoms, plus some commonly-used processing features or operations of foos. Commonly-used generally means cross-application or inter-application.

--top

[Huh? You don't try very hard to be clear. You just throw it up on the wall as-is... :-)]

If you have a question, just politely ask and I will do my best to politely answer. If you find it poorly-written, I assure you it wasn't my intention. With specific criticism I can apply specific fixes. -t
External Links:
CategoryDatabase, CategoryRealData
DecemberZeroNine

EditText of this page (last edited April 19, 2013) or FindPage with title or text search