of data is based on the following three principles:
- [duality principle] element consists of two components: identity and entity
- [inclusion principle] element has a parent element in a hierarchy
- [order principle] element has a number of greater elements in a partially ordered set
The duality principle splits one thing into two parts, identity
. Identity is a transient value passed by-value and observed in its original form without any indirection. It can be used for two purposes: as a representative of some entity, and as a data value. Entity is a persistent part of an element passed by-reference and accessible only indirectly so it can be viewed as a thing-in-itself.
Although any element has two sides or flavors they are considered one whole which exists in inseparable unity. It is informally analogous to complex numbers in mathematics which also have two constituents but are manipulated as one whole.
CoM uses a novel programming construct which generalizes conventional classes and is called concept
. Concept is defined as a couple of two classes – one identity class
and one entity class
. For example, if a bank account is identified by its account number and characterized by the current balance then it can be described by the following concept:
In CoM, concept fields are referred to as dimensions
(to emphasize their role). Concepts are used as types instead of classes when defining variables, fields or collections. One concept instance produces one identity and one entity. The identity is then used to access this entity and can be passed or stored as its representative. If identity class is empty then concept is equivalent to class. If entity class is empty then concept defines some data type in a domain.
It is important that identity class is different from primary keys because primary key is a role of some attributes which are still normal attributes while identity exists separately from its entity. It is analogous to conventional memory with an arbitrary address structure defined in identity class and arbitrary cell structure defined in entity class.
Concepts can be used to define a type of elements in a collection. For example, we could create a table with bank accounts as follows:
CREATE Table Accounts CONCEPT Account
Table Accounts will contain entities of concept Account which are identified by their account number. It is important that such a table is different from relational tables because concepts allow us to customize both columns and rows. Columns are defined in entity class while rows are defined in identity class. This makes the picture symmetric. In contrast, only columns can be defined in relational tables.
Inclusion relation is a generalization of inheritance. It is used to declare a parent concept. If a concept is included in some other concept then all its instances will have parents of this type. In contrast to inheritance, parent elements in CoM can be shared and one parent may have many child elements. For example, if a bank is supposed to have many accounts then concept Account has to be included in concept Bank:
CONCEPT Account IN Bank
Now any account will exist within some bank. Moreover, the main property is that any account is identified relative to its bank. Such an identity consisting of several segments is called complex identity
. Complex identities are analogous to conventional postal addresses where child elements define a position relative to their parent element.
Order principle is intended to bring semantics into the model and from data modeling perspective it is the most important and unique feature of CoM. This principle postulates that all elements are partially ordered where an element has a number of greater elements and a number of lesser elements. In other words, all elements exist within a partially ordered set rather than within a flat set.
To establish partial order among elements CoM uses the following rule: if element A reference element B then A < B (A is less than B). Thus references in CoM play a role of semantic construct rather than a simply navigational tool. The fact of referencing directly influences semantics of the model which depends on how elements are positioned with respect to other elements.
Concepts are ordered using the following rule: if concept A has a dimension (field) with type B then A < B (concept A is less than concept B). Thus dimension types of a concept define its greater concepts in the partially ordered set of concepts. For example, if concept Account has a dimension referencing the account owner then Account is a lesser concept while Person is a greater concept:
CONCEPT Person // Greater concept than Account
CONCEPT Account IN Bank // Lesser concept than Bank
Person owner; // Person is greater concept
A partially ordered set of concepts with inclusion relation is referred to as concept-oriented schema
. Lesser concepts in schema are positioned lower their greater concepts. In the above example, concept Account (lesser concept) will be drawn under concept Person (greater concept).
This bothers me a bit. The relationship between bank and account may not necessarily be hierarchical. An account may serve as the conduit of exchange between two banks, for example. Thus, they both reference each other. Banks reference accounts and accounts reference banks. And an account may have a person assigned to it as the primary account manager. References directions are often not "invariants" (and one cannot know up-front even if they were). One has to be careful when hard-wiring a hierarchy into a design. It may byte you on the butt down the road. Related: LimitsOfHierarchies.
Partially ordered structure of concepts is used to define two operations of projection
which are applied to a set of elements and return either their greater elements (for projection) or lesser elements (for de-projection). More formally, operation of projection '->' (right arrow) returns a set of greater elements along the specified dimension. For example, given a set of bank accounts in collection Accounts we can get their owners by projecting along this dimension up to collection Persons:
Accounts -> owner -> Persons
Projection can be applied consecutively and intermediate collections could be restricted. For example, if persons have addresses then we could get all addresses for some subset of accounts as follows:
(Accounts | balance > 100)
-> owner -> (Persons | age < 40)
-> address -> (Addresses | city == 'Moscow')
Operation of de-projection '<-' (left arrow) returns a set of lesser elements along the specified dimension. For example, given a set of addresses in collection Addresses we can get all persons living there by de-projecting them down to collection Persons:
Addresses <- address <- Persons
The persons can be further de-projected to bank accounts and we can use additional constraints:
(Addresses | city == 'Moscow')
<- address <- (Persons | age < 40)
<- owner <- (Accounts | balance > 100)
This concept-oriented query will return all accounts with high balance for young owners living in Moscow.