Data Manipulation

DataManipulation is the Manipulation of Data. ;-)

That's obvious, of course.

The ability to perform DataManipulation presupposes that you have Data to Manipulate, meaning you must have a DataBase.

If you wish to delegate and/or automate the task of DataManipulation, you must have the ability to express, through either imperative or declarative statements, the precise manipulations you wish to perform. Any such expression is necessarily part of a language. A language designed specifically for the domain of DataManipulation is called a DataManipulationLanguage?. The most popular of such languages is the StructuredQueryLanguage (SQL) which is used for many RelationalDatabases, but there are many others.

(DataManipulationLanguage? (DML) is generally considered a subset of StructuredQueryLanguage (SQL). The "other part" of SQL would be DataDefinitionLanguage? (DDL) -- including CREATE, DROP, GRANT, REVOKE and ALTER statments.) - agreed, and that's not even counting the parts for concurrency management and transaction support.

There is some confusion inherent in common terminology. Conceptually speaking, querying something should not involve manipulating it (and though Heisenburg would call that WishfulThinking in general, it's actually doable for Data). A query to a database is a request for useful information regarding which you desire a response. A manipulation over a database is the performance of some alteration to it - the addition or deletion of data items (datums?). A perhaps more appropriate term would be 'Database Manipulation Language', but CestLaVie.

The manipulation portion of SQL consists of its 'Insert', 'Update', 'Merge', 'Truncate', and 'Delete' operators. Only SQL's 'Select' operator is used for querying. (Other portions of SQL are for concurrency control, security, data definition, etc.)


On the Purpose of Data Manipulation

(From a discussion in DataSpace)

There are also limits in purpose for DataManipulation: I don't believe I'm missing anything, there. Reflection and Projection are conceptual duals (Projection casts an internal world of facts into an external one; Reflection collects an external world of facts into an internal one), as are Inference and Maintenance (Inference 'learns' and Maintenance 'forgets', both to increase the value of the database), and as are open-world and closed-world (open-world can add or removes information; closed-world cannot), so these provably cover all possible purposes for DataManipulation.


"I want PowerfulAdHocDataProcessingTools."

I have lots of data that isn't held hostage by a RelationalDatabase. I have data in CVS, for instance.

Do you mean ConcurrentVersionsSystem (CVS) or CommaSeparatedValues (CSV)? Not that it matters much. I'd question what you mean by "not held hostage by a database" [note: above statement since changed to 'RelationalDatabase'], since even a CSV or XSL file is (at least by many DatabaseDefinitions, including DatabaseIsRepresenterOfFacts) a DataBase. It just isn't a DBMS. And the basic reasons for DataManipulation upon it do not change.

ConcurrentVersionsSystem (CVS) doesn't respond well to SQL queries. But it does contain lots of data I find quite valuable. "Who changed this file last month?" could be useful information. And "Who calls this method?" could also be useful information. Attempting to get this data with SQL queries could be... interesting.

Advanced query support in any RefactoringBrowser or IntegratedDevelopmentEnvironment would be very useful indeed... you'd almost expect it to be state-of-the-art by now.


See also: WhatIsData, DataBase, DataSpace, RelationalDatabase, CollectionOrientedVerbs

EditText of this page (last edited July 9, 2010) or FindPage with title or text search