Are Types Tied To Syntax

(I would put this discussion under TopOnTypes rather than create a new topic, but that one is stuck in a vandal loop or EditWar it appears.)

In WhatAreTypes, it is strongly suggested that "types" are tied to "syntax". This struck me as a very odd definition and has led to LaynesLaw battles over the definition of "syntax".

As a thought experiment, suppose an alien race preferred to program directly into AbstractSyntaxTrees (AST's) out of preference. (Humans can too, but it never really caught on so far... though it is often alledged that LispLanguage comes close)

We all agree, I hope, that any language that has "types", including static compile-time-checked types can be represented as an AST.

How does the syntax-centric definition indicate which AST's have "types" and which don't?

Or, are types "lost" as soon as we turn it into a tree?

Questions that may arise are:
                      A = B + C * D

A -----> + ------ B | * -------> C | -------> D (AsciiArt here needs work. See also AbstractSyntaxTree for another example.)

(This text is here to work around a formatting bug)

See "Tree Example 2" below for another example that involves type conversion.

From the little I've read so far in the "formal type theory" references that Costin has provided, I get the impression that the syntax being referenced here is the syntax of type definitions. (I.e. not the syntax of the language that uses the types -- although often the former is a subset of the latter.)

"What is the difference between any tree and AST's?" An AST has to be composed of the valid lexical tokens of the language in question, and the tree has to represent a valid structure according to the rules that are the syntax of the language.

"Is any "executable" tree an AST?" What's an executable tree? A tree can be considered as input to an algorithm; processing a tree in this way could be considered "executing" it. If the tree is "executable" according to a given processor, and the valid input for that processor can be described by a syntax, then I suppose you could say that the tree is an AST for that syntax.

Is the definition of AST also tied to definition of "syntax"? Obviously, hence the name.

You can certainly represent a tree as a (set of) table(s). Consider a syntax as a set of rules for composing lexical tokens into an AST. Aside from structuring your tables so that they can represent the given AST, you'd also probably want to define constraints that reflect the rules of the syntax, disallowing invalid constructs. Typical representations of syntax rules will probably be a lot more clear and succinct than an equivalent expression of the rules using typical representations of database constraints, I think.

-- DanMuller

Suggestions: Clarify which syntax is meant here, the syntax that describes the types of a type system, or the syntax of a language that uses types. Related: Clarify what is meant by an "executable" tree. Since you're talking about types, I assumed the former, but but "executable trees" makes me think of the latter. Thanks. -- DanM

I've clarified "executable" as best I can above. I don't think I can give a 100% precise definition because any information (data) is potentially executable. Whether it has useful results or not is defined by the user. For example, one can take a spreadsheet file 'foo.xls', rename it to 'foo.exe' and type that at the DOS command line. The OS will try to run it, probably to the point where it causes a run-time error. (Some OS's can figure out the generating app and launch it from there, but I am not assuming that feature here.) But, we cannot define 'executable' as something that does not cause errors, because even programs with flaws often still have use and almost every complex program probably has at least one flaw, however subtle or unlikely to be encountered. Thus, in an informal sense, an 'executable tree' is a tree that contains instructions of some kind (and perhaps related operands) that were put there with the 'intent' of being executed. Please don't ask me to define intent. --top


Tree Example 2

   myNum = 7;
   myString = "foo";
   ...
   A = myNum + myString;

A ------- + -------> myNum {num} | --------> myString {string}

Suppose this is the syntax tree that would be generated in a dynamic language. The portion in curly braces are the internal type flags often used by interpreters to keep track of types. (Each operand "node" has a "value" and "type" attribute.) Now suppose we lost the source code forever and ever and just have the syntax tree (similar to P-code or MS-CLR). Does this syntax tree have "types"?

The interpreter would generally take such p-code and the plus operator would notice that at least one operand is a string, and thus convert the result to a string.

An interpreter does not have to use type flags. Such languages tend to have operators that explicitly determine types, such that "+" assumes only numbers and "." or "&" assumes strings and converts any numbers to strings, or stores them as strings from the beginning. "+" would dynamicly interpret/parse the strings as numbers (and abort if it can't).


PageAnchor: spreadsheet_scenario

Machine Optional

Suppose we had a spreadsheet with the following cells:

 Attribute....Type.....Bob...Cindy...Mike
 ----------------------------------------
 Age..........Number....43.....34....51
 Shoe-Size....Number....11......7....12
 Fav-Color....Text......Red...Blue...Green
 Hire-Date....Date.....12/02..8/98...4/01
 Etc...

Most would consider this sheet to define "types" even though it may be for human consumption only. No machine may ever use the "Type" column. This is to illustrate that a formal type processing system ("type system") is not a requirement to have "types" themselves. A formal type system is not a prerequiste to have "types".

[The sheet doesn't define types, it identifies the types to which attribute values belong. I see nothing there that defines the constraints under which we determine whether a sequence of symbols is a Number, or a Text, or a Date, nor anything that describes the operations we can perform on values belonging to those types. That must exist somewhere, if only in your head. Otherwise, it would be meaningless to distinguish one type from another. If we formalise how we describe what distinguishes one type from another, we call it a "type system." Therefore, a type system exists whether you contemplate it or not.]

For the sake of argument, would it change anything if the sheet was supplied with BackusNaurForm descriptions of the referenced type names?

[BNF of the type names??? You mean something like "<Number> ::= 'N' 'u' 'm' 'b' 'e' 'r'"? It's a start. It would allow us to automate distinguishing one type name from another, but it does not describe the nature of these types, or how they interact with values of other types, or how new types can be formed. Can we perform addition on values of Date and Text? How about values of Date and Number? Can we divide a Date by a Number? Can a Text contain spaces? Can a Number have decimal points? Can we combine these types to form new types, and what are the rules for doing so? These are the beginning of the issues you need to consider in designing a type system. You probably already have some mental rules (i.e., a mental type system) -- now familiar to the point of being implicit -- which you've formed from your past experience with non-computing values of these types, and from experience with software implementations of type systems.]

Sorry, I meant "types per name"; the grammer of items on a row with a given name. Just the syntax, not any kind of (formal) processing. Can they still be called "types" in your book without having behavior defined?

[Sure, but the behaviour must be, at least, implicit. If there is a syntactic mechanism for distinguishing kinds of values, then we have at least the start of a type system. If there is no syntactic mechanism for distinguishing kinds of values, but we still regard them as distinct, then there must be an implicit type system, i.e., the behaviour and properties exist in the mind of a human observer. What if there is no mechanism at all for distinguishing types beyond an arbitrary naming? E.g:]

 Attribute....Type.....Bob...Cindy...Mike
 -----------------------------------------
 Norfle.......Gadoink..3%z...foo.....^^^
 Spoont.......Piffle...nN....7uP.....12
 Fazoom.......Thwock...50&...Blue....%24
 Hoggleboof...Zorg.....*%&...92/98...thoon
 Etc...
[Gadoink, Piffle, Thwock, and Zorg only represent types if somewhere there is an associated type system. Otherwise, they are arbitrary and meaningless labels.]


(from above)

Perhaps you are confusing the syntax used to represent language-expressions with the language-expressions so represented...

I was trying to convert it to a different format, graphs, in order to get away from syntax issues.

- which exist at two entirely different layers of abstraction. I think you're confusing the two, which makes me think that any direct answer I give you will just confuse you because you don't understand the more foundational concepts it is based upon.

Or, perhaps you are mistaking personal internal subjective notions for external reality. It is a common fault in software engineers. (I know it is rude to externalize my personal judgments of you, but you started the practice and I feel I have a right to return in kind.) Your question, to avoid being extremely non sequitur, should NOT have been about graph conversions of programs, but rather about graph or multi-dimensional representations of programs, which are two entirely different things!

What is the difference and what is an example of the difference? I suspect there is a miscommunication here.

To provide some answers: A languages source possessing a syntax in 2D or non-linear space (like BefungeLanguage) does not affect or determine the types of the language-expressions, which are still based upon the properties of said expressions within the language being expressed.

Agreed. The "types" may come about from running the program rather than be directly in the program (pre-run). But if one can x-ray the run-time or compiler engine, types will usually be represented as a kind of side-flag(s). That is how one can objectively observe types being created. More on this below.

Indeed, if the graph syntax allows for manifest type-annotations, they're still manifest - just as manifest type-annotations are when you see them in text. Therefore, if you want to highlight them with a yellow pen, you can still do so... at least if you can find a nice way of organizing a representation of the graph upon a 2D plane so you can print it upon paper. OTOH, if types are mostly implicit in the graph syntax, you'll have just as much difficulty in highlighting them as you would have highlighting implicit types in any other syntax. This would all be quite obvious once you actually are familiar with and literate in the higher-dimension syntax rather than in your current position of trying to wrap your head around the concept.

What I am trying to do here is provide a concrete common representation (graph), and then find a way to identify types with that representation. By using a common format/representation, we can get away from things like semi-colons and quotes (syntax). If we need to represent the run-time or compiler engine as a graph also, not just the program, then so be it. (You don't seem to get what I am trying to achieve, and instead focus on the specific suggestions/examples meant to help communicate, and criticize me based on the fact that the examples are not all-encompassing.) If on the other hand types are ONLY in the mind, then a common, tight definition is impossible. Mathematicians invented a common notation to help them describe and communicate their stuff: Math. Similarly, I am proposing we use graphs as the common notation to aide in communication. If you have another suggested representation that is not a specific existing language (like C++ or Pascal), please bring it up. (Maybe that's what Reynolds did, but it is a difficult notation and not very wiki-able.)

What I propose is that in at least one of the graphs of the various steps (program, compile, execution, etc.) one will SEE side-flags of some sort in most things that people agree have/use "types". This is hopefully the concrete external objective representation, the Higgs Boson tracks of "types".

You cannot escape syntax by moving to a graph-based notation or representation for a language. Do you not see the inherent contradiction there? A graph-based notation or representation IS a syntax. What you CAN do, however, is avoid and simplify various other obtuse or unnatural syntax, especially when it comes to naming of things like variables and types or declaring data with complex associations. (In a graph-syntax, you can often avoid names entirely because nodes can represent named things and certain edges can indicate their placement.) Also, a graph-based syntax can allow for more interesting object and source-browsers, and a better programmer UI. These things are positives, but they aren't magic that resolves the need for or existence of syntax.

Also, moving to a graph-based notation does not create or eliminate any language-features if you're still expressing the same language... using graph-based notation will not give you closures or multiple dispatch or module-support or concurrency-support where they did not exist before. It will NOT eliminate the need for other languages that provide these features that the graph-based language fails to provide. The QuestForThePerfectLanguage might very well lead to a graph-based primary syntax for a language, but the graph/graphical/visual syntax is not, itself, an answer to many real language-design problems (though it might be a answer to ONE real problem, which is suitable and intuitive syntax and representation).

As to the idea of allowing programmers to easily 'see' the annotated program-graph including type annotations ('side-flags'), that's a laudable goal. It just doesn't support "flag theory". The flags there are reflecting the types, not defining them.

Again, the usage of the type flags is usually language or app-dependent, and thus should not be part of the definition -- except perhaps requiring different treatment at least in one spot, otherwise they are merely documentation at best. I could agree to that. In our definition, this would ensure that the type indicators are "used", but does not dictate how. Deal?

Your theory won't survive on politics and 'Deals', or at least I won't help it do so. So, no deal. You speak of type flags as though they necessarily exist, and I'm not willing to give you even that much.


MetaDiscussion? moved to VaguesDependingOnVaguesDiscussion? (@TheAdjunct)


See also: WhyTypeSyntax

AugustZeroSeven

EditText of this page (last edited August 28, 2007) or FindPage with title or text search