IBUKI data

Version 0.002 2015-07-31

Authors:
Richard Weyhrauch (IBUKI)
Note:
The information in this document (and its companion documents) on IBUKI Data is the property of IBUKI but which has been released as an open source specification.

Abstract

This document was written by Richard Weyhrauch to accompany his theoretical work on AI agents. This document describes IBUKI data which is available as part of all IBUKI systems.

Table of contents

  1. Introduction
  2. Base Data
  3. Native Data
  4. Subtypes of Data


1 Introdction

It may seem odd that IBUKI has gone through the trouble of defining IBUKI data explicitly since every programming language does exactly that and logicians do this when they axiomatize arithmetic or set theory. Our desire to develop a thinking system that can reason about its own reasoning capability has played a major role in our thinking. The data structures generally used by logicians (eg sets, natural numbers snd sometimes strings (Godel) or trees ...) are too restrictive and lack practical utility. No real usable system would encode numbers as sets or use Godel numbering to describe well-formed formulas. The data structures of computer science likewise are not practical for our purposes. They are both too diverse and too ad hoc and defined intentionally (their semantics is frequently tied to a particuliar processor or operating system ...). IBUKI has chosen a middle course. At the data level IBUKI has selected a collection of data types and has described their semantics in a way that allows us to build a system that 1) can think about them and 2) their properties are the ones used by mathematicians and logicians (e.g., IBUKI [integer]s are closed under addition. This property is fundemental to the arithmetic of logicians but fails for most programming languages.). On top of these IBUKI has defined a particular type system (in the sense of what type on a thing something is) called IBML, a particular computation system, SEUS, and a particular logical system, FOL contexts, that can be fully implemented using just this data.

The uniformity of data across IBUKI systems guarentees that IBUKI systems can be built that can reason about themselves using 'traditional' axioms.

We also know that, as a practical matter, builders of computer systems will want to use programming languages and other systems in conjunction with any IBUKI system. For this reason we have introduced 'native-data' as a general escape hatch for this purpose. This mechanism allows a developer (or for that matter a theoritician) to integrate an IBUKI system with whatever tools or techniques are convenient. The cost (dare I say penalty) for this is that reasoning about such objects will require additional work on the part of the developer. Perhaps a small price to pay for the flexability. More details as well as some discussion of its adaquacy in the section on native data.

Native Data

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;; [data]
;;
;; [base-data]
;;   
;;  [tag] 
;;    TAG?
;;
;;  [character]
;;    CHARACTER?
;;      chr2num num2chr
;;
;;  [string]
;;    STRING?
;;      str2chars (string)
;;      chars2str (chr-list)
;;
;;  [number]
;;    NUMBER?
;;      Zero
;;      ZERO?
;;      chars2num
;;      num2chars       
;;
;;  [symbol]
;;    SYMBOL?     
;;    sym-mak 
;;      str2sym sym2str       
;;    sym-get-name
;;
;;  [pair]
;;    PAIR?
;;      pair-mak (car cdr)
;;      pair-get-car
;;      pair-get-cdr
;;
;; [native-data]
;;   NATIVE-DATA?
;;   native-data-get-anno
;;
;;
;; subtypes of dats
;;
;; [number]s are defined in terms of [natnum]s  BUT
;; as [data] they include all [rational]s
;; operations and functions on [number]s coerse type
;; eg 0/0 ==> 0 ...
;;  [number]
;;    [rational]
;;      RATIONAL?
;;        RAT-ZERO?
;;        RAT-POSITIVE? RAT-NEGATIVE?
;;        RAT-NATNUM?
;;      rat-mak (num den)
;;      rat-get-num
;;      rat-get-den
;;    [integer] < [rational]
;;      INTEGER?
;;        INT-ZERO?
;;        INT-POSITIVE? INT-NEGATIVE?
;;        INT-NATNUM?
;;      int-mak (sign natnum)
;;      int-get-nstnum
;;      int-get-sign
;;    [natnum] < [integer]
;;      Zero
;;      NATNUM?
;;        NAT-ZERO?
;;        NAT-NATNUM?
;;        chars2nat nat2chars
;;      successor
;;
;;  [list]
;;    LIST?
;;      EmptyList
;;      EMPTYLIST?
;;    list-add-1st (list DATA)
;;    list-get-1st
;;    list-get-rst
;;
;;  [sexp]
;;    SEXP? == (type-union [atomic-data] [pair])
;;
;;  [tagged-sexp] < [sexp]
;;    `(,[tag] ,[anno] ,@[sexp])
;;
;;  [finite-set] < [tagged-sexp]
;;    `(*finite-set* ,[anno] ,@[list])
;;
;;  [aset] < [finite-sexp]
;;    `(*finite-set* ,[anno] ,@[alist])
;;
;;  [anno] < [sexp]
;;    (type-union [anno-name] '(,[anno-name] @,[alist]) 
;;  [anno-name] < [symbol]
;;    '(*one-of* ,[anno-IDK] ,[symbol])
;;  [anno-IDK] 
;;    '(*exa* |symbol| @)
;;
;;  (DISJOINT [base-data] [native-data])
;;
;;  [data] == (type-union [base-data| [native-data])
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; [base-data]
;;
;; [base-data] == 
;;   (type-make-disjoint-union
;;     [tag]
;;     [character]
;;     [string] 
;;     [number]
;;     [symbol]
;;     [pair]
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;


[data]

There are 6 types of [base-data]: [tag]s, [character]s, [integer]s, [string]s, [symbol]s and [pair]s. IBUKI also supports access to [native-data]. These types are mutually disjoint. The expressions defined below provide a uniform description of the basic data available to every IBUKI System.

[tag]s

In normal programming practice how a programmer uses a particular data type is usually in the head of the programmer (or maybe in the documentation) but is usually not descernable by looking at the data. A programmer may use integers to mean shoe sizes in one part of their code and to represent the number of pairs of shoes ordered in another. The idea of a 'tag' is to make explicit distinctions between the uses of the data. [tag]s are used to do this.

IBUKI tags are written as a word that begin and end with the 'star' character #\*. Some examples are

*tree*
*finite-set*
IBML tags are not IBML symbols.

Sequences of characters used in writing and printing IBUKI data is case sensitive, so '*finite-set*' and '*FINITE-SET*' are different tags.

[tag] is a [base-type] and is disjoint from other types of [base-data].

[integer]s

An IBUKI [integer] can have any number of digits and IBUKI integer arithmetic is exact.

[integer]s are read and written as any number of digits or an [integer] optionally preceeded with a '+' (plus) or '-' (minus) sign.

-749321
--6
0
293
128328287193289487593485992843579238749579459

[character]s

#\a #\greek-small-letter-alpha nFor convience IBML uses
  • the characters themselves to name printable US-ASCII characters (eg #\{)
  • short forms of <character-name>s for some frequently used sets of characters
  • short forms of <character-name>s for those characters which have been definitions as HTML entities

The latter two groups overlap. IBUKI has defined many groups of characters. The details are documented in IBML Characters.

[string]s

IBUKI strings are writen as sequences of US-ASCII characters surrounded by quotation marks (ie, the character #\").

"Hello World"
Writing strings containing the quote symbol itself is accomplished by putting a backslash character, #\\, before the quote symbol.
"He said, \"Hello World\"."
IBUKI uses #\\ as an escape character that is not actually part of the string. Note the #\" characters 'surrounding' the string are also not part of the string, but only part of the way we write strings to facilitate reading and printing.

The ability to mention 'string's containing non-US-ASCII characters is intentionally left out of this document. (see ' IBML big strings.

[symbol]s IBUKI symbols are written as sequences of characters, e.g.,
Hello-World
1+ 
[INT]
+INF
Each symbol has a name. The name of an IBUKI symbol can be any IBUKI string that is made up of printable US-ASCII characters that doea not contain the character #\:. and does not look like a [tag]. When writing an IBUKI symbol whose name contains characters that would make it look like more than one 'word' we use the Common Lisp convention of surrounding its name with the vertical bar character, #\|. Thus, we write the symbol whose name is "New York" as |New York|. Note IBUKI data is by definition case sensitive so the symbol written |new| is like the word 'new' but not the word 'NEW', ... IBUKI documents usually write symbols that contain lower case letters surrounded by the #\| characters so as to leave no ambiguity.

This is a version of the syntax of symbols of Common Lisp symbols without package designation, with a more structured use of the #\| character and omits symbols whose name would be look like a [tag] (i.e., start and end with #\*) Ths definition means that neither |:true| or |*tag*| names an IBUKI symbol (although both would name symbols in common lisp).

[pair]s

[pair]s are written ( <car> . <car> ).

The use of 'car' and 'cdr' for the components a pair is a reference to their traditional use in LISP.

[list]s

If d1, ... , dn are are expressions for IBUKI data then we write

(d1 ...  dn)

for the list containing d1, ... , dn.

We say that each <di> is an element of this list. The i>Empty-List is a list with no elements. [list]s are implemented (like in LISP0 as special kinds of [pair]s. As [pair]s, examples of [list]s are themselves [base-data] and therefore they can be nested.

An sexpression, whose type we write [sexp] is either an example of some [base-data_ or a [list]. This definition is of course redundent. Examples of sexpressions are

1
[base-data]
(1 2 3)
(*finite-set* @ 1 2 3)
(|ie,| |a| |finite| |set| |is| |a| |tagged sexpression| |;| |see| |below|)
(1/2 {3/4 5/6} 7/8)
("Hello World" "Hello Mars" "Meet Fred")
("An odd list of numbers!" 1.2 3/2 98302198301820831) 
(42 $True "maybe" %False)
(1.1 {1.2 1.3} 1.4 (1.5 1.6) 1.7 1.8 1.9)

1.2 Native Data

Other data

IBUKI also parses some other expressions for convenience.

Subtypes of [symbol]s

The text above (including the title of this section) contains words of the form '[<characters>>]'. These are all IBUKI [symbol]s but they are used as the names of types. Thus [tag], [symbol], ... should be thought of as the names of types of things. Types are the subjevt of the IBUKI Type Theory, IBML, and details can be found in .

Examples of [type-name]s are

1
[base-data]
[1-3]

[set]s

[Appendix]

Practical Issues

Mutable Data

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;
;;   There are two kinds of data [immutable data] and [mutable data]
;;   The behavior of both is well defined but their theories are different
;;   [immutable data] can be formalized as extential types of data
;;   [mutable data] is formalized relative to an environment and is
;;     prima facie more complicated to formalize.  
;;
;;   [immutable data] has definitions compatible with the 
;;     definitions of IBML subtypes of [data]  AND
;;     the properties of types and data in the corresponding FOL contexts
;;   [mutable data] is used in SEUS environments for doing computations
;;     and are compatible with the 
;;        IBML types for [SEUS data]  AND
;;        the formalization of SEUS as an FOL contexts
;;
;;   The theory of [mutable data] can be replaced by a theory of
;;   [immutable data] but real computations
;;      1) use [mutatable data] all the time  AND
;;      2) computations on real computers do I/O  AND
;;         thus de facto do not behave like functions.
;;   A satisfactory computation theory must account for this
;;
;;   In the code below ALL data is actuably mutable
;;   functions that can change the content of their args
;;     have a #\! as the last character of their name.
;;
;;   Operations on [immutable data]
;;     either only look at a datum or make a 'new' one
;;     (EQUAL A B) - (EQ A B)
;;       -get-
;;       -rem-
;;     (EQUAL A B) - (EQ A B)
;;
;;   Operations on [mutable data]
;;     destructively change data leaving args EQ to before
;;     (AND (EQ A B) (EQUAL A B))
;;       -set-
;;       -del-
;;     (EQ A B) but not (usually) necessarily (EQUAL A B)
;;      
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

[tag]s

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; [tag]
;;
;; (|type| [tag] |subtype] [base-data]
;;  (|defined-by|
;;   (|recognizer| #'TAG?)
;;  )
;;  (|where|
;;   (|in-context| Data (CONSEQUENCE (TAG? |*tag*|)))
;;   (|in-context| Meta (CONSEQUENCE (WELLORDER TAG-LE TAG?)))
;;   (|in-context| Meta (CONSEQUENCE
;;    (FORALL c p s
;;     (IF (AND (CONTEXT? c) (PREDSYM2? c p) (SORTSYM? c s))
;;       (IFF 
;;         (WELLORDER p s)
;;         (AND
;;           (TOTAL-ORDER p s)
;;           (FORALL (s1 (SUBSORT s1 s)) 
;;              (EXIST (e1 (EXAMPLE e1 s1)) 
;;                (FORALL (e2 (EXAMPLE e2 s1)) 
;;                  (|in-context| c (CONSEQUENCE (wff-mak p e1 e2))) ) ) ) ) )
;; )
;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Fractions

-749321/34
-1/2
22/1
293/69
37/128328287193289487593485992843579238749579459
Fractions are written as / with no spaces so 1/2 is a fraction but 1 /2 is not. Arithmetic functions work on fractions whose numerater and denominater can have any of digits and represents an exact rational mumber.