A high level description of this document, which is meant to be used as a reference for the development of madz. This will briefly cover many of the main points of the entire document.
Important concepts required for understanding the vision and design of madz. These will be presented in such a way as to only rely on previous concepts. But will assume the ability to look up well known concepts, specifically including:
Some examples and descriptions of what madz should provide. Vision is split into different categories of what madz will provide.
The core point of madz is to manage software projects. This ranges from checking out the source code, exploring (aka ‘reading’) the source code, organizing the source code, adding to the source code, editing the source code, committing the source code, building the source code, testing the source code, deploying the source code, debugging the source code, reproducing errors with the source code, documenting the source code, importing external source code, and exporting existing source code and any other source code manipulation.
It also provides features for managing libraries, SDKs and APIs, and custom tool chains used by the developer.
A highlevel discussion of feature design, at multiple levels of madz. Design is split into abstraction layers, each of which represents a set of abstractions used by components and systems of madz at that layer, as well as the implementation of features at that layer.
Some of the abstraction layers described here will in reality be separate libraries originally designed for madz.
Abbreviated madz, massive dangerzone is a system for managing data, and data about that data.
More specifically it is an attempt to fix many common problems in software engineering by re-imaging how we manage data. It is a framework for building software systems. It’s initial primary usage is as a tool chain for managing complex code bases.
The guiding purpose of madz is, simply, to: Fix Software Engineering. There are a number of core principles which unpack from that. In no particular order.
There are a couple of core realizations which madz takes for granted. Some of these are architecture related, others are high level concepts, and some aren’t easily categorized.
These are the core technologies required for madz. They will start as python libraries within madz, and eventually be separated into madz module libraries.
The primary purpose of madz is to manage modules. That is data, and data about that data. Modules can have many different properties, some of which will be covered below as concepts. Different sets of properties are used by madz to provides different features for modules.
Modules can go by many names depending on the properties it has, note the names are generally not exclusive (although some are subsets of others), hence modules are an ontology (as opposed to a taxonomy).
Markers within the module system which keep track of objects and actions, and how they relate to each other.
Specifically the entity system for the modules keeps track of the interesting objects related to each module, and other objects register themselves as interesting in other ways, each registration contains a description object which allows the objects to be organized, typed, and compared.
Verbs are also added to the system, whether as compound actions which invoke other verbs, or as singular verbs. These verbs can provide different capabilities and information. For example descriptions of the types of objects they require, alternate functions which can pretend to do the action if given mocked up object types, etc. Others may out right describe the action they perform.
Verbs always take a world, every noun is part of a world. Many nouns provide functionality for creating mocked versions in a ‘fake’ world. World use contexts.
This system is used by the AI to perform actions across modules. And by the user interface for generating interactive consoles and completing input.
Every module has a description. This description includes information about the purpose of the module, how to use it, and how it should affect other parts of the system. The description is what drives the majority of the other features of the module system.
Each description has an arbitrary number of sub parts, depending on the features available for parsing it. Each description component has a list of dependencies from which it comes from. Descriptions are acquired in various stages.
It’s important to note that types of information generated, and the formats containing information are disjoint. One file could contain identity, forge, and code descriptions. Description information can also be inherited from libraries (calculated at reduce time).
Perhaps the most important part of a module is it’s identity. However identities exist for many other things. For example, both forges (and their components) and code symbols have identities.
An identity provides a couple of key features:
When used for code symbols, for example, the identity system is used to find the symbol in question, and other partial matches.
When used for forges, for example, the identity system is used for searching, versioning, and compatibility.
Finally, when used for module, for example, the identity system provides searching and matching.
A projection is an operation performed on information (like a description or source code) which produces another view, or format, of the source. Typically projections are one way, although they don’t need to be. For our purposes projections means an operation which takes information in an in memory machine format (for example, in a parse tree, or the generated semantic tree) and produces another format meant for storage or reading.
An artifact is the projection of some of the description and source code of a plugin, from which the description and source is likely not recoverable, and more specifically, which is designed to be executable. That is an executable projection.
Forges are a collection of code which is responsible for generating artifacts for code modules. A forge may be responsible for multiple code modules, and multiple types of artifacts.
Forges contain a lot of parts, specifically:
Generally the design of the forge is that some parts provide new Actions for the plugins whose forge identities match a forge range. These actions are often limited to the target being attempted.
Reports represent another entity component system. Specifically a way of describing the result of some action. Reports are split into trees of results of the action. And provide human readable data. And potentially internal madz references to other reports or available features.
They are meant to provide a dynamic way for people to inspect the results of madz actions, and view the current state.
Madz is not a program, and it is not an application, at least not in the traditional sense. Madz is a library for managing data, with many high level objects cooperating to deliver powerful features.
Generally madz will have a toplevel set of bindings which represent the entire context of the current madz execution. These are global variables (actually thread local to allow for concurrent execution) which provide the top level objects of madz which orchestrate it’s features.
These toplevel bindings may be manipulated to provide different environments as required.
The list of top level objects:
Description loading is a lazy, resume-able, multistage process to find descriptions of modules.
Software engineers love modularity, but we rarely see it in our support tools. How modular is your compiler? Your programming languages? Your build system? These are the tools used every day to build software, but to add, remove, or change features in them is ridiculously difficult.
Hence a core principle is to make every aspect modular. Modular abstraction capabilities, modular implementation features, modular optimization techniques, even a modular framework of modularity.
The core process of software engineering is the generation and implementation of abstractions. However there are few (if any) ways to represent abstraction across multiple domains, capturing multiple, potentially unlimited, facets, in arbitrary ways.
Let alone ways to translate these abstractions to different equivalent domains, and projecting them onto arbitrary limited information domains. We use design patterns to reduce the mental overhead, but they don’t fix the root problem and cause lost information.
Hence a core principle is to enable powerful abstraction description and manipulation. Design patterns should be a first class objects.
This is in opposition to monolithic, opinionated, incremental, or platform specific abstraction systems.
The other side of the core process of software engineering is implementation. The implementation of abstraction(s) is a process of choosing between a myriad of competing concerns. Two implementations of the same concept are unlikely to be exactly the same. Yet there are few effective ways for managing variant implementations of the same concept at all but the most trivial levels.
Hence a core principle is to enable multiple implementations for any abstraction. Reimplementation should be done side by side, and should be separable from the project with the original implementation.
This is in opposition to concrete implementation designs.
How many projects use more than a couple programming languages? Or even more than one? Why is it so hard to setup a new programming language in your current project. If part A of the project requires features that language X excels at, and part B requires features that language Y excels at. Why choose language Z which is simply good for both A and B.
This applies to drivers (and the libraries they provide; e.g. DirectX vs OpenGl), operating systems, run-times, virtual machines, and communication protocols.
Hence a core principle is to allow any platform to be used for any implementation. Though this may limit which platforms the final result is usable on, and reduce performance.
This is in opposition to platform limited tool chains (they all are, madz will also provide “madz free” platforms, allowing it to be used for arbitrary projects).
Software projects should be considered in their entirety when optimizing. Optimizations should be possible across multiple layers of abstractions, considering all possible implementations, and manipulations based on their platforms. Engineers should worry about writing locally optimal code, and directing high level machinery to glue it together optimally. Where optimal is defined as the concerns of the engineers for each portion of the software.
Hence a core principle is to optimize across all available data.
This is in opposition to limited optimizations done by many toolchains.
The identity of something should never be in flux. It should be known exactly what you are asking for. Even if what you are asking for is a partial identity. Too many toolchains have ill-defined ways of managing the identity of code. Duplicate version numbers, code forks, abstraction changes.
Hence a core principle is to provide capabilities for useful universal identity.
We use ontologies to represent a number of objects within our system because they provide a way for something to be of many different types, which allows us to only focus on the relevant classification information, and allow other information to exist separately from that which our current section of the system can understand.
One of our core insights is that data should have useable metadata, that is data about the data. Our core goal is to build a system which can, when given some data, understand it by using the data about the data.
This comes back to bootstrapping, the pre-bootstrap system can understand specific types of data, which it can use to understand other data, and so on, arbitrarily, until the goal is reached.
By using AI techniques like Planning we can find the solutions to the problems facing the system. For example, we can use a knowledge base about our current information and a planner to find the correct piece of data or action to take.
Another core insight is having the ability to parse some data, learn about the representation of that data, and project it back out. Performing this cycle is a common operation within madz.
A library for making entity component systems. Supports various patterns for manipulating entity component systems, including madz wrapper syntax extensions. Also provides interfaces for adding optimizations to entity system construction.
A library for making entity component systems. Supports various patterns for manipulating entity component systems, including madz wrapper syntax extensions. Also provides interfaces for adding optimizations to entity system construction.
Uses EntityComponentSystem to provide a semantic node library. Primarily translations between different semantic domains, and support for tree/graph traversals of semantic nodes, among other utilities for support of end goal features like parsing and generation. Also includes serialization/serialization.
A library for making parsers using semantic node. Will support the full range of context free grammars via Earley and SGLR based parsers. will include generalized techniques for optimizing context free rules (separate from string based parsing).
A library for generating well formatted text (and other outputs) using semantic nodes. Based broadly on the idea of hierarchical planning rules (but Context Free Limited), some of which can generate more nodes, others of which can actually generate strings.
A library for generating languages, i.e. both the parsers and generators and semantic node domain for a language. As a high level library, may also eventually include additional sub libraries, like configs, alternate (non-graphical) outputs, etc.
A library of basic logic routines for use with planning. Namely declarative language like solutions. Including a description of solvers.
A library providing ways to specify general planning problems (i.e. actions, objects, predicates, heuristics). And modular solvers for them.
Eventually including additions (potentially as separate libraries) such as:
A library providing features for providing identity of objects. Supports various patterns for manipulating identity systems, including madz wrapper syntax extensions. Can be mixed with entity components.
Focuses on providing a way of uniquely identifying objects. Supporting techniques involving cryptography, unique identifiers, and human readable info. As well as generated human information.
The most common type of properties held by a module (‘module’ is often assumed to refer to code module, unless another name is used). Contains source code in a language suitable for compiling and integration with a binary.
A module containing assets which aren’t code (or more specifically, not meant for inclusion with the binary). Asset modules often have their assets collected by another process and packed up separately from the binary.
A code module meant to be dynamically added at runtime, often dynamically updated as well. Note that a plugin can still be added at build time like a normal Code Module, but this may prevent some of it’s plugin features (like reloading) from being used correctly.
A module which provides organizational information, documentation, and other project level information for a collection of other modules.
A module which describes entry point(s) for the project.
A code module which provides a variety of code modules via it’s code. Sort of like a lisp macro but more expansive. Generally used for things like templates.
A code module which adapts one module to provide another. Generally used to adapt between different variations or versions of existing modules.
A code module which adapts some other codebase to fit within a module.
A code module which adds features via indirect inclusion (i.e. inversion of control, dependency injection, reflection).
A module which contains only a description. (i.e. not an asset or a code module).
A concept module specifically for the description of plugin modules.
A plugin module which is dynamically loaded over a network, interprocess, or similar connection (i.e. the plugin is actually running on another machine).
A bridge module which wraps a live service (like an API).
The description of the identity of the plugin. Namely for versioning, namespacing, organizing, and searching. But probably most importantly for uniqueness and security. The identity description includes signature descriptions, and all of the identity domain descriptions can be used for uniqueness. Some example components within the identity description domain:
A description of the other modules this one relies on. Different modules are relied on in different ways, and those relationships may describe how to build other modules based off of the relationship.
This is a description of code, often including the code within this module, but could describe code in other modules. Often relies on identity to provide unique locations of some types within it.
Namely this module is responsible for describing the way code modules interact, for example:
This is a description of how to build, wrap, load, and execute a code module. Relies on the code description and dependency description to do some operations (but not others). Namely contains:
A singular identity is a collection of information describing a unique entity. Some of this information may be indexed, or otherwise used to provide features, but an identity is singular.
An identity range describes a potential range of identities, and is attempting to find a single one. Often there are resolution methods attached to the range which can choose the best choice among a list of plugins.
Every artifact has a target, which is an identity range of what it needs to be useable on. The matching identity is the target actually being ran on. The target includes information like:
Nearly every piece of code which runs in madz is tied to an action, some driving force which caused it to be executed. Reports are hence organized primarily by the action that they were responsible for. This is the entity used in reports, the action that caused the code to be ran.
State Reports express the current state of the madz system, including the latest action reports in each area. For example every time a plugin is built, it’s latest action report is used as the current state of that plugin.
A subset of state reports, these are collections of the latest action reports for a specific object.
Reports have the capability to be dynamic, to be updated when information changes, to provide the most up to date state of the system. And also to provide links to objects, and their actions.
Perhaps the most commonly used part of reports are the logs they contain, which contain text, and objects, about what happened. Report logs contain tags, and other features, for organizing them.
Entity component systems represent the live data stores for many of the more dynamic entity types, or more precisely, for ontological (as opposed to taxonomical) based object systems. Primarily modules (and the bootstrapped modules), although reports and files are also setup as entity component systems.
These systems are designed to provide objects which are ontological in nature, and provide support structures for those objects to act like any object required.
Detect new module descriptions.