Faxe at it's core implements the architectural ideas of dataflow computing from the 1970s and 80s.
The central idea was to replace the classic von Neumann architecture with something more powerful. In a von Neumann architecture, the processor follows explicit control flow, executing instructions one after another. In a dataflow processor, by contrast, an instruction is ready to execute as soon as all its inputs (typically referred to as “tokens”) are available, rather than when the control flow gets to it. This design promised efficient parallel execution in hardware when many “tokens” were ready to execute. -- The remarkable utility of dataflow computing
Nowadays with the need to process large amounts of (flowing) data and with the rise of machine learning frameworks, dataflow computing has gained in popularity again.
For example, machine learning frameworks like TensorFlow represent model training and inference as dataflow graphs, and the state transitions of actors (e.g., simulators used in reinforcement learning training) can be represented as dataflow edges, too.
Other research has extended the original dataflow graph abstraction for streaming computations. Instead of evaluating the dataflow graph once, with all inputs set at the beginning and all outputs produced at the end of evaluation, a streaming dataflow system continuously executes the dataflow in response to new inputs. This requires incremental processing and a stateful dataflow. In this setting, new inputs from a stream of input data combine with existing computation state inside the dataflow graph (e.g., an accumulator for a streaming sum). -- The remarkable utility of dataflow computing
This is exactly where FAXE picks up the dataflow computing idea.
To get started, we look at some of these concepts in Faxe.
The components that make up the computing graph are called nodes
in faxe.
There are many built-in nodes
for various different tasks, such as
- getting data from PLCs or Modbus devices
- reading and writing data from different databases
- publishing and consuming data from mqtt or other message brokers like RabbitMQ
- windowing
- statistics
- manipulating fields in data, that flows through the computing graph
- ....
Besides these nodes, FAXE users can also write custom nodes implemented with python
, that can be used like the built-in ones
in dataflows.
FAXE is implemented in Erlang/OTP which makes it possible, that each of the nodes in a flow is running in its own seperate process. These processes share nothing in between them and only communicate with each other through
message passing
. This makes up for massive parallelism (concurrency) within a flow and also between all the flows running in a FAXE instance, matching exactly the architecture of the dataflow programming paradigm.
A FAXE instance can easily have thousands of processes runnning in parallel, which results in great throughput for a lot of data-streams.
The smallest piece of data-item in FAXE is called a data-point
Since FAXE is mainly used for time series processes, these data-points always carry a unix timestamp in a field called ts
with them.
So a data-point holds data for a specific point in time and a series of such data-points then form this unbounded stream of data we call
time series data.
How does this data look like ?
We can think of the before mentioned data-point as a JSON object
(though internally it is not exactly JSON).
{"ts" : 1629812164152, "value" : 2.33}
The timestamp is always there and the field holding it is always called ts
Next to the timestamp a data-point can have any number of other fields
, a field can be of the basic data-type like int, string, float
or it can be an object itself, possibly deeply nested. Basically everything that is allowed in the JSON format.
"value2":"a string",
"value2":"a string"
In order to deal with these data structures, we can use a basic form of JSON-Path.
For example to reference the field value2 in the second example, we would use the following path
The path for the field value3_1 in the second example:
In the third example we have a json-array, we can reference the field value2 like so (array indexes are 0 based):
Dots in fieldnames
If you have to deal with dots in fieldnames, there is a syntax for this: you can use a star
instead in faxe flow scripts:
"data": {
"stats.speed": 22.3,
"stats.freq": 440.0,
"stats.cnt": 12
The path for the field stats.freq can be reached using a star character:
Note: You should absolutely not use such a notation for you data, since normally you would in- and output json data from flows and such a notation is against the rules of JSON-Path, the '.' character is the child operator. Start would be the wildcard in JSON path, but since Faxe does not support wildcards explicitely, we use it for literal dots in field names.
It is recommended to only use the star notation, if you are dealing with data from outside, that already has dots in fieldnames. You should immediately rename this kind of data, before any other processing in a flow.
There is a second type of data-item called data-batch
, which is simply a list of data-points
The list of data-points that make up a data-batch is ordered by the points' timestamps.
In JSON notation a data-batch will look like this:
Strings and References
Unlike what is possible in some programming languages,
where you can use two different string notations: 'a string'
or "also a string"
in DFS these two notations have a completely different meaning.
In DFS, single quotes are used for strings
'faxe is canned beer'
or for text
FROM table
ORDER BY timestamp
Double quotes are used for references and are used only in lambda-expressions, to retrieve the value of the specified field from the current data-point.
lambda: "data.value" > 3
Return whether the value at data.value
is greater than 3.
Lambda expressions
See lambda_expressions.
Rest Api
FAXE can be managed via its rest api.
How nodes are connected
See node_connection.