Tag Archives: Set Theory

ADM – Pt 3 Large Sets of Objects

Application Data Manager (ADM) is an open source solution for processing large amounts of real-time data. In this segment I describe the ADM process for incrementally allocating memory to and sequencing large sets of objects.

Unconstrained Array of Objects

In the previous segment I introduced the concept of an unconstrained array of numeric aliases called a Sequence. In this segment I extend the unconstrained array concept to a set of objects of generic class M.

Unconstrained Array Declarations

A set is a number of things of the same kind that belong or are used together. In Java we can make things be of the same kind by requiring they be objects of the same class. Since we have no a priori knowledge of what that class is, we use the generic class T (type) and enclose T in angle brackets. Making this class be an extension of Sequence provides the means to add and recycle elements; moreover, implementing Iterable<T> means we can iterate over the order of appearance.


class UnconstrainedArray<M> extends Sequence implements Iterable<T> {


private T[] smallE;
private T[][] mediumE;
private T[][][] largeE;

}

Case Small – The first 256 elements

Since the sequence and the unconstrained array must be in one-to-one correspondence, every allocation of 256 elements to the sequence requires a corresponding allocation of 256 instances of T. However, all 256 instances of T are initially null.


this.smallE = (T[])new Object[256];

An append method obtains an alias (i.e. index) from the sequence and then assigns a new instance of T at that index. The allocated memory is only as large as it needs to be and no larger.

Case Medium – The next increment of 256 elements

For objects the transition from small to medium is a bit simpler than for sequences, taking only three lines of code:


this.mediumE = (T[][]) new Object[256][];
this.mediumE[0] = this.smallE; //note the continued use of smallE !!!
this.mediumE[1] = (T[]) new Object[256];

Case Large – The 256th increment of 256 elements

When the capacity of mediumE is reached, largeE is allocated with the following code


this.largeE = (T[][][]) new Object[256][256][];
this.largeE[0] = this.mediumE; //note the continued use of mediumE !!!
this.largeE[1] = (T[][]) new Object[256][];
this.largeE[1][0] = (T[]) new Object[256];

Summary

  • The unconstrained array of objects T is a set
  • The capacity of the set grows in increments of 256 elements to a maximum capacity of 16,777,216 elements.
  • Addition and recycling are managed using the underlying Sequence.
  • Each element of the underlying Sequence (an alias / index) is in one-to-one correspondence with an element of set T
  • The underlying sequence enumerates the set of T.
  • Every element of set T is accessible via its alias.
  • Iteration over the order of appearance is supported.

In the next segment I will introduce unconstrained arrays of numeric data.
Copyright © 2014 Color My Data, All Rights Reserved

Previous Continue

ADM – Pt 2 Unconstrained Arrays

Application Data Manager (ADM) is an open source solution for processing large amounts of real-time data. In this segment I describe the ADM process for memory allocation.

Unconstrained Arrays

In the segment on Sequences the length of the sequence had reached maximum capacity and we needed to append a new element. One solution would be to allocate more memory and then copy the old sequence to a subset of the new sequence. In a real time system this can have significant performance consequences and may lead to memory leaks. Another strategy would be to create a linked list. This minimizes memory allocation but can be much less efficient than direct access.
The solution chosen for ADM is an unconstrained array. As defined here, an unconstrained array combines the benefits of a linked list with direct access as it increases capacity in regular increments to meet the demands of real-time data acquisition.

Unconstrained Array Declarations:


private byte[] small;
private byte[][] medium;
private byte[][][] large;

Unconstrained Array Initial State – Case Small

A byte can enumerate up to 256 elements. Initially medium and large are both null. The array small is dimensioned 256 x 1 byte (blue) and assigned the values 0 to 255 (note: sign extension can be overridden by a policy that widens bytes unsigned to int). Next, let us illustrate what happens when we exceed the capacity of small.

Unconstrained-2d

First Increment of 256 elements – Case Medium

On adding the 257th element the 256 x 1 byte array small needs to be widened to the 256 x 2 byte array medium[0] (green). Another 256 x 2 byte array medium[1] is allocated to increase the capacity by another 256 elements and initialized with the values 256 to 511. The arrays medium[2] to medium[255] can remain null until more increments of 256 elements are needed. When medium[255] is allocated, the unconstrained array has a maximum capacity of 65536 elements.

Increment 257 of 256 elements – Case Large

Up until this time the array large has been null. On adding element 65537 the 256 arrays of 256 x 2 byte elements of medium (green) must be widened to 256 arrays of 256 x 3 byte elements (yellow) as shown in the following illustration.
Unconstrained-3d
The array large[0] allocates 256 arrays large[0][0] to large[0][255] and populates these with the widened data from medium.  The array large[1]  allocates 256 arrays large[1][0] to large[1][255] but only allocates memory to the 256 x 3 byte array large[1][0], initializing it with the 256 values 65536 to 65791.

As required capacity increases,  additional arrays of 256×3 elements are allocated to large[m][n]. When both m and n are 255, the maximum capacity of 16777216 elements has been reached. 

Disjoint Subsets

If additional capacity is required, the problem must be restructured using disjoint subsets so that each disjoint subset has less than 16777216 elements. Disjoint subsets are also recommended for capacities much lower than the 16777216 maximum.

Unconstrained Arrays of Objects.

Until now we have only looked at sequences. Unconstrained arrays can also be used for numeric row data and sets of objects. These will be the topic for future segments in this blog.

Copyright © 2014 Color My Data, All Rights Reserved

previous next

ADM – Pt 1 Sequences

Application Data Manager (ADM) is an open source solution for processing large amounts of data in real time. In this segment, I describe the ADM process for automated data recycling, Sequences.

Set Theory

ADM is rooted in set theory.

A set is a number of things of the same kind that belong or are used together.

  • The rows of a table or view are a set.
  • The columns of a table or view are a set.
  • The tables, views, reports and scripts of a database are sets.
  • The databases of the application data model (ADM) are a set.

Enumerating Members of a Set

To enumerate a set is to specify one element after another.
One way to do this is to rearrange the order of the elements of the set.
To minimize overhead we will NOT use this approach.

Instead, let us create a set S of non-negative integer aliases for each element of the generic set of many elements M and require that the elements of M and S be in one-to-one correspondence with one another as defined below.

Definitions

Enumerand:
an enumerated element of set M
Alias:
a non-negative integer member of set S in an immutable, one-to-one correspondence with an enumerand
Sequence:
a one-to-one mapping of S onto itself in some order.

To change enumerand access order we alter the sequence S and do nothing to the set M itself.

For every set M there is at least one sequence called the order of appearance.
The primary usage for the order of appearance is element recycling.

Recycling Process

The following table illustrates the recycling process.

The column step enumerates the set of recycling process actions.

The column sequence illustrates a newly allocated sequence for a set of up to eight elements. Note that the numeric aliases have been initialized with the values 0 to 7. They are show in green because they have never been used.

The column capacity is the maximum number of elements that can be accessed without additional memory allocation.

The column hi-water is the maximum number of elements ever activated.

The column unused is the difference between hi-water and capacity and represents the number of elements that have never been used and is shown in green

The column length is the number of active elements and is shown in yellow.

The column recycled is the difference between hi-water and length and is shown in red.

The column available is the number of elements available for set expansion and is the difference between capacity and length.

Sequence

Step 2: to append an element, simply increment the length and use the alias. Yellow signals that the alias is active.

Step 3: after appending five more elements; six elements are in use (yellow) and there is availability for two more elements (green).

Step 4: element 3 is no longer needed and has been marked for recycling.

Step 5: a left cyclic permutation (x<-3<-4<-5<-x) and decrement in length inactivates and recycles element 3 (red). Note that the hi-water mark does not change as elements are recycled.

Step 6: element 1 is no longer needed and has been marked for recycling.

Step 7: a left cyclic permutation (x<-1<-2<-4<-5<-3<-x) and decrement in length inactivates and recycles element 1 (red).

Step 8: a new element is appended reactivating element 3. In set M the object aliased by element 3 is removed and replaced.

Step 9: a new element is appended reactivating element 1. In set M the object aliased by element 1 is removed and replaced.

Step 10: two new elements are appended and the sequence capacity is reached.

In my next blog I will describe the unconstrained arrays that allow sequences to grow incrementally up to a maximum capacity of 16,777,216 elements.

Copyright © 2014 Color My Data, All Rights Reserved