Michael Primeaux

Parallel and Distributed Systems


Information Modeling

When defining an information model, should one favor an abstract or concrete design? The short answer is neither.

Generally speaking, the life of any distributed system directly relates to its level of entropy. Furthermore, the level of entropy in a system directly relates to the system’s computational complexity. When designing applications to solve business problems, information storage and retrieval is one of the more important foundational design points. If the information schema is designed without efficiency, scalability, and flexibility in mind then not only will the system perform poorly but it will not meet the requirements imposed by future business demands.

At either end of the spectrum, an information schema is categorized as either concrete or abstract—though there most certainly are varying degrees in between. Each design has advantages and disadvantages. A concrete design is easier to conceptualize and requires less complex algorithmic considerations. In most cases, a concrete design is far less flexible than that of an abstract design. A properly designed abstract information schema remains invariant in the face of change that would otherwise impose a relatively large amount of downtime for an equivalent concrete model.

Arguably, however, many abstract systems are indeed designed with concreteness in mind. By this I mean the system is designed to contain representations of concrete constructs. Why? It’s how humans think in every day life. If I ask you to describe yourself then you’re statistically more likely to forgo attributes such as “person” or “human” and instead provide attributes such as height and weight. The former attributes are more abstract (relatively speaking) than the latter attributes. However, both indeed describe a concrete construct: you.

I tend to favor abstract information models but then again, I spend the majority of my time in abstract problem domains. However, regardless of which classification the solution domain requires, the following rules and considerations have proven effective in designing an efficient information model:

  1. Effectively understand the information.
  2. Describe it unambiguously.
  3. Enforce structure and style guidelines.
  4. Allow for efficient storage and retrieval of information.
  5. Keep network communication to a minimum. Don’t over engineer.

The last bullet (5) is delivered with a caveat. Over engineering is a very subjective quantification that takes time to perfect and is directly related to your complete understanding of the information (bullet 1). If bullet 1 suffers then so does bullet 5.

Usually, the questions are quick. It’s the answers that take the time.