Most XML tutorials will explicitly mention that empty elements in XML are allowed. And then surprisingly little of them will tell you what they are for. On a superficial view, empty elements are a concept too trivial to discuss. But now that many vendors support XML based integration with their products, you may be in for a surprise or two.
Empty elements
As most of you will be aware of, it is syntactically valid to produce XML elements that have no content, such as:
<value></value>
Or the (equivalent) shorthand notation:
<value/>
XML parsers will treat these fragments the same way, and the difference is considered merely syntactical.
However, the actual XML specification, specifies that the shorthand notation SHOULD be used only for elements that are declared empty (e.g. elements that must be empty as opposed to those that can be empty). It is safe to assume that this subtlety is lost on the best of us, so you should not draw conclusions from either notation.
You may wonder what empty elements are good for – especially the ones that must be empty. Why not just omit them?
One of the reasons is that empty elements are often used as ‘markers’ or ‘Booleans’ that need to be present for effect, but have no content. For example, in XHTML mark-up, the <br/> element inserts a line break, and the <hr/> element displays a horizontal ruler (on visual display units anyway). For these applications, emptiness will obviously not work out the same as absence.
More surprisingly though, empty is not always empty. When the XML is governed by a schema declaring that the element has default or fixed content, the parser will actually insert that content when the element is empty.
For the purpose of data exchange, you will seldom encounter marker elements or default content in schema (the idea of implicit values is somewhat alarming anyway). So usually, empty elements will be exactly that: empty.
Unfortunately, not all content models support empty elements. For an element that is defined as a string, empty content is just fine. But “empty numbers” or “empty dates” are indeed frowned upon by an XML parser.
So it gets better (or worse, depending on your point of view). Enter nillable elements.
Nillable elements
Sometimes, empty content is the data. If we need to indicate that a value should be (re)set to some undefined value (like the ‘null’ in databases or many programming languages) we require a way to convey that the value is “explicitly empty” rather than unspecified (for whatever reasons).
Therefore, XML schema allows us to define “nillable” elements, which can contain nil. Nil elements are easily spotted in XML instances, as they contain an attribute indicating nil content, like so (namespace declaration has been omitted):
<value xsi:nil=“true”></value>
or
<value xsi:nil=“true”/>
Note that “nillability” short-circuits all limitations on the content model, e.g. it applies to all types (not just strings) simple or complex, even if the content model explicitly forbids empty content.
Also note that nil elements must be empty (as opposed to empty elements, which we now know could be anything, including nil). In other words, nil is empty, but empty is not always nil.
Exercise for the reader: are the following valid?
<value xsi:nil=“false”>42</value>
and
<value xsi:nil=“false”/>
Reality check
Now that you know what empty (including nil) content can be used for, you may think it is quite rare in application integration. After all, how often do we actually need to exchange the “empty value”?
In fact, empty content is rather common. And it may cause you a lot of trouble if you are not prepared for it. Here are a few reasons for the – often unexpected - occurrence of empty elements (in addition to legitimate ones).
Design consequence
If the schema declares an element as a mandatory string (or nillable) type, but there is no value available in the integration layer, the only way to escape a run-time validation error is to create empty content. Those familiar with TIBCO Business Works (should) know that this is the default behaviour when you map a non-existing optional element to a required one. In this case it silently inserts an empty element.
When this happens a lot, it may be advised to review the design, e.g. consider making these elements optional. If you have no control over this, you should be prepared to deal with empty content in the proper way (which depends on the application).
Implementation side-effects or carelessness
Even if elements are declared optional, empty content is often created “by accident”, in particular when not creating it takes additional effort. For example, consider an operation (a function, a subroutine, a method) that is designed to return a string value. If the return value is not explicitly checked, an empty element might be inserted whenever an empty string is returned.
Also, the tooling that is used to create the XML may add empty elements when optional content is “mapped” from one source to the other. Again, TIBCO Business Works has caught me off guard with an unfortunate mapping mode on more than one occasion.
Misconception
Some producers of XML create empty elements because they think they are supposed to - or that providing optional empty elements is in some way superior to omitting them. They might do this as “living proof” that the elements were considered in the process of creation (and not accidentally overlooked). Or they believe that by supplying empty elements they are doing you a favour; now you can easily spot them. The point is - there is not always a point.
What can we do?
It is my experience that due to implementation side-effects and misconception, empty elements are abundant in real life XML-enabled applications. Those that create the XML are often unprepared to “fix” this because they are not violating any rules – after all, empty elements are allowed.
This is unfortunate, as it places an additional burden on the consumer. Most of the time, these elements have to be ignored – and as a result applications need to check not only for presence but also for non-emptiness (or non-nilness if that is even a word).
This is easily overlooked by developers and may cause you problems to no end. You will appreciate this when the unsuccessful conversion of an empty string to a number type causes your process to fault or when it - unknowingly propagated by you - wreaks havoc in an application downstream.
As always, be liberal in what you accept and conservative in what you produce. Use schema to validate output as well as input (a frightening number of applications fail to do this). Do not create empty content just because you can.
And educate those that do.