Revision 3, © 2010-2012 Kevin Seim
Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.
BeanIO is an open source Java framework for reading and writing Java beans or plain old java objects (POJO's) from a flat file or stream. BeanIO is ideally suited for batch processing, and currently supports XML, CSV, delimited and fixed length file formats. BeanIO is licensed under the Apache 2.0 License.
BeanIO Release 1.2 includes the following enhancements and bug fixes:
Release 1.2 is functionally backwards compatible with 1.1. The only feature that may require mapping file changes is the added support for escaping property values using a backslash (e.g. \n is now recognized as a line feed). If necessary, you can disable this feature by setting the configuration setting org.beanio.propertyEscapingEnabled to false. For more information, see Section 7.0. Configuration.
Programatically, significant changes were made to org.beanio.config.ConfigurationLoader interface and its implementation. All other major interfaces are backwards compatible with release 1.1.
To get started with BeanIO, download the latest stable version from Google Code, extract the contents of the ZIP file, and add beanio.jar to your application's classpath.
BeanIO requires a version 1.5 JDK or higher. In order to process XML formatted streams, BeanIO also requires an XML parser based on the Streaming API for XML (StAX), as specified by JSR 173. JDK 1.6 and higher includes a StAX implementation and therefore does not require any additional libraries. JDK 1.5 users will need to include the following:
Alternatively, Maven users can declare the following dependencies in their application's POM. Note that the version numbers documented below are only examples and may have changed.
<-- BeanIO dependency --> <dependency> <groupId>org.beanio</groupId> <artifactId>beanio</artifactId> <version>1.1.0</version> </dependency> <-- StAX dependencies for JDK 1.5 users --> <dependency> <groupId>javax.xml</groupId> <artifactId>jsr173</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.sun.xml.stream</groupId> <artifactId>sjsxp</artifactId> <version>1.0.1</version> </dependency>
This section explores a simple example that uses BeanIO to read and write a flat file containing employee data. Let's suppose the file is in CSV format and has the following record layout:
Position | Field | Format |
---|---|---|
0 | First Name | Text |
1 | Last Name | Text |
2 | Job Title | Text |
3 | Salary | Number |
4 | Hire Date | Date (MMDDYYYY) |
A sample file is shown below.
Joe,Smith,Developer,75000,10012009 Jane,Doe,Architect,80000,01152008 Jon,Anderson,Manager,85000,03182007
Next, let's suppose we want to read records into the following Java bean for further processing. Remember that a Java bean must have a default no-argument constructor and public getters and setters for all exposed properties.
package org.beanio.example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
// getters and setters not shown...
}
BeanIO uses an XML configuration file, called a mapping file, to define how bean objects are bound to records. Below is a mapping file, named mapping.xml, that could be used to read the sample employee file and unmarshall records into Employee objects. The same mapping file can be used to write, or marshall, Employee objects to a file or output stream.
<beanio xmlns="http://www.beanio.org/2011/01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2011/01 http://www.beanio.org/2011/01/mapping.xsd"> <stream name="employeeFile" format="csv"> <record name="employee" class="org.beanio.example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> </stream> </beanio>
To read the employee CSV file, a StreamFactory is used to load our mapping file and create a BeanReader, which is used to unmarshall Employee objects from the file employee.csv. (For the sake of brevity, proper exception handling is not shown.)
package org.beanio.example; import org.beanio.*; import java.io.*; public class BeanReaderExample { public static void main(String[] args) throws Exception { // create a StreamFactory StreamFactory factory = StreamFactory.newInstance(); // load the mapping file factory.load("mapping.xml"); // use a StreamFactory to create a BeanReader BeanReader in = factory.createReader("employeeFile", new File("employee.csv")); Employee employee; while ((employee = (Employee) in.read()) != null) { // process the employee... } in.close(); } }
To write an employee CSV file, the same StreamFactory class is used to create a BeanWriter for marshalling Employee bean objects to the file employee.csv. In this example, the same mapping configuration file is used for both reading and writing an employee file.
package org.beanio.example; import org.beanio.*; import java.io.*; import java.util.*; public class BeanWriterExample { public static void main(String[] args) throws Exception { // create a StreamFactory StreamFactory factory = StreamFactory.newInstance(); // load the mapping file factory.load("mapping.xml"); Employee employee = new Employee(); employee.setFirstName("Jennifer"); employee.setLastName("Jones"); employee.setTitle("Marketing") employee.setSalary(60000); employee.setHireDate(new Date()); // use a StreamFactory to create a BeanWriter BeanWriter out = factory.createWriter("employeeFile", new File("employee.csv")); // write an Employee object directly to the BeanWriter out.write(employee); out.flush(); out.close(); } }
Running BeanWriterExample produces the following CSV file.
Jennifer,Jones,Marketing,60000,01012011
The BeanReader interface, shown below, is used to read bean objects from an input stream. The method read() returns the bean object for the next record read from the input stream, or null when the end of the stream is reached. The method getRecordName() returns the name of the last record read from the input stream, as named in the mapping file. And getLineNumber() returns the line number of the last record from the input stream. The method setErrorHandler(...) can be used to register a custom error handler. If an error handler is not configured, read() simply throws the unhandled exception.
package org.beanio; public interface BeanReader { public Object read() throws BeanReaderException; public String getRecordName(); public int getLineNumber(); public int skip(int count) throws BeanReaderException; public void close() throws BeanReaderIOException; public void setErrorHandler(BeanReaderErrorHandler errorHandler); }
The BeanWriter interface, shown below, is used to write bean objects to an output stream. Calling the write(Object) method marshalls a bean object to the output stream. In some cases where multiple record types are not discernible by class type or record identifying fields, the write(String,Object) method can be used to explicitly name the record type to marshall.
package org.beanio; public interface BeanWriter { public void write(Object bean) throws BeanWriterException; public void write(String recordName, Object bean) throws BeanWriterException; public void flush() throws BeanWriterIOException; public void close() throws BeanWriterIOException; }
BeanIO uses XML configuration files, called mapping files, to bind a stream layout to Java objects. Multiple layouts can be configured in a single mapping file using stream elements, and each stream is assigned a unique name for referencing the layout. In addition to its name, every stream must declare its format using the format attribute. Supported stream formats include csv, delimited, and fixedlength. Mapping files are further explored in the next section (4.0. The Mapping File).
<beanio xmlns="http://www.beanio.org/2011/01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2011/01 http://www.beanio.org/2011/01/mapping.xsd"> <stream name="stream1" format="csv"... > <!-- record layout... --> </stream> <stream name="stream2" format="fixedlength"... > <!-- record layout... --> </stream> </beanio>
A StreamFactory is used to load mapping files and create BeanReader and BeanWriter instances. The following code snippet shows how to instantiate a StreamFactory, load a mapping file and create a BeanReader and BeanWriter. The load(...) method loads mapping files from the file system (relative to the current working directory), while the method loadResource(...) loads mapping files from the classpath.
// create a StreamFactory StreamFactory factory = StreamFactory.newInstance(); // load 'mapping-1.xml' from the current working directory factory.load("mapping-1.xml"); // load 'mapping-2.xml' from the classpath factory.loadResource("mapping-2.xml");' // create a BeanReader to read from 'in.txt' Reader in = new BufferedReader(new FileReader("in.txt")); BeanReader beanReader = factory.createBeanReader("streamName", in); // create a BeanWriter to write to 'out.txt' Writer out = new BufferedWriter(new FileWriter("out.txt")); BeanWriter beanWriter = factory.createBeanReader("streamName", out);
All BeanIO exceptions extend from BeanIOException, which extends from RuntimeException so that exceptions do not need to be explicitly caught unless desired. BeanReaderException and BeanWriterException extend from BeanIOException and may be thrown by a BeanReader or BeanWriter respectively.
A BeanReaderException is further broken down into the following subclasses thrown by the read() method.
Exception | Description |
---|---|
BeanReaderIOException | Thrown when the underlying input stream throws an IOException, or in a few other fatal error scenarios. |
MalformedRecordException | Thrown when the underlying input stream is malformed based on the configured stream format, and therefore the next record could not be accurately read from the stream. In many cases, further reads from the input stream will be unsuccessful. |
UnidentifiedRecordException | Thrown when a record does not match any record definition configured in the mapping file. If the stream layout does not strictly enforce record sequencing, further reads from the input stream are likely to be successful. |
UnexpectedRecordException | Thrown when a record is read out of order. Once record sequencing is violated, further reads from the input stream are likely to be unsuccessful. |
InvalidRecordException | Thrown when a record is matched, but the record is invalid for one of the following
reasons:
|
When a BeanReaderException is thrown, the current state of the input stream and any field or record level error messages can be accessed by calling exception.getContext(), which returns the following BeanReaderContext interface. Please refer to the API javadocs for method details.
package org.beanio; public interface BeanReaderContext { public int getRecordLineNumber(); public String getRecordText(); public String getRecordName(); public boolean hasRecordErrors(); public Collection<String> getRecordErrors(); public String getFieldText(String fieldName); public String getFieldText(String fieldName, int index); public boolean hasFieldErrors(); public Map<String, Collection<String>> getFieldErrors(); public Collection<String> getFieldErrors(String fieldName); }
If you need to handle an exception and continue processing, it may be simpler to register a BeanReaderErrorHandler using the beanReader.setErrorHandler() method. The BeanReaderErrorHandler interface is shown below. Any exception thrown by the error handler will be rethrown by the BeanReader.
package org.beanio; public interface BeanReaderErrorHandler { public void handleError(BeanReaderException ex) throws Exception; }
The following example shows how invalid records could be written to a reject file by registering an error handler extending BeanReaderErrorHandlerSupport, a subclass of BeanReaderErrorHandler. All other exceptions are left uncaught and will bubble up to the calling method.
BeanReader input;
BufferedWriter rejects;
try {
input.setErrorHandler(new BeanReaderErrorHandlerSupport() {
public void invalidRecord(InvalidRecordException ex) throws Exception {
rejects.write(ex.getContext().getRecordText());
rejects.newLine();
}
});
Object record = null;
while ((record = input.read()) != null) {
// process a valid record
}
rejects.flush();
}
finally {
input.close();
rejects.close();
}
An XML mapping file is used to configure a stream layout.
A typical mapping file contains one or more stream definitions. A stream definition must have a name and format attribute configured. The name of the stream is used to reference the layout when creating a BeanReader or BeanWriter instance using a StreamFactory. And the format instructs BeanIO how to interpret the stream. Supported formats include xml, csv, delimited and fixedlength.
<beanio xmlns="http://www.beanio.org/2011/01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2011/01 http://www.beanio.org/2011/01/mapping.xsd"> <stream name="stream1" format="csv"... > <!-- record layout... --> </stream> <stream name="stream2" format="fixedlength"... > <!-- record layout... --> </stream> </beanio>
Internally, BeanIO uses a RecordReader to read records from an input stream, and a RecordWriter to write records to an output stream.
The RecordReader interface is shown below. A record reader is responsible for dividing a stream into records. The actual Java representation of a record is dependent on the format of the stream. Delimited record readers (including CSV) parse an input stream into String array records, where each value in the array is a delimited field. And fixed length record readers simply parse an input stream into String records.
package org.beanio.stream; public interface RecordReader { public Object read() throws IOException, RecordIOException; public void close() throws IOException; public int getRecordLineNumber(); public String getRecordText(); }
Similarly, the RecordWriter interface show below is used to write records to an output stream. Once again, the Java representation of a record is dependent on the format of the stream. Delimited (and CSV) records use a String array, and fixed length records simply use a String.
package org.beanio.stream; public interface RecordWriterFactory { public RecordWriter createWriter(Writer out) throws IllegalArgumentException; }
A new RecordReader is created for each BeanReader using the RecordReaderFactory interface shown below.
package org.beanio.stream; public interface RecordReaderFactory { public RecordReader createReader(Reader in) throws IllegalArgumentException; }
And likewise, a new RecordWriter is created for each BeanWriter using the RecordWriterFactory interface shown below.
package org.beanio.stream; public interface RecordWriterFactory { public RecordWriter createWriter(Writer out) throws IllegalArgumentException; }
BeanIO includes default record readers and writers for XML, CSV, delimited and fixed length stream formats. Default reader and writer settings can be overridden for any stream in the mapping file using a reader or writer element. Or if necessary, you can even replace the default record reader or writer by setting the class attribute to the fully qualified class name of the record reader or writer factory to use. (Note that custom record reader/writer implementations will not be supported for XML formatted streams due to the tight coupling with the parser, although it is not prevented.)
In the example mapping file below, the default record reader's delimiter is changed to an astericks, while the record writer implementation is completely replaced using the factory class org.beanio.example.MyRecordWriterFactory.
<beanio> <stream name="employeeFile" format="delimited"> <reader> <property name="delimiter" value="*" /> </reader> <writer class="org.beanio.example.MyRecordWriterFactory" /> <record name="employee" class="org.beanio.example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> </stream> </beanio>
The default record reader and writer for CSV formatted streams is based on RFC 4180 with one exception: multi-line records are disabled (but this can be overridden).
The following properties can be used to customize the default CSV record reader and record writer:
Property | Type | Description | Reader? | Writer? |
---|---|---|---|---|
delimiter | char | The field delimiter. Defaults to a comma. | Yes | Yes |
quote | char | The quotation mark character used to wrap fields containing a delimiter character, a quotation mark, or new lines. Defaults to the double quotation mark, ". | Yes | Yes |
escape | Character | The character used to escape a quotation mark in a quoted field. Defaults to the double quotation mark, ". | Yes | Yes |
comments | String[] | A comma separated list of values for identifying commented lines. If a line read from an input
stream begins with any of the configured values, the line is ignored. A backslash may
be used to escape a comma and itself. All whitespace is preserved.
Enabling comments require the input reader passed to StreamFactory to support marking. Among others, BufferedReader and StringReader support marking. |
Yes | No |
multilineEnabled | boolean | If set to true, quoted fields may contain new line characters. Defaults to false. | Yes | No |
whitespaceAllowed | boolean | If set to true, whitespace is ignored and allowed before and after
quoted values. For example, the following is allowed:
Jennifer, "Jones" ,24Defaults to false. |
Yes | No |
unquotedQuotesAllowed | boolean | If set to true, field text containing quotation marks do not need to
be quoted unless the field text starts with a quotation mark. For example, the
following is allowed:
Jennifer,She said "OK"Defaults to false. |
Yes | No |
recordTerminator | String | The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. | No | Yes |
alwaysQuote | boolean | If set to true, field text is always quoted. By default, a field is only quoted if it contains a delimeter, a quotation mark or new line characters. | No | Yes |
The default delimited record reader and writer can be customized using the following properties:
Property | Type | Description | Reader? | Writer? |
---|---|---|---|---|
delimiter | char | The field delimiter. Defaults to the tab character. | Yes | Yes |
escape | Character | The escape character allowed to escape a delimiter or itself. By default, escaping is disabled. | Yes | Yes |
lineContinuationCharacter | Character | If this character is the last character before a new line or carriage return is read, the record will continue reading from the next line. By default, line continuation is disabled. | Yes | No |
recordTerminator | Character | The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. | Yes | Yes |
comments | String[] | A comma separated list of values for identifying commented lines. If a line read from an input
stream begins with any of the configured values, the line is ignored. A backslash may
be used to escape a comma and itself. All whitespace is preserved.
Enabling comments require the input reader passed to StreamFactory to support marking. Among others, BufferedReader and StringReader support marking. |
Yes | No |
The default fixed length reader and writer can be customized using the following properties:
Property | Type | Description | Reader? | Writer? |
---|---|---|---|---|
lineContinuationCharacter | Character | If this character is the last character before a new line or carriage return is read, the record will continue reading from the next line. By default, line continuation is disabled. | Yes | No |
recordTerminator | Character | The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. | Yes | Yes |
comments | String[] | A comma separated list of values for identifying commented lines. If a line read from an input
stream begins with any of the configured values, the line is ignored. A backslash may
be used to escape a comma and itself. All whitespace is preserved.
Enabling comments require the input reader passed to StreamFactory to support marking. Among others, BufferedReader and StringReader support marking. |
Yes | No |
The XML writer can be customized using the following properties:
Property | Type | Description | Reader? | Writer? |
---|---|---|---|---|
suppressHeader | boolean | If set to true, the XML header is suppressed in the marshalled document. Defaults to false. | No | Yes |
version | String | The XML header version. Defaults to 1.0. | No | Yes |
encoding | String | The XML header encoding. Defaults to utf-8. Note that this setting has no bearing on the actual encoding of the output stream. If set to "", an encoding attribute is not included in the header. | No | Yes |
namespaces | String | A space delimited list of XML prefixes and namespaces to declare
on the root element of a marshalled document. The property value
should be formatted as
prefix1 namespace1 prefix2 namespace2... |
No | Yes |
indentation | Integer | The number of spaces to indent each level of XML. By default, indentation is disabled using a value of -1. | No | Yes |
lineSeparator | String | The character(s) used to separate lines when indentation is enabled. By default, System.getProperty("line.separator") is used. | No | Yes |
Each record type read from an input stream or written to an output stream must be mapped using a record element. A stream mapping must include at least one record. The record mapping is used to validate the record and bind field values to a bean object. A simple record configuration is shown below.
<beanio>
<stream name="stream1" format="csv">
<record name="record1" class="org.beanio.example.Record">
<field name="firstName" />
<field name="lastName" />
<field name="age" />
</record>
</stream>
</beanio>
In this example, a CSV formatted stream is mapped to a single record composed of three fields: first name, last name and age. When a record is read from a stream using a BeanReader, the class org.beanio.example.Record is instantiated and its firstName and lastName attributes are set using standard Java bean setter naming conventions (e.g. setFirstName(String)).
Similarly, when a org.beanio.example.Record bean object is written to an output stream using a BeanWriter, its firstName and lastName attributes are read using standard Java bean getter naming conventions (e.g. getFirstName()) and formatted.
BeanIO also supports Map based records by setting a record's class attribute to map, or to the fully qualified class name of any class assignable to java.util.Map. Note that if you plan to use Map based records, field types may need be explicitly configured using the type attribute, or BeanIO will assume the field is of type java.lang.String The type attribute is further explained in section 4.6. Field Type Conversion.
<beanio> <stream name="stream1" format="csv"> <record name="record1" class="map"> <field name="firstName" /> <field name="lastName" /> <field name="age" type="int"/> </record> </stream> </beanio>
Oftentimes, a stream is made up of multiple record types. A typical batch file may include one header, one trailer, and zero to many detail records. BeanIO allows a record to be identified by one or more of its fields using expected literal values or regular expressions. By default, BeanIO will validate the order of all records in the input stream.
To see how a stream can be configured to handle multiple record types, let's modify our Employee file to include a header and trailer record as shown below. Each record now includes a record type field that identifies the type of record.
Header,01012011 Detail,Joe,Smith,Developer,75000,10012009 Detail,Jane,Doe,Architect,80000,01152008 Detail,Jon,Anderson,Manager,85000,03182007 Trailer,3
The mapping file can now be updated as follows:
<beanio> <stream name="employeeFile" format="csv"> <record name="header" minOccurs="1" maxOccurs="1" class="org.beanio.example.Header"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" format="MMddyyyy" /> </record> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="org.beanio.example.Employee"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> <record name="trailer" minOccurs="1" maxOccurs="1" class="org.beanio.example.Trailer"> <field name="recordType" rid="true" literal="Trailer" /> <field name="recordCount" /> </record> </stream> </beanio>
There are several new record and field attributes introduced in this mapping file, so we'll explain each new attribute in turn.
First, a field used to identify a record must be configured as a record identifier using rid="true". There is no limitation to the number of fields that can be used to identify a record, but all fields where rid="true" must be satisfied before a record is identified. If there is no field configured as a record identifier, by default the record will always match.
<record name="header" minOccurs="1" maxOccurs="1" class="org.beanio.example.Header">
<field name="recordType" rid="true" literal="Header" />
<field name="fileDate" />
</record>
Second, all record identifying fields must have a matching validation rule configured. In our example, the literal value Header in the record type field is used to identify the header record. Literal values must match exactly and can be configured using the literal field attribute. Alternatively, record identifying fields may use a regular expression to match field text using the regex field attribute.
<record name="header" minOccurs="1" maxOccurs="1" class="org.beanio.example.Header">
<field name="recordType" rid="true" literal="Header" />
<field name="fileDate" />
</record>
Third, each record defines the minimum and maximum number of times it may repeat using the attributes minOccurs and maxOccurs. Based on our configuration, exactly one header and trailer record is required, while the number of detail records is unbounded.
<record name="header" minOccurs="1" maxOccurs="1" class="org.beanio.example.Header">
<field name="recordType" rid="true" literal="Header" />
<field name="fileDate" />
</record>
As explained in the previous section, a stream can support multiple record types. By default, a BeanReader will validate that each record read from a stream appears in the same order it was configured. In the previous example, if a detail record were to appear before the header record, the BeanReader will throw an UnexpectedRecordException when the detail record is read out of order.
Default record ordering can be overridden using the order record attribute, which can be assigned any positive integer value greater than 0. Records that are assigned the same number may be read from the stream in any order. In our current example, if we want to allow header and detail records to appear in any order, while still requiring the trailer record at the end of the stream, the mapping file could be changed as follows.
<beanio> <stream name="employeeFile" format="csv"> <record name="header" order="1" minOccurs="1" maxOccurs="1" class="org.beanio.example.Header"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" format="MMddyyyy" /> </record> <record name="employee" order="1" minOccurs="0" maxOccurs="unbounded" class="org.beanio.example.Employee"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> <record name="trailer" order="2" minOccurs="1" maxOccurs="1" class="org.beanio.example.Trailer"> <field name="recordType" rid="true" literal="Trailer" /> <field name="recordCount" /> </record> </stream> </beanio>
Or if you don't care about record ordering at all, simply set the stream's ordered attribute to false as shown below.
<beanio>
<stream name="employeeFile" format="csv" ordered="false">
<!-- Record layouts... -->
</stream>
</beanio>
In some cases, a stream may be further divided into batches or groups of records. Continuing with our employee file, lets suppose employee detail records are batched by department, where each group of employees has a department header and a department trailer record. Thus an input file may look something like this:
Header,01012011 DeptHeader,Development Detail,Joe,Smith,Developer,75000,10012009 Detail,Jane,Doe,Architect,80000,01152008 DeptTrailer,2 DeptHeader,Product Management Detail,Jon,Anderson,Manager,85000,03182007 DeptTrailer,1 Trailer,2
BeanIO allows you to define groups of records using a group element to wrap the record types that belong to the group. Groups support the same order, minOccurs, and maxOccurs attributes, although there meaning is applied to the entire group. Once a record type is matched that belongs to a group, all other records in that group where minOccurs is greater that 1, must be read from the stream before the group may repeat or a different record can be read. Our mapping file would now look like this:
<beanio> <stream name="employeeFile" format="csv"> <record name="header" minOccurs="1" maxOccurs="1" class="org.beanio.example.Header"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" format="MMddyyyy" /> </record> <group name="departmentGroup" minOccurs="0" maxOccurs"unbounded"> <record name="deptHeader" minOccurs="1" maxOccurs="1" class="org.beanio.example.DeptHeader"> <field name="recordType" rid="true" literal="DeptHeader" /> <field name="departmentName" /> </record> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="org.beanio.example.Employee"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> <record name="deptTrailer" minOccurs="1" maxOccurs="1" class="org.beanio.example.DeptTrailer"> <field name="recordType" rid="true" literal="DeptTrailer" /> <field name="employeeCount" /> </record> </group> <record name="trailer" minOccurs="1" maxOccurs="1" class="org.beanio.example.Trailer"> <field name="recordType" rid="true" literal="Trailer" /> <field name="departmentCount" /> </record> </stream> </beanio>
The stream definition itself is a record group with defaults minOccurs="0" and maxOccurs="1". If you want your BeanReader to throw an exception if the stream is empty, simply change minOccurs to 1, or if you want to allow the entire stream to repeat indefinitely, simply change maxOccurs to unbounded as shown below.
<beanio> <stream name="employeeFile" format="csv" minOccurs="1" maxOccurs="unbounded"> <!-- Record layout... --> </stream> </beanio>
Default getter and setter methods can be overridden using getter and setter attributes as shown below.
<beanio>
<stream name="stream1" format="csv">
<record name="record1" class="org.beanio.example.Record">
<field name="firstName" />
<field name="lastName" setter="setSurname" getter="getSurname"/>
<field name="age" />
</record>
</stream>
</beanio>
Fields found in a stream that do not map to a bean property can be declared, but otherwise ignored, using the ignore field attribute. Note that configured validation rules are still applied to ignored fields.
<beanio>
<stream name="stream1" format="csv">
<record name="record1" class="org.beanio.example.Record">
<field name="firstName" />
<field name="lastName" />
<field name="age" />
<field name="filler" ignore="true" />
</record>
</stream>
</beanio>
By default, BeanIO expects fields to appear in a CSV, delimited or fixed length stream in the same order they are declared in the mapping file. If this is not the case, a position field attribute can be configured for each field. If a position is declared for one field, a position must be declared for all other fields in the same record. For delimited (and CSV) formatted streams, position should be set to the index of the first occurrence of the field in the record, beginning at 0. For fixed length formatted streams, position should be set to the index of the first character of the first occurrence of the field in the record, beginning at 0.
The following example shows how the position attribute can be used. Although the fields are declared in a different order, the record definition is identical to the previous example. When positions are explicitly configured for an input stream, there is no need to declare all fields in a record, unless desired for validation purposes.
<beanio> <stream name="stream1" format="csv"> <record name="record1" class="org.beanio.example.Record"> <field name="filler" position="3" ignore="true" /> <field name="lastName" position="1" /> <field name="age" position="2"/> <field name="firstName" position="0" /> </record> </stream> </beanio>
The property type of a field is determined by introspecting the bean object the field belongs to. If the bean class is of type java.util.Map, BeanIO will assume the field is of type java.lang.String, unless a field type is explicitly declared using a field's type attribute.
The type attribute may be set to any supported fully qualified class name or to one of the supported type aliases below. Type aliases are not case sensitive, and the same alias may be used for primitive types. For example, int and java.lang.Integer bean properties will use the same type handler registered for the type java.lang.Integer, or alias integer or int.
Class Name | Primitive | Alias(es) |
---|---|---|
java.lang.String | - | string |
java.lang.Boolean | boolean | boolean |
java.lang.Byte | byte | byte |
java.lang.Character | char | character char |
java.lang.Short | short | short |
java.lang.Integer | int | integer int |
java.lang.Long | long | long |
java.lang.Float | float | float |
java.lang.Double | double | double |
java.math.BigInteger | - | biginteger |
java.math.BigDecimal | - | bigdecimal decimal |
java.util.Date1 | - |
datetime date time |
1 By default, the date alias is used for java.util.Date types that contain date information only, and the time alias is used for java.util.Date types that contain only time information. Only the datetime alias can be used to replace the default class type handler for java.util.Date.
Optionally, a format attribute can be used to pass a decimal format for java.lang.Number types, and for passing a date format for java.util.Date types. In the example below, the hireDate field uses the SimpleDateFormat pattern "yyyy-MM-dd", and the salary field uses the DecimalFormat pattern "#,##0". In the example below, a DateTypeHandler is registered for all java.util.Date types and used by the fileDate field. The hireDate field overrides the pattern of the default date type handler using the format attribute.
<beanio> <stream name="employeeFile" format="csv"> <record name="header" minOccurs="1" maxOccurs="1" class="map"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" type="java.util.Date" /> </record> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="map"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" type="int" format="#,##0" /> <field name="hireDate" type="date" format="yyyy-MM-dd" /> </record> <record name="trailer" minOccurs="1" maxOccurs="1" class="map"> <field name="recordType" rid="true" literal="Trailer" /> <field name="recordCount" type="int" /> </record> </stream> </beanio>
Field type conversion is performed by a type handler. BeanIO includes type handlers for common Java types, or you can create your own type handler by implementing the org.beanio.types.TypeHandler interface shown below. When writing a custom type handler, make sure to handle null values and empty strings. Only one instance of your type handler is created, so if you plan to concurrently read or write multiple streams, make sure your type handler is also thread safe.
package org.beanio.types; public interface TypeHandler { public Object parse(String text) throws TypeConversionException; public String format(Object value); public Class<?> getType(); }
The following example shows a custom type handler for the java.lang.Boolean class and boolean primitive based on "Y" or "N" indicators.
import org.beanio.types.TypeHandler; public class YNTypeHandler implements TypeHandler { public Object parse(String text) throws TypeConversionException { return "Y".equals(text); } public String format(Object value) { return value != null && ((Boolean)value).booleanValue() ? "Y" : "N"; } public Class<?> getType() { return Boolean.class; } }
A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.
Many default type handlers included with BeanIO support customization through the use of one or more property elements, where the name attribute is a bean property of the type handler, and the value attribute is the property value.
Type handlers can be declared globally (for all streams in the mapping file) or for a specific stream. Globally declared type handlers may optionally use a format attribute to narrow the type handler scope to a specific stream format.
In the example below, the first DateTypeHandler is declared globally for all stream formats. The second DateTypeHandler overrides the first for java.util.Date types in an XML formatted stream, and the YNTypeHandler is declared only for the 'employeeFile' stream. Stream specific type handlers override global type handlers when declared with the same name or for the same type.
<beanio> <typeHandler type="java.util.Date" class="org.beanio.types.DateTypeHandler"> <property name="pattern" value="MMddyyyy" /> <property name="lenient" value="true" /> </typeHandler> <typeHandler type="java.util.Date" format="xml" class="org.beanio.types.DateTypeHandler"> <property name="pattern" value="yyyy-MM-dd" /> </typeHandler> <stream name="employeeFile" format="csv"> <typeHandler name="ynHandler" class="org.beanio.example.YNTypeHandler" /> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="map"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" /> <field name="exempt" typeHandler="ynHandler" /> </record> </stream> </beanio>
Collection and array field property types are also supported by BeanIO. For example, lets assume our Employee bean object contains a list of accounts.
package org.beanio.example; import java.util.Date; public class Employee { String firstName; String lastName; String title; int salary; Date hireDate; List<Integer> accounts; // getters and setters not shown... }
And lets assume our input file now looks like this:
Joe,Smith,Developer,75000,10012009 Chris,Johnson,Sales,80000,05292006,100012,200034,200045 Jane,Doe,Architect,80000,01152008 Jon,Anderson,Manager,85000,03182007,333001
In this example, the accounts bean property can be defined in the mapping file using a collection field attribute. The collection attribute can be set to the fully qualified class name of a java.util.Collection subclass, or to one of the collection type aliases below.
Class | Alias | Default Implementation |
---|---|---|
java.util.Collection | collection | java.util.ArrayList |
java.util.List | list | java.util.ArrayList |
java.util.Set | set | java.util.HashSet |
(Java Array) | array | N/A |
Collection type fields can declare the number of occurrences of the field using minOccurs and maxOccurs field attributes. If not declared, minOccurs will default to 1, and maxOccurs will default to the minOccurs value or 1, whichever is greater. If the number of field occurences is variable (i.e. maxOccurs is greater than minOccurs), the field must be the last field in the record.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
<field name="accounts" type="int" collection="list" minOccurs="0" maxOccurs="unbounded" />
</record>
</stream>
</beanio>
Fixed length fields require a little extra configuration than their delimited counterparts. Let's redefine our employee file example using the fixed length format below.
Position | Field | Format | Length |
---|---|---|---|
0 | First Name | Text | 10 |
10 | Last Name | Text | 10 |
20 | Job Title | Text | 10 |
30 | Salary | Number | 6 |
36 | Hire Date | Date (MMDDYYYY) | 8 |
A fixed length version of the employee file might look like the following:
Joe Smith Developer 07500010012009 Jane Doe Architect 08000001152008 Jon Anderson Manager 08500003182007
The length of a fixed length field must be configured using the length field attribute. By default, fixed length fields are left justified and padded with spaces, but these settings can be overridden using the padding and justify field attributes. Field padding can be set to any single character, and field justification can be set to left or right. Using these attributes, our mapping file can now be updated as follows:
<beanio> <stream name="employeeFile" format="csv"> <record name="employee" class="org.beanio.example.Employee"> <field name="firstName" length="10" /> <field name="lastName" length="10" /> <field name="title" length="10" /> <field name="salary" length="6" padding="0" justify="right" /> <field name="hireDate" length="8" format="MMddyyyy" /> </record> </stream> </beanio>
The configured padding character is removed from the beginning of the field if right justified, or from the end of the field if left justified, until a character is found that does not match the padding character. If the entire field is padded, Number property types default to the padding character if it is a digit, and the padding character is ignored for Character types. To illustrate this, some examples are shown in the table below.
Justify | Type | Padding | Padded Text | Unpadded Text |
---|---|---|---|---|
left | String | " " | "George " | "George" |
" " | "" | |||
Character | " " | "A" | "A" | |
" " | " " | |||
right | Number | "0" | "00123" | "123" |
"00000" | "0" | |||
"9" | "00000" | "00000" | ||
"99999" | "9" | |||
"X" | "XXXXX" | "" |
The marshalling and unmarshalling behavior of null field values for a padded field is further controlled using the required attribute. If required is set to true, null field values are marshalled by filling the field with the padding character. If required is set to false, a null field value is marshalled as spaces for fixed length streams and an empty string for non-fixed length streams. Similarly, if required is set to false, spaces are unmarshalled to a null field value regardless of the padding character. To illustrate this, the following table shows the field text for a right justified zero padded 3 digit number.
Required | Field Value | Field Text (Fixed Length) |
Field Text (Non-Fixed Length) |
---|---|---|---|
true | 0 | "000" | "000" |
null | "000"1 | "000"1 | |
false | 0 | "000" | "000" |
null | " " | "" |
1 Applies to marshalling only. Unmarshalling "000" would produce a field value of 0.
As hinted to above, padding settings can be applied to any field for any stream type.
If a bean property does not map to a field in the input or output stream, the property value can still be set using a property element. Like a field, all properties must specify a name attribute, which by default, is used to get and set the property value from the bean object. Properties also require a value attribute for setting the textual representation of the property value. The value text is type converted using the same rules and attributes (type, typeHandler and format) used for field type conversion described above. Collection type properties are not supported.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="map">
<property name="recordType" value="employee" />
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
</record>
</stream>
</beanio>
Properties are particularly useful in two scenarios:
The bean class mapped to a record can be divided into nested bean objects using a bean element. First, let's suppose we store an address in our CSV employee file, so that the record layout might look like this:
Position | Field | Format |
---|---|---|
0 | First Name | Text |
1 | Last Name | Text |
2 | Job Title | Text |
3 | Salary | Number |
4 | Hire Date | Date (MMDDYYYY) |
5 | Street | Text |
6 | City | Text |
7 | State | Text |
8 | Zip | Text |
Second, lets suppose we want to store address information in a new Address bean object like the one below, and add an Address reference to our Employee class.
package org.beanio.example;
public class Address {
String street;
String city;
String state;
String zip;
// getters and setters not shown...
}
package org.beanio.example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
Address mailingAddress;
// getters and setters not shown...
}
With this information, we can now update our employee CSV mapping file to accomodate the nested Address object. A bean element must include name and class attributes. By default, the name attribute is used to determine the getter and setter on its parent bean or record. Optionally, getter or setter attributes can be used to override the default property name similar to a field property. And the class attribute must be set to the fully qualified class name of the bean object, or to map, or to the class name of any concrete java.util.Map implementation. If the bean class is of type java.util.Map, field values are stored in the Map under their configured field name.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
<bean name="mailingAddress" class="org.beanio.example.Address">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</bean>
</record>
</stream>
</beanio>
If needed, nested bean objects can be further divided into other bean objects. There is no limit to the number of nested levels that can be configured in a mapping file.
Similar to collection type fields, BeanIO also supports collection type beans. Continuing our previous example, let's suppose the employee CSV file may contain 1 or more addresses for each employee. Thus our Employee bean object might look like this:
package org.beanio.example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
List<Address> addressList;
// getters and setters not shown...
}
And our input file might look like this:
Joe,Smith,Developer,75000,10012009,123 State St,Chicago,IL,60614 Jane,Doe,Architect,80000,01152008,456 Main St,Chicago,IL,60611,111 Michigan Ave,Chicago,IL,60611 Jon,Anderson,Manager,85000,03182007,1212 North Ave,Chicago,IL,60614
In our mapping file, in order to designate a bean as a collection, simply set it's collection attribute to the fully qualified class name of a java.util.Collection subclass, or to one of the collection type aliaes below.
Class | Alias | Default Implementation |
---|---|---|
java.util.Collection | collection | java.util.ArrayList |
java.util.List | list | java.util.ArrayList |
java.util.Set | set | java.util.HashSet |
(Java Array) | array | N/A |
Just like a collection type field, collection type beans may declare the number of occurrences using minOccurs and maxOccurs bean attributes. If not declared, minOccurs will default to 1, and maxOccurs will default to the minOccurs value or 1, whichever is greater. If the number of occurences is variable (i.e. maxOccurs is greater than minOccurs), the bean must be the last segment in the record.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
<bean name="addressList" collection="list" minOccurs="1" maxOccurs="unbounded"
class="org.beanio.example.Address">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</bean>
</record>
</stream>
</beanio>
When working with collection type beans, there are a few restrictions to keep in mind:
A BeanReader will throw an InvalidRecordException if a record or one of its fields fails a configured validation rule. There are two types of errors reported for an invalid record: record level errors and field level errors. If a record level error occurs, further processing of the record is aborted and an excception is immediatedly thrown. If a field level error is reported, the BeanReader will continue to process the record's other fields before throwing an exception.
When an InvalidRecordException is thrown, the exception will contain the reported record and field level errors. The following code shows how this information can be accessed using the BeanReaderContext.
BeanReader in; try { Object record = in.read(); if (record != null) { // process record... } } catch (InvalidRecordException ex) { BeanReaderContext context = ex.getContext(); if (context.hasRecordErrors()) { for (String error : context.getRecordErrors()) { // handle record errors... } } if (context.hasFieldErrors()) { for (String field : context.getFieldErrors().keySet()) { for (String error : context.getFieldErrors(field)) { // handle field error... } } } } }
Alternatively, it may be simpler to register a BeanReaderErrorHandler for handling non-fatal exceptions. The example below shows how invalid records could be written to a reject file by extending BeanReaderErrorHandlerSupport.
BeanReader input;
BufferedWriter rejects;
try {
input.setErrorHandler(new BeanReaderErrorHandlerSupport() {
public void invalidRecord(InvalidRecordException ex) throws Exception {
rejects.write(ex.getContext().getRecordText());
rejects.newLine();
}
});
Object record = null;
while ((record = input.read()) != null) {
// process a valid record
}
rejects.flush();
}
finally {
input.close();
rejects.close();
}
Record and field level error messages can be customized and localized through the use of resource bundles. A resource bundle is configured at the stream level using the resourceBundle attribute as shown below.
<beanio>
<typeHandler type="java.util.Date" class="org.beanio.types.DateTypeHandler">
<property name="pattern" value="MMddyyyy" />
</typeHandler>
<stream name="employeeFile" format="csv" resourceBundle="org.beanio.example.messages" >
<record name="employee" class="map">
<field name="recordType" rid="true" literal="Detail" />
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" />
</record>
</stream>
</beanio>
Record level error messages are retrieved using the following prioritized list of keys. If a message is not configured under the name of the first key, the next key will be tried until a message is found, or a default message is used.
Similarly, field level error messages are retrieved using the following priortized list of keys:
More descriptive or localized labels can also be configured for record and field names using the keys label.[record name] and label.[record name].[field name] respectively.
For example, the following resource bundle could be used to customize a few error messages for the employee file.
# 'employee' record label: label.employee = Employee Record # 'firstName' field label: label.employee.firstName = First Name Field # Unidentified record error message: recorderror.unidentified = Unidentified record at line {0} # Type conversion error message for the 'hireDate' field: fielderror.employee.hireDate.type = Invalid date format # Maximum field length error message for all fields: fielderror.maxLength = Maximum field length exceeded for {3}
Error messages are formatted using a java.text.MessageFormat. Depending on the validation rule that was violated, different parameters are passed to the MessageFormat. Appendix B documents the parameters passed to the MessageFormat for each validation rule.
The following record level validation rules may be configured on a record element.
Attribute | Argument Type | Description |
---|---|---|
minLength | Integer | Validates the record contains at least minLength fields for delimited and CSV formatted streams, or has at least minLength characters for fixed length formatted streams. |
maxLength | Integer | Validates the record contains at most maxLength fields for delimited and CSV formatted streams, or has at most maxLength characters for fixed length formatted streams. |
BeanIO supports several common field validation rules when reading an input stream. All field validation rules are validated against the field text before type conversion. When field trimming is enabled, trim="true", all validations are performed after the field's text has first been trimmed. Field validations are ignored when writing to an output stream.
The following table lists supported field attributes for validation.
Attribute | Argument Type | Description |
---|---|---|
required | Boolean | When set to true, validates the field is present and the field text is not the empty string. |
minLength | Integer | Validates the field text is at least N characters. |
maxLength | Integer | Validates the field text does not exceed N characters. |
literal | String | Validates the field text exactly matches the literal value. |
regex | String | Validates the field text matches the given regular expression pattern. |
minOccurs | String | Applies to collection type fields only. Validates the minimum occurrences of the field in the stream. If the field is present in the stream, minOccurs is satisfied, and the required setting determines whether a value is required. |
When a common set of fields is used by multiple record types, configuration may be simplified using templates. A template is a reusable list of bean properties (fields, properties and child beans) that can be included by a record, bean or other template. The following example illustrates some of the ways a template can be used:
<beanio> <template name="address"> <field name="street1" /> <field name="street2" /> <field name="city" /> <field name="state" /> <field name="zip" /> </template> <template name="employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> <bean name="mailingAddress" template="address" class="org.beanio.example.Address" /> </template> <stream name="employeeFile" format="csv"> <record name="employee" template="employee" class="org.beanio.example.Employee" /> </stream> <stream name="addressFile" format="csv"> <record name="address" class="org.beanio.example.Address"> <field name="location" /> <include template="address"/> <field name="attention" /> </record> </stream> </beanio>
Templates are essentially copied into their destination using the include element. For convenience, record and bean elements support a template attribute which includes the template before any other children.
The include element can optionally specify a positional offset for included fields using the offset attribute. The following example illustrates this behavior. Even when using templates, remember that position must be declared for all fields or none.
<beanio>
<template name="address">
<field name="street1" position="0" />
<field name="street2" position="1" />
<field name="city" position="2" />
<field name="state" position="3" />
<field name="zip" position="4" />
</template>
<stream name="addressFile" format="csv">
<record name="address" class="org.beanio.example.Address">
<field name="location" position="0" />
<include template="address" offset="1"/>
<field name="attention" position="6" />
</record>
</stream>
</beanio>
This section provides further details for using BeanIO to marshall and unmarshall Java objects to and from XML formatted streams. This section assumes you are already familiar with the mapping file concepts documented in previous sections.
BeanIO is similar to other OXM (Object to XML Mapping) libraries, except that it is also capable of marshalling and unmarshalling extremely large XML files by reading and writing Java beans one record at a time. BeanIO uses a streaming XML (StAX) parser to read and write XML, and will never hold more than the minimum amount of XML in memory needed to marshall or unmarshall a single bean object. That said, it is still possible to run out of memory (heap space) with poorly designed XML documents and/or misconfigured mapping files.
Before diving into the details, let's start with a basic example using the employee input file from Section 2.1 after it's been converted to XML (shown below).
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> </employee> <employee> <firstName>Jane</firstName> <lastName>Doe</lastName> <title>Architect</title> <salary>80000</salary> <hireDate>2008-01-15</hireDate> </employee> <employee> <firstName>Jon</firstName> <lastName>Andersen</lastName> <title>Manager</title> <salary>85000</salary> <hireDate>2007-03-18</hireDate> </employee> </employeeFile>
In this example, let's suppose we are unmarshalling the XML employee file into the same Employee bean object from Section 2.1 and repeated below.
package org.beanio.example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
// getters and setters not shown...
}
Our original mapping file from Section 2.1 can now be updated to parse XML instead of CSV with only two minor changes. First, the stream format is changed to xml. And second, the hire date field format is removed and replaced with type="date". With XML, the date format does not need to be explicity declared because it conforms to the W3C XML Schema date syntax. (This will be further explained in Section 5.7.1).
<beanio xmlns="http://www.beanio.org/2011/01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2011/01 http://www.beanio.org/2011/01/mapping.xsd"> <stream name="employeeFile" format="xml"> <record name="employee" class="org.beanio.example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" type="date" /> </record> </stream> </beanio>
That's it! No Java code changes are required, and as before, Employee bean objects will be unmarshalled from the XML input stream each time beanReader.read() is called.
And also as before, Employee objects can be marshalled to an XML output stream using beanWriter.write(Object). However, please note that when marshalling/writing XML, it is even more important to call beanWriter.close() so that the XML document can be properly completed.
Because BeanIO is built like a pull parser, it does not support XML validation against a DTD or XML schema. Where this functionality is needed, it is recommended to make two passes on the input document. The first pass can use a SAX parser or other means to validate the XML, and the second pass can use BeanIO to parse and process bean objects read from the document.
Each BeanIO mapping element (stream, group, record, bean and field), is mapped to an XML element with the same local name. When the name of the stream, group, etc. does not match the XML element name, the xmlName attribute can be used. For example, if the name of the root element in the previous example's employee file is changed from "employeeFile" to "employees", and "title" was renamed "position", the mapping file could be updated as shown below.
<beanio> <stream name="employeeFile" format="xml" xmlName="employees"> <record name="employee" class="org.beanio.example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" xmlName="position" /> <field name="salary" /> <field name="hireDate" type="date" /> </record> </stream> </beanio>
XML namespaces can be enabled by declaring namespaces using a xmlNamespace attribute on any mapping element (stream, group, record, bean or field). By default, all mapping elements inherit their namespace (or lack thereof) from their parent. When a namespace is delcared, the local name and namespace must match when unmarshalling XML, and appropriate namespace declarations are included when marshalling bean objects. For example, let's suppose our employee file contains namespaces as shown below.
<?xml version="1.0"?> <employeeFile xmlns="http://example.com/employeeFile" xmlns:n="http://example.com/name"> <e:employee xmlns:e="http://example.com/employee"> <n:firstName>Joe</n:firstName> <n:lastName>Smith</n:lastName> <e:title>Developer</e:title> <e:salary>75000</e:salary> <e:hireDate>2009-10-12</e:hireDate> </e:employee> . . . </employeeFile>
To unmarshall the file using namespaces, and to marshall Employee bean objects in the same fashion as they appear above, the following mapping file can be used.
<beanio> <stream name="employeeFile" format="xml" xmlNamespace="http://example.com/employeeFile"> <writer> <property name="namespaces" value="n http://example.com/name"/> </writer> <record name="employee" class="org.beanio.example.Employee" xmlNamespace="http://example.com/employee" xmlPrefix="e"> <field name="firstName" xmlNamespace="http://example.com/name" /> <field name="lastName" xmlNamespace="http://example.com/name" /> <field name="title" /> <field name="salary" /> <field name="hireDate" type="date" /> </record> </stream> </beanio>
From this example, the following behavior can be observed:
BeanIO also supports a special wildcard namespace. If xmlNamespace is set to '*', any namespace is allowed when unmarshalling XML, and no namespace declaration will be made when marshalling XML.
The following table summarizes namespace configuration options and their effect on the configured element and a child that inherits it's parent namespace.
Mapping Configuration | Marshalled Element And Child |
---|---|
[None] |
<element> <child/> </element> |
xmlNamespace="*" |
<element> <child/> </element> |
xmlNamespace="" |
<element xmlns=""> <child/> </element> |
xmlNamespace="http://example.com" |
<element xmlns="http://example.com"> <child/> </element> |
xmlNamespace="http://example.com" xmlPrefix="e" |
<e:element xmlns="http://example.com"> <e:child/> </e:element> |
When unmarshalling multiple records from an XML document, the stream configuration is mapped to the root element in the XML formatted stream. This default behavior has been demonstrated in previous examples. If on the other hand, an XML document contains only a single record, the document can be fully read or written by setting the stream configuration's xmlType attribute to none. This behavior is similar to other OXM libraries that marshall or unmarshall one bean object per XML document.
For example, if BeanIO was used to unmarshall a single employee record submitted via a web service, the XML document might look like the following. Notice there is no 'employeeFile' root element for containing multiple employee records.
<employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> </employee>
In this example, the following highlighted changes can be made to our mapping file to allow BeanIO to unmarshall/marshall a single employee record.
<beanio>
<stream name="employeeFile" format="xml" xmlType="none">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
</record>
</stream>
</beanio>
Like other mapping elements, groups are also mapped to XML elements by default. Or if a group is used only for control purposes, the group's xmlType attribute can be set to none.
A record is always mapped to an XML element. As we've seen before, records are matched based on their group context and configured record identifying fields. XML records are further matched using their XML element name, as defined by xmlName, or if not present, name. Other than record identifying fields, fields and beans contained within the record are not used for matching.
For example, let's suppose our employee file differentiated managers using 'manager' tags.
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> </employee> <employee> <firstName>Jane</firstName> <lastName>Doe</lastName> <title>Architect</title> <salary>80000</salary> <hireDate>2008-01-15</hireDate> </employee> <manager> <firstName>Jon</firstName> <lastName>Andersen</lastName> <title>Manager</title> <salary>85000</salary> <hireDate>2007-03-18</hireDate> </manager> </employeeFile>
To map managers to a new Manager bean we could use the following mapping configuration.
<beanio>
<stream name="employeeFile" format="xml">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
</record>
<record name="manager" class="org.beanio.example.Manager">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
</record>
</stream>
</beanio>
Do not forget BeanIO will enforce record order in this example unless ordered is set to false on the stream, or unless each record is assigned the same order number.
A field is mapped to XML using the field definition's xmlType attribute, which defaults to element. The field XML type can be set to element, attribute, text, or none. The following table illustrates possible configurations, except for none which is not covered here.
Record Definition | Sample Record |
---|---|
<record name="person" class="map"> <field name="name" xmlType="element"/> </person> |
<person> <name>John</name> </person> |
<record name="person" class="map"> <field name="name" xmlType="attribute"/> </person> |
<person name="John"/> |
<record name="person" class="map"> <field name="name" xmlType="text"/> </person> |
<person>John</person> |
Field type conversion works the same way for XML formatted streams as it does for other formats. However, several default type handlers are overridden specifically for XML formatted streams to conform with W3C XML Schema built-in data types according to this specification. The following table summarizes overriden type handlers:
Class or Type Alias | XML Schema Data Type | Example |
---|---|---|
date | date | 2011-01-01 |
datetime or java.util.Date | dateTime | 2011-01-01T15:14:13 |
time | time | 15:14:13 |
boolean | boolean | true |
Like other type handlers, XML specific type handlers can be customized or completely replaced. Please consult BeanIO javadocs for customization details.
The nillable and minOccurs field attributes control how a null field value is marshalled. If minOccurs is 0, an element or attribute is not marshalled for the field. If an element type field has nillable set to true and minOccurs set to 1, the W3C XML Schema Instance attribute nil is set to true.
This behavior is illustrated in the following table.
Record Definition | Marshalled Record (Field Value is Null) |
---|---|
<record name="person" class="map"> <field name="name" xmlType="element" /> </person> |
<person/> |
<record name="person" class="map"> <field name="name" xmlType="element" minOccurs="1" /> </person> |
<person> <name/> </person> |
<record name="person" class="map"> <field name="name" xmlType="element" minOccurs="1" nillable="true"/> </person> |
<person> <name xsi:nil="true"/> </person> |
<record name="person" class="map"> <field name="name" xmlType="attribute"/> </person> |
<person/> |
<record name="person" class="map"> <field name="name" xmlType="attribute" minOccurs="1"/> </person> |
<person name=""/> |
<record name="person" class="map"> <field name="name" xmlType="text"/> </person> |
<person/> |
Nested bean definitions can be mapped to XML elements or used to group fields. This difference can be explored using the Address and Employee beans defined in Section 4.4 and repeated here.
package org.beanio.example;
public class Address {
String street;
String city;
String state;
String zip;
// getters and setters not shown...
}
package org.beanio.example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
Address mailingAddress;
// getters and setters not shown...
}
By default, a bean definition's xmlType is set to element, so it is not necessary to declare it in the mapping file below.
<beanio>
<stream name="employeeFile" format="xml">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
<bean name="mailingAddress" class="org.beanio.example.Address" xmlType="element">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</bean>
</record>
</stream>
</beanio>
This mapping configuration can be used to process the sample XML document below. When a bean is mapped to an XML element, nillable and minOccurs will control the marshalling behavior of null bean objects in the same fashion as a field (see Section 5.7.2).
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> <mailingAddress> <street>123 Main Street</street> <city>Chicago</city> <state>IL</state> <zip>12345</zip> </mailingAddress> </employee> . . . </employeeFile>
Alternatively, if the bean definition's xmlType is set to none, the following XML document can be processed.
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> <street>123 Main Street</street> <city>Chicago</city> <state>IL</state> <zip>12345</zip> </employee> . . . </employeeFile>
In some cases, an XML document may contain elements that do not map directly to a bean or field. In these situations, the xmlWrapper attribute may help. The xmlWrapper attribute is used to configure the local name of an XML element that wraps a record, bean or field. The wrapper element uses the same namespace as the element (record, bean or field) for which it was configured.
Extending the previous example, let's suppose the Employee bean object is modified to hold a list of addresses.
package org.beanio.example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
List<Address> addressList;
// getters and setters not shown...
}
And let's further suppose that each employee's list of addresses is enclosed in a new element called addresses.
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> <addresses> <mailingAddress> <street>123 Main Street</street> <city>Chicago</city> <state>IL</state> <zip>12345</zip> </mailingAddress> </addresses> </employee> . . . </employeeFile>
The mapping file can now be updated as follows:
<beanio>
<stream name="employeeFile" format="xml">
<record name="employee" class="org.beanio.example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
<bean name="mailingAddress" class="org.beanio.example.Address" collection="list"
minOccurs="0" maxOccurs="unbounded" xmlWrapper="addresses">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</bean>
</record>
</stream>
</beanio>
The following table illustrates different xmlWrapper effects based on the xmlType of a field, and the effect of minOccurs and nillable when marshalling null field values.
Record Definition | Non-Null Field Value | Null Field Value |
---|---|---|
<field name="field" xmlType="element" xmlWrapper="wrapper" minOccurs="0" /> |
<wrapper> <field>value</field> </wrapper> |
- |
<field name="field" xmlType="element" xmlWrapper="wrapper" minOccurs="1" nillable="true" /> |
<wrapper> <field xsi:nil="true"/> </wrapper> |
|
<field name="field" xmlType="element" xmlWrapper="wrapper" minOccurs="1" /> |
<wrapper> <field/> </wrapper> |
|
<field name="field" xmlType="attribute" xmlWrapper="wrapper" minOccurs="0" /> |
<wrapper field="value"/> |
- |
<field name="field" xmlType="attribute" xmlWrapper="wrapper" minOccurs="1" /> |
<wrapper field=""/> |
|
<field name="field" xmlType="text" xmlWrapper="wrapper" minOccurs="1" /> |
<wrapper>value</wrapper> |
<wrapper /> |
<field name="field" xmlType="text" xmlWrapper="wrapper" minOccurs="1" nillable="true" /> |
<wrapper xsi:nil="true"/> |
|
<field name="field" xmlType="text" xmlWrapper="wrapper" minOccurs="0" /> |
- |
Wrapped collection fields (and beans) behave a little differently depending on whether the collection is null or empty.
Record Definition | Collection | Null Collection | Empty Collection |
---|---|---|---|
<field name="field" collection="list" minOccurs="0" xmlType="element" xmlWrapper="wrapper" /> |
<wrapper> <field>value1</field> <field>value2</field> </wrapper> |
- |
<wrapper /> |
<field name="field" collection="list" minOccurs="0" xmlType="element" xmlWrapper="wrapper" nillable="true" /> |
- |
<wrapper xsi:nil="true"/> |
|
<field name="field" collection="list" minOccurs="1" xmlType="element" xmlWrapper="wrapper" nillable="false" /> |
<wrapper> <field/> </wrapper> |
<wrapper> <field/> </wrapper> |
As of release 1.2, BeanIO can be used to read and write flat files with Spring Batch (2.1.x), a batch processing framework by SpringSource.
The class org.beanio.spring.BeanIOFlatFileItemReader implements Spring Batch's ItemReader interface and can be used to read flat files using a BeanIO stream mapping file. The following Spring bean definition shows a BeanIO item reader configuration that loads a BeanIO mapping file called 'mapping.xml' from the classpath to read a file called 'in.txt'. The location of the mapping file is set using the streamMapping property, and the name of the stream layout is specified using the streamName property.
<bean id="itemReader" class="org.beanio.spring.BeanIOFlatFileItemReader"> <property name="streamMapping" value="classpath:/mapping.xml" /> <property name="streamName" value="stream" /> <property name="resource" value="file:in.txt" /> </bean>
Similarly, the class org.beanio.spring.BeanIOFlatFileItemWriter implements Spring Batch's ItemWriter interface and can be used to write flat files using a BeanIO stream mapping file. The following Spring bean definition shows a BeanIO item writer configuration that loads a BeanIO mapping file called 'mapping.xml' from the classpath to write a file called 'out.txt'.
<bean id="itemWriter" class="org.beanio.spring.BeanIOFlatFileItemWriter"> <property name="streamMapping" value="classpath:/mapping.xml" /> <property name="streamName" value="stream" /> <property name="resource" value="file:out.txt" /> </bean>
BeanIO item readers and writers are restartable, and support many of the same properties supported by the flat file item reader and writer included with Spring Batch. Please refer to their API documentation for details.
By default, a BeanIO item reader/writer creates its own stream factory, but in cases where this could cause one or more mapping files to be loaded multiple times, it may be preferable to create a shared stream factory instance. BeanIO's org.beanio.spring.BeanIOStreamFactory class can be used to create a shared stream factory that can be injected into BeanIO item readers and writers. The following Spring beans configuration file illustrates this:
<beans> <bean id="streamFactory" class="org.beanio.spring.BeanIOStreamFactory"> <property name="streamMappings"> <list> <value>classpath:/mapping1.xml</value> <value>file:/mapping2.xml</value> </list> </property> </bean> <bean id="itemReader" class="org.beanio.spring.BeanIOFlatFileItemReader"> <property name="streamFactory" ref="streamFactory" /> <property name="streamName" value="stream" /> <property name="resource" value="file:in.txt" /> </bean> </beans>
In some cases, BeanIO behavior can be controlled by setting optional property values. Properties can be set using System properties or a property file. BeanIO will load configuration setting in the following order of priority:
The name and location of beanio.properties can be overridden using the System property org.beanio.configuration. In the following example, configuration settings will be loaded from the file named config/settings.properties, first relative to the application's working directory, and if not found, then from the root of the application's classpath.
java -Dorg.beanio.configuration=config/settings.properties org.beanio.example.Main
The following configuration settings are supported by BeanIO:
Property | Description | Default |
---|---|---|
org.beanio.propertyEscapingEnabled | Whether property values (for typeHandler, reader and writer elements) support escape patterns for line feeds, carriage returns, tabs, etc. Set to true or false. | true |
org.beanio.marshalDefaultEnabled | Whether a configured field default is marshalled for null property values. May be disabled for backwards compatibility by setting the value to false. | true |
org.beanio.defaultDateFormat | Sets the default SimpleDateFormat pattern for date type fields in CSV, delimited and fixed length file formats. | DateFormat. getDateInstance() |
org.beanio.defaultDateTimeFormat | Sets the default SimpleDateFormat pattern for datetime type fields in CSV, delimited and fixed length file formats.. | DateFormat. getDateTimeInstance() |
org.beanio.defaultTimeFormat | Sets the default SimpleDateFormat pattern for time type fields in CSV, delimited and fixed length file formats.. | DateFormat. getTimeInstance() |
org.beanio.xml.defaultXmlType | Sets the default XML type for a field in an XML formatted stream. May be set to element or attribute. | element |
org.beanio.xml.xsiNamespacePrefix | Sets the default prefix for the namespace http://www.w3.org/2001/XMLSchema-instance. | xsi |
Appendix A is the complete reference for the BeanIO mapping file schema. The root element of a mapping file is beanio with namespace http://www.beanio.org/2011/01. The following notatiion is used to indicate the allowed number of child elements:
The beanio element is the root element for a BeanIO mapping configuration file.
Children: import*, typeHandler*, template*, stream*
The import element is used to import type handlers, templates and stream definitions from an external mapping file. Stream definitions declared in a mapping file being imported are not affected by global type handlers or templates declared in the file that imported it.
Attributes:
Attribute | Description | Required |
---|---|---|
resource | The name of the resource to import.
The resource name must be qualified with 'classpath:' to load the resource from the classpath, or with 'file:' to load the file relative to the application's working directory. |
Yes |
A typeHandler element is used to declare a custom field type handler that implements the org.beanio.types.TypeHandler interface. A type handler can be registered for a specific Java type, or registered for a Java type and stream format combination, or explicitly named.
Attributes:
Attribute | Description | Required |
---|---|---|
name | The type handler name. A field can always reference a type
handler by name, even if the stream format does not match the
configured type handler format attribute.
When configured, the name of a globally declared type handler must be unique within a mapping and any imported mapping files. |
One of name or type is required. |
type | The fully qualified classname or type alias to register the type handler for. If format is also set, the type handler will only be used by streams that match the configured format. | One of name or type is required. |
class | The fully qualified classname of the TypeHandler implementation. | Yes |
format | When used in conjunction with the type attribute, a type handler can be registered for a specific stream format. Set to xml, csv, delimited, or fixedlength. If not set, the type handler may be used by any stream format. | No |
Children: property*
A property element is used to customize other elements, such as a typeHandler, reader, or writer.
Attribute | Description | Required |
---|---|---|
name | The property name. | Yes |
value | The property value.
When used to customize a typeHandler, reader, or writer, default type handlers only are used to convert property text to an object value. String and Character type property values can use the following escape sequences: \\ (Backslash), \n (Line Feed), \r (Carriage Return), \t (Tab), and \f (Form Feed). A backslash precending any other character is ignored. |
Yes |
A property element, when used as child of a record or bean element, can be used to set constant values on a record or bean object that do not map to a field in the input or output stream. The following additional attributes are accepted in this scenario:
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
getter | The getter method used to retrieve the property value from its parent bean class. By default, the getter method is determined through intropection using the property name. | No | * |
setter | The setter method used to set the property value on its parent bean class. By default, the setter method is determined through intropection using the property name. | No | * |
rid | Record identifier indicator for marshalling/writing only. Set to true if this property is used to identify the record mapping configuration used to marshall a bean object. More than one property or field can be used for identification. Defaults to false. | No | * |
type | The fully qualified class name or type alias of the property value. By default, BeanIO will derrive the property type from the bean class. This attribute can be used to override the default or may be required if the bean class is of type Map. | No | * |
typeHandler | The name of the type handler to use for type conversion. By default, BeanIO will select a type handler based on type when set, or through introspection of the property's parent bean class. | No | * |
format | The decimal format pattern for Number type properties, or the simple
date format pattern for Date type properties.
The format value can accessed by any custom type handler that implements ConfigurableTypeHandler. |
No | * |
The template element is used to create reusable lists of bean properties.
Attributes:
Attribute | Description | Required |
---|---|---|
name | The name of the template. Template names must be unique within a mapping file and any imported mapping files. | Yes |
Children: ( field | property | bean | include )*
The include element is used to include a template in a record, bean, or another template.
Attributes:
Attribute | Description | Required |
---|---|---|
template | The name of the template to include. | Yes |
offset | The offset added to field positions included by the template. Defaults to 0. | No |
A stream element defines the record layout of an input or output stream.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the stream. | Yes | * |
format | The stream format. Either xml, csv, delimited or fixedlength | Yes | * |
mode | By default, a stream mapping can be used for both reading input streams and writing
output streams, called readwrite mode. Setting mode to read or
write instead, respectively restricts usage to a BeanReader or a
BeanWriter only, but relaxes some validations on the mapping configuration.
When mode is set read, a bean class does not require getter methods. When mode is set write, a bean class may be abstract or an interface, and does not require setter methods. |
No | * |
resourceBundle | The name of the resource bundle for customizing error messages. | No | * |
ordered | When set to false, records may appear in any order, and specifying a record order will cause a configuration error to be thrown. Defaults to true. | No | * |
minOccurs | The minimum number of times the record layout must be read from an input stream. Defaults to 0. | No | * |
maxOccurs | The maximum number of times the record layout can repeat when read from an input stream. Defaults to 1. | No | * |
xmlType | The XML node type mapped to the stream. If not specified or set to element, the stream is mapped to the root element of the XML document being marshalled or unmarshalled. If set to none, the XML input stream will be fully read and mapped to a group or record. | No | xml |
xmlName | The local name of the XML element mapped to the stream. Defaults to the stream name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to the stream. Defaults to '*' which will ignore namespaces while marshalling and unmarshalling. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
Children: reader?, writer?, typeHandler*, ( record | group )+
A reader element is used to customize or replace the default record reader factory for the stream..
Attributes:
Attribute | Description | Required |
---|---|---|
class | The fully qualified class name of the
org.beanio.stream.RecordReaderFactory implementation
to use for this stream. If not specified, one of the following default factories is
used based on the stream format: csv - org.beanio.stream.csv.CsvReaderFactory delimited - org.beanio.stream.delimited.DelimitedReaderFactory fixedlength - org.beanio.stream.fixedlength.FixedLengthReaderFactory xml - org.beanio.stream.xml.XmlReaderFactory Overriding the record reader factory for XML is not supported. |
No |
Children: property*
A writer element is used to customize or replace the default record writer factory for the stream..
Attributes:
Attribute | Description | Required |
---|---|---|
class | The fully qualified class name of the
org.beanio.stream.RecordWriterFactory implementation
to use for this stream. If not specified, one of the following default factories is
used based on the stream format: csv - org.beanio.stream.csv.CsvWriterFactory delimited - org.beanio.stream.delimited.DelimitedWriterFactory fixedlength - org.beanio.stream.fixedlength.FixedLengthWriterFactory xml - org.beanio.stream.xml.XmlWriterFactory Overriding the record writer factory for XML is not supported. |
No |
Children: property*
A group element is used to group records together for validating occurrences of the group as a whole.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the group. | Yes | * |
order | The order this group must appear within its parent group or stream. Unless the stream is unordered, order will default to the next sequential number following the order of the previous record/group at the same level. If this is the first record/group, order defaults to 1. | No | * |
minOccurs | The minimum number of occurences of this group within its parent group or stream. Defaults to 1. | No | * |
maxOccurs | The maximum number of occurences of this group within its parent group or stream. Defaults to unbounded. | No | * |
xmlType | The XML node type mapped to this group. If not specified or set to element, this group is mapped to an XML element. When set to none, this group is used only to define expected record sequencing. | No | xml |
xmlName | The local name of the XML element mapped to this group. Defaults to the group name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this group. Defaults to the namespace declared for the parent stream or group definition. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
Children: record*
A record is used to define a record mapping within a stream.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the record. | Yes | * |
order | The order this record must appear within its parent group or stream. Unless the stream is unordered, order will default to the next sequential number following the order of the previous record/group at the same level. If this is the first record/group, order defaults to 1. | No | * |
minOccurs | The minimum number of occurences of this record within its parent group or stream. Defaults to 1. | No | * |
maxOccurs | The maximum number of occurrences of this record within its parent group or stream. Defaults to unbounded. | No | * |
minLength | If the stream format is delimited or csv, minLength is the minimum number
of fields required by this record. Defaults to the number of fields defined for the record.
If the stream format is fixedlength, minLength is the minimum number of characters required by this record. Defaults to the sum of all field lengths definied for the record. |
No | csv, delimited, fixedlength |
maxLength | If the stream format is delimited or csv, maxLength is the maximum number
of fields allowed by this record. Defaults to the number of fields defined for the record, or
if no fields are declared, then unbounded.
If the stream format is fixedlength, maxLength is the maximum number of characters allowed by this record. Defaults to the sum of all field lengths defined for the record, or if no fields are declared, then unbounded. |
No | csv, delimited, fixedlength |
class | The fully qualified class name of the bean object mapped to this record. If not set, a BeanReader will fully validate the record, but no bean object will be returned and the reader will continue reading the next record. If set to map or any java.util.Map implementation, a Map object will be used with field names for keys and field values for values. | No | * |
template | The name of the template to include. Children of this record are added after all properties included from the template. | No | * |
xmlName | The local name of the XML element mapped to this record. Defaults to the record name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this record. Defaults to the namespace declared for this record's parent group or stream. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
Children: ( field | property | bean | include )*
A bean element is used to map fields and other bean objects to a parent bean or record.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the bean. Unless a getter and/or setter is defined, the bean name is used for getting and setting the bean from its parent bean object. | Yes | * |
getter | The getter method used to retrieve the bean property value from the bean class of it's parent. By default, the getter method is determined through intropection using the bean name. | No | * |
setter | The setter method used to set the bean property value on the bean class of it's parent. By default, the setter method is determined through intropection using the bean name. | No | * |
class | The fully qualified class name of the object mapped to this bean. If set to map or any java.util.Map implementation, a Map object will be used with field/bean names for keys and field/bean values for values. | Yes | * |
template | The name of the template to include. Children of this bean are added after all properties included from the template. | No | * |
collection | If the parent bean property type of this bean is a collection, collection
is the fully qualified collection class name or collection type alias of it's parent bean
property type, and type becomes the property type of the collection item.
May be set to array if the collection type is a Java array.
BeanIO will not derrive the collection type from it's parent bean object, thus collection type beans must always be explicitly declared. There are a few restrictions specific to beans in any "flat" format (delimited, CSV or fixedlength):
|
No | * |
minOccurs | The minimum consecutive occurrences of this bean.
For CSV, delimited and fixed length streams, minOccurs defaults to 1, and should only be overridden for collection type beans. For XML streams, minOccurs defaults to 1 if nillable is true, or 0 otherwise. minOccurs controls whether an element is marshalled for a null bean object or required during unmarshalling. During unmarshalling, if the configured minimum occurrences is not met, an InvalidRecordException is thrown. |
No | * |
maxOccurs | The maximum consecutive occurrences of this bean. By default,
maxOccurs is set to minOccurs or 1, whichever is greater. If overridden for
a CSV, delimited or fixed length stream, the value can only exceed minOccurs if the bean
appears at the end of a record. If there is no limit to the number of occurrences, the value may
be set to unbounded.
Maximum occurrences is not used for validation. When bounded, the size of a bean collection will not exceed the configured value, and additional occurrences are ignored. |
No | * |
xmlType | The XML node type mapped to this bean. If not specified or set to element, this bean is mapped to an XML element. If set to none, fields and nested beans belonging to this bean are expected to be contained by this bean's parent record or bean. | No | xml |
xmlName | The local name of the XML element mapped to this bean. Defaults to the bean name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this bean. Defaults to the namespace declared for the parent record or bean definition. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
xmlWrapper | The local name of the XML element that wraps this bean element. The XML wrapper element uses the same namespace of the element it wraps. If a bean object is null when marshalled, minOccurs also controls whether the wrapper element is marshalled. By default, elements are not wrapped. | No | xml |
nillable | Set to true if the W3C Schema Instance attribute nil should be set to true when the marshalled bean object is null. Defaults to false. Nillable collection type beans that use a wrapper element will have the nil attribute set on the wrapping element when the bean collection value is null or empty. | No | xml |
Children: ( field | property | bean | include )*
A field element is used to define and map a field from a record to a bean property and vice versa.
Attributes:
Attribute | Description | Required | Formats |
---|---|---|---|
name | The name of field. Unless a getter and/or setter is defined, the field name is used for the bean property name. | Yes | * |
getter | The getter method used to retrieve the field property value from the bean class of the record. By default, the getter method is determined through intropection using the field name. | No | * |
setter | The setter method used to set the field property value on the bean class of the record. By default, the setter method is determined through intropection using the field name. | No | * |
rid | Record identifier indicator. Set to true if this field is used to identify the record. More than one field can be used to identify a record. Defaults to false. | No | * |
position | For delimited and CSV formatted streams, position is the index of the field
within the record, beginning at 0. And for fixed length formatted streams, position
is the index of the first character of the field within the record, beginning at 0.
If the field is a collection, or the field is a property of a bean that is a collection, position should be set based on the first occurrence of the field in a record. A position must be specified for all fields in a record, or for none at all. If positions are not specified, BeanIO will automatically calculate field positions based on the order in which the fields are defined in the mapping file. |
No | csv, delimited, fixedlength |
trim | Set to true to trim the field text before validation and type conversion. Defaults to false. | No | * |
required | Set to true if this this field is required. If this field is required, and its field text is empty or the field is not present in the record, a BeanReader will throw an exception when reading the record. Defaults to false. | No | * |
minLength | The minimum length of the field text before type conversion. | No | * |
maxLength | The maximum length of the field text before type conversion. | No | * |
regex | The regular expression pattern the field text must match. | No | * |
literal | Sets the literal or constant value of this field. When reading an input stream, a BeanReader will throw an exception if the field text does not match the literal value. | No | * |
default | The default value of this field.
When unmarshalling a stream, this value is set on the bean object when the field text is null or the empty string. And when marshalling, the default value is used when the property value is null or ignore is set to true (unless disabled). The default value is converted to a Java object using the same type handler configured for this field. |
No | * |
type | The fully qualified class name or type alias of the field value. By default, BeanIO will derrive the field type from the bean class of the record. This attribute can be used to override the default or may be required if the bean class of the record is a Map. | No | * |
collection | If the bean property type of this field is a collection, collection
is the fully qualified collection class name or collection type alias of the bean
property type, and type becomes the property type of the collection item.
May be set to array if the collection type is a Java array.
BeanIO will not derrive the collection type from the record bean object, thus collection type fields must always be explicitly declared. |
No | * |
minOccurs | The minimum consecutive occurrences of this field in a record.
For CSV, delimited and fixed length streams, minOccurs defaults to 1, and should only be overridden for collection type fields. For XML streams, minOccurs defaults to 1 if nillable is true, or 0 otherwise. minOccurs controls whether an element is marshalled for a null field object or required during unmarshalling. During unmarshalling, if the configured minimum occurrences is not met, an InvalidRecordException is thrown. |
No | * |
maxOccurs | The maximum consecutive occurrences of this field in a record. By default,
maxOccurs is set to minOccurs or 1, whichever is greater. If overridden
for a non-XML stream format, the value can only exceed minOccurs if this is the last field
in the record. The value may be set to unbounded if there is no limit to the
number of occurrences of this field.
Maximum occurrences is not used for validation. When bounded, the size of a collection will not exceed the configured value, and additional occurrences are ignored. |
No | * |
format | The decimal format pattern for Number field properties, or the simple
date format pattern for Date field properties.
The format value can accessed by any custom type handler that implements ConfigurableTypeHandler. |
No | * |
typeHandler | The name of the type handler to use for type conversion. By default, BeanIO will select a type handler based on the field type when set, or through introspection of this field's parent record or bean class. | No | * |
ignore | Set to true if this field is not a property of the record bean class. Defaults to false. | No | * |
length | The padded length of this field measured in characters. Length is required for fixed length formatted streams, and can be set for fields in other stream formats (along with a padding character) to enable field padding. | Yes1 | * |
padding | The character used to pad this field. For fixed length formatted streams,
padding defaults to a space. For non-fixed length formatted streams,
padding is disabled unless a padding character and length is specified.
If padding is enabled, the required field attribute has some control over the marshalling and unmarshalling of null values. When unmarshalling a field consisting of all spaces in a fixed length stream, if required is false, the field is accepted regardless of the padding character. If required is true, a required field validation error is triggered. And when marshalling a null field value, if required is false, the field text is formatted as spaces regardless of the configured padding character. In other stream formats that are not fixed length, null field values are unmarshalled and marshalled as empty strings when required is false. When required is true, unmarshalling an empty string will trigger a required field validation error, and marshalling a null value will fill the field text with the padding character up to the padded length of the field. |
No | * |
justify | The justification of the field text within this padded field. Either left or right. Defaults to left. | No | * |
xmlType | The XML node type mapped to this field. The type can be set to element (default)
to map this field to an XML element, attribute to map to an XML attribute, or text
to map the field value to the enclosed text of the parent record or bean element. May also be
set to none if a default value is provided.
When set to text, xmlName and xmlNamespace have no effect. |
No | xml |
xmlName | The local name of the XML element or attribute mapped to this field. Defaults to the field name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this field. Defaults to the namespace configured for the parent record or bean definition. | No | xml |
xmlPrefix | The namespace prefix assigned to the configured xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
xmlWrapper | The local name of the XML element that wraps this field. The XML wrapper element uses the same namespace configured for the field it wraps. If a field value is null when marshalled, minOccurs also controls whether the wrapper element is marshalled. By default, fields are not wrapped. | No | xml |
nillable | Set to true if the W3C Schema Instance attribute nil should be set to true when the marshalled field value is null. Defaults to false. Nillable collection type fields that use a wrapper element will have the nil attribute set on the wrapping element when the field collection value is null or empty. | No | xml |
1Only required for fixed length fields. If a literal value is supplied for a fixed length field, length will default to the length of the literal value.
The following table shows the message parameters used to format an error message for each configurable validation rule.
Type | Rule Name | Index | Value |
---|---|---|---|
Record Error | malformed | 0 | Line Number |
unidentified | 0 | Line Number | |
unexpected | 0 | Line Number | |
1 | Record Label/Name | ||
minLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Minimum Length | ||
3 | Maximum Length | ||
maxLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Minimum Length | ||
3 | Maximum Length | ||
Field Error | required | 0 | Line Number |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
nillable | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
minLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Minimum Length | ||
5 | Maximum Length | ||
maxLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Minimum Length | ||
5 | Maximum Length | ||
length | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Fixed Length Field Length | ||
regex | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Regular Expression Pattern | ||
type | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | TypeConversionException error message. | ||
literal | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Literal value | ||
minOccurs | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field or Bean Label/Name | ||
3 | - | ||
4 | Minimum occurrences | ||
5 | Maximum occurences | ||
maxOccurs | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field or Bean Label/Name | ||
3 | - | ||
4 | Minimum occurrences | ||
5 | Maximum occurences |