Revision 2, © 2010-2013 Kevin Seim
Copies of this document may be made for your own use and for distribution to others, provided that you do not charge any fee for such copies and further provided that each copy contains this Copyright Notice, whether distributed in print or electronically.
BeanIO is an open source Java framework for reading and writing Java objects from a flat file, stream, or any String input. BeanIO is well suited for batch processing, and currently supports XML, CSV, delimited and fixed length file formats. BeanIO is licensed under the Apache 2.0 License.
BeanIO 2.1 includes the following significant enhancements:
Release 2.1 is "mostly" backwards compatible with prior 2.0.x releases. A few behavior changes are noted below that will hopefully improve your experience with BeanIO. Where noted, legacy behavior can be restored using beanio.properties configuration settings.
Release 2.0 is not backwards compatible with prior releases. Sorry. This section contains the steps you'll need to follow to update your code and mapping files.
The org.beanio.BeanReaderContext class was renamed RecordContext in order to support bean objects bound to multiple records.
The exception classes org.beanio.BeanReaderException and org.beanio.BeanWriterException are no longer abstract and may be thrown in a few rare (but fatal) scenarios. The exception classes org.beanio.BeanReaderIOException and org.beanio.BeanWriterIOException are now only thrown when the underlying input stream throws a java.io.IOException, or when a BeanReader/Writer method is invoked on a closed stream.
The org.beanio.stream.RecordReaderFactory and org.beanio.stream.RecordWriterFactory interfaces have been consolidated into the RecordParserFactory interface, which is also used to create RecordMarshaller and RecordUnmarshaller implementations for parsing individual records.
All type handlers and Spring related classes are unchanged or backwards compatible. Internal implementation classes have been moved to the org.beanio.internal package and their API may change in any release without further regard to backwards compatibility.
The mapping file namespace has changed to http://www.beanio.org/2012/03 for all elements.
Release 2.0 includes more lenient defaults for some mapping components. A new stream attribute called strict has been added to support some legacy behavior. If strict is set to true, the following behavior is enabled (which mimics prior releases):
The ordered attribute has been removed from a stream. Since release 2.0, all record and group components are unordered by default. The order attribute is still supported. If you want to continue validating record order, you can set order attributes on ordered groups and records, or set strict to true as described above to have BeanIO calculate a default order.
The reader and writer elements have been combined into a single parser element. Format specific property names have not changed. If you have overridden the default RecordReaderFactory or RecordWriterFactory, you will need to modify your class to implement RecordParserFactory instead.
The minOccurs attribute for a record now defaults to 0, instead of 1.
All bean elements should be renamed segment. A segment element supports all the functionality of a bean element (and more).
For XML formatted streams, the minOccurs attribute for a bean/segment, or a field bound to an XML element, will always default to 1. Prior to release 2.0, minOccurs defaulted to 0 if not nillable. (This is now consistent with XML Schema and hopefully simpler to remember.) The default minOccurs attribute for a field bound to an XML attribute remains 0.
The xmlWrapper attribute has been removed. XML wrappers can be replaced by segment components.
Mapping file changes are illustrated using an example in Appendix C.
If desired, BeanIO's default minOccurs value for a group, record or field can be overridden using property values. See Section 7.0 Configuration for details.
To get started with BeanIO, download the latest stable version from Google Code, extract the contents of the ZIP file, and add beanio.jar to your application's classpath.
BeanIO requires a version 1.5 JDK or higher. In order to process XML formatted streams, BeanIO also requires an XML parser based on the Streaming API for XML (StAX), as specified by JSR 173. JDK 1.6 and higher includes a StAX implementation and therefore does not require any additional libraries. JDK 1.5 users will need to include the following:
Alternatively, Maven users can declare the following dependencies in their application's POM. Note that the version numbers used below are only examples and may have changed.
<!-- BeanIO dependency --> <dependency> <groupId>org.beanio</groupId> <artifactId>beanio</artifactId> <version>2.1.0</version> </dependency> <!-- StAX dependencies for JDK 1.5 users --> <dependency> <groupId>javax.xml</groupId> <artifactId>jsr173</artifactId> <version>1.0</version> </dependency> <dependency> <groupId>com.sun.xml.stream</groupId> <artifactId>sjsxp</artifactId> <version>1.0.2</version> </dependency>
This section explores a simple example that uses BeanIO to read and write a flat file containing employee data. Let's suppose the file is in CSV format and has the following record layout:
Position | Field | Format |
---|---|---|
0 | First Name | Text |
1 | Last Name | Text |
2 | Job Title | Text |
3 | Salary | Number |
4 | Hire Date | Date (MMDDYYYY) |
A sample file is shown below.
Joe,Smith,Developer,75000,10012009 Jane,Doe,Architect,80000,01152008 Jon,Anderson,Manager,85000,03182007
Next, let's suppose we want to read records into the following Java bean for further processing. Remember that a Java bean must have a default no-argument constructor and public getters and setters for all exposed properties.
package example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
// getters and setters not shown...
}
BeanIO uses an XML configuration file, called a mapping file, to define how bean objects are bound to records. Below is a mapping file, named mapping.xml, that could be used to read the sample employee file and unmarshall records into Employee objects. The same mapping file can be used to write, or marshall, Employee objects to a file or output stream.
<beanio xmlns="http://www.beanio.org/2012/03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd"> <stream name="employeeFile" format="csv"> <record name="employee" class="example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> </stream> </beanio>
To read the employee CSV file, a StreamFactory is used to load our mapping file and create a new BeanReader instance. The BeanReader is used to unmarshall Employee objects from the file employee.csv. (For the sake of brevity, proper exception handling is not shown.)
package example; import org.beanio.*; import java.io.*; public class BeanReaderExample { public static void main(String[] args) throws Exception { // create a StreamFactory StreamFactory factory = StreamFactory.newInstance(); // load the mapping file factory.load("mapping.xml"); // use a StreamFactory to create a BeanReader BeanReader in = factory.createReader("employeeFile", new File("employee.csv")); Employee employee; while ((employee = (Employee) in.read()) != null) { // process the employee... } in.close(); } }
To write an employee CSV file, the same StreamFactory class is used to create a BeanWriter for marshalling Employee bean objects to the file employee.csv. In this example, the same mapping configuration file is used for both reading and writing an employee file.
package example; import org.beanio.*; import java.io.*; import java.util.*; public class BeanWriterExample { public static void main(String[] args) throws Exception { // create a StreamFactory StreamFactory factory = StreamFactory.newInstance(); // load the mapping file factory.load("mapping.xml"); Employee employee = new Employee(); employee.setFirstName("Jennifer"); employee.setLastName("Jones"); employee.setTitle("Marketing") employee.setSalary(60000); employee.setHireDate(new Date()); // use a StreamFactory to create a BeanWriter BeanWriter out = factory.createWriter("employeeFile", new File("employee.csv")); // write an Employee object directly to the BeanWriter out.write(employee); out.flush(); out.close(); } }
Running BeanWriterExample produces the following CSV file.
Jennifer,Jones,Marketing,60000,01012011
The org.beanio.BeanReader interface, shown below, is used to read bean objects from an input stream. The read() method returns an unmarshalled bean object for the next record or group of records read from the input stream. When the end of the stream is reached, null is returned.
The method setErrorHandler(...) can be used to register a custom error handler. If an error handler is not configured, read() simply throws the unhandled exception.
The method getRecordName() returns the name of the record (or group) mapped to the most recent bean object read from the input stream, as declared in the mapping file. And getLineNumber() returns the line number of the first record mapped to the most recent bean object read from the input stream. Additional information is available about records read from the stream by calling getRecordCount and getRecordContext. Please consult the API documentation for further information.
Before discarding a BeanReader, close() should be invoked to close the underlying input stream.
package org.beanio; public interface BeanReader { public Object read() throws BeanReaderException; public int getLineNumber(); public String getRecordName(); public int getRecordCount(); public RecordContext getRecordContext(int index); public int skip(int count) throws BeanReaderException; public void close() throws BeanReaderIOException; public void setErrorHandler(BeanReaderErrorHandler errorHandler); }
The org.beanio.BeanWriter interface, shown below, is used to write bean objects to an output stream. Calling the write(Object) method marshals a bean object to the output stream. In some cases where multiple record types are not discernible by class type or record identifying fields, the write(String,Object) method can be used to explicitly name the record type to marshal.
Before discarding a BeanWriter, close() should be invoked to close the underlying output stream.
package org.beanio; public interface BeanWriter { public void write(Object bean) throws BeanWriterException; public void write(String recordName, Object bean) throws BeanWriterException; public void flush() throws BeanWriterIOException; public void close() throws BeanWriterIOException; }
The org.beanio.Unmarshaller interface, shown below, is used to unmarshal a bean object from a String record.
package org.beanio; public interface Unmarshaller { // For all stream formats public Object unmarshal(String record) throws BeanReaderException; // For CSV and delimited formatted streams public Object unmarshal(List<String> fields) throws BeanReaderException; public Object unmarshal(String[] fields) throws BeanReaderException; // For XML formatted streams public Object unmarshal(Node node) throws BeanReaderException; public String getRecordName(); public RecordContext getRecordContext(); }
The org.beanio.Marshaller interface, shown below, is used to marshal a bean object into a String record.
package org.beanio; public interface Marshaller { public Marshaller marshal(Object bean) throws BeanWriterException; public Marshaller marshal(String recordName, Object bean) throws BeanWriterException; // For all stream formats public String toString(); // For CSV and delimited formatted streams public String[] toArray() throws BeanWriterException; public List<String> toList() throws BeanWriterException; // For XML formatted streams public Document toDocument() throws BeanWriterException; }
Marshalling a single bean object to record text is now as simple as:
String recordText = marshaller.marshal(object).toString();
BeanIO uses XML configuration files, called mapping files, to bind a stream layout to bean objects. Multiple layouts can be configured in a single mapping file using stream elements. Each stream is assigned a unique name for referencing the layout. In addition to its name, every stream must declare its format using the format attribute. Supported stream formats include csv, delimited, fixedlength, and xml. Mapping files are fully explained in the next section (4.0. The Mapping File).
<beanio xmlns="http://www.beanio.org/2012/03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd"> <stream name="stream1" format="csv"... > <!-- record layout... --> </stream> <stream name="stream2" format="fixedlength"... > <!-- record layout... --> </stream> </beanio>
The org.beanio.StreamFactory class is used to load mapping files and create BeanReader, BeanWriter, Marshaller and Unmarshaller instances. The following code snippet shows how to instantiate a StreamFactory, load a mapping file and create the various BeanIO parsers. The load(...) method loads mapping files from the file system (relative to the current working directory), while the method loadResource(...) loads mapping files from the classpath.
// create a StreamFactory StreamFactory factory = StreamFactory.newInstance(); // load 'mapping-1.xml' from the current working directory factory.load("mapping-1.xml"); // load 'mapping-2.xml' from the classpath factory.loadResource("mapping-2.xml");' // create a BeanReader to read from 'in.txt' Reader in = new BufferedReader(new FileReader("in.txt")); BeanReader beanReader = factory.createBeanReader("streamName", in); // create a BeanWriter to write to 'out.txt' Writer out = new BufferedWriter(new FileWriter("out.txt")); BeanWriter beanWriter = factory.createBeanReader("streamName", out); // create an Unmarshaller to unmarshal bean objects from record text Unmarshaller unmarshaller = factory.createUnmarshaller("streamName"); // create a Marshaller to marshal bean objects to record text Marshaller marshaller = factory.createMarshaller("streamName");
All BeanIO exceptions extend from BeanIOException, which extends from RuntimeException so that exceptions do not need to be explicitly caught unless desired. BeanReaderException and BeanWriterException extend from BeanIOException and may be thrown by a BeanReader or BeanWriter respectively.
A BeanReaderException is further broken down into the following subclasses thrown by the read() method.
Exception | Description |
---|---|
BeanReaderIOException | Thrown when the underlying input stream throws an IOException. |
MalformedRecordException | Thrown when the underlying input stream is malformed based on the configured stream format, and therefore a record could not be accurately read from the stream. In many cases, further reads from the input stream will be unsuccessful. |
UnidentifiedRecordException | Thrown when a record does not match any record definition configured in the mapping file. If the stream layout does not strictly enforce record sequencing, further reads from the input stream are likely to be successful. |
UnexpectedRecordException | Thrown when a record is read out of order. Once record sequencing is violated, further reads from the input stream are likely to be unsuccessful. |
InvalidRecordException | Thrown when a record is matched, but the record is invalid for one of the following
reasons:
|
InvalidRecordGroupException | Extends from InvalidRecordException and is thrown when one or more records in a group (that are mapped to a single bean object) are invalid. This exception has no effect on the state of the BeanReader and further reads from the input stream can be safely performed. |
BeanReaderException | Thrown directly in a few rare unrecoverable scenarios. |
When a BeanReaderException is thrown, information about the failed record(s) can be accessed by calling exception.getRecordContext() to obtain a org.beanio.RecordContext. Please refer to the API javadocs for more information.
package org.beanio; public interface RecordContext { public int getLineNumber(); public String getRecordText(); public String getRecordName(); public boolean hasErrors(); public boolean hasRecordErrors(); public Collection<String> getRecordErrors(); public String getFieldText(String fieldName); public String getFieldText(String fieldName, int index); public boolean hasFieldErrors(); public Map<String, Collection<String>> getFieldErrors(); public Collection<String> getFieldErrors(String fieldName); }
If you need to handle an exception and continue processing, it may be simpler to register a BeanReaderErrorHandler using the beanReader.setErrorHandler() method. The BeanReaderErrorHandler interface is shown below. Any exception thrown by the error handler will be rethrown by the BeanReader.
package org.beanio; public interface BeanReaderErrorHandler { public void handleError(BeanReaderException ex) throws Exception; }
The following example shows how invalid records could be written to a reject file by registering an error handler extending BeanReaderErrorHandlerSupport, a subclass of BeanReaderErrorHandler. All other exceptions are left uncaught and will bubble up to the calling method.
BeanReader input; BufferedWriter rejects; try { input.setErrorHandler(new BeanReaderErrorHandlerSupport() { public void invalidRecord(InvalidRecordException ex) throws Exception { // if a bean object is mapped to a record group, // the exception may contain more than one record for (int i=0, j=ex.getRecordCount(); i<j; i++) { rejects.write(ex.getRecordContext(i).getRecordText()); rejects.newLine(); } } }); Object record = null; while ((record = input.read()) != null) { // process a valid record } rejects.flush(); } finally { input.close(); rejects.close(); }
This section covers the basic components used by BeanIO to map an input stream or String to Java objects. All examples are shown using a mapping file, but the concepts (and most attributes) are the same whether using the stream builder API, mapping file, Java annotations, or any combination thereof.
A typical mapping file contains one or more stream layouts. A stream must have a name and format attribute configured. The name of the stream is used to reference the layout when creating a parser using a StreamFactory. And the format instructs BeanIO how to interpret the stream. Supported formats include xml, csv, delimited and fixedlength.
<beanio xmlns="http://www.beanio.org/2012/03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd"> <stream name="stream1" format="csv"... > <!-- record layout... --> </stream> <stream name="stream2" format="fixedlength"... > <!-- record layout... --> </stream> </beanio>
BeanIO parses (and formats) a record from a stream or text using a record parser generated by a RecordParserFactory. BeanIO allows you to create and customize your own RecordParserFactory, but in most cases you can simply configure BeanIO's default record parser factory using a stream's parser element. The parser element allows you to set format specific properties on a RecordParserFactory. For example, the following stream layout changes the delimiter to a pipe for the delimited stream 's1':
<beanio xmlns="http://www.beanio.org/2012/03"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd">
<stream name="s1" format="delmiited">
<parser>
<property name="delimiter" value="|" />
</parser>
<!-- record layout... -->
</stream>
</beanio>
The next few sections list available parser properties for each stream format.
CSV formatted streams are parsed according to RFC 4180 with one exception: multi-line records are disabled (but this can be overridden).
The following properties can be used to customize default CSV parsers:
Property Name | Type | Description | Affects |
---|---|---|---|
delimiter | char | The field delimiter. Defaults to a comma. | * |
quote | char | The quotation mark character used to wrap fields containing a delimiter character, a quotation mark, or new lines. Defaults to the double quotation mark, ". | * |
escape | Character | The character used to escape a quotation mark in a quoted field. Defaults to the double quotation mark, ". | * |
comments | String[] | A comma separated list of values for identifying commented lines. If a line read from an input
stream begins with any of the configured values, the line is ignored. A backslash may
be used to escape a comma and itself. All whitespace is preserved.
Enabling comments require the input reader passed to StreamFactory to support marking. Among others, Java's BufferedReader and StringReader support marking. |
BeanReader |
multilineEnabled | boolean | If set to true, quoted fields may contain new line characters. Defaults to false. | BeanReader |
whitespaceAllowed | boolean | If set to true, whitespace is ignored and allowed before and after
quoted values. For example, the following is allowed:
Jennifer, "Jones" ,24Defaults to false. |
BeanReader, Unmarshaller |
unquotedQuotesAllowed | boolean | If set to true, field text containing quotation marks do not need to
be quoted unless the field text starts with a quotation mark. For example, the
following is allowed:
Jennifer,She said "OK"Defaults to false. |
BeanReader, Unmarshaller |
recordTerminator | String | The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. | BeanWriter |
alwaysQuote | boolean | If set to true, field text is always quoted. By default, a field is only quoted if it contains a delimeter, a quotation mark or new line characters. | BeanWriter, Marshaller |
The default delimited parsers can be customized using the following properties:
Property Name | Type | Description | Affects |
---|---|---|---|
delimiter | char | The field delimiter. Defaults to the tab character. | * |
escape | Character | The escape character allowed to escape a delimiter or itself. By default, escaping is disabled. | * |
lineContinuationCharacter | Character | If this character is the last character before a new line or carriage return is read, the record will continue reading from the next line. By default, line continuation is disabled. | BeanReader |
recordTerminator | Character | The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. | BeanReader, BeanWriter |
comments | String[] | A comma separated list of values for identifying commented lines. If a line read from an input
stream begins with any of the configured values, the line is ignored. A backslash may
be used to escape a comma and itself. All whitespace is preserved.
Enabling comments require the input reader passed to StreamFactory to support marking. Among others, Java's BufferedReader and StringReader support marking. |
BeanReader |
The default fixed length parsers can be customized using the following properties:
Property Name | Type | Description | Affects |
---|---|---|---|
lineContinuationCharacter | Character | If this character is the last character before a new line or carriage return is read, the record will continue reading from the next line. By default, line continuation is disabled. | BeanReader |
recordTerminator | Character | The character used to signify the end of a record. By default, any new line character (line feed (LF), carriage return (CR), or CRLF combination) is accepted when reading an input stream, and System.getProperty("line.separator") is used when writing to a stream. | BeanReader, BeanWriter |
comments | String[] | A comma separated list of values for identifying commented lines. If a line read from an input
stream begins with any of the configured values, the line is ignored. A backslash may
be used to escape a comma and itself. All whitespace is preserved.
Enabling comments require the input reader passed to StreamFactory to support marking. Among others, Java's BufferedReader and StringReader support marking. |
BeanReader |
The default XML parsers can be customized using the following properties:
Property Name | Type | Description | Affects |
---|---|---|---|
suppressHeader | boolean | If set to true, the XML header is suppressed in the marshalled document. Defaults to false. | BeanWriter, Marshaller |
version | String | The XML header version. Defaults to 1.0. | BeanWriter, Marshaller |
encoding | String | The XML header encoding. Defaults to utf-8. Note that this setting has no bearing on the actual encoding of the output stream. If set to "", an encoding attribute is not included in the header. | BeanWriter, Marshaller |
namespaces | String | A space delimited list of XML prefixes and namespaces to declare
on the root element of a marshalled document. The property value
should be formatted as
prefix1 namespace1 prefix2 namespace2... |
BeanWriter, Marshaller |
indentation | Integer | The number of spaces to indent each level of XML. By default, indentation is disabled using a value of -1. | BeanWriter, Marshaller |
lineSeparator | String | The character(s) used to separate lines when indentation is enabled. By default, System.getProperty("line.separator") is used. | BeanWriter, Marshaller |
Each record type read from an input stream or written to an output stream must be mapped using a record element. A stream mapping must include at least one record. A record mapping is used to validate the record and bind field values to a bean object. A simple record configuration is shown below.
<beanio>
<stream name="stream1" format="csv">
<record name="record1" class="example.Record">
<field name="firstName" />
<field name="lastName" />
<field name="age" />
</record>
</stream>
</beanio>
In this example, a CSV formatted stream is mapped to a single record composed of three fields: first name, last name and age. When a record is read from a stream using a BeanReader, the class example.Record is instantiated and its firstName, lastName and age attributes are set using standard Java bean setter naming conventions (e.g. setFirstName(String)).
Similarly, when a example.Record bean object is written to an output stream using a BeanWriter, its firstName, lastName and age attributes are retrieved from the bean object using standard Java bean getter naming conventions (e.g. getFirstName()).
BeanIO also supports Map based records by setting a record's class attribute to map, or to the fully qualified class name of any class assignable to java.util.Map. Note that if you plan to use Map based records, field types may need be explicitly configured using the type attribute, or BeanIO will assume the field is of type java.lang.String The type attribute is further explained in section 4.6. Field Type Conversion.
<beanio> <stream name="stream1" format="csv"> <record name="record1" class="map"> <field name="firstName" /> <field name="lastName" /> <field name="age" type="int"/> </record> </stream> </beanio>
Oftentimes, a stream is made up of multiple record types. A typical batch file may include one header, one trailer, and zero to many detail records. BeanIO allows a record to be identified by one or more of its fields using expected literal values or regular expressions. If desired, BeanIO can be used to validate the order of all records in the input stream.
To see how a stream can be configured to handle multiple record types, let's modify our Employee file to include a header and trailer record as shown below. Each record now includes a record type field that identifies the type of record.
Header,01012011 Detail,Joe,Smith,Developer,75000,10012009 Detail,Jane,Doe,Architect,80000,01152008 Detail,Jon,Anderson,Manager,85000,03182007 Trailer,3
The mapping file can now be updated as follows:
<beanio> <stream name="employeeFile" format="csv"> <record name="header" minOccurs="1" maxOccurs="1" class="example.Header"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" format="MMddyyyy" /> </record> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="example.Employee"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> <record name="trailer" minOccurs="1" maxOccurs="1" class="example.Trailer"> <field name="recordType" rid="true" literal="Trailer" /> <field name="recordCount" /> </record> </stream> </beanio>
There are several new record and field attributes introduced in this mapping file, so we'll explain each new attribute in turn.
First, a field used to identify a record must be configured as a record identifier using rid="true". There is no limitation to the number of fields that can be used to identify a record, but all fields where rid="true" must be satisfied before a record is identified. If there is no field configured as a record identifier, by default the record will always match.
<record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
<field name="recordType" rid="true" literal="Header" />
<field name="fileDate" />
</record>
Second, all record identifying fields must have a matching validation rule configured. In our example, the literal value Header in the record type field is used to identify the header record. Literal values must match exactly and can be configured using the literal field attribute. Alternatively, record identifying fields may use a regular expression to match field text using the regex field attribute.
<record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
<field name="recordType" rid="true" literal="Header" />
<field name="fileDate" />
</record>
Third, each record defines the minimum and maximum number of times it may repeat using the attributes minOccurs and maxOccurs. Based on our configuration, exactly one header and trailer record is expected, while the number of detail records is unbounded.
<record name="header" minOccurs="1" maxOccurs="1" class="example.Header">
<field name="recordType" rid="true" literal="Header" />
<field name="fileDate" />
</record>
If minOccurs and/or maxOccurs are not set, the minimum occurrences of a record defaults to 0 and maximum occurrences is unbounded.
Its also possible to identify delimited and fixed length records based on their length. The ridLength record attribute can be used to specify a range of lengths to identify the record.
As explained in the previous section, a stream can support multiple record types. By default, a BeanReader will read records in any order. But if desired, BeanIO can enforce record ordering using an order attribute on each record. The order attribute can be assigned any positive integer value greater than 0. Records that are assigned the same number may be read from the stream in any order. If order is set for one record, it must be set for all other records (and groups) that share the same parent.
In our previous example, if we want enforce that the header record is the first record in the file, the trailer is the last, and all detail records appear in the middle, the mapping file could be changed as follows. Using this configuration, if a detail record were to appear before the header record, the BeanReader will throw an UnexpectedRecordException when the detail record is read out of order.
<beanio> <stream name="employeeFile" format="csv"> <record name="header" order="1" minOccurs="1" maxOccurs="1" class="example.Header"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" format="MMddyyyy" /> </record> <record name="employee" order="2" minOccurs="0" maxOccurs="unbounded" class="example.Employee"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> <record name="trailer" order="3" minOccurs="1" maxOccurs="1" class="example.Trailer"> <field name="recordType" rid="true" literal="Trailer" /> <field name="recordCount" /> </record> </stream> </beanio>
In some cases, a stream may be further divided into batches or groups of records. Continuing with our employee file, lets suppose employee detail records are batched by department, where each group of employees has a department header and a department trailer record. Thus an input file may look something like this:
Header,01012011 DeptHeader,Development Detail,Joe,Smith,Developer,75000,10012009 Detail,Jane,Doe,Architect,80000,01152008 DeptTrailer,2 DeptHeader,Product Management Detail,Jon,Anderson,Manager,85000,03182007 DeptTrailer,1 Trailer,2
BeanIO allows you to define groups of records using a group element to wrap the record types that belong to the group. Groups support the same order, minOccurs, and maxOccurs attributes, although there meaning is applied to the entire group. Once a record type is matched that belongs to a group, all other records in that group where minOccurs is greater that 1, must be read from the stream before the group may repeat or a different record can be read. Our mapping file would now look like this:
<beanio> <stream name="employeeFile" format="csv"> <record name="header" order="1" minOccurs="1" maxOccurs="1" class="example.Header"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" format="MMddyyyy" /> </record> <group name="departmentGroup" order="2" minOccurs="0" maxOccurs"unbounded"> <record name="deptHeader" order="1" minOccurs="1" maxOccurs="1" class="example.DeptHeader"> <field name="recordType" rid="true" literal="DeptHeader" /> <field name="departmentName" /> </record> <record name="employee" order="2" minOccurs="0" maxOccurs="unbounded" class="example.Employee"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> </record> <record name="deptTrailer" order="3" minOccurs="1" maxOccurs="1" class="example.DeptTrailer"> <field name="recordType" rid="true" literal="DeptTrailer" /> <field name="employeeCount" /> </record> </group> <record name="trailer" order="3" minOccurs="1" maxOccurs="1" class="example.Trailer"> <field name="recordType" rid="true" literal="Trailer" /> <field name="departmentCount" /> </record> </stream> </beanio>
The stream definition itself is a record group with defaults minOccurs="0" and maxOccurs="1". If you want your BeanReader to throw an exception if the stream is empty, simply change minOccurs to 1, or if you want to allow the entire stream to repeat indefinitely, simply change maxOccurs to unbounded as shown below.
<beanio> <stream name="employeeFile" format="csv" minOccurs="1" maxOccurs="unbounded"> <!-- Record layout... --> </stream> </beanio>
Default getter and setter methods can be overridden using getter and setter attributes as shown below. If a field is a constructor argument, setter can be set to '#N' where N is the position of the argument in the constructor starting at 1 (not shown).
<beanio>
<stream name="stream1" format="csv">
<record name="record1" class="example.Record">
<field name="firstName" />
<field name="lastName" setter="setSurname" getter="getSurname"/>
<field name="age" />
</record>
</stream>
</beanio>
Fields found in a stream that do not map to a bean property can be declared using the ignore field attribute. Note that any configured validation rules are still applied to ignored fields (not shown).
<beanio>
<stream name="stream1" format="csv">
<record name="record1" class="example.Record">
<field name="firstName" />
<field name="lastName" />
<field name="age" />
<field name="filler" ignore="true" />
</record>
</stream>
</beanio>
By default, BeanIO expects fields to appear in a CSV, delimited or fixed length stream in the same order they are declared in the mapping file. If this is not the case, a position field attribute can be configured for each field. If a position is declared for one field, a position must be declared for all other fields in the same record. For delimited (and CSV) formatted streams, position should be set to the index of the first occurrence of the field in the record, beginning at 0. For fixed length formatted streams, position should be set to the index of the first character of the first occurrence of the field in the record, beginning at 0. A negative position can be used to specify a field location relative to the end of the record. For example, a position of -2 indicates the second to last field in a delimited record.
The following example shows how the position attribute can be used. Although the fields are declared in a different order, the record definition is identical to the previous example. When positions are explicitly configured for an input stream, there is no need to declare all fields in a record, unless desired for validation purposes.
<beanio> <stream name="stream1" format="csv"> <record name="record1" class="example.Record"> <field name="filler" position="3" ignore="true" /> <field name="lastName" position="1" /> <field name="age" position="2"/> <field name="firstName" position="0" /> </record> </stream> </beanio>
The property type of a field is determined by introspecting the bean object the field belongs to. If the bean class is of type java.util.Map or java.util.Collection, BeanIO will assume the field is of type java.lang.String, unless a field type is explicitly declared using a field's type attribute.
The type attribute may be set to any fully qualified class name or to one of the supported type aliases below. Type aliases are not case sensitive, and the same alias may be used for primitive types. For example, int and java.lang.Integer bean properties will use the same type handler registered for the type java.lang.Integer, or alias integer or int.
Class Name | Primitive | Alias(es) |
---|---|---|
java.lang.String | - | string |
java.lang.Boolean | boolean | boolean |
java.lang.Byte | byte | byte |
java.lang.Character | char | character char |
java.lang.Short | short | short |
java.lang.Integer | int | integer int |
java.lang.Long | long | long |
java.lang.Float | float | float |
java.lang.Double | double | double |
java.math.BigInteger | - | biginteger |
java.math.BigDecimal | - | bigdecimal decimal |
java.util.Date1 | - |
datetime date time |
java.util.Calendar2 | - |
calendar calendar-datetime calendar-date calendar-time |
java.util.UUID | - | uuid |
java.net.URL | - | url |
java.lang.Enum3 | - | - |
1 By default, the date alias is used for java.util.Date types that contain date information only, and the time alias is used for java.util.Date types that contain only time information. Only the datetime alias can be used to replace the default type handler for the java.util.Date class.
2 By default, the calendar-date alias is used for java.util.Calendar types that contain date information only, and the calendar-time alias is used for java.util.Date types that contain only time information. Only the calendar-datetime and calendar aliases can be used to replace the default type handler for the java.util.Calendar class.
3 By default, enums are converted using Enum.valueOf(Class, String). If format="toString", the enum will be converted using values computed by calling toString() for each enum value. In either case, conversion is case sensitive. As with other types, a custom type handler can also be used for enums.
Optionally, a format attribute can be used to pass a decimal format for java.lang.Number types, and for passing a date format for java.util.Date types. In the example below, the hireDate field uses the SimpleDateFormat pattern "yyyy-MM-dd", and the salary field uses the DecimalFormat pattern "#,##0". For more information about supported patterns, please reference the API documentation for Java's java.text.DecimalFormat and java.text.SimpleDateFormat classes.
<beanio> <stream name="employeeFile" format="csv"> <record name="header" minOccurs="1" maxOccurs="1" class="map"> <field name="recordType" rid="true" literal="Header" /> <field name="fileDate" type="java.util.Date" /> </record> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="map"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" type="int" format="#,##0" /> <field name="hireDate" type="date" format="yyyy-MM-dd" /> </record> <record name="trailer" minOccurs="1" maxOccurs="1" class="map"> <field name="recordType" rid="true" literal="Trailer" /> <field name="recordCount" type="int" /> </record> </stream> </beanio>
Field type conversion is performed by a type handler. BeanIO includes type handlers for common Java types, or you can create your own type handler by implementing the org.beanio.types.TypeHandler interface shown below. When writing a custom type handler, make sure to handle null values and empty strings. Only one instance of your type handler is created, so if you plan to concurrently read or write multiple streams, make sure your type handler is also thread safe.
package org.beanio.types; public interface TypeHandler { public Object parse(String text) throws TypeConversionException; public String format(Object value); public Class<?> getType(); }
The following example shows a custom type handler for the java.lang.Boolean class and boolean primitive based on "Y" or "N" indicators.
import org.beanio.types.TypeHandler; public class YNTypeHandler implements TypeHandler { public Object parse(String text) throws TypeConversionException { return "Y".equals(text); } public String format(Object value) { return value != null && ((Boolean)value).booleanValue() ? "Y" : "N"; } public Class<?> getType() { return Boolean.class; } }
A type handler may be explicitly named using the name attribute, and/or registered for all fields of a particular type by setting the type attribute. The type attribute can be set to the fully qualified class name or type alias of the class supported by the type handler. To reference a named type handler, use the typeHandler field attribute when configuring the field.
Many default type handlers included with BeanIO support customization through the use of one or more property elements, where the name attribute is a bean property of the type handler, and the value attribute is the property value.
Type handlers can be declared globally (for all streams in the mapping file) or for a specific stream. Globally declared type handlers may optionally use a format attribute to narrow the type handler scope to a specific stream format.
In the example below, the first DateTypeHandler is declared globally for all stream formats. The second DateTypeHandler overrides the first for java.util.Date types in an XML formatted stream, and the YNTypeHandler is declared only for the 'employeeFile' stream. Stream specific type handlers override global type handlers when declared with the same name or for the same type.
<beanio> <typeHandler type="java.util.Date" class="org.beanio.types.DateTypeHandler"> <property name="pattern" value="MMddyyyy" /> <property name="lenient" value="true" /> </typeHandler> <typeHandler type="java.util.Date" format="xml" class="org.beanio.types.DateTypeHandler"> <property name="pattern" value="yyyy-MM-dd" /> </typeHandler> <stream name="employeeFile" format="csv"> <typeHandler name="ynHandler" class="example.YNTypeHandler" /> <record name="employee" minOccurs="0" maxOccurs="unbounded" class="map"> <field name="recordType" rid="true" literal="Detail" /> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" /> <field name="exempt" typeHandler="ynHandler" /> </record> </stream> </beanio>
Repeating fields are also supported by BeanIO. For example, lets assume our Employee bean object contains a list of account numbers.
package example; import java.util.Date; public class Employee { String firstName; String lastName; String title; int salary; Date hireDate; List<Integer> accounts; // getters and setters not shown... }
And lets assume our input file now looks like this:
Joe,Smith,Developer,75000,10012009 Chris,Johnson,Sales,80000,05292006,100012,200034,200045 Jane,Doe,Architect,80000,01152008 Jon,Anderson,Manager,85000,03182007,333001
In this example, the accounts bean property can be defined in the mapping file using a collection field attribute. The collection attribute can be set to the fully qualified class name of a java.util.Collection subclass, or to one of the collection type aliases below.
Class | Alias | Default Implementation |
---|---|---|
java.util.Collection | collection | java.util.ArrayList |
java.util.List | list | java.util.ArrayList |
java.util.Set | set | java.util.HashSet |
(Java Array) | array | N/A |
Repeating fields can declare the number of occurrences of the field using the minOccurs and maxOccurs field attributes. If not declared, minOccurs will default to 1, and maxOccurs will default to the minOccurs value or 1, whichever is greater.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
<field name="accounts" type="int" collection="list" minOccurs="0" maxOccurs="unbounded" />
</record>
</stream>
</beanio>
Flat file formats (CSV, delimited and fixed length) may only contain one field or segment of indeterminate length (i.e. where maxOccurs is greater than minOccurs). The position of components that follow are assumed to be relative to the end of the record.
If a field repeats a fixed number of times based on a preceding field in the same record, the occursRef attribute can be used to identify the name of the controlling field. If the controlling field is not bound to a separate property of its parent bean object, be sure to specify ignore="true". The following mapping file shows how to configure the accounts field occurrences to be dependent on the numberOfAccounts field. If desired, minOccurs and maxOccurs may still be specified to validate the referenced field occurrences value.
<beanio> <stream name="employeeFile" format="csv"> <record name="employee" class="example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> <field name="numberOfAccounts" ignore="true" /> <field name="accounts" type="int" collection="list" occursRef="numberOfAccounts" /> </record> </stream> </beanio>
Note that a repeating field can not be used for record identification.
Fixed length fields require a little extra configuration than their delimited counterparts. Let's redefine our employee file example using the fixed length format below.
Position | Field | Format | Length |
---|---|---|---|
0 | First Name | Text | 10 |
10 | Last Name | Text | 10 |
20 | Job Title | Text | 10 |
30 | Salary | Number | 6 |
36 | Hire Date | Date (MMDDYYYY) | 8 |
A fixed length version of the employee file might look like the following:
Joe Smith Developer 07500010012009 Jane Doe Architect 08000001152008 Jon Anderson Manager 08500003182007
The length of a fixed length field must be configured using the length field attribute. By default, fixed length fields are left justified and padded with spaces, but these settings can be overridden using the padding and justify field attributes. Field padding can be set to any single character, and field justification can be set to left or right. Using these attributes, our mapping file can now be updated as follows:
<beanio> <stream name="employeeFile" format="csv"> <record name="employee" class="example.Employee"> <field name="firstName" length="10" /> <field name="lastName" length="10" /> <field name="title" length="10" /> <field name="salary" length="6" padding="0" justify="right" /> <field name="hireDate" length="8" format="MMddyyyy" /> </record> </stream> </beanio>
The configured padding character is removed from the beginning of the field if right justified, or from the end of the field if left justified, until a character is found that does not match the padding character. If the entire field is padded, Number property types default to the padding character if it is a digit, and the padding character is ignored for Character types. To illustrate this, some examples are shown in the table below.
Justify | Type | Padding | Padded Text | Unpadded Text |
---|---|---|---|---|
left | String | " " | "George " | "George" |
" " | "" | |||
Character | " " | "A" | "A" | |
" " | " " | |||
right | Number | "0" | "00123" | "123" |
"00000" | "0" | |||
"9" | "00000" | "00000" | ||
"99999" | "9" | |||
"X" | "XXXXX" | "" |
The marshalling and unmarshalling behavior of null field values for a padded field is further controlled using the required attribute. If required is set to true, null field values are marshalled by filling the field with the padding character. If required is set to false, a null field value is marshalled as spaces for fixed length streams and an empty string for non-fixed length streams. Similarly, if required is set to false, spaces are unmarshalled to a null field value regardless of the padding character. To illustrate this, the following table shows the field text for a right justified zero padded 3 digit number.
Required | Field Value | Field Text (Fixed Length) |
Field Text (CSV, Delimited, XML) |
---|---|---|---|
true | 0 | "000" | "000" |
null | "000"1 | "000"1 | |
false | 0 | "000" | "000" |
null | " " | "" |
1 Applies to marshalling only. Unmarshalling "000" would produce a field value of 0.
As hinted to above, padding settings can be applied to any field for any stream type.
If a bean property does not map to a field in the stream, a constant property value can still be set using a property element. Like a field, all properties must specify a name attribute, which by default, is used to get and set the property value from the bean object. Properties also require a value attribute for setting the textual representation of the property value. The value text is type converted using the same rules and attributes (type, typeHandler and format) used for field type conversion described above. Collection type properties are not supported.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="map">
<property name="recordType" value="employee" />
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
</record>
</stream>
</beanio>
Constant properties may be useful in two scenarios:
A segment is a group of fields within a record. Segments are most often used to bind a group of fields to a nested bean object or collection of bean objects, and are configured in a mapping file using a segment element.
Prior to release 2.x, the bean element performed this task. A segment supports all the functionality of a bean element, but unlike the original bean element, a segment is not required to be bound to a bean object. This allows repeating segments to be fully validated during unmarshalling, without necessarily binding the fields to a bean object. An "unbound" segment also allows an arbitrary number of XML fields to be wrapped by other XML nodes without creating bean objects that mirror the same hierarchy.
As mentioned, a record can be divided into nested bean objects using a segment element. First, let's suppose we store an address in our CSV employee file, so that the record layout might look like this:
Position | Field | Format |
---|---|---|
0 | First Name | Text |
1 | Last Name | Text |
2 | Job Title | Text |
3 | Salary | Number |
4 | Hire Date | Date (MMDDYYYY) |
5 | Street | Text |
6 | City | Text |
7 | State | Text |
8 | Zip | Text |
Second, lets suppose we want to store address information in a new Address bean object like the one below, and add an Address reference to our Employee class.
package example;
public class Address {
String street;
String city;
String state;
String zip;
// getters and setters not shown...
}
package example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
Address mailingAddress;
// getters and setters not shown...
}
With this information, we can now update our employee CSV mapping file to accomodate the nested Address object. A segment must include a name attribute, and may optionally provide a class attribute to bind its children to a bean object. If class is set, the attribute must be set to the fully qualified class name of the bean object, or to map, or to the class name of any concrete java.util.Map implementation. If the bean class is of type java.util.Map, field values are stored in the Map using the configured field names for keys. By default, the name attribute is used to determine the getter and setter on its parent bean or record. Alternatively, getter or setter attributes can be used to override the default property name similar to a field property.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
<segment name="mailingAddress" class="example.Address">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</segment>
</record>
</stream>
</beanio>
If class is not set, fields will be automatically bound to the segment's parent bean object, which would be the Employee object in the example above.
If needed, segments can be further divided into other segments. There is no limit to the number of nested levels that can be configured in a mapping file.
Similar to repeating fields, BeanIO supports repeating segments, which may be bound to a collection of bean objects. Continuing our previous example, let's suppose the employee CSV file may contain 1 or more addresses for each employee. Thus our Employee bean object might look like this:
package example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
List<Address> addressList;
// getters and setters not shown...
}
And our input file might look like this:
Joe,Smith,Developer,75000,10012009,123 State St,Chicago,IL,60614 Jane,Doe,Architect,80000,01152008,456 Main St,Chicago,IL,60611,111 Michigan Ave,Chicago,IL,60611 Jon,Anderson,Manager,85000,03182007,1212 North Ave,Chicago,IL,60614
In our mapping file, in order to bind a segment to a collection, simply set it's collection attribute to the fully qualified class name of a java.util.Collection or java.util.Map subclass, or to one of the collection type aliases below.
Class | Alias | Default Implementation |
---|---|---|
java.util.Collection | collection | java.util.ArrayList |
java.util.List | list | java.util.ArrayList |
java.util.Set | set | java.util.HashSet |
java.util.Map | map | java.util.LinkedHashMap |
(Java Array) | array | N/A |
Repeating segments can declare the number of occurrences using the minOccurs and maxOccurs attributes. If not declared, minOccurs will default to 1, and maxOccurs will default to the minOccurs value or 1, whichever is greater.
Just like repeating fields, if the number of occurrences of a segment is dependent on a preceding field in the same record, the occursRef attribute can be set to the name of the field that controls the number of occurrences.
Flat file formats (CSV, delimited and fixed length) may only contain one field or segment of indeterminate length (i.e. where maxOccurs is greater than minOccurs). The position of components that follow are assumed to be relative to the end of the record.
<beanio>
<stream name="employeeFile" format="csv">
<record name="employee" class="example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" format="MMddyyyy" />
<segment name="addressList" collection="list" minOccurs="1" maxOccurs="unbounded" class="example.Address">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</segment>
</record>
</stream>
</beanio>
When working with repeating segments, there are a few restrictions to keep in mind:
As noted above, a segment can also be bound to a java.util.Map which provides support for "inline" maps. For example, given the following CSV file of users,
id1,firstName1,lastName1,id2,firstName2,lastName2 jsmith,Joe,Smith,jdoe,Jane,Doe
The following mapping file could be used to create a Map of User objects by ID. The key attribute is used to set the name of a descendant field to use for the Map key.
<beanio> <stream name="employeeFile" format="csv"> <record name="employee" target="userMap"> <segment name="userMap" class="example.User" collection="map" key="id" minOccurs="1" maxOccurs="unbounded"> <field name="id" /> <field name="firstName" /> <field name="lastName" /> </segment> </record> </stream> </beanio>
If a Map of last names by ID is needed instead, simply replace the class attribute with value and specify the name of the descendant field to use for the Map value. In this case, first name is effectively ignored.
<beanio> <stream name="employeeFile" format="csv"> <record name="employee" target="userMap"> <segment name="userMap" collection="map" key="id" value="lastName" minOccurs="1" maxOccurs="unbounded"> <field name="id" /> <field name="firstName" /> <field name="lastName" /> </segment> </record> </stream> </beanio>
A BeanReader will throw an InvalidRecordException if a record or one of its fields fails a configured validation rule. There are two types of errors reported for an invalid record: record level errors and field level errors. If a record level error occurs, further processing of the record is aborted and an excception is immediately thrown. If a field level error is reported, the BeanReader will continue to process the record's other fields before throwing an exception.
When an InvalidRecordException is thrown, the exception will contain the reported record and field level errors. The following code shows how this information can be accessed using the RecordContext.
BeanReader in; try { Object record = in.read(); if (record != null) { // process record... } } catch (InvalidRecordException ex) { RecordContext context = ex.getRecordContext(); if (context.hasRecordErrors()) { for (String error : context.getRecordErrors()) { // handle record errors... } } if (context.hasFieldErrors()) { for (String field : context.getFieldErrors().keySet()) { for (String error : context.getFieldErrors(field)) { // handle field error... } } } } }
Alternatively, it may be simpler to register a BeanReaderErrorHandler for handling non-fatal exceptions. The example below shows how invalid records could be written to a reject file by extending BeanReaderErrorHandlerSupport. (Note that the example assumes the mapping file does not bind a record group to a bean object.)
BeanReader input;
BufferedWriter rejects;
try {
input.setErrorHandler(new BeanReaderErrorHandlerSupport() {
public void invalidRecord(InvalidRecordException ex) throws Exception {
rejects.write(ex.getRecordContext().getRecordText());
rejects.newLine();
}
});
Object record = null;
while ((record = input.read()) != null) {
// process a valid record
}
rejects.flush();
}
finally {
input.close();
rejects.close();
}
Record and field level error messages can be customized and localized through the use of resource bundles. A resource bundle is configured at the stream level using the resourceBundle attribute as shown below.
<beanio>
<typeHandler type="java.util.Date" class="org.beanio.types.DateTypeHandler">
<property name="pattern" value="MMddyyyy" />
</typeHandler>
<stream name="employeeFile" format="csv" resourceBundle="example.messages" >
<record name="employee" class="map">
<field name="recordType" rid="true" literal="Detail" />
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" />
</record>
</stream>
</beanio>
Record level error messages are retrieved using the following prioritized list of keys. If a message is not configured under the name of the first key, the next key will be tried until a message is found, or a default message is used.
Similarly, field level error messages are retrieved using the following priortized list of keys:
More descriptive or localized labels can be configured for record and field names using the keys label.[record name] and label.[record name].[field name] respectively.
For example, the following resource bundle could be used to customize a few error messages for the employee file.
# 'employee' record label: label.employee = Employee Record # 'firstName' field label: label.employee.firstName = First Name Field # Unidentified record error message: recorderror.unidentified = Unidentified record at line {0} # Type conversion error message for the 'hireDate' field: fielderror.employee.hireDate.type = Invalid date format # Maximum field length error message for all fields: fielderror.maxLength = Maximum field length exceeded for {3}
Error messages are formatted using a java.text.MessageFormat. Depending on the validation rule that was violated, different parameters are passed to the MessageFormat. Appendix B documents the parameters passed to the MessageFormat for each validation rule.
The following record level validation rules may be configured on a record element.
Attribute | Argument Type | Description |
---|---|---|
minLength | Integer | Validates the record contains at least minLength fields for delimited and CSV formatted streams, or has at least minLength characters for fixed length formatted streams. |
maxLength | Integer | Validates the record contains at most maxLength fields for delimited and CSV formatted streams, or has at most maxLength characters for fixed length formatted streams. |
BeanIO supports several common field validation rules when reading an input stream. All field validation rules are validated against the field text before type conversion. When field trimming is enabled, trim="true", all validations are performed after the field's text has first been trimmed. Field validations are ignored when writing to an output stream.
The following table lists supported field attributes for validation.
Attribute | Argument Type | Description |
---|---|---|
required | Boolean | When set to true, validates the field is present and the field text is not the empty string. |
minLength | Integer | Validates the field text is at least N characters. |
maxLength | Integer | Validates the field text does not exceed N characters. |
literal | String | Validates the field text exactly matches the literal value. |
regex | String | Validates the field text matches the given regular expression pattern. |
minOccurs | String | Validates the minimum occurrences of the field in a stream. If the field is present in the stream, minOccurs is satisfied, and the required setting determines whether a value is required. |
When a common set of fields is used by multiple record types, configuration may be simplified using templates. A template is a reusable list of components (segments, fields, and properties/constants) that can be included by a record, segment or other template. The following example illustrates some of the ways a template can be used:
<beanio> <template name="address"> <field name="street1" /> <field name="street2" /> <field name="city" /> <field name="state" /> <field name="zip" /> </template> <template name="employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" format="MMddyyyy" /> <segment name="mailingAddress" template="address" class="example.Address" /> </template> <stream name="employeeFile" format="csv"> <record name="employee" template="employee" class="example.Employee" /> </stream> <stream name="addressFile" format="csv"> <record name="address" class="example.Address"> <field name="location" /> <include template="address"/> <field name="attention" /> </record> </stream> </beanio>
Templates are essentially copied into their destination using the include element. For convenience, record and segment elements support a template attribute which includes the template before any other children.
The include element can optionally specify a positional offset for included fields using the offset attribute. The following example illustrates this behavior. Even when using templates, remember that position must be declared for all fields or none.
<beanio>
<template name="address">
<field name="street1" position="0" />
<field name="street2" position="1" />
<field name="city" position="2" />
<field name="state" position="3" />
<field name="zip" position="4" />
</template>
<stream name="addressFile" format="csv">
<record name="address" class="example.Address">
<field name="location" position="0" />
<include template="address" offset="1"/>
<field name="attention" position="6" />
</record>
</stream>
</beanio>
Since release 2.0, BeanIO supports the binding of multiple consecutive records to a single bean object. This can be achieved by assigning a bean class to a stream or group containing the record configurations bound to the bean.
Let's suppose we are reading a CSV input file of orders that contains an order, followed by the customer that placed the order, followed by a detailed list of items that make up the order. A sample input file might look like this:
Order,101,2012-02-01,5.00 Customer,John,Smith Item,Apple,2,2.00 Item,Orange,1,1.00 Order,102,2012-02-01,3.00 Customer,Jane,Johnson Item,Ham,1,3.00
Let's then suppose we want to read the following Order class from the stream, which contains a reference to Customer and Item classes. (For brevity, getters and setters are not shown.)
package example; import java.util.Date; public class Order { String id; Date date; BigDecimal amount; Customer customer; List<Item> items; } public class Customer { String firstName; String lastName; } public class Item { String name; int quantity; BigDecimal amount; }
Now to read and write Order objects from our example stream, the following mapping file can be used:
<beanio> <stream name="orders" format="csv"> <group name="order" class="example.Order" minOccurs="0" maxOccurs="unbounded"> <record name="orderRecord" order="1" minOccurs="1"> <field name="recordType" rid="true" literal="Order" ignore="true" /> <field name="id" /> <field name="date" format="yyyy-MM-dd" /> <field name="amount" /> </record> <record name="customer" class="example.Customer" order="2" minOccurs="1" maxOccurs="1"> <field name="recordType" rid="true" literal="Customer" ignore="true" /> <field name="firstName" /> <field name="lastName" /> </record> <record name="items" class="example.Item" collection="list" order="3" minOccurs="1" maxOccurs="unbounded"> <field name="recordType" rid="true" literal="Item" ignore="true" /> <field name="name" /> <field name="quantity" /> <field name="amount" /> </record> </group> </stream> </beanio>
By configuring a class on a group component, BeanIO will automatically marshal or unmarshal all of the group's descendants in a single call to read or write from the stream. Also note that by not configuring a class on a record, in this case our "orderRecord", the fields are instead set on the bean class assigned to it's parent group. Finally, repeating records can be aggregated into a collection using a collection attribute at the record level, as used for the "items" record. If necessary, getter and setter attributes can be configured on a record component as well.
If any record included in a group bound to a bean object is invalid, an InvalidRecordException is thrown, but only after reading all the other records in the group. In such cases, the InvalidRecordException will contain RecordContext objects for every record in the group read from the stream. If multiple records in the group are invalid, only one InvalidRecordException is thrown.
If a malformed or unidentified record is read from the stream while unmarsahalling a record group, an exception is immediately thrown, and the BeanReader will most likely not be able to recover. For this reason, when unmarshalling untrusted sources, it is recommended that you read the stream twice, using the first pass to validate the integrity of the file including syntax, record identification, record ordering, possible header/trailer counts, etc. For example, the following mapping file might be used to validate our orders file.
<beanio> <stream name="orders-validation" format="csv"> <group name="order" minOccurs="0" maxOccurs="unbounded"> <record name="orderRecord" order="1" minOccurs="1"> <field name="recordType" rid="true" literal="Order" ignore="true" /> </record> <record name="customer" order="2" minOccurs="1"> <field name="recordType" rid="true" literal="Customer" ignore="true" /> </record> <record name="items" order="3" minOccurs="1" maxOccurs="unbounded"> <field name="recordType" rid="true" literal="Item" ignore="true" /> </record> </group> </stream> </beanio>
In this case, we are validating syntax, record ordering and record identification for the entire file in a single call to beanReader.read(), while leaving other record and field level validations for unmarshalling, which can be caught and handled without worrying whether the BeanReader will be able to recover.
This section provides further details for using BeanIO to marshall and unmarshall Java objects to and from XML formatted streams. This section assumes you are already familiar with the mapping file concepts documented in previous sections.
BeanIO is similar to other OXM (Object to XML Mapping) libraries, except that it is also capable of marshalling and unmarshalling extremely large XML files by reading and writing Java beans one record at a time. BeanIO uses a streaming XML (StAX) parser to read and write XML, and will never hold more than the minimum amount of XML in memory needed to marshall or unmarshall a single bean object. That said, it is still possible to run out of memory (heap space) with poorly designed XML documents and/or misconfigured mapping files.
Before diving into the details, let's start with a basic example using the employee input file from Section 2.1 after it's been converted to XML (shown below).
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> </employee> <employee> <firstName>Jane</firstName> <lastName>Doe</lastName> <title>Architect</title> <salary>80000</salary> <hireDate>2008-01-15</hireDate> </employee> <employee> <firstName>Jon</firstName> <lastName>Andersen</lastName> <title>Manager</title> <salary>85000</salary> <hireDate>2007-03-18</hireDate> </employee> </employeeFile>
In this example, let's suppose we are unmarshalling the XML employee file into the same Employee bean object from Section 2.1 and repeated below.
package example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
// getters and setters not shown...
}
Our original mapping file from Section 2.1 can now be updated to parse XML instead of CSV with only two minor changes. First, the stream format is changed to xml. And second, the hire date field format is removed and replaced with type="date". With XML, the date format does not need to be explicity declared because it conforms to the W3C XML Schema date syntax. (This will be further explained in Section 5.7.1).
<beanio xmlns="http://www.beanio.org/2012/03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd"> <stream name="employeeFile" format="xml"> <record name="employee" class="example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" type="date" /> </record> </stream> </beanio>
That's it! No Java code changes are required, and as before, Employee bean objects will be unmarshalled from the XML input stream each time beanReader.read() is called.
And also as before, Employee objects can be marshalled to an XML output stream using beanWriter.write(Object). However, please note that when marshalling/writing XML, it is even more important to call beanWriter.close() so that the XML document can be properly completed.
Because BeanIO is built like a pull parser, it does not support XML validation against a DTD or XML schema. Where this functionality is needed, it is recommended to make two passes on the input document. The first pass can use a SAX parser or other means to validate the XML, and the second pass can use BeanIO to parse and process bean objects read from the document.
Each BeanIO mapping component (stream, group, record, segment and field), is mapped to an XML element with the same local name. If the name of the stream, group, etc. does not match the XML element name, the xmlName attribute can be used. For example, if the name of the root element in the previous example's employee file is changed from "employeeFile" to "employees", and "title" was renamed "position", the mapping file could be updated as shown below.
<beanio> <stream name="employeeFile" format="xml" xmlName="employees"> <record name="employee" class="example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" xmlName="position" /> <field name="salary" /> <field name="hireDate" type="date" /> </record> </stream> </beanio>
XML namespaces can be enabled through the use of the xmlNamespace attribute on any mapping component (stream, group, record, segment or field). By default, all mapping elements inherit their namespace (or lack thereof) from their parent. When a namespace is declared, the local name and namespace must match when unmarshalling XML, and appropriate namespace declarations are included when marshalling bean objects. For example, let's suppose our employee file contains namespaces as shown below.
<?xml version="1.0"?> <employeeFile xmlns="http://example.com/employeeFile" xmlns:n="http://example.com/name"> <e:employee xmlns:e="http://example.com/employee"> <n:firstName>Joe</n:firstName> <n:lastName>Smith</n:lastName> <e:title>Developer</e:title> <e:salary>75000</e:salary> <e:hireDate>2009-10-12</e:hireDate> </e:employee> . . . </employeeFile>
To unmarshall the file using namespaces, and to marshall Employee bean objects in the same fashion as they appear above, the following mapping file can be used.
<beanio> <stream name="employeeFile" format="xml" xmlNamespace="http://example.com/employeeFile"> <parser> <property name="namespaces" value="n http://example.com/name"/> </parser> <record name="employee" class="example.Employee" xmlNamespace="http://example.com/employee" xmlPrefix="e"> <field name="firstName" xmlNamespace="http://example.com/name" /> <field name="lastName" xmlNamespace="http://example.com/name" /> <field name="title" /> <field name="salary" /> <field name="hireDate" type="date" /> </record> </stream> </beanio>
From this example, the following behavior can be observed:
BeanIO also supports a special wildcard namespace. If xmlNamespace is set to '*', any namespace is allowed when unmarshalling XML, and no namespace declaration will be made when marshalling XML.
The following table summarizes namespace configuration options and their effect on the configured element and a child that inherits it's parent namespace.
Mapping Configuration | Marshalled Element And Child |
---|---|
[None] |
<element> <child/> </element> |
xmlNamespace="*" |
<element> <child/> </element> |
xmlNamespace="" |
<element xmlns=""> <child/> </element> |
xmlNamespace="http://example.com" |
<element xmlns="http://example.com"> <child/> </element> |
xmlNamespace="http://example.com" xmlPrefix="e" |
<e:element xmlns="http://example.com"> <e:child/> </e:element> |
When unmarshalling multiple records from an XML document, the stream configuration is mapped to the root element in the XML formatted stream. This default behavior has been demonstrated in previous examples. If on the other hand, an XML document contains only a single record, the document can be fully read or written by setting the stream configuration's xmlType attribute to none. This behavior is similar to other OXM libraries that marshall or unmarshall one bean object per XML document.
For example, if BeanIO was used to unmarshall a single employee record submitted via a web service, the XML document might look like the following. Notice there is no 'employeeFile' root element for containing multiple employee records.
<employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> </employee>
In this example, the following highlighted changes can be made to our mapping file to allow BeanIO to unmarshall/marshall a single employee record.
<beanio>
<stream name="employeeFile" format="xml" xmlType="none">
<record name="employee" class="example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
</record>
</stream>
</beanio>
Like other mapping elements, groups are also mapped to XML elements by default. Or if a group is used only for control purposes, the group's xmlType attribute can be set to none.
A record is always mapped to an XML element. As we've seen before, records are matched based on their group context and configured record identifying fields. XML records are further matched using their XML element name, as defined by xmlName, or if not present, name. Other than configured record identifying fields, segment and field names declared within the record are not used to identify records.
For example, let's suppose our employee file differentiated managers using 'manager' tags.
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> </employee> <employee> <firstName>Jane</firstName> <lastName>Doe</lastName> <title>Architect</title> <salary>80000</salary> <hireDate>2008-01-15</hireDate> </employee> <manager> <firstName>Jon</firstName> <lastName>Andersen</lastName> <title>Manager</title> <salary>85000</salary> <hireDate>2007-03-18</hireDate> </manager> </employeeFile>
To bind managers to a new Manager bean we could use the following mapping configuration.
<beanio>
<stream name="employeeFile" format="xml">
<record name="employee" class="example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
</record>
<record name="manager" class="example.Manager">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
</record>
</stream>
</beanio>
A field is mapped to XML using the field's xmlType attribute, which defaults to element. The field XML type can be set to element, attribute, text, or none. The following table illustrates possible configurations, except for none which is not covered here.
Record Definition | Sample Record |
---|---|
<record name="person" class="map">
<field name="name" xmlType="element"/>
</person>
|
<person> <name>John</name> </person> |
<record name="person" class="map">
<field name="name" xmlType="attribute"/>
</person>
|
<person name="John"/> |
<record name="person" class="map">
<field name="name" xmlType="text"/>
</person>
|
<person>John</person> |
Field type conversion works the same way for XML formatted streams as it does for other formats. However, several default type handlers are overridden specifically for XML formatted streams to conform with W3C XML Schema built-in data types according to this specification. The following table summarizes overriden type handlers:
Class or Type Alias | XML Schema Data Type | Example |
---|---|---|
date calendar-date |
date | 2011-01-01 |
datetime java.util.Date calendar calendar-datetime java.util.Calendar |
dateTime | 2011-01-01T15:14:13 |
time calendar-time |
time | 15:14:13 |
boolean | boolean | true |
Like other type handlers, XML specific type handlers can be customized or completely replaced. Please consult BeanIO javadocs for customization details.
The nillable and minOccurs field attributes control how a null field value is marshalled. If minOccurs is 0, an element or attribute is not marshalled for the field. If an element type field has nillable set to true and minOccurs set to 1, the W3C XML Schema Instance attribute nil is set to true.
This behavior is illustrated in the following table.
Field Type | Record Definition | Marshalled Record (Field Value is Null) |
---|---|---|
element |
<record name="person" class="map"> <field name="name" /> </person> |
<person> <name/> </person> |
<record name="person" class="map">
<field name="name" minOccurs="0" />
</person>
|
<person/> |
|
<record name="person" class="map">
<field name="name" nillable="true"/>
</person>
|
<person> <name xsi:nil="true"/> </person> |
|
attribute |
<record name="person" class="map">
<field name="name" xmlType="attribute"/>
</person>
|
<person/> |
<record name="person" class="map">
<field name="name" xmlType="attribute" minOccurs="1"/>
</person>
|
<person name=""/> |
|
text |
<record name="person" class="map">
<field name="name" xmlType="text"/>
</person>
|
<person/> |
A segment can be used to bind a group of fields to a nested bean object, or to wrap a field or group of fields under an XML element.
Segments can be used to bind a group of fields to a bean object. The xmlType assigned to the segment determines the format of the XML. Possible values are element (default) and none. The difference can be explored using the Address and Employee beans defined in Section 4.4 and repeated here.
package example;
public class Address {
String street;
String city;
String state;
String zip;
// getters and setters not shown...
}
package example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
Address mailingAddress;
// getters and setters not shown...
}
By default, a segment's xmlType is set to element, so it is not necessary to declare it in the mapping file below.
<beanio>
<stream name="employeeFile" format="xml">
<record name="employee" class="example.Employee">
<field name="firstName" />
<field name="lastName" />
<field name="title" />
<field name="salary" />
<field name="hireDate" type="date" />
<segment name="mailingAddress" class="example.Address" xmlType="element">
<field name="street" />
<field name="city" />
<field name="state" />
<field name="zip" />
</segment>
</record>
</stream>
</beanio>
This mapping configuration can be used to process the sample XML document below. When a segment is mapped to an XML element, nillable and minOccurs will control the marshalling behavior of null bean objects in the same fashion as a field (see Section 5.7.2).
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> <mailingAddress> <street>123 Main Street</street> <city>Chicago</city> <state>IL</state> <zip>12345</zip> </mailingAddress> </employee> . . . </employeeFile>
Alternatively, if the segment's xmlType is set to none, the following XML document can be processed.
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> <street>123 Main Street</street> <city>Chicago</city> <state>IL</state> <zip>12345</zip> </employee> . . . </employeeFile>
In some cases, an XML document may contain extraneous elements that do not map directly to a bean object or property value. In these cases, a segment (without a class attribute) can be used to wrap a field or group of fields.
Extending the previous example, let's suppose the Employee bean object is modified to hold a list of addresses.
package example;
import java.util.Date;
public class Employee {
String firstName;
String lastName;
String title;
int salary;
Date hireDate;
List<Address> addressList;
// getters and setters not shown...
}
And let's further suppose that each employee's list of addresses is enclosed in a new element called addresses.
<?xml version="1.0"?> <employeeFile> <employee> <firstName>Joe</firstName> <lastName>Smith</lastName> <title>Developer</title> <salary>75000</salary> <hireDate>2009-10-12</hireDate> <addresses> <mailingAddress> <street>123 Main Street</street> <city>Chicago</city> <state>IL</state> <zip>12345</zip> </mailingAddress> </addresses> </employee> . . . </employeeFile>
The mapping file can now be updated as follows:
<beanio> <stream name="employeeFile" format="xml"> <record name="employee" class="example.Employee"> <field name="firstName" /> <field name="lastName" /> <field name="title" /> <field name="salary" /> <field name="hireDate" type="date" /> <segment name="addresses"> <segment name="mailingAddress" class="example.Address" collection="list" minOccurs="0" maxOccurs="unbounded"> <field name="street" /> <field name="city" /> <field name="state" /> <field name="zip" /> </segment> </segment> </record> </stream> </beanio>
The following table illustrates various effects using a segment based on the xmlType of a field, and the effect of minOccurs and nillable when marshalling null field values.
Field Mapping | Non-Null Field Value | Null Field Value |
---|---|---|
<segment name="wrapper"> <field name="field" /> </segment> |
<wrapper> <field>value</field> </wrapper> |
<wrapper> <field/> </wrapper> |
<segment name="wrapper" minOccurs="0">
<field name="field" />
</segment>
|
- |
|
<segment name="wrapper" nillable="true">
<field name="field" />
</segment>
|
<wrapper xsi:nil="true"/> |
|
<segment name="wrapper">
<field name="field" nillable="true" />
</segment>
|
<wrapper> <field xsi:nil="true"/> </wrapper> |
|
<segment name="wrapper">
<field name="field" minOccurs="0"/>
</segment>
|
<wrapper/> |
|
<segment name="wrapper"> <field name="field" xmlType="attribute" /> </segment> |
<wrapper field="value"/> |
<wrapper/> |
<segment name="wrapper">
<field name="field" xmlType="attribute" minOccurs="1" />
</segment>
|
<wrapper field=""/> |
|
<segment name="wrapper" minOccurs="0">
<field name="field" xmlType="attribute" minOccurs="1" />
</segment>
|
- |
|
<segment name="wrapper"> <field name="field" xmlType="text" /> </segment> |
<wrapper>value</wrapper> |
<wrapper/> |
<segment name="wrapper" nillable="true">
<field name="field" xmlType="text" />
</segment>
|
<wrapper xsi:nil="true"/> |
|
<segment name="wrapper" minOccurs="0">
<field name="field" xmlType="text"/>
</segment>
|
- |
Similarly, a segment can be used to wrap a repeating field as illustrated below.
Field Mapping | Collection | Null or Empty Collection |
---|---|---|
<segment name="wrapper">
<field name="field" collection="list"
minOccurs="0" maxOccurs="10" />
</segment name="wrapper">
|
<wrapper> <field>value1</field> <field>value2</field> </wrapper> |
<wrapper /> |
<segment name="wrapper">
<field name="field" collection="list"
minOccurs="1" maxOccurs="10" />
</wrapper>
|
<wrapper> <field/> </wrapper> |
|
<segment name="wrapper" minOccurs="0">
<field name="field" collection="list"
minOccurs="1" maxOccurs="10" />
</wrapper>
|
- |
|
<segment name="wrapper" nillable="true">
<field name="field" collection="list"
minOccurs="1" maxOccurs="10" />
</wrapper>
|
<wrapper xsi:nil="true"/> |
Since release 2.1, BeanIO includes support for Java annotations and a stream builder API.
The stream builder API can be used to programatically create a stream mapping without the need for a mapping file.
StreamFactory factory = StreamFactory.newInstance(); // create a new StreamBuilder and define its layout StreamBuilder builder = new StreamBuilder("employeeFile") .format("delimited") .parser(new DelimitedParserBuilder(',')) .addRecord(new RecordBuilder("employee") .type(Employee.class) .minOccurs(1) .addField(new FieldBuilder("type").rid().literal("EMP").ignore()) .addField(new FieldBuilder("recordType").rid( .addField(new FieldBuilder("firstName")) .addField(new FieldBuilder("lastName")) .addField(new FieldBuilder("title")) .addField(new FieldBuilder("salary")) .addField(new FieldBuilder("hireDate").format("MMddyyyy"))); // pass the StreamBuilder to the factory factory.define(builder); BeanReader in = factory.createReader("employeeFile", new File("employee.csv")); // etc...
Like in a mapping file, components are assumed to be ordered as they are added to their parent, unless at (for fields) or order (for records and groups) is explicitly set. For more information, refer to the Javadocs for the org.beanio.builder package.
Java classes can also be annotated to augment the use of the stream builder API or mapping file. Classes may be annotated with @Record, and class attributes or getter/setter methods may be annotated with @Field. Any component annotated with @Record or @Segment may be further annotated using @Fields to include fields not bound to a Java property.
Continuing our previous example, the example.Employee class could be annotated like so:
@Record(minOccurs=1} @Fields({ @Field(name="type", at=0, rid=true, literal="EMP") }) public class Employee { @Field(at=1) private String firstName; @Field(at=2) private String lastName; @Field(at=3) private String title; @Field(at=4) private String salary, @Field(at=5, format="MMddyyyy") private Date hireDate; // getters and setters... }
Using an annotated Employee class, the stream builder example can be greatly simplified:
StreamFactory factory = StreamFactory.newInstance(); StreamBuilder builder = new StreamBuilder("employeeFile") .format("delimited") .parser(new DelimitedParserBuilder(',')) .addRecord(Employee.class); factory.define(builder);
As can a mapping file:
<beanio> <stream name="employeeFile" format="delimited"> <parser> <property name="delimiter" value="," /> </parser> <record name="employee" class="example.Employee" /> </stream> </beanio>
When using annotations, it is strongly recommended to explicitly set the position (using at) for all fields and segments. BeanIO does not guarrantee the order in which annotated components are added to a layout.
Annotation settings are generally named according to their mapping file counterparts and follow the same convention as well. Refer to Appendix A for a complete explanation of all settings.
Where used, annotated records can not be overridden by mapping file components. Configuration settings other than class and descendent components will be ignored.
As of release 1.2, BeanIO can be used to read and write flat files with Spring Batch (2.1.x), a batch processing framework by SpringSource.
The class org.beanio.spring.BeanIOFlatFileItemReader implements Spring Batch's ItemReader interface and can be used to read flat files using a BeanIO stream mapping file. The following Spring bean definition shows a BeanIO item reader configuration that loads a BeanIO mapping file called 'mapping.xml' from the classpath to read a file called 'in.txt'. The location of the mapping file is set using the streamMapping property, and the name of the stream layout is specified using the streamName property.
<bean id="itemReader" class="org.beanio.spring.BeanIOFlatFileItemReader"> <property name="streamMapping" value="classpath:/mapping.xml" /> <property name="streamName" value="stream" /> <property name="resource" value="file:in.txt" /> </bean>
Similarly, the class org.beanio.spring.BeanIOFlatFileItemWriter implements Spring Batch's ItemWriter interface and can be used to write flat files using a BeanIO stream mapping file. The following Spring bean definition shows a BeanIO item writer configuration that loads a BeanIO mapping file called 'mapping.xml' from the classpath to write a file called 'out.txt'.
<bean id="itemWriter" class="org.beanio.spring.BeanIOFlatFileItemWriter"> <property name="streamMapping" value="classpath:/mapping.xml" /> <property name="streamName" value="stream" /> <property name="resource" value="file:out.txt" /> </bean>
BeanIO item readers and writers are restartable, and support many of the same properties supported by the flat file item reader and writer included with Spring Batch. Please refer to their API documentation for details.
By default, a BeanIO item reader/writer creates its own stream factory, but in cases where this could cause one or more mapping files to be loaded multiple times, it may be preferable to create a shared stream factory instance. BeanIO's org.beanio.spring.BeanIOStreamFactory class can be used to create a shared stream factory that can be injected into BeanIO item readers and writers. The following Spring beans configuration file illustrates this:
<beans> <bean id="streamFactory" class="org.beanio.spring.BeanIOStreamFactory"> <property name="streamMappings"> <list> <value>classpath:/mapping1.xml</value> <value>file:/mapping2.xml</value> </list> </property> </bean> <bean id="itemReader" class="org.beanio.spring.BeanIOFlatFileItemReader"> <property name="streamFactory" ref="streamFactory" /> <property name="streamName" value="stream" /> <property name="resource" value="file:in.txt" /> </bean> </beans>
In some cases, BeanIO behavior can be controlled by setting optional property values. Properties can be set using System properties or a property file. BeanIO will load configuration setting in the following order of priority:
The name and location of beanio.properties can be overridden using the System property org.beanio.configuration. In the following example, configuration settings will be loaded from the file named config/settings.properties, first relative to the application's working directory, and if not found, then from the root of the application's classpath.
java -Dorg.beanio.configuration=config/settings.properties example.Main
The following configuration settings are supported by BeanIO:
Property | Description | Default |
---|---|---|
org.beanio.allowProtectedAccess | Whether private and protected class variables and constructors can be accessed (i.e. make accessible using the reflection API). | true |
org.beanio.lazyIfEmpty | Whether objects are lazily instantiated if String properties are empty (and not just null). | true |
org.beanio.errorIfNullPrimitive | Whether null field values will cause an exception if bound to a primitive property. | false |
org.beanio.useDefaultIfMissing | Whether default values apply to fields missing from the stream. | true |
org.beanio.propertyEscapingEnabled | Whether property values (for typeHandler, reader and writer elements) support escape patterns for line feeds, carriage returns, tabs, etc. Set to true or false. | true |
org.beanio.nullEscapingEnabled | Whether the null character can be escaped using \0 when property escaping is enabled. Set to true or false. | true |
org.beanio.marshalDefaultEnabled | Whether a configured field default is marshalled for null property values. May be disabled for backwards compatibility by setting the value to false. | true |
org.beanio.defaultTypeHandlerLocale | Sets the default type handler locale. | Locale.getDefault() |
org.beanio.defaultDateFormat | Sets the default SimpleDateFormat pattern for date and calendar-date type fields in CSV, delimited and fixed length file formats. | DateFormat. getDateInstance() |
org.beanio.defaultDateTimeFormat | Sets the default SimpleDateFormat pattern for datetime, calendar-datetime and calendar type fields in CSV, delimited and fixed length file formats.. | DateFormat. getDateTimeInstance() |
org.beanio.defaultTimeFormat | Sets the default SimpleDateFormat pattern for time and calendar-time type fields in CSV, delimited and fixed length file formats.. | DateFormat. getTimeInstance() |
org.beanio.group.minOccurs | Sets the default minOccurs for a group. | 0 |
org.beanio.record.minOccurs | Sets the default minOccurs for a record. | 0 |
org.beanio.field.minOccurs.[format] | Sets the default minOccurs for a field by stream format. | 1 |
org.beanio.propertyAccessorFactory | Sets the method of property invocation to use. Defaults to reflection, but may be set to asm to use ASM bytecode generation for setting bean properties. (In my limited benchmarking, this can improve marshalling and unmarshalling performance by up to 20% in a 1.5 JVM. Modern JVM's (1.6+) will likely see little to no performance improvement. I strongly recommend conducting your own benchmarking tests before changing this setting.) | reflection |
org.beanio.xml.defaultXmlType | Sets the default XML type for a field in an XML formatted stream. May be set to element or attribute. | element |
org.beanio.xml.xsiNamespacePrefix | Sets the default prefix for the namespace http://www.w3.org/2001/XMLSchema-instance. | xsi |
org.beanio.xml.sorted | Whether XML fields are sorted by position (if assigned). | true |
Appendix A is the complete reference for the BeanIO 2.x XML mapping file schema. The root element of a mapping file is beanio with namespace http://www.beanio.org/2012/03. The following notation is used to indicate the allowed number of child elements:
Where noted, some attributes can be configured using a range notation. A range is expressed using the following syntax, where N and M are integer values:
N | Upper and lower boundaries are set to N. |
N-M | Lower boundery is set to N. Upper boundary is set to M. |
N+ | Lower boundary is set to N. No upper boundary. |
The beanio element is the root element for a BeanIO mapping file.
Children: property*, import*, typeHandler*, template*, stream*
The import element is used to import type handlers, templates and streams from an external mapping file. Streams declared in a mapping file being imported are not affected by global type handlers or templates declared in the file that imported it.
Attributes:
Attribute | Description | Required |
---|---|---|
resource | The name of the resource to import.
The resource name must be qualified with 'classpath:' to load the resource from the classpath, or with 'file:' to load the file relative to the application's working directory. |
Yes |
A typeHandler element is used to declare a custom field type handler that implements the org.beanio.types.TypeHandler interface. A type handler can be registered for a specific Java type, or registered for a Java type and stream format combination, or explicitly named.
Attributes:
Attribute | Description | Required |
---|---|---|
name | The type handler name. A field can always reference a type
handler by name, even if the stream format does not match the
configured type handler format attribute.
When configured, the name of a globally declared type handler must be unique within a mapping and any imported mapping files. |
One of name or type is required. |
type | The fully qualified classname or type alias to register the type handler for. If format is also set, the type handler will only be used by streams that match the configured format. | One of name or type is required. |
class | The fully qualified classname of the TypeHandler implementation. | Yes |
format | When used in conjunction with the type attribute, a type handler can be registered for a specific stream format. Set to xml, csv, delimited, or fixedlength. If not set, the type handler may be used by any stream format. | No |
Children: property*
A property element has several uses.
Attribute | Description | Required |
---|---|---|
name | The property name. | Yes |
value | The property value.
When used to customize a typeHandler or parser, default type handlers only are used to convert property text to an object value. String and Character type property values can use the following escape sequences: \\ (Backslash), \n (Line Feed), \r (Carriage Return), \t (Tab), \0 (Null) and \f (Form Feed). A backslash preceding any other character is ignored. |
Yes |
A property element, when used as child of a record or segment element, can be used to set constant values on a record or bean object that do not map to a field in the input or output stream. The following additional attributes are accepted in this scenario:
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
getter | The getter method used to retrieve the property value from its parent bean class. By default, the getter method is determined through introspection using the property name. | No | * |
setter | The setter method used to set the property value on its parent bean class. By default, the setter method is determined through introspection using the property name. | No | * |
rid | Record identifier indicator for marshalling/writing only. Set to true if this property is used to identify the record mapping configuration used to marshall a bean object. More than one property or field can be used for identification. Defaults to false. | No | * |
type | The fully qualified class name or type alias of the property value. By default, BeanIO will derive the property type from the bean class. This attribute can be used to override the default or may be required if the bean class is of type Map. | No | * |
typeHandler | The name of the type handler to use for type conversion. By default, BeanIO will select a type handler based on type when set, or through introspection of the property's parent bean class. | No | * |
format | The decimal format pattern for Number type properties, or the simple
date format pattern for Date type properties.
The format value can accessed by any custom type handler that implements ConfigurableTypeHandler. |
No | * |
The template element is used to create reusable lists of bean properties.
Note that templates are "expanded" at the time they are included. This means an imported template that relies on property substitution will use property values from the mapping file that included it and not the mapping file where the template was declared.
Attributes:
Attribute | Description | Required |
---|---|---|
name | The name of the template. Template names must be unique within a mapping file and any imported mapping files. | Yes |
Children: ( field | property | segment | include )*
The include element is used to include a template in a record, segment, or another template.
Attributes:
Attribute | Description | Required |
---|---|---|
template | The name of the template to include. | Yes |
offset | The offset added to field positions included by the template. Defaults to 0. | No |
A stream element defines the record layout of an input or output stream.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the stream. | Yes | * |
format | The stream format. Either xml, csv, delimited or fixedlength | Yes | * |
mode | By default, a stream mapping can be used for both reading input streams and writing
output streams, called readwrite mode. Setting mode to read or
write instead, respectively restricts usage to a BeanReader or a
BeanWriter only, but relaxes some validations on the mapping configuration.
When mode is set read, a bean class does not require getter methods. When mode is set write, a bean class may be abstract or an interface, and does not require setter methods. |
No | * |
resourceBundle | The name of the resource bundle for customizing error messages. | No | * |
strict | When set to true, BeanIO will calculate and enforce record ordering
based on the order records are declared. The record order attribute can
still be used to override a particular section of the stream.
When set to true, BeanIO will also calculate and enforce record lengths based on configured fields and their occurrences. The record minLength and maxLength attributes can still be used to override BeanIO defaults. Defaults to fales. |
No | * |
minOccurs | The minimum number of times the record layout must be read from an input stream. Defaults to 0. | No | * |
maxOccurs | The maximum number of times the record layout can repeat when read from an input stream. Defaults to 1. | No | * |
occurs | An alternative to specifying both minOccurs and maxOccurs that uses range notation. | No | * |
ignoreUnidentified Records | If set to true, BeanIO will skip records that cannot be identified, otherwise an UnidentifiedRecordException is thrown. This feature is not recommended for use with record groups, since a record sequencing error could cause large portions of a stream to go unprocessed without any exception. | No | * |
xmlType | The XML node type mapped to the stream. If not specified or set to element, the stream is mapped to the root element of the XML document being marshalled or unmarshalled. If set to none, the XML input stream will be fully read and mapped to a child group or record. | No | xml |
xmlName | The local name of the XML element mapped to the stream. Defaults to the stream name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to the stream. Defaults to '*' which will ignore namespaces while marshalling and unmarshalling. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
Children: parser?, typeHandler*, ( record | group )+
A parser element is used to customize or replace the default record parser factory for a stream.
Attributes:
Attribute | Description | Required |
---|---|---|
class | The fully qualified class name of the
org.beanio.stream.RecordParserFactory implementation
to use for this stream. If not specified, one of the following default factories is
used based on the stream format: csv - org.beanio.stream.csv.CsvRecordParserFactory delimited - org.beanio.stream.delimited.DelimitedRecordParserFactory fixedlength - org.beanio.stream.fixedlength.FixedLengthRecordParserFactory xml - org.beanio.stream.xml.XmlRecordParserFactory Overriding the record parser factory for XML is not supported (but also not prevented). |
No |
Children: property*
A group element is used to group records together for validating occurrences of the group as a whole.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the group. | Yes | * |
class | The fully qualified class name of the bean object mapped to this group. A class
may be bound to a group when its marshalled form spans multiple consecutive records.
During umarshalling, if any record in the group fails validation, an InvalidRecordGroupException is thrown. |
No | * |
value |
The name of a child component (typically a record) to return in lieu of an assigned class.
There can be only one iteration of the named value. For example, if a repeating segment bound to a collection contains a repeating field (also bound to a collection), the segment can be targeted, but the field cannot. |
No | * |
collection | The collection type for repeating groups bound to a parent bean object (configured on
a group). The value may be set to any fully qualified class name
assignable to java.util.Collection,
or to one of the collection type aliases: list, set or array.
A collection type can only be set if class is also set.
BeanIO will not derive the collection type from it's parent bean object. |
No | * |
getter | The getter method used to get the bean object bound to this group from it's parent. By default, the getter method is determined through introspection using the group name. Ignored if class is not set. | No | * |
setter | The setter method used to set the bean object bound to this group on the bean object of it's parent. By default, the setter method is determined through introspection using the group name. Ignored if class is not set. | No | * |
order | The order this group must appear within its parent group or stream.
If strict is set to true at the stream level, order will default to the order assigned to its preceding sibling plus one (i.e. the record or group that shares the same parent), or 1 if this group is the first child in its parent group or stream. If strict is false, defaults to 1. If order is explicitly set for one group, it must be set for all other siblings that share the same parent. |
No | * |
minOccurs | The minimum number of occurences of this group within its parent group or stream. Defaults to 1. | No | * |
maxOccurs | The maximum number of occurences of this group within its parent group or stream. Defaults to unbounded. | No | * |
occurs | An alternative to specifying both minOccurs and maxOccurs that uses range notation. | No | * |
xmlType | The XML node type mapped to this group. If not specified or set to element, this group is mapped to an XML element. When set to none, this group is used only to define expected record sequencing. | No | xml |
xmlName | The local name of the XML element mapped to this group. Defaults to the group name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this group. Defaults to the namespace declared for the parent stream or group definition. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace is used (i.e. xmlns="..."). | No | xml |
Children: record*
A record is used to define a record mapping within a stream.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the record. | Yes | * |
class | The fully qualified class name of the bean object mapped to this record.
If set to map or any java.util.Map implementation, a Map object will be used with field names for keys and field values for values. If set to list, set, collection, or any java.util.Collection implementation, child fields are added to the declared collection, including null values for missing or null fields. If neither class or target is set, a BeanReader will fully validate the record, but no bean object will be returned and the reader will continue reading the next record. |
No | * |
value |
The name of a child segment or field to return in lieu of an assigned class.
There can be only one iteration of a named value. For example, if a repeating segment bound to a collection contains a repeating field (also bound to a collection), the segment can be targeted, but the field cannot. If neither class or value is set, a BeanReader will fully validate the record, but no bean object will be returned and the reader will continue reading the next record. |
No | * |
getter | The getter method used to get the bean object bound to this record from it's parent. By default, the getter method is determined through introspection using the record name. Ignored if class is not set. | No | * |
setter | The setter method used to set the bean object bound to this record on the bean object of it's parent. By default, the setter method is determined through introspection using the record name. Ignored if class is not set. | No | * |
collection | The collection type for repeating records bound to a parent bean object (configured on
a group). The value may be set to any fully qualified class name
assignable to java.util.Collection or java.util.Map,
or to one of the collection type aliases: list, set, map or array.
A collection type can only be set if class or target is also set.
BeanIO will not derive the collection type from it's parent bean object. |
No | * |
key | The name of a descendant field to use for the Map key when collection is assignable to a java.util.Map. | No | * |
order | The order this record must appear within its parent group or stream.
If strict is set to true at the stream level, order will default to the order assigned to its preceding sibling plus one (i.e. the record or group that shares the same parent), or 1 if this record is the first child in its parent group or stream. If strict is false, defaults to 1. If order is explicitly set for one record, it must be set for all other siblings that share the same parent. |
No | * |
minOccurs | The minimum number of occurences of this record within its parent group or stream. Defaults to 0. | No | * |
maxOccurs | The maximum number of occurrences of this record within its parent group or stream. Defaults to unbounded. | No | * |
occurs | An alternative to specifying both minOccurs and maxOccurs that uses range notation. | No | * |
lazy | If set to true, the class or collection bound to this record will only be instantiated if at least one child attribute is not null or the empty String. Defaults to false. [Only applies if this record is bound to an attribute of a parent group.] | No | * |
template | The name of the template to include. The template is added to the record layout before any child of this record. | No | * |
ridLength | The expected length of this record for identifying it. The value uses
range notation.
If the stream format is delimited or csv, record length is measured by number of fields. If the stream format is fixedlength, record length is measured in characters. |
No | csv, delimited, fixedlength |
minLength | If the stream format is delimited or csv, minLength is the minimum number
of fields required by this record. If strict is true, defaults to the number of fields defined
for the record, otherwise 0.
If the stream format is fixedlength, minLength is the minimum number of characters required by this record. If strict is true, defaults to the sum of all field lengths definied for the record, otherwise 0. |
No | csv, delimited, fixedlength |
maxLength | If the stream format is delimited or csv, maxLength is the maximum number
of fields allowed by this record. If strict is true, defaults to the number of fields defined
for the record, or if no fields are declared or strict is false, then unbounded.
If the stream format is fixedlength, maxLength is the maximum number of characters allowed by this record. If strict is true, defaults to the sum of all field lengths defined for the record, or if no fields are declared or strict is false, then unbounded. |
No | csv, delimited, fixedlength |
xmlName | The local name of the XML element mapped to this record. Defaults to the record name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this record. Defaults to the namespace declared for this record's parent group or stream. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace is used (i.e. xmlns="..."). | No | xml |
Children: ( field | property | segment | include )*
A segment is used to bind groups of fields to a nested bean object, or to validate repeating groups of fields, or in an XML formatted stream, to wrap one or more fields in an element.
Attributes:
Attribute | Description | Required | Format(s) |
---|---|---|---|
name | The name of the segment. If the segment is bound to a bean object, the segment name is used for the name of the bean property unless a getter or setter is set. | Yes | * |
class | The fully qualified class name of the bean object bound to this segment. If set to map or any java.util.Map implementation, a Map object will be used with field/segment names for keys and field/segment values for values. | No | * |
value |
The name of a child segment or field to return in lieu of an assigned class. If set, all other descendants are not bound to the parent bean property. | No | * |
getter | The getter method used to get the bean object bound to this segment from it's parent. By default, the getter method is determined through introspection using the segment name. Ignored if class is not set. | No | * |
setter | The setter method used to set the bean object bound to this segment on the bean object of it's parent. By default, the setter method is determined through introspection using the segment name. Ignored if class is not set. | No | * |
collection | The collection type for repeating segments bound to a parent bean object.
The value may be set to any fully qualified class name
assignable to java.util.Collection or java.util.Map,
or to one of the collection type aliases: list, set, map
or array.
A collection type can only be set if class or target is also set.
BeanIO will not derive the collection type from it's parent bean object. There are a few restrictions specific to repeating segments in any "flat" format (delimited, CSV or fixedlength):
|
No | * |
key | The name of a descendant field to use for the Map key when collection is assignable to a java.util.Map. | No | * |
minOccurs | The minimum consecutive occurrences of this segment. Defaults to 1.
If minOccurs is 0, a null bean object bound to this segment will not be marshalled (unless a subsequent field is marshalled for CSV, delimited and fixed legnth stream formats) During unmarshalling, if the configured minimum occurrences is not met, an InvalidRecordException is thrown. |
No | * |
maxOccurs | The maximum consecutive occurrences of this segment. By default,
maxOccurs is set to minOccurs or 1, whichever is greater.
If there is no limit to the number of occurrences, the value may
be set to unbounded.
If set for a CSV, delimited or fixed length stream, the value can only exceed minOccurs if the segment appears at the end of a record. Maximum occurrences is not used for validation. If bounded, the size of a bound collection will not exceed the configured value, and additional occurrences are ignored. |
No | * |
occurs | An alternative to specifying both minOccurs and maxOccurs that uses range notation. | No | * |
occursRef | The name of a preceding field in the same record that controls the number of occurrences of this segment. If the controlling field is not bound to a separate property (i.e. ignore="true"), its automatically set based on the size of the segment collection during marshalling. | No | csv, delimited, fixedlength |
lazy | If set to true, the class or collection bound to this segment will only be instantiated if at least one child attribute is not null or the empty String. Defaults to false. This functionality differs from minOccurs in that the fields may still exist in the input stream. | No | * |
template | The name of the template to include. The template is added to the layout before any child of this segment. | No | * |
xmlType | The XML node type mapped to this segment. If not specified or set to element, this bean is mapped to an XML element. If set to none, children of this segment are expected to be contained by this segment's parent. | No | xml |
xmlName | The local name of the XML element mapped to this segment. Defaults to the segment name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this segmnet. Defaults to the namespace declared for the parent record or segmnet. | No | xml |
xmlPrefix | The namespace prefix assigned to the declared xmlNamespace for marshalling XML. If not specified, the default namespace is used (i.e. xmlns="..."). | No | xml |
nillable | Set to true if the W3C Schema Instance attribute nil should be set to true when marshalling a null bean object. Defaults to false. During unmarshalling, a nillable element will cause an InvalidRecordException if nillable is false.. | No | xml |
Children: ( field | property | segment | include )*
A field element is used to bind a field belonging to a record or segment to a bean property.
Attributes:
Attribute | Description | Required | Formats |
---|---|---|---|
name | The name of field. Unless a getter and/or setter is defined, the field name is used for the bean property name. | Yes | * |
getter | The getter method used to retrieve the property value for this field from its parent bean class. By default, the getter method is determined through introspection using the field name. | No | * |
setter | The setter method used to set the property value of this field on its parent bean class.
By default, the setter method is determined through introspection using
the field name.
If the field is a constructor argument, setter may be set to '#N', where N is the position of the argumnet in the constructor beginning at 1. |
No | * |
rid | Record identifier indicator. Set to true if this field is used to identify a record.
More than one field can be used to identify a record. Defaults to false.
Record identifying fields must have regex or literal configured to match a record, unless the field is a named XML element or attribute. |
No | * |
at position |
For delimited and CSV formatted streams, position (or at) is the index of the field
within the record, beginning at 0. And for fixed length formatted streams, position
is the index of the first character of the field within the record, beginning at 0.
Negative numbers can be used to indicate the position is relative to the end of the record. For example, the position -2 indicates the second to last field in a delimited record. If the field repeats, or the field belongs to a segment that repeats, position should be set based on the first occurrence of the field in a record. A position must be specified for all fields in a record, or for none at all. If positions are not specified, BeanIO will automatically calculate field positions based on the order in which the fields are defined in the mapping file. Position, if defined, is also used in XML formatted streams for ordering fields within their parent record or segment. This is typically not needed when using a mapping file, but can be useful when using annotations. If a position is configured for a parent segment (with annotations), the positions declared for fields added to the segment are assumed to be relative to their parent. |
No | * |
until | The maximum position of the field in the record. Only applies to fields that repeat where the number of occurrences is indeterminate (i.e. maxOccurs is greater than minOccurs). until must always be specified relative to the end of the record, and is therefore always a negative number. | No | csv, delimited, fixedlength |
trim | Set to true to trim the field text before validation and type conversion. Defaults to false. | No | * |
lazy | Set to true to convert empty field text to null before type conversion. For repeating fields bound to a collection, the collection will not be created if all field values are null or the empty String. Defaults to false. | No | * |
required | Set to true if this field is required. If a field is required and its field text is empty, a BeanReader will throw an InvalidRecordException when reading the record. Defaults to false. | No | * |
minLength | The minimum length of the field text before type conversion. Minimum length is only validated if the field length is greater than 0. Defaults to 0. | No | * |
maxLength | The maximum length of the field text before type conversion. Defaults to unbounded. | No | * |
regex | The regular expression pattern the field text must match. | No | * |
literal | Sets the literal or constant value of this field. When unmarshalled, an InvalidRecordException is thrown if the field text does not exactly match the literal value. | No | * |
default | The default value of this field.
When unmarshalling a stream, this value is set on the bean object when the field text is null or the empty string. And when marshalling, the default value is used when the property value is null or ignore is set to true (unless disabled). A default value is converted to a Java object using the same type handler configured for the field. |
No | * |
type | The fully qualified class name or type alias of the field value. By default, BeanIO will derive the field type from its parent bean class. This attribute can be used to override the default, or may be needed when its parent class is of type java.util.Map. | No | * |
collection | If a repeating field is bound to a collection object, collection
is the fully qualified class name of the java.util.Collection implementation,
or a collection type alias. When a collection is configured, the type attribute
is used to declare the property type of an item stored in the collection.
May be set to array if the collection type is a Java array.
Repeating fields bound to a property value must have collection configured. BeanIO will not derive the collection type from a field's parent bean class. |
No | * |
minOccurs | The minimum consecutive occurrences of this field in a record. Defaults to 1, with one
exception: a field in an XML formatted stream bound to an attribute defaults to 0.
minOccurs controls whether a field is marshalled for a null field value, and whether the field must be present during unmarshalling. If minOccurs is 1 or greater and the field is not present during unmarshalling, an InvalidRecordException is thrown. |
No | * |
maxOccurs | The maximum consecutive occurrences of this field in a record. By default,
maxOccurs is set to minOccurs or 1, whichever is greater. If overridden
for a non-XML stream format, the value can only exceed minOccurs if this is the last field
in the record. The value may be set to unbounded if there is no limit to the
number of occurrences of this field.
Maximum occurrences is not used for validation. When bounded, the size of a collection will not exceed the configured value, and additional occurrences are ignored. |
No | * |
occurs | An alternative to specifying both minOccurs and maxOccurs that uses range notation. | No | * |
occursRef | The name of a preceding field in the same record that controls the number of occurrences of this field. If the controlling field is not bound to a separate property (i.e. ignore="true"), its automatically set based on the size of the field collection during marshalling. | No | csv, delimited, fixedlength |
format | The decimal format pattern for java.lang.Number field values, or the simple
date format pattern for java.util.Date field properties.
The format value can also be accessed by any custom type handler that implements org.beanio.types.ConfigurableTypeHandler. |
No | * |
typeHandler | The name of the type handler to use for type conversion. By default, BeanIO will select a type handler based on the field type when set, or through introspection of this field's parent bean class. | No | * |
ignore | Set to true if this field is not a property of it's parent bean class. Defaults to false. Note that any configured validation rule on an ignored field is still performed. | No | * |
length | The padded length of this field measured in characters. Length is required for fixed
length formatted streams, and can be set for fields in other stream formats (along with
a padding character) to enable field padding.
The length of the last field in a fixed length record may be set to unbounded to disable padding and allow a single variable length field at the end of the otherwise fixed length record. |
Yes1 | * |
padding | The character used to pad this field. For fixed length formatted streams,
padding defaults to a space. For non-fixed length formatted streams,
padding is disabled unless a padding character and length are specified.
If padding is enabled, the required field attribute has some control over the marshalling and unmarshalling of null values. When unmarshalling a field consisting of all spaces in a fixed length stream, if required is false, the field is accepted regardless of the padding character. If required is true, a required field validation error is triggered. And when marshalling a null field value, if required is false, the field text is formatted as spaces regardless of the configured padding character. In other stream formats that are not fixed length, null field values are unmarshalled and marshalled as empty strings when required is false. When required is true, unmarshalling an empty string will trigger a required field validation error, and marshalling a null value will fill the field text with the padding character up to the padded length of the field. |
No | * |
keepPadding | Set to true if field padding should not be removed when unmarshalling a fixed length field. Defaults to false. | No | fixedlength |
lenientPadding | Set to true to disable enforcement of the padded field length when unmarshalling a fixed length field. Defaults to false. | No | fixedlength |
justify align |
The justification (i.e. alignment) of the field text within its padding. Either left or right. Defaults to left. | No | * |
xmlType | The XML node type mapped to this field. The type can be set to element (default)
to map this field to an XML element, attribute to map to an XML attribute, or text
to map the field value to the enclosed text of it's parent record or segment.
When set to text, xmlName and xmlNamespace have no effect. |
No | xml |
xmlName | The local name of the XML element or attribute mapped to this field. Defaults to the field name. | No | xml |
xmlNamespace | The namespace of the XML element mapped to this field. Defaults to the namespace configured for it's immediate parent record or segment. | No | xml |
xmlPrefix | The namespace prefix assigned to the configured xmlNamespace for marshalling XML. If not specified, the default namespace (i.e. xmlns="...") is used. | No | xml |
nillable | Set to true if the W3C Schema Instance attribute nil should be set to true when the marshalled field value is null. Defaults to false. Unmarshalling a non-nillalbe field where nil="true" will cause an InvalidRecordException. | No | xml |
1Only required for fixed length fields. If a literal value is supplied for a fixed length field, length will default to the length of the literal value.
The following table shows the message parameters used to format an error message for each configurable validation rule.
Type | Rule Name | Index | Value |
---|---|---|---|
Record Error | malformed | 0 | Line Number |
unidentified | 0 | Line Number | |
unexpected | 0 | Line Number | |
1 | Record Label/Name | ||
minLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Minimum Length | ||
3 | Maximum Length | ||
maxLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Minimum Length | ||
3 | Maximum Length | ||
Field Error | required | 0 | Line Number |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
nillable | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
minLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Minimum Length | ||
5 | Maximum Length | ||
maxLength | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Minimum Length | ||
5 | Maximum Length | ||
length | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Fixed Length Field Length | ||
regex | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Regular Expression Pattern | ||
type | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | TypeConversionException error message. | ||
literal | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field Label/Name | ||
3 | Field Text | ||
4 | Literal value | ||
minOccurs | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field or Bean Label/Name | ||
3 | - | ||
4 | Minimum occurrences | ||
5 | Maximum occurences | ||
maxOccurs | 0 | Line Number | |
1 | Record Label/Name | ||
2 | Field or Bean Label/Name | ||
3 | - | ||
4 | Minimum occurrences | ||
5 | Maximum occurences |
This appendix illustrates typical changes required to update an 1.x mapping file to 2.x.
Given the following 1.x mapping file:
<beanio xmlns="http://www.beanio.org/2011/01" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2011/01 http://www.beanio.org/2011/01/mapping.xsd"> <stream name="employees" format="delimited"> <reader> <property name="delimiter" value="," /> </reader> <writer> <property name="delimiter" value="," /> </writer> <record name="header" class="example.Header" maxOccurs="1"> <field name="recordType" rid="true" literal="H" /> <field name="fileDate" format="yyyy-MM-dd" /> </record> <record name="employee" class="example.Employee" minOccurs="0" minLength="6" maxLength="7"> <field name="recordType" rid="true" literal="D" /> <field name="firstName" /> <field name="lastName" /> <bean name="address" class="example.Address" > <field name="city" /> <field name="state" /> <field name="zip" /> </bean> <field name="phoneNumber" /> </record> <record name="trailer" class="example.Trailer" maxOccurs="1"> <field name="recordType" rid="true" literal="T" /> <field name="recordCount" /> </record> </stream> <stream name="contacts" format="xml" ordered="false"> <record name="person" class="example.Person" minOccurs="0"> <field name="firstName" /> <field name="lastName" minOccurs="1" /> <field name="phone" collection="list" minOccurs="0" maxOccurs="5" xmlWrapper="phoneList" /> </record> <record name="company" class="example.Person" minOccurs="0"> <field name="companyName" minOccurs="1" /> <field name="phone" /> </record> </stream> </beanio>
The following 2.x mapping file can be created:
<!-- Namespace updated --> <beanio xmlns="http://www.beanio.org/2012/03" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.beanio.org/2012/03 http://www.beanio.org/2012/03/mapping.xsd"> <!-- Use 'strict' to have BeanIO calculate and enforce default record lengths and ordering --> <stream name="employees" format="delimited" strict="true"> <!-- Combine 'reader' and 'writer' elements into 'parser' --> <parser> <property name="delimiter" value="," /> </parser> <!-- 'minOccurs' defaults to 0 if not specified --> <record name="header" class="example.Header" minOccurs="1" maxOccurs="1"> <field name="recordType" rid="true" literal="H" /> <field name="fileDate" format="yyyy-MM-dd" /> </record> <!-- 'minLength/maxLength' not needed, see 'phoneNumber' field below for explanation --> <record name="employee" class="example.Employee" minOccurs="0"minLength="6" maxLength="7"> <field name="recordType" rid="true" literal="D" /> <field name="firstName" /> <field name="lastName" /> <!-- Change 'bean' elements to 'segment' elements --> <segment name="address" class="example.Address" > <field name="city" /> <field name="state" /> <field name="zip" /> </segment> <!-- Use 'minOccurs' to denote optional fields at the end of a record. When used with -- 'strict', there is no need to set 'minLength' and 'maxLength' on the record, unless -- you are not mapping every field --> <field name="phoneNumber" minOccurs="0" /> </record> <record name="trailer" class="example.Trailer" minOccurs="1" maxOccurs="1"> <field name="recordType" rid="true" literal="T" /> <field name="recordCount" /> </record> </stream> <!-- Records are not ordered by default. --> <stream name="contacts" format="xml"ordered="false"> <!-- minOccurs defaults to 0 --> <record name="person" class="example.Person"minOccurs="0"> <!-- Optional XML elements must set minOccurs to 0 --> <field name="firstName" minOccurs="0"/> <field name="lastName" minOccurs="1" /> <!-- Use a 'segment' instead of an 'xmlWrapper' --> <segment name="phoneList" minOccurs="0"> <field name="phone" collection="list" minOccurs="0" maxOccurs="5"xmlWrapper="phoneList"/> </segment> </record> <record name="company" class="example.Person" minOccurs="0"> <field name="companyName"minOccurs="1"/> <!-- Optional XML elements must set minOccurs to 0 --> <field name="phone" minOccurs="0"/> </record> </stream> </beanio>