Skip to content

Read flat files with the ADO.NET IDataReader interface.

License

Notifications You must be signed in to change notification settings

freein/FlatFiles

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FlatFiles

Work with flat files using fluent type mapping, with support for ADO.NET.

Download using NuGet: FlatFiles

Overview

A lot of us still need to work with flat files (e.g. CSV or fixed-length) whether because we're interfacing with older systems or because we're running one-time migration scripts. It's a pain that there's nothing built into .NET for treating flat files like a database table.

FlatFiles makes it easy to read and write flat files in many different ways. It supports type mappers for directly reading and writing with data objects, using a style similar to Entity Framework Code First. You can also go back and forth between files and DataTables. It also supports the ability to expose a file using the IDataReader interface, for working with the low-level ADO.NET classes. If you really want to, you can read and write values with raw object arrays. FlatFiles also supports a large number of options for customizing how files are interpreted, to support the most common types of flat files.

FlatFiles also supports reading Excel files.

Type Mappers

Using the type mappers, you can directly read file contents into your classes:

var mapper = SeparatedValueTypeMapper.Define<Customer>();
mapper.Property(c => c.CustomerId).ColumnName("customer_id");
mapper.Property(c => c.Name).ColumnName("name");
mapper.Property(c => c.Created).ColumnName("created").InputFormat("yyyyMMdd");
mapper.Property(c => c.AverageSales).ColumnName("avg_sales");
var customers = mapper.Read(@"C:\path\to\file.csv");

Writing to a file is just as easily:

mapper.Property(c => c.Created).OutputFormat("yyyyMMdd");
mapper.Property(c => c.AverageSales).OutputFormat("N2");
mapper.Write(@"C:\path\to\file2.csv", customers);

Note that the mapper assumes the order Property is called the first time for a particular property matches the order the columns appear in the file. Additional references to the property have no impact on the expected order.

Schemas

Type mapping internally defines a schema, which defines the name, order and type of each column in the flat file. In order to use the other classes in FlatFiles, you must define the schema explicitly. For instance, this is how we would define a CSV file schema:

SeparatedValueSchema schema = new SeparatedValueSchema();
schema.AddColumn(new Int64Column("customer_id"))
      .AddColumn(new StringColumn("name"))
      .AddColumn(new DateTimeColumn("created") { InputFormat = "yyyyMMdd", OutputFormat = "yyyyMMdd" })
      .AddColumn(new DoubleColumn("avg_sales") { OutputFormat = "N2" });

Or, if the schema is for a fixed-length file:

FixedLengthSchema schema = new FixedLengthSchema();
schema.AddColumn(new Int64Column("customer_id"), 10)
  .AddColumn(new StringColumn("name"), 255)
  .AddColumn(new DateTimeColumn("created") { InputFormat = "yyyyMMdd", OutputFormat = "yyyyMMdd" }, 8)
  .AddColumn(new DoubleColumn("avg_sales") { OutputFormat = "N2" }, 10);

The FixedLengthSchema class is the same as the SeparatedValueSchema class, except it associates a Window to each column. A Window records the Width of the column in the file. It also allows you to specify the Alignment (left or right) in cases where the value doesn't fill the entire width of the column (the default is left aligned). The FillCharacter property can be used to say what character is used as padding.

Some fixed-length files may have columns that are not used. The fixed-length schema doesn't provide a way to specify a starting index for a column. Simply define "ignored" columns for gaps in the input file.

Schemas can be defined for Excel files, too, using the ExcelSchema. The code is very similar to defining a schema for a separated value file.

The type mappers provide a GetSchema method to allow you to define schemas using a fluent syntax.

Handling Nulls

By default, FlatFiles will treat blank or empty strings as null. If nulls are represented differently in your file, you can pass a custom INullHandler to the schema. If it is a fixed value, you can use the ConstantNullHandler class.

DateTimeColumn dtColumn = new DateTimeColumn("created") { NullHandler = ConstantNullHandler.For("NULL") };

Or, if you are using Type Mappers, you can simply use the NullValue or NullHandler methods.

mapper.Property(c => c.Created).ColumnName("created").NullValue("NULL");

You can implement the INullHandler interface if you need to support something more complex.

SeparatedValueReader

If you are working with delimited files, such as comma-separated or tab-separated files, you will want to use the SeparatedValueReader class. The constructor accepts a combination of a file name (or stream), a SeparatedValueSchema object and/or a SeparatedValueOptions object.

When parsing separated files, you can surround fields with double or single quotes. This way you can include the separator string within the field. Of course, this won't work if the separator is the quote character itself!

The SeparatedValueOptions class supports a Separator property for specifying the string that separates your fields. A comma (,) is the default separator.

The RecordSeparator property specifies what character sequence is used to separate records. By default, this is Environment.NewLine (\r\n). This is useful if you are working on files from other systems, such as Linux (\n) or Macintosh (\r).

The IsFirstRecordSchema property tells the reader to treat the first record in the file as the schema. Since the types of the fields cannot be determined from a file, they are assumed to be strings. If you provide the schema to the constructor, it will be used instead, and the first record will simply be skipped. By default, this property is set to false.

SeparateValueWriter

If you want to build a delimited file, you can use the SeparatedValueWriter class. It accepts the same schema and options arguments. If the SeparatedValueOptions's IsFirstRecordSchema property is set to true, the schema will be written to the file upon writing the first record.

FixedLengthReader

If you are working with files whose fields are a fixed-length you will want to use the FixedLengthReader class. The constructor accepts a combination of a file name (or stream), a FixedLengthSchema object and/or a FixedLengthOptions object.

The FixedLengthOptions class supports a FillCharacter property to specify which character is used as a fill character in the columns. A space ( ) is the default.

It also supports a RecordSeparator property for specifying what value indicates the end of a record. By default, this is Environment.NewLine (\r\n). This is useful if you are working on files from other systems, such as Linux (\n) or Macintosh (\r).

FixedLengthWriter

If you want to build a fixed-length file, you can use the FixedLengthWriter class. It accepts the same schema and options arguments used to read files. If you want to control the alignment of the columns, you can specify the FixedAlignment for each column when defining the schema. This will control whether padding is put to the right or the left of the value.

ExcelReader

If you are working with an Excel file, you will want to use the ExcelReader class. The constructor accepts a combination of a file name, an ExcelSchema object and an ExcelOptions object.

The ExcelOptions class is used to specify the name of the worksheet to read the data from. It also includes fields for specifying the row/column range. By default, the reader will try to read the entire worksheet.

The IsFirstRecordSchema property tells the reader to treat the first record in the file as the schema. If you provide a schema, the reader will simply skip the first row. Although, it probably makes more sense to simply skip the header in the case of Excel (using the StartingRow option). Otherwise, Excel will try to interpret the type of the column by looking at the values in the worksheet. If you don't provide a schema and set the IsFirstRecordSchema property to false, ExcelReader will simply return the row values in whatever way Excel interprets them.

If your schema's column types do not match what Excel interprets the column type, ExcelReader will do its best to convert to the desired value.

DataTables

If you are using DataTables, you can read and write to a DataTable using the ReadFlatFile and WriteFlatFile extension methods. Just pass the corresponding reader or writer object.

DataTable customerTable = new DataTable("Customer");
using (IReader reader = new SeparatedValueReader(@"C:\path\to\file.csv", schema))
{
    customerTable.ReadFlatFile(reader);
}

FlatFileReader

For low-level file reading, you can use the FlatFileReader class. It provides an IDataReader interface to the records in the file, making it compatible with other ADO.NET interfaces.

// The DataRead Approach
using (FlatFileReader reader = new FlatFileReader(new SeparatedValueReader(@"C:\path\to\file.csv", schema))
{
    List<Customer> customers = new List<Customer>();
    while (reader.Read())
    {
        Customer customer = new Customer();
        customer.CustomerId = reader.GetInt32(0);
        customer.Name = reader.GetString(1);
        customer.Created = reader.GetDateTime(2);
        customer.AverageSales = reader.GetDouble(3);
        customers.Add(customer);
    }
    return customers;
}

License

This is free and unencumbered software released into the public domain.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

In jurisdictions that recognize copyright laws, the author or authors of this software dedicate any and all copyright interest in the software to the public domain. We make this dedication for the benefit of the public at large and to the detriment of our heirs and successors. We intend this dedication to be an overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

For more information, please refer to http://unlicense.org

About

Read flat files with the ADO.NET IDataReader interface.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C# 99.9%
  • Batchfile 0.1%