Example using IRS MeF XML Files (In demo Directory)

This example demonstrates loading a sample IRS Modernized eFile tax return using a Joost STX transformation. The data is in the form of a complex XML file.

The U.S. Internal Revenue Service (IRS) made a significant commitment to XML and specifies its use in its Modernized e-File (MeF) system. In MeF, each tax return is an XML document with a deep hierarchical structure that closely reflects the particular form of the underlying tax code.

XML, XML Schema and stylesheets play a role in their data representation and business workflow. The actual XML data is extracted from a ZIP file attached to a MIME “transmission file” message. For more information about MeF, see Modernized e-File (Overview) on the IRS web site.

The sample XML document, RET990EZ_2006.xml, is about 350KB in size with two elements:

  • ReturnHeader
  • ReturnData

The <ReturnHeader> element contains general details about the tax return such as the taxpayer’s name, the tax year of the return, and the preparer. The <ReturnData> element contains multiple sections with specific details about the tax return and associated schedules.

The following is an abridged sample of the XML file.

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <Return returnVersion="2006v2.0"
  3. xmlns="http://www.irs.gov/efile"
  4. xmlns:efile="http://www.irs.gov/efile"
  5. xsi:schemaLocation="http://www.irs.gov/efile"
  6. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  7. <ReturnHeader binaryAttachmentCount="1">
  8. <ReturnId>AAAAAAAAAAAAAAAAAAAA</ReturnId>
  9. <Timestamp>1999-05-30T12:01:01+05:01</Timestamp>
  10. <ReturnType>990EZ</ReturnType>
  11. <TaxPeriodBeginDate>2005-01-01</TaxPeriodBeginDate>
  12. <TaxPeriodEndDate>2005-12-31</TaxPeriodEndDate>
  13. <Filer>
  14. <EIN>011248772</EIN>
  15. ... more data ...
  16. </Filer>
  17. <Preparer>
  18. <Name>Percy Polar</Name>
  19. ... more data ...
  20. </Preparer>
  21. <TaxYear>2005</TaxYear>
  22. </ReturnHeader>
  23. ... more data ..

The goal is to import all the data into a HAWQ database. First, convert the XML document into text with newlines “escaped”, with two columns: ReturnId and a single column on the end for the entire MeF tax return. For example:

  1. AAAAAAAAAAAAAAAAAAAA|<Return returnVersion="2006v2.0"...

Load the data into HAWQ.