Notice ID: 5000123185
High-Level Summary/Future State
“This RFI is associated with a Pilot IRS Enterprise Digitalization and Case Management Office (EDCMO) program. The IRS EDCMO office seeks information on technologies capable of extracting machine-readable data from existing low-resolution digital images and include both structured and unstructured data (although our initial focus will be forms/structured data, there is also handwritten and typed information in unstructured formats). We are primarily interested in solutions that:
- Extract machine-readable data out of low-resolution (120 DPI and below) digital images, with high levels of accuracy and speed, and low levels of manual correction/activity (i.e., results in the use of this information by government personnel with limited manual input or effort).
- Demonstrate flexibility and adaptability to extract machine-readable data from different forms with different structures (or the same form with different structures based on different versions/years), be able to improve accuracy and speed based on previous images, and provide a search capability across different images and data sources.
- Interface and are compliant with IRS systems, cybersecurity requirements, hardware and software, etc. Interfaces and schemas included in potential solutions would need to be approved for use by the IRS Chief Information Officer.”
“We are anticipating solutions that leverage Optical Character Recognition (OCR) or Intelligent Character Recognition (ICR), to include aspects of machine learning and neural networks. Because the initial users of this solution will be working to identify trends across multiple sets of information, we also anticipate a solution with a robust and configurable search capability.”
“The IRS has many existing digital images that are at a level of resolution (120 dots per inch (DPI) and lower) that creates difficulty in extracting machine-readable data. These low-resolution images reside within multiple systems, and are occasionally available, for a shorter period, at higher resolution. Ultimately, some of these images are saved at lower resolution due to the nature of legacy systems within which they are stored. As much of the information stored within these systems is sensitive, the IRS will use less sensitive or publicly available forms/data to confirm the efficacy of a proposed solution before deciding whether to pursue it. Because the images are contained within legacy systems, interoperability with those systems and other IRS requirements (i.e., cybersecurity) will also be a primary determinant of whether a use case will be scaled.”
“The IRS is committed to creating an environment where IRS data is available, accessible, and usable in a format that enables data-driven decision-making at all levels of the IRS organization. These efforts will support improvements to taxpayer service, enhance the fairness of our compliance efforts, address federal guidelines (e.g., Office of Management and Budget (OMB) M-19-21, NARA 2022 mandate), and reduce teleworking challenges that have emerged as a result of the COVID-19 pandemic…”