Show HN: Tile.run – Extract structured data from any document via API https://ift.tt/Se68Pb0

Show HN: Tile.run – Extract structured data from any document via API Hey HN, Today, we’re launching tile.run, an API that extracts structured data from unstructured documents (PDF, images, text) with support for custom schemas. The Problem: Extracting data out of unstructured documents is surprisingly hard. We built tile.run while solving this for our product Kili (automation for invoicing/reconciliation). We found that getting to accuracy that is reliable enough for automation is challenging. Dense documents (e.g., lots of tables or line items) are even harder, and these are the most valuable to automate. After talking to other teams and developers, we found many other teams were after similar solutions. Key Features: - Multiple formats: PDF, JPEG, PNG, TIFF, plain text - Custom schema support with nested objects/arrays - Specialized in dense documents with tables - Self-serve API - start extracting in minutes Technical Details: - REST API with simple JSON responses - Robust error handling and validation Coming Soon: - Improved accuracy - More file formats - Self-hosting options - Zero data retention mode Links: - Landing page: https://tile.run - Documentation: https://tile.run/docs I appreciate there have been a bunch of launches in this area recently, so wanted to address that head on as well: - Clearly this problem is very valuable to solve but requires significant effort - There are many ways to approach the same problem. For example, tile.run targets technical teams whereas other teams are solving this for business teams or specific functions (e.g. ETL). We're excited to hear your feedback on the product. https://www.tile.run/ November 6, 2024 at 11:44PM

No comments:

Show HN: I built a Slack app to automate timesheets and project tracking https://ift.tt/gNVHstq

Show HN: I built a Slack app to automate timesheets and project tracking https://ift.tt/yYh9SL2 November 7, 2024 at 04:49AM