You can extract text from files in IBM watsonx.ai programmatically by using the Python library.
You can run a document text extraction job to extract text from a file that is stored in IBM Cloud Object Storage by using the ibm-watsonx-ai Python SDK and retrieve the results in a JSON file.
The following high-level steps are involved in setting up a source document from which text is extracted and an output file to collect the extracted results, and running a text extraction job to generate the results:
Upload a source document to IBM Cloud Object Storage and a JSON file to be populated with the extracted data.
Initialize a text extraction manager object by using the TextExtractions class.
from ibm_watsonx_ai.foundation_models.extractions import TextExtractions
extraction = TextExtractions(api_client=client,
project_id=project_id)
Copy to clipboardCopied to clipboard
Set the properties that you want to extract in the text extraction process. In this example, English language text is detected by using Optical Character Recognition (OCR) and any tables present in the documents are processed.
About cookies on this siteOur websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising.For more information, please review your cookie preferences options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.