You can extract text from files in IBM watsonx.ai programmatically by using the Python library.
You can run a document text extraction job to extract text from a file that is stored in IBM Cloud Object Storage by using the ibm-watsonx-ai Python SDK and retrieve the results in a JSON file.
The following high-level steps are involved in setting up a source document from which text is extracted and an output file to collect the extracted results, and running a text extraction job to generate the results:
Upload a source document to IBM Cloud Object Storage and a JSON file to be populated with the extracted data.
Initialize a text extraction manager object by using the TextExtractions class.
from ibm_watsonx_ai.foundation_models.extractions import TextExtractions
extraction = TextExtractions(api_client=client,
project_id=project_id)
Copy to clipboardCopied to clipboard
Set the properties that you want to extract in the text extraction process. In this example, English language text is detected by using Optical Character Recognition (OCR) and any tables present in the documents are processed.