Python for Spark scripts (SPSS Modeler) | IBM Cloud Pak for Data as a Service

Python for Spark scripts

Last updated: Feb 11, 2025

Python for Spark scripts (SPSS Modeler)

SPSS Modeler supports Python scripts for Apache Spark.

Note:

Python nodes depend on the Spark environment.
Python scripts must use the Spark API because data is presented in the form of a Spark DataFrame.
When installing Python, make sure all users have permission to access the Python installation.
If you want to use the Machine Learning Library (MLlib), you must install a version of Python that includes NumPy.

Tips

You can run the following Python scripts from an Extension Output node:

To view information about the distribution of Python included with SPSS Modeler:
```
import sys
sys.version
```

To list all installed Python packages:

import subprocess
subprocess.check_call([sys.executable, '-m', 'pip', 'list'])

To install Python packages from an air-gapped environment, use the --index-url option which allows pip to install packages from a given Python repository (the repository must be compliant with PEP 503). For more information, including a list of all options, see https://pip.pypa.io/en/stable/cli/pip_install/.

Was the topic helpful?

0/1000