Appearance
Using Data Pools in Managed Services
In this hands-on tutorial, you'll build a complete text analysis service that processes documents using our Data Pools feature. You'll learn how to upload datasets, create a service that reads from Data Pools, test it locally, deploy it, and consume it via the SDK.
What You'll Build
By the end of this tutorial, you'll have created:
- A text analysis service that counts words in documents stored in Data Pools
- A working local development environment
- A deployed service on the platform
- A Python client that consumes your service
The full code of this tutorial is available in the planqk-examples repository.
Prerequisites
- Node.js 18+ installed on your system
- Python 3.9+ installed
- A platform account with a personal access token
Note: Replace <your-token>
, <your-consumer-key>
, <your-consumer-secret>
and other placeholder values with your actual credentials throughout this tutorial.
Step 1: Set Up Your Development Environment
1.1 Install and Configure the PLANQK CLI
First, let's install the PLANQK CLI and verify it's working. Please also run this, if you already have the CLI installed, to ensure you have the latest version:
bash
# Install the current CLI
npm install -g @planqk/planqk-cli
# Verify installation
planqk --version
You should see a version number. If you get an error, ensure Node.js 18+ is installed.
1.2 Install uv Package Manager
We'll use uv, a fast Python package manager, for managing our Python dependencies:
bash
# Install uv (if not already installed)
# On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows:
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Verify uv installation
uv --version
1.3 Authenticate with PLANQK
Get your personal access token from the PLANQK platform (Profile → Access Tokens) and authenticate:
bash
planqk login -t <your-personal-access-token>
You should see a success message confirming you're logged in.
1.4 Create Your Service Project
Let's create a new service project for our text analyzer:
bash
planqk init --name text-analyzer
cd text-analyzer
This creates a project structure with:
src/program.py
- Your main service logicinput/
- Local test data directoryplanqk.json
- Service configuration- Other configuration files
1.5 Set Up Python Environment
Now initialize a Python environment within our service project:
bash
# Initialize a Python project with uv in the current directory
uv sync -U
# Activate the environment (optional, uv will handle this automatically)
source .venv/bin/activate # On Windows: .venv\Scripts\activate.[ps1|bat]
Step 2: Prepare Sample Data
2.1 Create Sample Text Files
Let's create some sample documents to analyze. We'll create them in two locations - one set for uploading to the Data Pool and another set for local testing:
bash
# Create directories for both upload and local testing
mkdir -p input/documents
Create sample documents for uploading to Data Pool in input/documents/
:
Create input/documents/document1.txt
:
bash
cat > input/documents/document1.txt << 'EOF'
Quantum computing is a revolutionary technology that harnesses the principles of quantum mechanics.
It promises to solve complex problems that are intractable for classical computers.
Quantum algorithms like Shor's algorithm and Grover's algorithm demonstrate significant speedups.
EOF
Create input/documents/document2.txt
:
bash
cat > input/documents/document2.txt << 'EOF'
Machine learning and artificial intelligence are transforming industries worldwide.
Deep learning models can process vast amounts of data to identify patterns.
Natural language processing enables computers to understand human language.
EOF
Create input/documents/summary.json
with metadata:
bash
cat > input/documents/summary.json << 'EOF'
{
"collection": "Sample Documents",
"total_files": 2,
"description": "Demo text files for analysis",
"created": "2025-08-04"
}
EOF
2.2 Upload Data to a Data Pool
Now upload the files from input/documents/
to a PLANQK Data Pool:
bash
planqk datapool upload -f ./input/documents/document1.txt -f ./input/documents/document2.txt -f ./input/documents/summary.json
The CLI will prompt you to create a new Data Pool. Choose "Yes" and give it a name like text-analysis-demo
. Save the Data Pool ID that's returned - you'll need it later.
Step 3: Implement the Text Analysis Service
The full code of the text analysis service is available in the planqk-examples repository.
3.1 Update the Service Logic
Replace the contents of src/program.py
with our text analyzer:
python
from planqk.commons.datapool import DataPool
from pydantic import BaseModel
import json
from typing import Dict, List
class AnalysisRequest(BaseModel):
files_to_analyze: List[str]
min_word_length: int = 3
class AnalysisResult(BaseModel):
total_files: int
word_counts: Dict[str, int]
total_words: int
summary: str
def run(data: AnalysisRequest, documents: DataPool) -> AnalysisResult:
"""Analyze text files from a Data Pool and return word statistics."""
word_counts = {}
files_processed = 0
for filename in data.files_to_analyze:
try:
# Read the text file from Data Pool
with documents.open(filename, 'r') as f:
content = f.read()
# Simple word counting
words = content.lower().split()
for word in words:
# Clean word and filter by length
clean_word = ''.join(char for char in word if char.isalnum())
if len(clean_word) >= data.min_word_length:
word_counts[clean_word] = word_counts.get(clean_word, 0) + 1
files_processed += 1
except FileNotFoundError:
print(f"Warning: File {filename} not found in Data Pool")
continue
total_words = sum(word_counts.values())
# Find most common words
top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:5]
summary = f"Analyzed {files_processed} files. Top words: {dict(top_words)}"
return AnalysisResult(
total_files=files_processed,
word_counts=word_counts,
total_words=total_words,
summary=summary
)
3.2 Make Your Initial Commit to Track Your Changes [Optional]
To track your changes, initialize a Git repository and commit your code:
bash
git init
git add .
git commit -m "Initial commit: Implement text analysis service"
Step 4: Test Locally
4.1 Set Up Local Test Environment
Create test input in input/data.json
:
bash
cat > input/data.json << 'EOF'
{
"files_to_analyze": ["document1.txt", "document2.txt"],
"min_word_length": 4
}
EOF
4.2 Update Local Test Runner
Replace src/__main__.py
to test with our Data Pool:
python
import json
import os
from planqk.commons.constants import OUTPUT_DIRECTORY_ENV
from planqk.commons.datapool import DataPool
from planqk.commons.json import any_to_json
from planqk.commons.logging import init_logging
from .program import AnalysisRequest, run
init_logging()
# Set up output directory for local testing
directory = "./out"
os.makedirs(directory, exist_ok=True)
os.environ[OUTPUT_DIRECTORY_ENV] = directory
# Load test data
with open("./input/data.json") as file:
data = AnalysisRequest.model_validate(json.load(file))
# Simulate DataPool injection using local directory
result = run(data, documents=DataPool("./input/documents"))
print("Analysis Results:")
print(any_to_json(result))
4.3 Run Local Test
Test your service locally:
bash
python -m src
You should see output showing the word analysis results from your sample documents.
Step 5: Deploy Your Service
5.1 Generate OpenAPI Specification
bash
planqk openapi
5.2 Deploy Your Service to the Platform
You have two options for deployment: using the CLI or the web UI.
5.2.1 Deploy via CLI
To deploy your service using the CLI, run:
bash
planqk up
5.2.2 Deploy via Web UI
Alternatively, you can deploy via the PLANQK web interface. Therefore, you need to compress your service files into a ZIP archive:
bash
planqk compress
- Go to the PLANQK platform web interface and navigate to services: https://platform.planqk.de/services
- Click on
Create Service
- Select your ZIP file at
Source
>File
- Configure the service:
- Set service name: "Text Analyzer with Data Pools"
- Add a Data Pool parameter named
documents
- Publish the service
Save your service ID - you'll need it for the next steps.
Step 6: Test Your Deployed Service
6.1 Create a Request Body
Create a file called service-request.json
with the Data Pool reference:
bash
cat > service-request.json << 'EOF'
{
"data": {
"files_to_analyze": ["document1.txt", "document2.txt"],
"min_word_length": 3
},
"documents": {
"id": "<your-datapool-id>",
"ref": "DATAPOOL"
}
}
EOF
Replace <your-datapool-id>
with the Data Pool ID from Step 2.2.
6.2. Test the Execution Using the UI
Currently, the Jobs execution using Data Pools as input is not available. Therefore, you need to publish your service first and invoke it via an Application. Follow these steps:
- Go to the services page in the platform app: https://platform.planqk.de/services and navigate to your service.
- Click on
Publish Service
andPublish internally
. - Go to the Applications page: https://platform.planqk.de/applications and create a new Application (or reuse an existing one).
- Navigate to the Application you want to use.
- Click on
Subscribe Internally
and select your new service. - After subscribing, you can test your service by clicking on
Try it out
. - Open the
POST
element in the OpenAPI specification. - Click again on
Try it out
and paste the content ofservice-request.json
into the request body. - Click the
Execute
button under the body to run the service. - Navigate to the Application again and click on the subscription of your service on
Activity Logs
. - Select the latest execution and click on
Show Logs
. - You should see the execution logs, including the analysis results similar to the local execution.
Step 7: Build a Python Client
The full code of the client is available in the planqk-examples repository.
7.1 Set Up Client Environment
Create a separate directory for your client:
bash
cd ..
mkdir text-analyzer-client
cd text-analyzer-client
# Set up Python environment
uv init && uv sync -U
source .venv/bin/activate # On Windows: .venv\Scripts\activate.ps1
uv add planqk-service-sdk python-dotenv
7.2 Configure Client Credentials
Create a .env
file with your application credentials (get these from your application's settings page):
You can get the CONSUMER_KEY
, CONSUMER_SECRET
, and DATAPOOL_ID
from the application you created in the previous steps. The SERVICE_ENDPOINT
can be found and copied from the subscription of your service inside the application details.
bash
cat > .env << 'EOF'
SERVICE_ENDPOINT=<your-service-endpoint>
CONSUMER_KEY=<your-consumer-key>
CONSUMER_SECRET=<your-consumer-secret>
DATAPOOL_ID=<your-datapool-id>
EOF
7.3 Create the Client Script
Create analyze_client.py
:
python
import os
from dotenv import load_dotenv
from planqk.service.client import PlanqkServiceClient
from planqk.service.datapool import DataPoolReference
# Load environment variables
load_dotenv()
# Initialize the client
client = PlanqkServiceClient(
service_endpoint=os.getenv("SERVICE_ENDPOINT"),
consumer_key=os.getenv("CONSUMER_KEY"),
consumer_secret=os.getenv("CONSUMER_SECRET")
)
def analyze_documents(files_to_analyze, min_word_length=3):
"""Run text analysis on documents in the Data Pool."""
# Create Data Pool reference
documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
# Prepare request
request_body = {
"data": {
"files_to_analyze": files_to_analyze,
"min_word_length": min_word_length
},
"documents": documents_ref
}
print("Starting analysis...")
# Execute the service
execution = client.run(request=request_body)
print(f"Execution started with ID: {execution.id}")
print("Waiting for completion...")
# Wait for completion
execution.wait_for_final_state(timeout=300)
if execution.status == "SUCCEEDED":
result = execution.result()
print("\n=== Analysis Results ===")
print(f"Status: {execution.status}")
print(f"Files processed: {result.total_files}")
print(f"Total words found: {result.total_words}")
print(f"Summary: {result.summary}")
# Show top 10 most common words
word_counts = result.word_counts
top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:10]
print("\nTop 10 most common words:")
for word, count in top_words:
print(f" {word}: {count}")
else:
print(f"Execution failed with status: {execution.status}")
logs = execution.logs()
print("Error logs:")
for log in logs[-5:]: # Show last 5 log entries
print(f" {log}")
if __name__ == "__main__":
# Analyze our sample documents
analyze_documents(
files_to_analyze=["document1.txt", "document2.txt"],
min_word_length=4
)
7.4 Run the Client
bash
python analyze_client.py
You should see the text analysis results from your deployed service!
Step 8: Advanced Usage
8.1 Add More Documents
Upload additional documents to your Data Pool:
bash
cd ../text-analyzer-service
# Create a new document
cat > input/documents/document3.txt << 'EOF'
Cloud computing provides scalable infrastructure for modern applications.
Microservices architecture enables independent deployment and scaling.
Container orchestration platforms manage distributed systems efficiently.
EOF
# Upload to existing Data Pool
planqk datapool upload -f ./input/documents/document3.txt --datapool-id <your-datapool-id>
8.2 Analyze New Documents
Update your client to analyze the new document:
python
# In analyze_client.py, change the files list:
analyze_documents(
files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
min_word_length=5
)
8.3 Monitor Execution Progress
Add progress monitoring to your client:
python
def analyze_with_monitoring(files_to_analyze, min_word_length=3):
"""Run analysis with real-time status monitoring."""
documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
request_body = {
"data": {
"files_to_analyze": files_to_analyze,
"min_word_length": min_word_length
},
"documents": documents_ref
}
execution = client.run(request=request_body)
print(f"Started execution: {execution.id}")
# Monitor progress
while not execution.has_finished:
print(f"Status: {execution.status}")
import time
time.sleep(2) # Check every 2 seconds
print(f"Final status: {execution.status}")
if execution.status == "SUCCEEDED":
return execution.result()
else:
print("Execution failed")
return None
Then update the main block to use this function:
python
if __name__ == "__main__":
# In analyze_client.py, change the files list:
resutl = analyze_with_monitoring(
files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
min_word_length=5
)
print(result) if result else print("No results returned.")
And run it again:
bash
python analyze_client.py
You should see real-time status updates as your service processes the documents.
What You've Accomplished
🎉 Congratulations! You've successfully:
- ✅ Set up the PLANQK CLI and authenticated
- ✅ Created sample data and uploaded it to a Data Pool
- ✅ Built a text analysis service that reads from Data Pools
- ✅ Tested your service locally with simulated Data Pools
- ✅ Deployed your service to the PLANQK platform
- ✅ Created a Python client that consumes your service
- ✅ Learned how to monitor executions and handle results
Key Concepts Learned
- Data Pools: Managed file collections that can be mounted into services
- Local Testing: Simulating Data Pools with local directories
- Service Parameters: How Data Pool parameters are injected into your service
- SDK Integration: Using DataPoolReference to pass Data Pool IDs to services
- Error Handling: Managing file not found errors and execution failures
Next Steps
- Try uploading larger datasets (remember the 500 MB per file limit)
- Experiment with different analysis algorithms
- Build services that write results back to output Data Pools
- Explore the workflow orchestration features for multi-step data processing
References
[CLI] CLI Reference | PLANQK Docs
[DataPool] Using Data Pools in PLANQK Services | PLANQK Docs