Using Data Pools in Managed Services

In this hands-on tutorial, you'll build a complete text analysis service that processes documents using our Data Pools feature. You'll learn how to upload datasets, create a service that reads from Data Pools, test it locally, deploy it, and consume it via the SDK.

What You'll Build

By the end of this tutorial, you'll have created:

A text analysis service that counts words in documents stored in Data Pools
A working local development environment
A deployed service on the platform
A Python client that consumes your service

The full code of this tutorial is available in the examples repository.

Prerequisites

Node.js 18+ installed on your system
Python 3.9+ installed
A platform account with a personal access token

Note: Replace <your-token>, <your-consumer-key>, <your-consumer-secret> and other placeholder values with your actual credentials throughout this tutorial.

Step 1: Set Up Your Development Environment

1.1 Install and Configure the CLI

First, let's install the CLI and verify it's working. Please also run this, if you already have the CLI installed, to ensure you have the latest version:

bash

# Install the current CLI
npm install -g @planqk/planqk-cli

# Verify installation
planqk --version

You should see a version number. If you get an error, ensure Node.js 18+ is installed.

1.2 Install uv Package Manager

We'll use uv, a fast Python package manager, for managing our Python dependencies:

bash

# Install uv (if not already installed)
# On macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh

# On Windows:
# powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Verify uv installation
uv --version

1.3 Authenticate with the platform

Get your personal access token from the platform (Profile → Access Tokens) and authenticate:

bash

planqk login -t <your-personal-access-token>

You should see a success message confirming you're logged in.

1.4 Create Your Service Project

Let's create a new service project for our text analyzer:

bash

planqk init --name text-analyzer
cd text-analyzer

This creates a project structure with:

src/program.py - Your main service logic
input/ - Local test data directory
planqk.json - Service configuration
Other configuration files

1.5 Set Up Python Environment

Now initialize a Python environment within our service project:

bash

# Initialize a Python project with uv in the current directory
uv sync -U

# Activate the environment (optional, uv will handle this automatically)
source .venv/bin/activate  # On Windows: .venv\Scripts\activate.[ps1|bat]

Step 2: Prepare Sample Data

2.1 Create Sample Text Files

Let's create some sample documents to analyze. We'll create them in two locations - one set for uploading to the Data Pool and another set for local testing:

bash

# Create directories for both upload and local testing
mkdir -p input/documents

Create sample documents for uploading to Data Pool in input/documents/:

Create input/documents/document1.txt:

bash

cat >  input/documents/document1.txt << 'EOF'
Quantum computing is a revolutionary technology that harnesses the principles of quantum mechanics.
It promises to solve complex problems that are intractable for classical computers.
Quantum algorithms like Shor's algorithm and Grover's algorithm demonstrate significant speedups.
EOF

Create input/documents/document2.txt:

bash

cat >  input/documents/document2.txt << 'EOF'
Machine learning and artificial intelligence are transforming industries worldwide.
Deep learning models can process vast amounts of data to identify patterns.
Natural language processing enables computers to understand human language.
EOF

Create input/documents/summary.json with metadata:

bash

cat > input/documents/summary.json << 'EOF'
{
  "collection": "Sample Documents",
  "total_files": 2,
  "description": "Demo text files for analysis",
  "created": "2025-08-04"
}
EOF

2.2 Upload Data to a Data Pool

Now upload the files from input/documents/ to a Data Pool:

bash

planqk datapool upload -f ./input/documents/document1.txt -f ./input/documents/document2.txt -f ./input/documents/summary.json

The CLI will prompt you to create a new Data Pool. Choose "Yes" and give it a name like text-analysis-demo. Save the Data Pool ID that's returned - you'll need it later.

Step 3: Implement the Text Analysis Service

The full code of the text analysis service is available in the examples repository.

3.1 Update the Service Logic

Replace the contents of src/program.py with our text analyzer:

python

from planqk.commons.datapool import DataPool
from pydantic import BaseModel
import json
from typing import Dict, List

class AnalysisRequest(BaseModel):
    files_to_analyze: List[str]
    min_word_length: int = 3

class AnalysisResult(BaseModel):
    total_files: int
    word_counts: Dict[str, int]
    total_words: int
    summary: str

def run(data: AnalysisRequest, documents: DataPool) -> AnalysisResult:
    """Analyze text files from a Data Pool and return word statistics."""
    
    word_counts = {}
    files_processed = 0
    
    for filename in data.files_to_analyze:
        try:
            # Read the text file from Data Pool
            with documents.open(filename, 'r') as f:
                content = f.read()
            
            # Simple word counting
            words = content.lower().split()
            for word in words:
                # Clean word and filter by length
                clean_word = ''.join(char for char in word if char.isalnum())
                if len(clean_word) >= data.min_word_length:
                    word_counts[clean_word] = word_counts.get(clean_word, 0) + 1
            
            files_processed += 1
            
        except FileNotFoundError:
            print(f"Warning: File {filename} not found in Data Pool")
            continue
    
    total_words = sum(word_counts.values())
    
    # Find most common words
    top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:5]
    summary = f"Analyzed {files_processed} files. Top words: {dict(top_words)}"
    
    return AnalysisResult(
        total_files=files_processed,
        word_counts=word_counts,
        total_words=total_words,
        summary=summary
    )

3.2 Make Your Initial Commit to Track Your Changes [Optional]

To track your changes, initialize a Git repository and commit your code:

bash

git init
git add .
git commit -m "Initial commit: Implement text analysis service"

Step 4: Test Locally

4.1 Set Up Local Test Environment

Create test input in input/data.json:

bash

cat > input/data.json << 'EOF'
{
  "files_to_analyze": ["document1.txt", "document2.txt"],
  "min_word_length": 4
}
EOF

4.2 Update Local Test Runner

Replace src/__main__.py to test with our Data Pool:

python

import json
import os
from planqk.commons.constants import OUTPUT_DIRECTORY_ENV
from planqk.commons.datapool import DataPool
from planqk.commons.json import any_to_json
from planqk.commons.logging import init_logging
from .program import AnalysisRequest, run

init_logging()

# Set up output directory for local testing
directory = "./out"
os.makedirs(directory, exist_ok=True)
os.environ[OUTPUT_DIRECTORY_ENV] = directory

# Load test data
with open("./input/data.json") as file:
    data = AnalysisRequest.model_validate(json.load(file))

# Simulate DataPool injection using local directory
result = run(data, documents=DataPool("./input/documents"))

print("Analysis Results:")
print(any_to_json(result))

4.3 Run Local Test

Test your service locally:

bash

python -m src

You should see output showing the word analysis results from your sample documents.

Step 5: Deploy Your Service

5.1 Generate OpenAPI Specification

bash

planqk openapi

5.2 Deploy Your Service to the Platform

You have two options for deployment: using the CLI or the web UI.

5.2.1 Deploy via CLI

To deploy your service using the CLI, run:

bash

planqk up

5.2.2 Deploy via Web UI

Alternatively, you can deploy via the platform web interface. Therefore, you need to compress your service files into a ZIP archive:

bash

planqk compress

Go to the platform web interface and navigate to services: https://dashboard.hub.kipu-quantum.com/services
Click on Create Service
Select your ZIP file at Source > File
Configure the service:
- Set service name: "Text Analyzer with Data Pools"
- Add a Data Pool parameter named documents
Publish the service

Save your service ID - you'll need it for the next steps.

Step 6: Test Your Deployed Service

6.1 Create a Request Body

Create a file called service-request.json with the Data Pool reference:

bash

cat > service-request.json << 'EOF'
{
  "data": {
    "files_to_analyze": ["document1.txt", "document2.txt"],
    "min_word_length": 3
  },
  "documents": {
    "id": "<your-datapool-id>",
    "ref": "DATAPOOL"
  }
}
EOF

Replace <your-datapool-id> with the Data Pool ID from Step 2.2.

6.2. Test the Execution Using the UI

Currently, the Jobs execution using Data Pools as input is not available. Therefore, you need to publish your service first and invoke it via an Application. Follow these steps:

Go to the services page in the platform app: https://dashboard.hub.kipu-quantum.com/services and navigate to your service.
Click on Publish Service and Publish internally.
Go to the Applications page: https://dashboard.hub.kipu-quantum.com/applications and create a new Application (or reuse an existing one).
Navigate to the Application you want to use.
Click on Subscribe Internally and select your new service.
After subscribing, you can test your service by clicking on Try it out.
Open the POST element in the OpenAPI specification.
Click again on Try it out and paste the content of service-request.json into the request body.
Click the Execute button under the body to run the service.
Navigate to the Application again and click on the subscription of your service on Activity Logs.
Select the latest execution and click on Show Logs.
You should see the execution logs, including the analysis results similar to the local execution.

Step 7: Build a Python Client

The full code of the client is available in the examples repository.

7.1 Set Up Client Environment

Create a separate directory for your client:

bash

cd ..
mkdir text-analyzer-client
cd text-analyzer-client

# Set up Python environment
uv init && uv sync -U
source .venv/bin/activate  # On Windows: .venv\Scripts\activate.ps1
uv add planqk-service-sdk python-dotenv

7.2 Configure Client Credentials

Create a .env file with your application credentials (get these from your application's settings page):

You can get the CONSUMER_KEY, CONSUMER_SECRET, and DATAPOOL_ID from the application you created in the previous steps. The SERVICE_ENDPOINT can be found and copied from the subscription of your service inside the application details.

bash

cat > .env << 'EOF'
SERVICE_ENDPOINT=<your-service-endpoint>
CONSUMER_KEY=<your-consumer-key>
CONSUMER_SECRET=<your-consumer-secret>
DATAPOOL_ID=<your-datapool-id>
EOF

7.3 Create the Client Script

Create analyze_client.py:

python

import os
from dotenv import load_dotenv
from planqk.service.client import PlanqkServiceClient
from planqk.service.datapool import DataPoolReference

# Load environment variables
load_dotenv()

# Initialize the client
client = PlanqkServiceClient(
    service_endpoint=os.getenv("SERVICE_ENDPOINT"),
    consumer_key=os.getenv("CONSUMER_KEY"),
    consumer_secret=os.getenv("CONSUMER_SECRET")
)

def analyze_documents(files_to_analyze, min_word_length=3):
    """Run text analysis on documents in the Data Pool."""
    
    # Create Data Pool reference
    documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
    
    # Prepare request
    request_body = {
        "data": {
            "files_to_analyze": files_to_analyze,
            "min_word_length": min_word_length
        },
        "documents": documents_ref
    }
    
    print("Starting analysis...")
    
    # Execute the service
    execution = client.run(request=request_body)
    
    print(f"Execution started with ID: {execution.id}")
    print("Waiting for completion...")
    
    # Wait for completion
    execution.wait_for_final_state(timeout=300)
    
    if execution.status == "SUCCEEDED":
        result = execution.result()
        print("\n=== Analysis Results ===")
        print(f"Status: {execution.status}")
        print(f"Files processed: {result.total_files}")
        print(f"Total words found: {result.total_words}")
        print(f"Summary: {result.summary}")
        
        # Show top 10 most common words
        word_counts = result.word_counts
        top_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)[:10]
        print("\nTop 10 most common words:")
        for word, count in top_words:
            print(f"  {word}: {count}")
            
    else:
        print(f"Execution failed with status: {execution.status}")
        logs = execution.logs()
        print("Error logs:")
        for log in logs[-5:]:  # Show last 5 log entries
            print(f"  {log}")

if __name__ == "__main__":
    # Analyze our sample documents
    analyze_documents(
        files_to_analyze=["document1.txt", "document2.txt"],
        min_word_length=4
    )

7.4 Run the Client

bash

python analyze_client.py

You should see the text analysis results from your deployed service!

Step 8: Advanced Usage

8.1 Add More Documents

Upload additional documents to your Data Pool:

bash

cd ../text-analyzer-service

# Create a new document
cat > input/documents/document3.txt << 'EOF'
Cloud computing provides scalable infrastructure for modern applications.
Microservices architecture enables independent deployment and scaling.
Container orchestration platforms manage distributed systems efficiently.
EOF

# Upload to existing Data Pool
planqk datapool upload -f ./input/documents/document3.txt --datapool-id <your-datapool-id>

8.2 Analyze New Documents

Update your client to analyze the new document:

python

# In analyze_client.py, change the files list:
analyze_documents(
    files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
    min_word_length=5
)

8.3 Monitor Execution Progress

Add progress monitoring to your client:

python

def analyze_with_monitoring(files_to_analyze, min_word_length=3):
    """Run analysis with real-time status monitoring."""
    
    documents_ref = DataPoolReference(id=os.getenv("DATAPOOL_ID"))
    
    request_body = {
        "data": {
            "files_to_analyze": files_to_analyze,
            "min_word_length": min_word_length
        },
        "documents": documents_ref
    }
    
    execution = client.run(request=request_body)
    print(f"Started execution: {execution.id}")
    
    # Monitor progress
    while not execution.has_finished:
        print(f"Status: {execution.status}")
        import time
        time.sleep(2)  # Check every 2 seconds
    
    print(f"Final status: {execution.status}")
    
    if execution.status == "SUCCEEDED":
        return execution.result()
    else:
        print("Execution failed")
        return None

Then update the main block to use this function:

python

if __name__ == "__main__":
   # In analyze_client.py, change the files list:
   resutl = analyze_with_monitoring(
       files_to_analyze=["document1.txt", "document2.txt", "document3.txt"],
       min_word_length=5
   )
   
   print(result) if result else print("No results returned.")

And run it again:

bash

python analyze_client.py

You should see real-time status updates as your service processes the documents.

What You've Accomplished

🎉 Congratulations! You've successfully:

✅ Set up the CLI and authenticated
✅ Created sample data and uploaded it to a Data Pool
✅ Built a text analysis service that reads from Data Pools
✅ Tested your service locally with simulated Data Pools
✅ Deployed your service to the platform
✅ Created a Python client that consumes your service
✅ Learned how to monitor executions and handle results

Key Concepts Learned

Data Pools: Managed file collections that can be mounted into services
Local Testing: Simulating Data Pools with local directories
Service Parameters: How Data Pool parameters are injected into your service
SDK Integration: Using DataPoolReference to pass Data Pool IDs to services
Error Handling: Managing file not found errors and execution failures

Next Steps

Try uploading larger datasets (remember the 500 MB per file limit)
Experiment with different analysis algorithms
Build services that write results back to output Data Pools
Explore the workflow orchestration features for multi-step data processing

References

[CLI] CLI Reference | Docs

[DataPool] Using Data Pools in Services | Docs

[SDK] Service SDK Reference | Docs

Using Data Pools in Managed Services ​

What You'll Build ​

Prerequisites ​

Step 1: Set Up Your Development Environment ​

1.1 Install and Configure the CLI ​

1.2 Install uv Package Manager ​

1.3 Authenticate with the platform ​

1.4 Create Your Service Project ​

1.5 Set Up Python Environment ​

Step 2: Prepare Sample Data ​

2.1 Create Sample Text Files ​

2.2 Upload Data to a Data Pool ​

Step 3: Implement the Text Analysis Service ​

3.1 Update the Service Logic ​

3.2 Make Your Initial Commit to Track Your Changes [Optional] ​

Step 4: Test Locally ​

4.1 Set Up Local Test Environment ​

4.2 Update Local Test Runner ​

4.3 Run Local Test ​

Step 5: Deploy Your Service ​

5.1 Generate OpenAPI Specification ​

5.2 Deploy Your Service to the Platform ​

5.2.1 Deploy via CLI ​

5.2.2 Deploy via Web UI ​

Step 6: Test Your Deployed Service ​

6.1 Create a Request Body ​

6.2. Test the Execution Using the UI ​

Step 7: Build a Python Client ​

7.1 Set Up Client Environment ​

7.2 Configure Client Credentials ​

7.3 Create the Client Script ​

7.4 Run the Client ​

Step 8: Advanced Usage ​

8.1 Add More Documents ​

8.2 Analyze New Documents ​

8.3 Monitor Execution Progress ​

What You've Accomplished ​

Key Concepts Learned ​

Next Steps ​

References ​

Using Data Pools in Managed Services

What You'll Build

Prerequisites

Step 1: Set Up Your Development Environment

1.1 Install and Configure the CLI

1.2 Install uv Package Manager

1.3 Authenticate with the platform

1.4 Create Your Service Project

1.5 Set Up Python Environment

Step 2: Prepare Sample Data

2.1 Create Sample Text Files

2.2 Upload Data to a Data Pool

Step 3: Implement the Text Analysis Service

3.1 Update the Service Logic

3.2 Make Your Initial Commit to Track Your Changes [Optional]

Step 4: Test Locally

4.1 Set Up Local Test Environment

4.2 Update Local Test Runner

4.3 Run Local Test

Step 5: Deploy Your Service

5.1 Generate OpenAPI Specification

5.2 Deploy Your Service to the Platform

5.2.1 Deploy via CLI

5.2.2 Deploy via Web UI

Step 6: Test Your Deployed Service

6.1 Create a Request Body

6.2. Test the Execution Using the UI

Step 7: Build a Python Client

7.1 Set Up Client Environment

7.2 Configure Client Credentials

7.3 Create the Client Script

7.4 Run the Client

Step 8: Advanced Usage

8.1 Add More Documents

8.2 Analyze New Documents

8.3 Monitor Execution Progress

What You've Accomplished

Key Concepts Learned

Next Steps

References