Process file line-by-line with script

Process files line-by-line using zo to transform or extract data from each line.

Notes on zo

zo is a command-line program that gives Zo access to Zo itself, with all the same prompts, tools, and power. It's located at /usr/local/bin/zo.

➜  ~ zo -h
usage: zo [-h] [--output-format OUTPUT_FORMAT] input

Call Zo AI with optional structured output

positional arguments:
  input                 Input text to send to Zo

options:
  -h, --help            show this help message and exit
  --output-format OUTPUT_FORMAT
                        JSON schema for structured output (optional)

Examples:
  zo "hello"
  zo "what's 2+2?" --output-format '{"type": "object", "properties": {"answer": {"type": "number"}}, "required": ["answer"]}'

output_format uses OpenAI's structured output JSON schema format.
Keep schemas flat (max 1-2 levels) for best results.

Protocol

Step 1: Ask clarifying questions

Do NOT assume. Ask the user:

What should be extracted/transformed from each line?
What format should the output be?
What happens if a line fails - halt everything or continue?

Step 2: Create the processing script

Create a script in the conversation workspace.

Requirements:

Flat schemas: Keep output_format max 1-2 levels deep
Never swallow errors: Use check=True, no try/except hiding failures
Sequential processing: Each line waits for the previous (70s timeout per line)
Include context necessary to the task in every call: Each zo call is independent
Incremental writes: Flush after each line so partial results are saved if script crashes

Use this template and customize the process_line function based on user requirements:

#!/usr/bin/env python3
import subprocess
import json
import sys
from pathlib import Path
from typing import List, Dict, Any

def parse_file_lines(filepath: Path) -> List[str]:
    """Split file into lines, keeping newlines. Must satisfy: ''.join(lines) == original."""
    with open(filepath, 'r') as f:
        content = f.read()
    lines = content.splitlines(keepends=True)
    return lines

def call_zo(prompt: str, output_format: Dict[str, Any] = None) -> Any:
    """Call zo with the given prompt. Errors bubble up for visibility."""
    cmd = ["zo", prompt]
    if output_format:
        cmd.extend(["--output-format", json.dumps(output_format)])

    result = subprocess.run(
        cmd, capture_output=True, text=True, check=True, timeout=70
    )
    response = json.loads(result.stdout)
    return response["output"]

def process_line(line: str, instruction: str) -> str:
    """Process a single line. Customize based on user requirements."""
    # Keep output_format flat (max 1-2 levels)
    output_format = {
        "type": "object",
        "properties": {
            "result": {"type": "string", "description": "the processed result"}
        },
        "required": ["result"]
    }

    # Include all context in each call (calls are independent)
    prompt = f"{instruction}\n\nInput: {line.strip()}"
    output = call_zo(prompt, output_format)
    return output["result"]

def process_file(input_path: Path, output_path: Path, instruction: str) -> None:
    """Process each line sequentially, writing results incrementally."""
    lines = parse_file_lines(input_path)

    if not lines:
        print("No lines to process")
        return

    with open(output_path, 'w') as f:
        for i, line in enumerate(lines, 1):
            print(f"Processing line {i}/{len(lines)}...", file=sys.stderr)
            result = process_line(line, instruction)
            f.write(result + '\n')
            f.flush()  # Write immediately for crash recovery

    print(f"\nComplete! Output written to {output_path}")

def main():
    if len(sys.argv) < 3:
        print("Usage: ./script.py <input_file> <output_file> [instruction]")
        sys.exit(1)

    input_path = Path(sys.argv[1])
    output_path = Path(sys.argv[2])
    instruction = sys.argv[3] if len(sys.argv) > 3 else "process this line"

    if not input_path.exists():
        print(f"Error: Input file {input_path} does not exist", file=sys.stderr)
        sys.exit(1)

    process_file(input_path, output_path, instruction)

if __name__ == "__main__":
    main()

Step 3: Test with the first line and get confirmation

Before running the batch script, test by calling zo directly on the first line:

Read the first line from the file
Construct a test prompt matching what the script will use: {instruction}\n\nInput: {first_line_content}
Call zo via Bash with the same output format the script will use
Show the user the test result
Ask the user: "Does this result look correct? Should I proceed with processing all lines?"
Wait for user confirmation before continuing

If the user is not satisfied, ask what needs to be adjusted and refine the approach. You may need to revise the instruction, output format, or processing logic.

Example test command:

zo "Extract the email address.\n\nInput: John Doe <john@example.com>" \
  --output-format '{"type": "object", "properties": {"result": {"type": "string"}}, "required": ["result"]}'

Step 4: Create the output file

You must use Create file to create an empty output file. This allows the user to open it in their editor and watch results stream in. You must never use touch to create the output file.

Step 5: Run the full batch

Execute the script via Bash to process all lines:

python3 process_script.py input.txt output.txt "instruction"

The script writes results incrementally to the output file.

Output

Inform the user when processing is complete. Show:

Total lines processed
Location of output file
Any errors encountered

Process file line-by-line with script

Creator

Categories

Steps