Skip to content

Enhance API, CLI, and web UI with new features and improvements#1

Open
codefitz wants to merge 39 commits intocodefitz:mainfrom
sasakocic:main
Open

Enhance API, CLI, and web UI with new features and improvements#1
codefitz wants to merge 39 commits intocodefitz:mainfrom
sasakocic:main

Conversation

@codefitz
Copy link
Copy Markdown
Owner

@codefitz codefitz commented Apr 6, 2026

This pull request introduces a comprehensive continuous integration and testing setup for the TransDocs project, along with significant improvements to documentation and project structure. The main changes include adding a CI/CD workflow, a Dockerfile for testing, a detailed changelog, expanded and clarified README instructions, and test configuration files.

Key changes:

CI/CD and Testing Infrastructure

  • Added .github/workflows/ci.yml to automate testing, linting, Docker-based tests, and security scanning on push and pull requests to main and develop branches. The workflow runs tests across multiple Python versions, checks code formatting and types, builds and tests a Docker image, and scans for security vulnerabilities.
  • Added Dockerfile for building a test environment, installing dependencies, and running tests automatically when the container is executed.
  • Added pytest.ini to configure pytest discovery and output options.

Documentation and Project Structure

  • Created/updated CHANGELOG.md to document all notable changes, new features, fixes, and improvements across releases.
  • Significantly expanded README.md:
    • Updated feature list, clarified requirements and installation steps, and provided a project structure overview.
    • Documented both CLI and web app usage, new backend support, argument changes, and example commands.
    • Added sections on testing, troubleshooting, and clarified licensing. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Source and Test Initialization

  • Added src/__init__.py to define package exports and metadata, making the core functions importable as a package.
  • Added tests/__init__.py to mark the tests directory as a package.

User and others added 30 commits March 21, 2026 13:11
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80e723f9b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if not args.model or not args.model.strip():
parser.error("Model name (-m/--model) is required")

api_url = build_chat_endpoint(args.api_url, args.backend)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Pass API base URL into processing path

main() converts the user-provided URL to a concrete endpoint before calling process_document, but call_chat_api() builds endpoint candidates again, causing malformed fallback URLs like .../v1/chat/completions/api/chat and .../api/chat/v1/chat/completions. In CLI usage this breaks the intended backend fallback (for example, -b openai_compatible -u http://localhost:11434 never tries http://localhost:11434/api/chat), so valid deployments fail when the first endpoint is unavailable.

Useful? React with 👍 / 👎.

Comment on lines +326 to +327
if base_url.endswith(("/api/chat", "/api/generate")):
add(base_url)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Stop routing chat payloads to /api/generate

The endpoint builder accepts URLs ending in /api/generate as primary Ollama chat targets, but call_chat_api() always sends a chat-completions payload (messages) rather than a generate payload (prompt). This makes legacy /api/generate configurations fail consistently at runtime, turning a previously supported Ollama URL shape into translation errors.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures TransDocs into a package under src/, expands document/back-end capabilities (DOCX+PDF, Ollama+OpenAI-compatible), and adds CI/testing + updated documentation.

Changes:

  • Added CI workflow, Docker-based test runner, pytest configuration, and initial unit tests.
  • Migrated/expanded core translation logic into src/transdoc.py (PDF support, proofreading mode, multi-backend endpoint handling).
  • Reworked Flask web app + new templates/upload.html with model querying and async translation jobs + progress polling.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
upload.html Removed legacy root HTML template (replaced by templates/upload.html).
transdoc.py Removed legacy root script (replaced by src/transdoc.py).
transdoc_app.py Major Flask web app rewrite: backend selection, model query endpoint, async jobs/progress, PDF support.
templates/upload.html New modern UI with model query + progress polling.
src/transdoc.py New core module: DOCX/PDF processing, translation/proofread modes, backend endpoint builders.
src/__init__.py Package exports + metadata.
tests/test_transdoc.py Added unit tests for language detection, API calls, paragraph processing, CLI parsing.
tests/__init__.py Marks tests as a package.
requirements.txt Added runtime + dev/test dependencies.
README.md Expanded usage/docs for new CLI/web behaviors and PDF/backends.
pytest.ini Pytest discovery/options.
Dockerfile Containerized test environment.
CHANGELOG.md Added release notes.
.gitignore Simplified ignore list.
.github/workflows/ci.yml Added CI pipeline for tests/lint/security scan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +435 to +439
@app.route("/downloads/<filename>")
def download_file(filename):
return send_file(os.path.join(app.config['OUTPUT_FOLDER'], filename), as_attachment=True)
return send_file(
os.path.join(app.config["OUTPUT_FOLDER"], filename), as_attachment=True
)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download endpoint builds a filesystem path from the untrusted filename route parameter and passes it directly to send_file(), which allows path traversal (e.g., ../...) and unintended file disclosure. Use send_from_directory (or Werkzeug safe_join) and reject filenames that resolve outside OUTPUT_FOLDER (optionally also enforce secure_filename / a known job-id mapping).

Copilot uses AI. Check for mistakes.
app = Flask(__name__)
app.secret_key = 'your_secret_key'
app = Flask(__name__, template_folder="templates") # Templates in templates/ subfolder
app.secret_key = "your_secret_key"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

app.secret_key is hard-coded, which makes session cookies forgeable if this code is deployed/shared. Load the secret key from an environment variable (and fail fast if missing in non-debug) instead of committing a static value.

Suggested change
app.secret_key = "your_secret_key"
_is_debug_mode = (
os.getenv("FLASK_DEBUG", "").lower() in {"1", "true", "yes", "on"}
or os.getenv("FLASK_ENV", "").lower() == "development"
)
app.secret_key = os.getenv("FLASK_SECRET_KEY")
if not app.secret_key:
if _is_debug_mode:
app.secret_key = "dev-secret-key"
else:
raise RuntimeError(
"FLASK_SECRET_KEY environment variable must be set when debug mode is disabled"
)

Copilot uses AI. Check for mistakes.
output_filepath = os.path.join(app.config['OUTPUT_FOLDER'], output_filename)
input_filepath = os.path.join(app.config["UPLOAD_FOLDER"], filename)
input_base_name = os.path.splitext(filename)[0]
output_filename = f"translated_{input_base_name}.docx"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The synchronous / POST handler allows .pdf uploads but always forces a .docx output filename. This diverges from /start_translation (which returns .pdf for PDF input) and may confuse users / produce unexpected output types. Consider mirroring the async behavior: choose output extension based on input, and/or make the output format explicit in the UI and server-side.

Suggested change
output_filename = f"translated_{input_base_name}.docx"
input_extension = os.path.splitext(filename)[1].lower()
output_extension = input_extension if input_extension in {".docx", ".pdf"} else ".docx"
output_filename = f"translated_{input_base_name}{output_extension}"

Copilot uses AI. Check for mistakes.
Comment on lines +30 to +32
TRANSLATION_JOBS = {}
TRANSLATION_JOBS_LOCK = threading.Lock()

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TRANSLATION_JOBS is an unbounded in-memory dictionary with no TTL/cleanup. On a long-running server (or under repeated use) this can grow indefinitely and increase memory usage. Consider expiring completed/error jobs after some duration and/or limiting the number of retained job records.

Copilot uses AI. Check for mistakes.
Comment on lines +33 to +35
log_color = self.COLORS.get(record.levelname, self.RESET)
record.levelname = f"{log_color}{record.levelname}{self.RESET}"
return super().format(record)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColorFormatter.format() mutates record.levelname in-place. Because the same LogRecord is formatted by multiple handlers, this can leak ANSI escape codes into the file logs and/or double-colorize in some configurations. Preserve the original value (restore it after formatting) or add a separate record.levelname_colored attribute instead of overwriting levelname.

Suggested change
log_color = self.COLORS.get(record.levelname, self.RESET)
record.levelname = f"{log_color}{record.levelname}{self.RESET}"
return super().format(record)
original_levelname = record.levelname
log_color = self.COLORS.get(original_levelname, self.RESET)
record.levelname = f"{log_color}{original_levelname}{self.RESET}"
try:
return super().format(record)
finally:
record.levelname = original_levelname

Copilot uses AI. Check for mistakes.
Comment on lines +212 to +246
@app.route("/start_translation", methods=["POST"])
def start_translation():
file = request.files.get("input_file")
target_lang = request.form.get("target_lang", "").strip()
src_lang = request.form.get("src_lang", "").strip() or None
api_token = request.form.get("api_token", "").strip() or None
model = request.form.get("model", "").strip()
backend = request.form.get("backend", "ollama").strip() or "ollama"
api_url = request.form.get("api_url", "http://localhost:11434").strip()

if backend not in {"ollama", "openai_compatible"}:
backend = "ollama"

if not model:
return jsonify({"success": False, "error": "Please select a valid model"}), 400
if not target_lang:
return jsonify({"success": False, "error": "Target language is required"}), 400
if not file or not allowed_file(file.filename):
return (
jsonify(
{"success": False, "error": "Please upload a valid .docx or .pdf file"}
),
400,
)

filename = secure_filename(file.filename)
job_id = uuid.uuid4().hex
input_filename = f"{job_id}_{filename}"
input_base_name = os.path.splitext(filename)[0]
input_extension = os.path.splitext(filename)[1].lower()
output_extension = ".pdf" if input_extension == ".pdf" else ".docx"
output_filename = f"translated_{job_id}_{input_base_name}{output_extension}"
input_filepath = os.path.join(app.config["UPLOAD_FOLDER"], input_filename)
output_filepath = os.path.join(app.config["OUTPUT_FOLDER"], output_filename)
file.save(input_filepath)
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no upload size limit configured, so a client can POST arbitrarily large files and exhaust disk/memory/worker time. Consider setting app.config['MAX_CONTENT_LENGTH'] and/or validating the uploaded file size before saving to disk.

Copilot uses AI. Check for mistakes.
src_lang = detect_source_language(doc)
if not src_lang:
logger.error("Source language detection failed. Exiting.")
return
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When source-language detection fails, process_document() logs an error and returns without raising. For CLI usage this results in a successful exit code and can mislead users (no output produced). Consider raising an exception (or returning a status) so callers can fail fast and report an error.

Suggested change
return
raise RuntimeError("Source language detection failed.")

Copilot uses AI. Check for mistakes.
and proofreading capabilities.
"""

__version__ = "1.2.0"
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__version__ is set to 1.2.0 but the changelog includes newer releases up to 1.4.0. Align package metadata with the current release version (or derive it from a single source of truth) to avoid confusing users and downstream tooling.

Suggested change
__version__ = "1.2.0"
__version__ = "1.4.0"

Copilot uses AI. Check for mistakes.
Comment on lines 127 to 140
1. **Translate a Document with Automatic Source Language Detection**

```bash
python transdoc.py -i input.docx -o output.docx -t en -k your_api_token
python src/transdoc.py -i input.docx -o output.docx -t en
```

This command translates `input.docx` to English, saving the result as `output.docx`. The script will detect the source language automatically.
This command translates `input.docx` to English, saving the result as `output.docx`. The script will detect the source language automatically. No API token required for local Ollama without authentication.

2. **Translate a Document with Specified Source Language**

```bash
python transdoc.py -i input.docx -o output.docx -t en -k your_api_token -s fr
python src/transdoc.py -i input.docx -o output.docx -t en -s fr
```

Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first CLI examples omit the required -m/--model argument, but src/transdoc.py enforces it via parser.error("Model name ... is required"). Update the examples to include -m (or adjust the CLI to provide a safe default) so the README commands work as written.

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +70
@patch("transdoc.Document")
def test_detect_source_language_de(self, mock_doc_class):
"""Test source language detection for German text."""
# Mock document with German paragraphs - each paragraph needs runs attribute
mock_run1 = Mock()
mock_run1.text = "Dies ist ein Testdokument auf Deutsch."

mock_para1 = Mock()
mock_para1.runs = [mock_run1]

mock_run2 = Mock()
mock_run2.text = "Es enthält mehrere Sätze und Wörter."

mock_para2 = Mock()
mock_para2.runs = [mock_run2]

mock_doc = Mock()
mock_doc.paragraphs = [mock_para1, mock_para2]
mock_doc_class.return_value = mock_doc

from transdoc import detect_source_language

result = detect_source_language(mock_doc)
self.assertEqual(result, "de")
Copy link

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test also relies on the real langdetect.detect() result for a short sample while using the default min_words=50, which can make outcomes flaky across environments. Patch transdoc.detect to return a deterministic value (or provide a longer sample / lower min_words) so CI is stable.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants