Enhance API, CLI, and web UI with new features and improvements by codefitz · Pull Request #1 · codefitz/TransDocs

codefitz · 2026-04-06T21:45:21Z

This pull request introduces a comprehensive continuous integration and testing setup for the TransDocs project, along with significant improvements to documentation and project structure. The main changes include adding a CI/CD workflow, a Dockerfile for testing, a detailed changelog, expanded and clarified README instructions, and test configuration files.

Key changes:

CI/CD and Testing Infrastructure

Added .github/workflows/ci.yml to automate testing, linting, Docker-based tests, and security scanning on push and pull requests to main and develop branches. The workflow runs tests across multiple Python versions, checks code formatting and types, builds and tests a Docker image, and scans for security vulnerabilities.
Added Dockerfile for building a test environment, installing dependencies, and running tests automatically when the container is executed.
Added pytest.ini to configure pytest discovery and output options.

Documentation and Project Structure

Created/updated CHANGELOG.md to document all notable changes, new features, fixes, and improvements across releases.
Significantly expanded README.md:
- Updated feature list, clarified requirements and installation steps, and provided a project structure overview.
- Documented both CLI and web app usage, new backend support, argument changes, and example commands.
- Added sections on testing, troubleshooting, and clarified licensing. [1] [2] [3] [4] [5] [6] [7] [8] [9]

Source and Test Initialization

Added src/__init__.py to define package exports and metadata, making the core functions importable as a package.
Added tests/__init__.py to mark the tests directory as a package.

…etter prompts, CPU support

…t examples

…guage match

…, --output, --target-lang, etc.)

… and --target-lang)

…tion

…optional API token, new argument names

…s and Dockerfile

…canning

…teration

…laceholders

…rify paragraph processing

…rements

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 80e723f9b8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-06T21:50:20Z

src/transdoc.py

+    if not args.model or not args.model.strip():
+        parser.error("Model name (-m/--model) is required")
+
+    api_url = build_chat_endpoint(args.api_url, args.backend)


Pass API base URL into processing path

main() converts the user-provided URL to a concrete endpoint before calling process_document, but call_chat_api() builds endpoint candidates again, causing malformed fallback URLs like .../v1/chat/completions/api/chat and .../api/chat/v1/chat/completions. In CLI usage this breaks the intended backend fallback (for example, -b openai_compatible -u http://localhost:11434 never tries http://localhost:11434/api/chat), so valid deployments fail when the first endpoint is unavailable.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-06T21:50:20Z

src/transdoc.py

+    if base_url.endswith(("/api/chat", "/api/generate")):
+        add(base_url)


Stop routing chat payloads to /api/generate

The endpoint builder accepts URLs ending in /api/generate as primary Ollama chat targets, but call_chat_api() always sends a chat-completions payload (messages) rather than a generate payload (prompt). This makes legacy /api/generate configurations fail consistently at runtime, turning a previously supported Ollama URL shape into translation errors.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR restructures TransDocs into a package under src/, expands document/back-end capabilities (DOCX+PDF, Ollama+OpenAI-compatible), and adds CI/testing + updated documentation.

Changes:

Added CI workflow, Docker-based test runner, pytest configuration, and initial unit tests.
Migrated/expanded core translation logic into src/transdoc.py (PDF support, proofreading mode, multi-backend endpoint handling).
Reworked Flask web app + new templates/upload.html with model querying and async translation jobs + progress polling.

Reviewed changes

Copilot reviewed 13 out of 15 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
`upload.html`	Removed legacy root HTML template (replaced by `templates/upload.html`).
`transdoc.py`	Removed legacy root script (replaced by `src/transdoc.py`).
`transdoc_app.py`	Major Flask web app rewrite: backend selection, model query endpoint, async jobs/progress, PDF support.
`templates/upload.html`	New modern UI with model query + progress polling.
`src/transdoc.py`	New core module: DOCX/PDF processing, translation/proofread modes, backend endpoint builders.
`src/__init__.py`	Package exports + metadata.
`tests/test_transdoc.py`	Added unit tests for language detection, API calls, paragraph processing, CLI parsing.
`tests/__init__.py`	Marks tests as a package.
`requirements.txt`	Added runtime + dev/test dependencies.
`README.md`	Expanded usage/docs for new CLI/web behaviors and PDF/backends.
`pytest.ini`	Pytest discovery/options.
`Dockerfile`	Containerized test environment.
`CHANGELOG.md`	Added release notes.
`.gitignore`	Simplified ignore list.
`.github/workflows/ci.yml`	Added CI pipeline for tests/lint/security scan.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-06T21:50:23Z

transdoc_app.py

+@app.route("/downloads/<filename>")
 def download_file(filename):
-    return send_file(os.path.join(app.config['OUTPUT_FOLDER'], filename), as_attachment=True)
+    return send_file(
+        os.path.join(app.config["OUTPUT_FOLDER"], filename), as_attachment=True
+    )


The download endpoint builds a filesystem path from the untrusted filename route parameter and passes it directly to send_file(), which allows path traversal (e.g., ../...) and unintended file disclosure. Use send_from_directory (or Werkzeug safe_join) and reject filenames that resolve outside OUTPUT_FOLDER (optionally also enforce secure_filename / a known job-id mapping).

Copilot · 2026-04-06T21:50:23Z

transdoc_app.py

-app = Flask(__name__)
-app.secret_key = 'your_secret_key'
+app = Flask(__name__, template_folder="templates")  # Templates in templates/ subfolder
+app.secret_key = "your_secret_key"


app.secret_key is hard-coded, which makes session cookies forgeable if this code is deployed/shared. Load the secret key from an environment variable (and fail fast if missing in non-debug) instead of committing a static value.

Suggested change

app.secret_key = "your_secret_key"

_is_debug_mode = (

os.getenv("FLASK_DEBUG", "").lower() in {"1", "true", "yes", "on"}

or os.getenv("FLASK_ENV", "").lower() == "development"

)

app.secret_key = os.getenv("FLASK_SECRET_KEY")

if not app.secret_key:

if _is_debug_mode:

app.secret_key = "dev-secret-key"

else:

raise RuntimeError(

"FLASK_SECRET_KEY environment variable must be set when debug mode is disabled"

)

Copilot · 2026-04-06T21:50:23Z

transdoc_app.py

-            output_filepath = os.path.join(app.config['OUTPUT_FOLDER'], output_filename)
+            input_filepath = os.path.join(app.config["UPLOAD_FOLDER"], filename)
+            input_base_name = os.path.splitext(filename)[0]
+            output_filename = f"translated_{input_base_name}.docx"


The synchronous / POST handler allows .pdf uploads but always forces a .docx output filename. This diverges from /start_translation (which returns .pdf for PDF input) and may confuse users / produce unexpected output types. Consider mirroring the async behavior: choose output extension based on input, and/or make the output format explicit in the UI and server-side.

Suggested change

output_filename = f"translated_{input_base_name}.docx"

input_extension = os.path.splitext(filename)[1].lower()

output_extension = input_extension if input_extension in {".docx", ".pdf"} else ".docx"

output_filename = f"translated_{input_base_name}{output_extension}"

Copilot · 2026-04-06T21:50:23Z

transdoc_app.py

+TRANSLATION_JOBS = {}
+TRANSLATION_JOBS_LOCK = threading.Lock()
+


TRANSLATION_JOBS is an unbounded in-memory dictionary with no TTL/cleanup. On a long-running server (or under repeated use) this can grow indefinitely and increase memory usage. Consider expiring completed/error jobs after some duration and/or limiting the number of retained job records.

Copilot · 2026-04-06T21:50:24Z

src/transdoc.py

+        log_color = self.COLORS.get(record.levelname, self.RESET)
+        record.levelname = f"{log_color}{record.levelname}{self.RESET}"
+        return super().format(record)


ColorFormatter.format() mutates record.levelname in-place. Because the same LogRecord is formatted by multiple handlers, this can leak ANSI escape codes into the file logs and/or double-colorize in some configurations. Preserve the original value (restore it after formatting) or add a separate record.levelname_colored attribute instead of overwriting levelname.

Suggested change

log_color = self.COLORS.get(record.levelname, self.RESET)

record.levelname = f"{log_color}{record.levelname}{self.RESET}"

return super().format(record)

original_levelname = record.levelname

log_color = self.COLORS.get(original_levelname, self.RESET)

record.levelname = f"{log_color}{original_levelname}{self.RESET}"

try:

return super().format(record)

finally:

record.levelname = original_levelname

Copilot · 2026-04-06T21:50:25Z

transdoc_app.py

+@app.route("/start_translation", methods=["POST"])
+def start_translation():
+    file = request.files.get("input_file")
+    target_lang = request.form.get("target_lang", "").strip()
+    src_lang = request.form.get("src_lang", "").strip() or None
+    api_token = request.form.get("api_token", "").strip() or None
+    model = request.form.get("model", "").strip()
+    backend = request.form.get("backend", "ollama").strip() or "ollama"
+    api_url = request.form.get("api_url", "http://localhost:11434").strip()
+
+    if backend not in {"ollama", "openai_compatible"}:
+        backend = "ollama"
+
+    if not model:
+        return jsonify({"success": False, "error": "Please select a valid model"}), 400
+    if not target_lang:
+        return jsonify({"success": False, "error": "Target language is required"}), 400
+    if not file or not allowed_file(file.filename):
+        return (
+            jsonify(
+                {"success": False, "error": "Please upload a valid .docx or .pdf file"}
+            ),
+            400,
+        )
+
+    filename = secure_filename(file.filename)
+    job_id = uuid.uuid4().hex
+    input_filename = f"{job_id}_{filename}"
+    input_base_name = os.path.splitext(filename)[0]
+    input_extension = os.path.splitext(filename)[1].lower()
+    output_extension = ".pdf" if input_extension == ".pdf" else ".docx"
+    output_filename = f"translated_{job_id}_{input_base_name}{output_extension}"
+    input_filepath = os.path.join(app.config["UPLOAD_FOLDER"], input_filename)
+    output_filepath = os.path.join(app.config["OUTPUT_FOLDER"], output_filename)
+    file.save(input_filepath)


There is no upload size limit configured, so a client can POST arbitrarily large files and exhaust disk/memory/worker time. Consider setting app.config['MAX_CONTENT_LENGTH'] and/or validating the uploaded file size before saving to disk.

Copilot · 2026-04-06T21:50:25Z

src/transdoc.py

+            src_lang = detect_source_language(doc)
+            if not src_lang:
+                logger.error("Source language detection failed. Exiting.")
+                return


When source-language detection fails, process_document() logs an error and returns without raising. For CLI usage this results in a successful exit code and can mislead users (no output produced). Consider raising an exception (or returning a status) so callers can fail fast and report an error.

Suggested change

return

raise RuntimeError("Source language detection failed.")

Copilot · 2026-04-06T21:50:26Z

src/__init__.py

+and proofreading capabilities.
+"""
+
+__version__ = "1.2.0"


__version__ is set to 1.2.0 but the changelog includes newer releases up to 1.4.0. Align package metadata with the current release version (or derive it from a single source of truth) to avoid confusing users and downstream tooling.

Suggested change

__version__ = "1.2.0"

__version__ = "1.4.0"

Copilot · 2026-04-06T21:50:26Z

README.md

 1. **Translate a Document with Automatic Source Language Detection**

   ```bash
-   python transdoc.py -i input.docx -o output.docx -t en -k your_api_token
+   python src/transdoc.py -i input.docx -o output.docx -t en
   ```

-   This command translates `input.docx` to English, saving the result as `output.docx`. The script will detect the source language automatically.
+   This command translates `input.docx` to English, saving the result as `output.docx`. The script will detect the source language automatically. No API token required for local Ollama without authentication.

 2. **Translate a Document with Specified Source Language**

   ```bash
-   python transdoc.py -i input.docx -o output.docx -t en -k your_api_token -s fr
+   python src/transdoc.py -i input.docx -o output.docx -t en -s fr
   ```



The first CLI examples omit the required -m/--model argument, but src/transdoc.py enforces it via parser.error("Model name ... is required"). Update the examples to include -m (or adjust the CLI to provide a safe default) so the README commands work as written.

Copilot · 2026-04-06T21:50:26Z

tests/test_transdoc.py

+    @patch("transdoc.Document")
+    def test_detect_source_language_de(self, mock_doc_class):
+        """Test source language detection for German text."""
+        # Mock document with German paragraphs - each paragraph needs runs attribute
+        mock_run1 = Mock()
+        mock_run1.text = "Dies ist ein Testdokument auf Deutsch."
+
+        mock_para1 = Mock()
+        mock_para1.runs = [mock_run1]
+
+        mock_run2 = Mock()
+        mock_run2.text = "Es enthält mehrere Sätze und Wörter."
+
+        mock_para2 = Mock()
+        mock_para2.runs = [mock_run2]
+
+        mock_doc = Mock()
+        mock_doc.paragraphs = [mock_para1, mock_para2]
+        mock_doc_class.return_value = mock_doc
+
+        from transdoc import detect_source_language
+
+        result = detect_source_language(mock_doc)
+        self.assertEqual(result, "de")


This test also relies on the real langdetect.detect() result for a short sample while using the default min_words=50, which can make outcomes flaky across environments. Patch transdoc.detect to return a deterministic value (or provide a longer sample / lower min_words) so CI is stable.

User and others added 30 commits March 21, 2026 13:11

Update Ollama API config and translation prompt

b96f04f

Add upload.html template for web interface

3813f86

Add CHANGELOG and finalize improvements: auto-append /api/generate, b…

75b4e81

…etter prompts, CPU support

Update README: document --api_url auto-append, CPU support, quick tes…

c3c8aa8

…t examples

Add automatic proofreading mode when source==target language

f9870a2

Add explicit --proofread flag to force proofreading regardless of lan…

3b09089

…guage match

Standardize CLI argument names: use hyphens for long options (--input…

0d1cb54

…, --output, --target-lang, etc.)

Simplify argument names: --source and --target (instead of --src-lang…

d07b585

… and --target-lang)

Rename --api-url to -u/--url for brevity

03a2b3f

Make API token optional for local Ollama instances without authentica…

c2115f1

…tion

Update README and CHANGELOG with latest features: proofreading mode, …

548b96e

…optional API token, new argument names

Add LICENSE reference link in README

eaa62f5

Add colorful terminal output with ANSI color codes for log levels

5446a7c

Refactor architecture: move to src/ structure, add comprehensive test…

373ca24

…s and Dockerfile

Add GitHub Actions CI/CD pipeline with tests, linting, and security s…

0919547

…canning

Fix CI/CD paths: remove unnecessary cd TransDocs commands

5e5e5cc

Apply Black code formatting to all Python files

761f995

Fix unit tests: properly mock Document runs attribute for paragraph i…

0548d34

…teration

Apply Black formatting to test file

fa6a81f

Fix flake8 issues: remove unused 're' import, fix f-strings without p…

c3ddf4f

…laceholders

Fix remaining test failures: mock detect exception for short text, ve…

46b20f1

…rify paragraph processing

Fix LangDetectException constructor args, add types-requests to requi…

fa9e4fc

…rements

fix tests

f629cde

fix requirements

8804bf1

fix requirements

c7b635f

update README.md

83181db

refactor

4de24ea

fix tests

0e678f9

fix errors

b8a1625

Fix model submission flow and simplify Ollama connected message

9e322a2

Sasa Kocic added 9 commits March 22, 2026 01:41

Redesign upload page layout and styling

eac22f0

Polish file upload control styling

2ebd76e

Update README and changelog for web UI and language code fixes

8440b24

Set INFO as default log level and add verbose CLI switch

e827e06

Add OpenAI-compatible backend support and auth-aware model query

ef3f928

Add async translation progress tracking in web UI

3c0af5d

Format transdoc.py with black

ceb9fae

Added pdf

4a598a1

added providers

80e723f

codefitz requested a review from Copilot April 6, 2026 21:45

codefitz assigned Copilot Apr 6, 2026

Copilot started reviewing on behalf of codefitz April 6, 2026 21:46 View session

chatgpt-codex-connector bot reviewed Apr 6, 2026

View reviewed changes

Copilot AI reviewed Apr 6, 2026

View reviewed changes

		if base_url.endswith(("/api/chat", "/api/generate")):
		add(base_url)

-app.secret_key = "your_secret_key"
+_is_debug_mode = (
+    os.getenv("FLASK_DEBUG", "").lower() in {"1", "true", "yes", "on"}
+    or os.getenv("FLASK_ENV", "").lower() == "development"
+)
+app.secret_key = os.getenv("FLASK_SECRET_KEY")
+if not app.secret_key:
+    if _is_debug_mode:
+        app.secret_key = "dev-secret-key"
+    else:
+        raise RuntimeError(
+            "FLASK_SECRET_KEY environment variable must be set when debug mode is disabled"
+        )

		TRANSLATION_JOBS = {}
		TRANSLATION_JOBS_LOCK = threading.Lock()

-        log_color = self.COLORS.get(record.levelname, self.RESET)
-        record.levelname = f"{log_color}{record.levelname}{self.RESET}"
-        return super().format(record)
+        original_levelname = record.levelname
+        log_color = self.COLORS.get(original_levelname, self.RESET)
+        record.levelname = f"{log_color}{original_levelname}{self.RESET}"
+        try:
+            return super().format(record)
+        finally:
+            record.levelname = original_levelname

	return
	raise RuntimeError("Source language detection failed.")

Conversation

codefitz commented Apr 6, 2026

CI/CD and Testing Infrastructure

Documentation and Project Structure

Source and Test Initialization

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants