Skip to main content

Import a Word or PDF

Turn an existing file into an editable Markdown controlled document. The import converts the file to Markdown, lets you review and edit the result, and only then commits it as a new content document.

Supported input formats are:

  • .docx - high-fidelity structured extraction.
  • .doc (legacy Word) - normalised to .docx automatically before conversion.
  • .pdf - extracted with AI.
  • .html and .txt - converted locally.

The original uploaded file is retained as provenance on the new version, and stays downloadable later from the Versions tab.

Who this is for

Administrators (Quality Admin, HR Admin, Corporate Admin). Importing is admin-only.

The two-step flow: convert, review, commit

Import is deliberately a two-step process so you always see and approve the converted text before anything is saved as a controlled document:

  1. Convert. You drop a file; Better Comply uploads it, converts it to Markdown, and extracts embedded images. This step produces a draft only. Nothing is saved as a controlled document yet, and no audit event is written.
  2. Review. You edit the converted Markdown side by side with a live preview, and fill in the title, category, and scope.
  3. Commit. When you save, the server writes the document, its first version, the Markdown body, and the extracted images, and records the creation in the audit trail.
Screenshot pendingImport review step with the original on the left, the Markdown editor in the middle, and the preview on the right

Steps

  1. Open the import page from Controlled Documents.
  2. Drag a .docx, .doc, .pdf, .html, or .txt file onto the drop zone, or click to choose a file. Files are limited to 50 MB. A legacy .doc is normalised to .docx automatically before conversion.
  3. Select Convert. A progress indicator shows the upload and conversion. You can cancel while it runs.
  4. In the review step:
    • Edit the converted Markdown in the editor. The conversion is a starting point, not a finished document - read it carefully against the original.
    • Set the title, category, and scope (and department if the scope is departmental).
    • Read any conversion warnings (see below).
  5. Select Save to commit, or Discard to throw the draft away and start over.

After a successful commit you are taken to the new document's detail page, in Draft at version 1.0. Move it through the lifecycle from there. See Lifecycle and approval.

Image extraction

Embedded images in the source file are extracted and stored alongside the document, and referenced from the converted Markdown. The image records are kept as append-only provenance: they cannot be edited or individually deleted, and they are removed only if the whole document is deleted.

Some images cannot be extracted automatically (for example certain page-level or vector graphics). When that happens you will see a conversion warning telling you which content was detected but not extracted, so you can add it back by hand.

Conversion warnings

The review step surfaces warnings when the conversion is imperfect, for example:

  • Images that were detected but could not be extracted.
  • An empty or truncated conversion result.

Treat warnings as a prompt to compare the Markdown against the original before committing. The original file stays available in the review step so you can check it.

Screenshot pendingImport review step showing a conversion warnings panel

Your work survives a refresh

The review step persists what you are editing. If you refresh the page or close the tab mid-edit, your converted Markdown and the form fields are restored when you come back, and the original-file preview is re-opened. The conversion itself is never re-run silently, so you do not lose your edits or pay for a second conversion.

Abandoned drafts (a file you uploaded but never committed) are cleaned up automatically by a scheduled job, not when you close the tab.

What happens behind the scenes

The commit step is the only part of the import that persists anything, and it is performed on the server.

The commit is server-side and audited

When you save, the backend writes the document, the version, the Markdown blob, and the image assets, then records a create_controlled_document audit event. If that audit record cannot be written, every write is reverted: the rows, the Markdown blob, and the staged images are all removed. An imported document is never left in the system without a matching audit entry. See the Compliance area.