Skip to content

awslabs/toolkit-md

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

76 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Toolkit for Markdown

npm version License

CLI tools for maintaining Markdown content like documentation and tutorials.

Screenshot

Features

  • Content Validation - Check Markdown files for lint issues, broken links, missing images, and spelling errors
  • AI-Powered Content Review - Automatically review and improve your Markdown content using Amazon Bedrock
  • Multi-Language Translation - Translate content between 8+ supported languages
  • Intelligent Q&A - Ask questions about your content and get AI-powered answers
  • Style Guide Enforcement - Maintain consistency with custom style guides
  • Rate Limiting - Built-in rate limiting for API calls
  • Context-Aware Processing - Smart content processing with configurable context strategies
  • Model Context Protocol Server - Integrate with tools like Cursor, Cline and Q Developer with the built-in MCP server

Installation

npm install -g @aws/toolkit-md
# or
yarn global add @aws/toolkit-md

Quick Start

Review Content

Analyze and improve all Markdown files in a directory using AI:

toolkit-md review ./docs

Write the changes directly back to the source files:

toolkit-md review --write ./docs

Provide some additional instructions:

toolkit-md review --write ./docs --instructions 'Add more detailed explanations to the introduction page'

Include images from Markdown files in the review (WARNING: can result in significantly higher token usage):

toolkit-md review --write ./docs --include-images

Review only changes from a git diff:

git diff main > changes.diff
toolkit-md review --diff-file changes.diff --summary-file review-summary.md ./docs

Customize the context window for diff-based review:

toolkit-md review --diff-file changes.diff --summary-file review-summary.md --diff-context 5 ./docs

Translate Content

Translate all Markdown content in a directory to French:

toolkit-md translate ./docs --to fr

Ask Questions

Ask questions about content across an entire documentation directory:

toolkit-md ask ./docs --question "What are the main topics covered?"

Map Content Structure

Print a tree map of all Markdown files in a directory:

toolkit-md map ./docs

Include image references in the tree output:

toolkit-md map ./docs --images

Check Content

Run non-AI validation checks on Markdown files including linting, broken link detection, missing image detection, and spell checking:

toolkit-md check ./docs

Skip external link validation for offline or faster checks:

toolkit-md check ./docs --skip-external-links

Ignore specific remark-lint rules:

toolkit-md check ./docs --ignore-rule maximum-line-length --ignore-rule no-html

Only report errors (skip warnings):

toolkit-md check ./docs --min-severity error

Run only specific check categories:

toolkit-md check ./docs --category lint --category link

Enable spell checking (disabled by default):

toolkit-md check ./docs --category lint --category link --category image --category spell

Ignore specific words during spell checking using a words file (one word per line):

toolkit-md check ./docs --ignore-words-file .spelling-ignore

Skip spell checking inside specific directives:

toolkit-md check ./docs --skip-directive video --skip-directive tabs

Configuration

Toolkit for Markdown supports configuration through:

  1. Command line arguments
  2. Environment variables
  3. .toolkit-mdrc file

Configuration Options:

Config Path CLI Flag Environment Variable Description Default
contentDir --content-dir TKMD_CONTENT_DIR Directory relative to the cwd where content is hosted undefined
language --language TKMD_LANGUAGE Source language code "en"
defaultLanguage --default-language TKMD_DEFAULT_LANGUAGE Language for files without explicit markers "en"
ai.model --model TKMD_AI_MODEL Amazon Bedrock model ID "global.anthropic.claude-sonnet-4-5-20250929-v1:0"
ai.maxTokens --max-tokens TKMD_AI_MAX_TOKENS Maximum output tokens 4096
ai.write --write TKMD_AI_WRITE Write changes directly to files false
ai.rate.requests --request-rate TKMD_AI_REQUEST_RATE_LIMIT Max requests per minute (0 = unlimited) 0
ai.rate.tokens --token-rate TKMD_AI_TOKEN_RATE_LIMIT Max tokens per minute (0 = unlimited) 0
ai.contextStrategy --context-strategy TKMD_AI_CONTEXT_STRATEGY Context inclusion: "siblings", "nothing", "everything" "nothing"
ai.exemplars --exemplar TKMD_AI_EXEMPLAR_* Path to directory of content to use as an example to follow, can be specified multiple times []
ai.styleGuides --style-guide TKMD_AI_STYLE_GUIDE_* Path to style guide file, can be specified multiple times []
ai.includeImages --include-images TKMD_AI_INCLUDE_IMAGES Include images from markdown files in AI review false
ai.maxImages --max-images TKMD_AI_MAX_IMAGES Maximum number of images to include per file 5
ai.maxImageSize --max-image-size TKMD_AI_MAX_IMAGE_SIZE Maximum image file size in bytes 3145728 (3MB)
ai.review.instructions --instructions TKMD_AI_REVIEW_INSTRUCTIONS Additional instructions for the model undefined
ai.review.summaryFile --summary-file TKMD_AI_REVIEW_SUMMARY_PATH Write a summary of the review changes to the provided file path in Markdown format ""
ai.review.diffFile --diff-file TKMD_AI_REVIEW_DIFF_FILE Path to unified diff file for filtering review suggestions undefined
ai.review.diffContext --diff-context TKMD_AI_REVIEW_DIFF_CONTEXT Number of context lines around changed lines to include (symmetric) 3
ai.review.runChecks --review-check TKMD_AI_REVIEW_CHECK Run content checks and include results in the review prompt true
ai.translation.force --force TKMD_AI_FORCE_TRANSLATION Force translation even if source unchanged false
ai.translation.check --check TKMD_AI_CHECK_TRANSLATION Only check if translation needed false
ai.translation.directory --translation-dir TKMD_AI_TRANSLATION_DIRECTORY Directory where translated content is stored, if not specified defaults to source directory undefined
ai.translation.skipFileSuffix --skip-file-suffix TKMD_AI_TRANSLATION_SKIP_FILE_SUFFIX Omit the language code suffix for translated files ('example.fr.md' becomes 'example.md') false
check.minSeverity --min-severity TKMD_CHECK_MIN_SEVERITY Minimum severity level to report (error, warning) "warning"
check.categories --category TKMD_CHECK_CATEGORY_* Check categories to run (lint, link, image, spell), can be specified multiple times ["lint", "link", "image"]
check.links.timeout --link-timeout TKMD_CHECK_LINK_TIMEOUT Timeout in milliseconds for HTTP link and image checks 5000
check.links.skipExternal --skip-external-links TKMD_CHECK_SKIP_EXTERNAL_LINKS Skip validation of external HTTP/HTTPS links and images false
check.links.ignorePatterns --ignore-link-pattern TKMD_CHECK_LINK_IGNORE_PATTERN_* Regex patterns for URLs to ignore during link checking, can be specified multiple times []
check.lint.ignoreRules --ignore-rule TKMD_CHECK_LINT_IGNORE_RULE_* remark-lint rule names to ignore (without the remark-lint- prefix), can be specified multiple times []
check.spell.ignoreWordsFile --ignore-words-file TKMD_CHECK_SPELL_IGNORE_WORDS_FILE Path to a file containing words to ignore during spell checking, one word per line undefined
check.spell.skipDirectives --skip-directive TKMD_CHECK_SPELL_SKIP_DIRECTIVE_* Directive names whose content should be skipped during spell checking, can be specified multiple times []
staticPrefix --static-prefix TKMD_STATIC_PREFIX URL prefix indicating a link points to a file in the static directory undefined
staticDir --static-dir TKMD_STATIC_DIR Directory relative to the cwd where static assets are stored, used with staticPrefix undefined

Note: For array values (exemplars, styleGuides, ignoreRules, ignorePatterns, skipDirectives), the environment variable referenced above is treated as a prefix: TKMD_AI_EXEMPLAR_FIRST, TKMD_AI_EXEMPLAR_SECOND, etc.

Configuration File Format

Create a .toolkit-mdrc file in JSON format:

{
  "baseDir": ".",
  "language": "en",
  "defaultLanguage": "en",
  "ai": {
    "model": "global.anthropic.claude-sonnet-4-5-20250929-v1:0",
    "maxTokens": 4096,
    "write": false,
    "rate": {
      "requests": 10,
      "tokens": 10000
    },
    "contextStrategy": "siblings",
    "exemplars": ["./examples/good-example1", "./examples/good-example2"],
    "styleGuides": ["./guides/style-guide.md", "./guides/aws-terminology.md"],
    "includeImages": true,
    "maxImages": 5,
    "maxImageSize": 3145728,
    "translation": {
      "force": false,
      "check": false
    },
    "review": {
      "runChecks": true
    }
  },
  "check": {
    "minSeverity": "warning",
    "categories": ["lint", "link", "image"],
    "links": {
      "timeout": 5000,
      "skipExternal": false,
      "ignorePatterns": ["^https://example\\.com/.*"]
    },
    "lint": {
      "ignoreRules": ["maximum-line-length"]
    },
    "spell": {
      "ignoreWordsFile": ".spelling-ignore",
      "skipDirectives": ["video", "tabs"]
    }
  },
  "staticPrefix": "/static/",
  "staticDir": "./static"
}

Changing the working directory

Adding a .toolkit-mdrc file to a project is a useful way to set default configuration values for that project. The most effective way to make sure that the correct configuration file is picked up is to run toolkit-md from the directory of the project itself.

An alternative approach is to use the --cwd flag to change what the tool considers the current directory:

toolkit-md review --cwd ~/projects/my-project ./docs

This will:

  1. Try to load a configuration file from ~/projects/my-project/.toolkit-mdrc
  2. Review the content in ~/projects/my-project/docs

Configuring a content directory

Its not unusual for Markdown content to be stored in a sub-directory of an overall project.

For example:

docs/.                      ← **Markdown documentation**
├── guide/
│   ├── getting-started.md
│   ├── installation.md
│   └── configuration.md
└── api/
    ├── authentication.md
    └── endpoints.md
src/                        ← **Other source code**

The --content-dir parameter can be used to information toolkit-md that a sub-directory is what it should consider the "root" for Markdown content.

toolkit-md review --content-dir ./docs ./guide

This will review the content in ./docs/guide.

Taking this approach has several benefits:

  1. It shortens the parameter used to target subsets of content for review
  2. It allows toolkit-md to generate more succinct, targeted directory and file structure listings that can be provided to LLMs. Instead of mapping the entire project repository, it can focus on the specific directory that contains Markdown. This saves tokens and reduces the context sent to the model.

Style Guides

Style guides are intended to help provide context on how content should ideally be written. This is usually expressed as natural language in Markdown format. The --style-guide parameter is used to provide a path to the style guide:

--style-guide ./style-guide.md

Multiple --style-guide parameters can be provided to use different style guide files.

--style-guide ./style-guide.md --style-guide ./other-style-guide.md

The paths can be either:

  1. Individual file: A single Markdown file
  2. Directory: A directory which will be recursively loaded using the same logic as the main content loader

Using with Translations

Style guides can be used to model language-specific content guidance that works effectively with the translation command. When using the translate command and a style guide directory is provided the style guides for both the source and target language will be loaded in to context.

For example you could create a directory called i18n:

docs/
[...]
i18n/
├── style-guide.fr.md
└── style-guide.es.md
style-guide.md

When running this command:

toolkit-md translate ./docs --to fr --style-guide style-guide.md --style-guide i18n

The style guides that are loaded would be:

  • style-guide.md: It is explicitly provided
  • style-guide.fr.md: It is discovered from the i18n directory based on the target language

Context Strategy

The contextStrategy setting controls how much surrounding content is included when processing files. This affects the AI's understanding of the document structure and relationships.

Strategy Options:

"nothing" (default)

Processes each file in isolation with no additional context:

docs/
├── guide/
│   ├── getting-started.md
│   ├── installation.md      ← **processing this file only**
│   └── configuration.md
└── api/
    ├── authentication.md
    └── endpoints.md

"siblings"

Includes the current file and its sibling files in the same directory:

docs/
├── guide/
│   ├── getting-started.md   ← included as context
│   ├── installation.md      ← **processing this file**
│   └── configuration.md     ← included as context
└── api/
    ├── authentication.md
    └── endpoints.md

"everything"

Includes all files in the entire directory tree as context:

docs/
├── guide/
│   ├── getting-started.md   ← included as context
│   ├── installation.md      ← **processing this file**
│   └── configuration.md     ← included as context
└── api/
    ├── authentication.md    ← included as context
    └── endpoints.md         ← included as context

Recommendation: Use "siblings" for most cases as it provides good context while keeping token usage reasonable. Use "everything" for small documentation sets where full context is valuable, and "nothing" for independent files or when minimizing token usage.

Image Processing

The review command can extract and include images referenced in markdown files for AI analysis. This enables comprehensive content reviews that consider both textual and visual elements.

Image Path Resolution:

  • Relative paths (e.g., ./images/diagram.png, ../assets/photo.jpg) are resolved from the markdown file's directory
  • Absolute paths (e.g., /images/diagram.png) are resolved against the staticDir directory. If staticPrefix is configured and the path starts with it, the prefix is stripped before resolution
  • Remote URLs (e.g., https://example.com/image.png) are excluded from processing

Image Limits:

  • Maximum images per file: configurable via ai.maxImages (default: 5)
  • Maximum image size: configurable via ai.maxImageSize (default: 3MB)
  • Supported formats: PNG, JPEG, GIF, WebP

Example:

toolkit-md review ./docs --include-images --max-images 10 --static-dir ./assets

Images that cannot be loaded (missing files, unsupported formats, or exceeding size limits) will generate warnings but won't stop the review process.

AWS Bedrock Setup

Toolkit for Markdown uses AWS Bedrock for AI processing. Ensure the following is available:

  • AWS credentials configured
  • Access to Bedrock models in the appropriate AWS account

Supported Models

The following Amazon Bedrock model IDs are supported:

  • anthropic.claude-opus-4-5
  • anthropic.claude-opus-4-1
  • anthropic.claude-opus-4
  • anthropic.claude-sonnet-4-5
  • anthropic.claude-haiku-4-5
  • anthropic.claude-sonnet-4
  • anthropic.claude-3-7-sonnet

Model IDs are matched using a prefix, so versioned variants (e.g. anthropic.claude-sonnet-4-5-20250929-v1:0) and cross-region inference profiles (e.g. global.anthropic.claude-sonnet-4-5-20250929-v1:0) are also supported.

Claude 4.6 models (Opus 4.6 and Sonnet 4.6) are not supported due to their removal of prefill support, which this tool relies on for multi-iteration generation.

Commands

review

Analyzes Markdown content using AI to identify areas for improvement including grammar, clarity, structure, and adherence to style guides. The AI reviews each file individually or with contextual awareness of related files, providing suggestions or directly applying changes. Supports processing entire directory trees of Markdown files while respecting language markers and file organization.

Diff-Based Review:

When a diff file is provided via --diff-file, the review command filters AI suggestions to only include changes that overlap with lines modified in the diff. This is useful for reviewing only recent changes in version-controlled content. The --diff-context option (default: 3) controls how many lines before and after each change are included in the review scope.

Requirements when using --diff-file:

  • Must provide --summary-file to capture review results
  • Cannot use --write flag (diff-based review is read-only for safety)
  • Diff output is not displayed to console (only written to summary file)

Example:

toolkit-md review ./docs --write --style-guide ./guides/style.md --context-strategy siblings

Diff-Based Review Example:

git diff main > changes.diff
toolkit-md review --diff-file changes.diff --summary-file review-summary.md --diff-context 5 ./docs

Options:

  • --write
  • --language
  • --default-language
  • --model
  • --max-tokens
  • --content-dir
  • --request-rate
  • --token-rate
  • --context-strategy
  • --exemplar
  • --style-guide
  • --summary-file
  • --instructions
  • --diff-file
  • --diff-context
  • --review-check
  • --include-images
  • --max-images
  • --max-image-size

translate

Translates Markdown content from one language to another while preserving formatting, frontmatter, and document structure. Includes intelligent change detection to avoid retranslating unchanged content and supports both checking for translation needs and forcing retranslation. Maintains translation metadata to track source content changes over time.

Example:

toolkit-md translate ./docs --to fr --write --exemplar ./examples/french-docs

Options:

  • --to (required)
  • --write
  • --language
  • --default-language
  • --model
  • --max-tokens
  • --content-dir
  • --request-rate
  • --token-rate
  • --context-strategy
  • --exemplar
  • --style-guide
  • --force
  • --check
  • --skip-file-suffix
  • --translation-dir

ask

Enables interactive querying of Markdown content using natural language questions. The AI analyzes the entire content tree to provide comprehensive answers about topics, structure, or specific information contained within the documentation. Useful for content discovery, summarization, and understanding complex documentation sets.

Example:

toolkit-md ask ./docs --question "What are the installation requirements and setup steps?"

Options:

  • --question (required)
  • --language
  • --default-language
  • --model
  • --max-tokens
  • --content-dir
  • --request-rate
  • --token-rate
  • --exemplar
  • --style-guide

map

Prints a hierarchical tree map of Markdown content showing the structure of files and directories with their titles. This command does not use AI and requires no AWS credentials. Optionally includes image references found in each file.

Example:

toolkit-md map ./docs

With image references:

toolkit-md map ./docs --images

Options:

  • --images
  • --language
  • --default-language
  • --content-dir
  • --cwd

check

Validates Markdown content without AI by running linting checks (via remark-lint), verifying that local link targets exist, and confirming that referenced images are present. Spell checking of prose content (via cspell) is also available but disabled by default and can be enabled by adding spell to the --category option. Remote links and images are validated with HTTP HEAD requests. This command requires no AWS credentials and is suitable for CI pipelines. Exits with code 1 if any errors are found.

Example:

toolkit-md check ./docs

Skip external link and image validation:

toolkit-md check ./docs --skip-external-links

Ignore specific remark-lint rules:

toolkit-md check ./docs --ignore-rule maximum-line-length --ignore-rule no-html

Options:

  • --link-timeout
  • --skip-external-links
  • --ignore-rule
  • --ignore-words-file
  • --skip-directive
  • --min-severity
  • --category
  • --static-prefix
  • --static-dir
  • --language
  • --default-language
  • --content-dir
  • --cwd

mcp

Starts an MCP server that exposes tool features to MCP clients. See below for further information.

Example:

toolkit-md mcp

Options:

  • --language
  • --default-language

Model Context Protocol server

The built-in MCP server allows tools like Cursor, Cline and Q Developer to integrate and leverage the same Markdown file discovery, parsing and style guides as running the CLI commands. This is useful when working in an IDE to write new content or review existing content using agentic AI tools.

Some example prompts:

"Review the markdown content in this project for issues and best practices"

"Translate the markdown content section on Amazon DynamoDB to French"

Here is an example of configuring the MCP server in Q Developer:

{
  "mcpServers": {
    "toolkit-md": {
      "command": "toolkit-md",
      "timeout": 10000,
      "args": ["mcp"]
    }
  }
}

Alternatively you can use npx:

{
  "mcpServers": {
    "toolkit-md": {
      "command": "npx",
      "timeout": 10000,
      "args": ["@aws/toolkit-md", "mcp"]
    }
  }
}

For safety purposes the MCP server will refuse to load files from outside the current working directory that the MCP is running in. You can override the current working directory using the --cwd flag to broaden the files accessible.

The following MCP tools are provided:

Tools Description
content_summary_information Response provides high level information for the model to locate Markdown content in a given project for orientation.
content_best_practices Response contains style guide and exemplar content as configured for the specified project. It the targetLanguage is provided it will also load style guides for that language and provide them in the response.
content_review_guidance Response contains guidance for the model to systematically review Markdown content for a given project for general issues and best practices.
content_translation_guidance Response contains guidance for the model to translate Markdown for a given project to another language. It helps the model locate both source content as well as existing translated content to use for context.
run_checks Runs lint, link, image, and spell checks on specified Markdown content files relative to the content directory. Supports filtering by severity and category.

Development

# Install dependencies
yarn install

# Build the project
yarn build

# Lint project
yarn biome check

# Run tests
yarn test

# Execute the tool
yarn start review ./docs

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

CLI tools for maintaining Markdown content like documentation and tutorials

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors

Languages