019b6e48c913d6dbbc3028b8c3f519f8574a31f3
toak
it's no joke
Overview
toak
is an intentionally simple yet powerful tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts using the Llama 3 tokenizer.
$ cd your-git-repo
$ npx toak
Philosophy
- Human-first technologies for a better future.
- If you don't like the name...good.
Features
Data Processing
- Reads tracked files from git repository
- Removes comments, imports, and unnecessary whitespace
- Redacts sensitive information (API keys, tokens, JWT, hashes)
- Counts tokens using llama3-tokenizer-js
- Supports nested .toak-ignore files
Token Cleaning
- Removes single-line and multi-line comments
- Strips console.log statements
- Removes import statements
- Cleans up whitespace and empty lines
Security Features
- Redacts API keys and secrets
- Masks JWT tokens
- Hides authorization tokens
- Redacts Base64 encoded strings
- Masks cryptographic hashes
Requirements
- Node.js (>=14.0.0)
- Git repository
- Bun runtime (for development)
Installation
npm install toak
Usage
CLI
npx toak
Programmatic Usage
import { MarkdownGenerator } from 'toak';
const generator = new MarkdownGenerator({
dir: './project',
outputFilePath: './output.md',
verbose: true
});
const result = await generator.createMarkdownDocument();
Configuration
MarkdownGenerator Options
interface MarkdownGeneratorOptions {
dir?: string; // Project directory (default: '.')
outputFilePath?: string; // Output file path (default: './prompt.md')
fileTypeExclusions?: Set<string>;// File types to exclude
fileExclusions?: string[]; // File patterns to exclude
customPatterns?: Record<string, any>; // Custom cleaning patterns
customSecretPatterns?: Record<string, any>;// Custom redaction patterns
verbose?: boolean; // Enable verbose logging (default: true)
}
Ignore File Configuration
Create a .toak-ignore
file in any directory to specify exclusions. The tool supports nested ignore files that affect their directory and subdirectories.
Example .toak-ignore
:
# Ignore specific files
secrets.json
config.private.ts
# Ignore directories
build/
temp/
# Glob patterns
**/*.test.ts
**/._*
Default Exclusions
The tool automatically excludes common file types and patterns:
File Types:
- Images: .jpg, .jpeg, .png, .gif, .bmp, .svg, .webp, etc.
- Fonts: .ttf, .woff, .woff2, .eot, .otf
- Binaries: .exe, .dll, .so, .dylib, .bin
- Archives: .zip, .tar, .gz, .rar, .7z
- Media: .mp3, .mp4, .avi, .mov, .wav
- Data: .db, .sqlite, .sqlite3
- Config: .lock, .yaml, .yml, .toml, .conf
File Patterns:
- Configuration files: .*rc, tsconfig.json, package-lock.json
- Version control: .git*, .hg*, .svn*
- Environment files: .env*
- Build outputs: build/, dist/, out/
- Dependencies: node_modules/
- Documentation: docs/, README*, CHANGELOG*
- IDE settings: .idea/, .vscode/
- Test files: test/, spec/, tests/
Development
This project uses Bun for development. To contribute:
Setup
git clone <repository>
cd toak
bun install
Scripts
# Build the project
bun run build
# Run tests
bun test
# Lint code
bun run lint
# Fix linting issues
bun run lint:fix
# Format code
bun run format
# Fix all (format + lint)
bun run fix
# Development mode
bun run dev
# Publish development version
bun run deploy:dev
Project Structure
src/
├── index.ts # Main exports
├── TokenCleaner.ts # Code cleaning and redaction
├── MarkdownGenerator.ts # Markdown generation logic
├── cli.ts # CLI implementation
├── fileExclusions.ts # File exclusion patterns
└── fileTypeExclusions.ts # File type exclusions
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
Guidelines
- Write TypeScript code following the project's style
- Include appropriate error handling
- Add documentation for new features
- Include tests for new functionality
- Update the README for significant changes
Note
This tool requires a git repository to function properly as it uses git ls-files
to identify tracked files.
License
GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007 © 2024 Geoff Seemueller
Languages
TypeScript
90.5%
JavaScript
8%
CSS
1.5%