Go to file

Geoff Seemueller 4a0c75f3a2 Update tsconfig.json and bump package version

Removed stray character from tsconfig.json. Bumped package version from 1.0.9 to 1.0.10 to reflect minor changes.

2024-11-21 13:25:07 -05:00

.github/workflows

code formatting + documentation for MarkdownGenerator

2024-11-07 11:43:00 -05:00

src

convert project to typescript

2024-11-21 13:23:45 -05:00

.gitignore

add todo file feature

2024-11-07 11:20:52 -05:00

.prettierignore

add eslint

2024-11-07 11:37:56 -05:00

.prettierrc

add eslint

2024-11-07 11:37:56 -05:00

build.ts

convert project to typescript

2024-11-21 13:23:45 -05:00

bun.lockb

convert project to typescript

2024-11-21 13:23:45 -05:00

eslint.config.js

code formatting + documentation for MarkdownGenerator

2024-11-07 11:43:00 -05:00

package.json

Update tsconfig.json and bump package version

2024-11-21 13:25:07 -05:00

pnpm-lock.yaml

Integrate file exclusion with micromatch, refactor code

2024-11-21 13:02:22 -05:00

README.md

code formatting + documentation for MarkdownGenerator

2024-11-07 11:43:00 -05:00

tsconfig.json

Update tsconfig.json and bump package version

2024-11-21 13:25:07 -05:00

README.md

code-tokenizer-md

Created to push creative limits.

Process git repository files into markdown with token counting and sensitive data redaction.

Overview

code-tokenizer-md is a Node.js tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts.

graph TD
   Start[Start] -->|Read| Git[Git Files]
   Git -->|Clean| TC[TokenCleaner]
   TC -->|Redact| Clean[Clean Code]
   Clean -->|Generate| MD[Markdown]
   MD -->|Count| Results[Token Counts]
   style Start fill:#000000,stroke:#FFFFFF,stroke-width:4px,color:#ffffff
   style Git fill:#222222,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style TC fill:#333333,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style Clean fill:#444444,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style MD fill:#555555,stroke:#FFFFFF,stroke-width:2px,color:#ffffff
   style Results fill:#666666,stroke:#FFFFFF,stroke-width:2px,color:#ffffff

Features

Data Processing

Reads files from git repository
Removes comments and unnecessary whitespace
Redacts sensitive information (API keys, tokens, etc.)
Counts tokens using llama3-tokenizer

Analysis Types

Token counting per file
Total token usage
File content analysis
Sensitive data detection

Data Presentation

Markdown formatted output
Code block formatting
Token count summaries
File organization hierarchy

Requirements

Node.js (>=14.0.0)
Git repository
npm or npx

Installation

npm install -g code-tokenizer-md

Usage

Quick Start

npx code-tokenizer-md

Programmatic Usage

import { MarkdownGenerator } from 'code-tokenizer-md';

const generator = new MarkdownGenerator({
  dir: './project',
  outputFilePath: './output.md',
});

const result = await generator.createMarkdownDocument();

Project Structure

src/
├── index.js              # Main exports
├── TokenCleaner.js       # Code cleaning and redaction
├── MarkdownGenerator.js  # Markdown generation logic
└── cli.js               # CLI implementation

Dependencies

{
  "dependencies": {
    "llama3-tokenizer-js": "^1.0.0"
  },
  "peerDependencies": {
    "node": ">=14.0.0"
  }
}

Extending

Adding Custom Patterns

const generator = new MarkdownGenerator({
  customPatterns: [{ regex: /TODO:/g, replacement: '' }],
  customSecretPatterns: [{ regex: /mySecret/g, replacement: '[REDACTED]' }],
});

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Open a Pull Request

Contribution Guidelines

Follow Node.js best practices
Include appropriate error handling
Add documentation for new features
Include tests for new functionality (this project needs a suite)
Update the README for significant changes

License

Note

This tool requires a git repository to function properly.