Skip to main content

Command Palette

Search for a command to run...

Building a Code Editor: The Journey Begins (Again)

Updated
4 min read
Building a Code Editor: The Journey Begins (Again)

Another Side Project, Another Adventure

For all the wrong reasons, I've decided to build my own code editor. Yes, I know what you're thinking—there are already dozens of excellent editors out there. VSCode exists. Sublime Text exists. Vim has been around since the dawn of time. But here I am, a fullstack developer with a track record of abandoned startup ideas, diving headfirst into yet another side project.

Will I finish this one? Probably not. Will I move on to another shiny idea in a few weeks? History suggests yes. But for now, while the motivation burns bright, I'm riding this wave of enthusiasm. Let's play the game of pretends where this actually ships.

The Tech Stack: Because We Love Making Things Complicated

As any self-respecting fullstack developer would do, I'm taking the scenic route. Instead of just writing a native application like a normal person, I'm going full web tech:

  • C++ for the core (because performance matters when you're probably not going to finish anyway)

  • WebAssembly (WASM baby!)

  • npm module (so other developers can use my abandoned project)

Yes, I'm writing performance-critical code in C++ and compiling it to WASM to run in the browser. Is this over-engineering? Absolutely. Is it fun? Also absolutely.

Syntax Highlighting: The Fun Part

The most visually satisfying component of any code editor is syntax highlighting. There's something deeply satisfying about watching plain text transform into a colorful masterpiece of keywords, strings, and comments.

But before we can paint our code rainbow, we need to understand what we're painting. Enter: the lexer.

What's a Lexer Anyway?

In layman's terms, a lexer (or lexical analyzer) is like a really pedantic reader that goes through your code word by word, character by character, and labels everything it sees.

Think of it like a grammar teacher marking up your essay, except instead of circling run-on sentences, it's identifying:

  • Keywords: The reserved words of the language (int, float, if, while, class)

  • Identifiers: Variable names, function names, stuff you made up (myVariable, calculateTotal)

  • Operators: The symbols that do things (=, +, -, /, ==)

  • Strings: Text in quotes ("Hello, World!")

  • Numbers: Numeric literals (42, 3.14, 0xFF)

  • Comments: The bits of sanity we leave for our future selves (// TODO: refactor this mess)

  • Punctuation: Brackets, semicolons, commas—the skeleton of code structure

The Tokenization Process

Here's what happens when the lexer reads this simple C++ code:

int main() {
    return 0;
}

The lexer breaks it down into tokens:

  1. intKeyword

  2. mainIdentifier

  3. (Operator/Punctuation

  4. )Operator/Punctuation

  5. {Operator/Punctuation

  6. returnKeyword

  7. 0Number

  8. ;Operator/Punctuation

  9. }Operator/Punctuation

Each token gets tagged with its type, value, and position in the source code. This structured data is gold for a syntax highlighter.

My Implementation: C++ Lexer → WASM → npm

I've written a C++ lexer that:

  1. Reads C++ source code character by character

  2. Identifies tokens using pattern matching and keyword lookup

  3. Handles edge cases like raw string literals, hex numbers, multi-line comments

  4. Outputs JSON with token information (type, value, position)

The beauty of compiling this to WebAssembly is that I get near-native performance for lexical analysis while keeping my editor web-based. The lexer does the heavy lifting in WASM, and JavaScript handles the UI and rendering.

The Architecture

[C++ Lexer] → [Compile to WASM] → [npm package]
     ↓
[JavaScript/TypeScript Editor]
     ↓
[Syntax Highlighted Code]

Why This Will Probably Fail (But That's Okay)

Let's be honest about the challenges:

  1. Scope creep is real - "Just a simple editor" becomes "full IDE with AI autocomplete"

  2. The existing solutions are really good - VSCode didn't become dominant by accident

  3. My attention span - Already thinking about that blockchain idea...

  4. The complexity curve - Syntax highlighting is just the beginning. LSP integration, extensions, debugging, git integration... it never ends

But here's the thing: even if this project joins the graveyard of my abandoned ideas, I'm learning. I'm diving deep into compilers, lexical analysis, WebAssembly, and the guts of how code editors work.

The Journey Continues (For Now)

Next up on this adventure:

  • Compiling the C++ lexer to WASM with Emscripten

  • Creating an npm package that wraps the WASM module

  • Building a React component that uses the lexer

  • Implementing syntax highlighting themes

  • Adding language support beyond C++

  • Getting distracted by another project

Will I finish this? Ask me in three weeks. Will I learn something? Absolutely.

And isn't that the real reason we take on these ridiculous side projects? The journey, the learning, the joy of creation—even if our creations end up as digital dust in an abandoned GitHub repository.

Current status: Highly motivated
Expected completion: TBD (Translation: probably never)
Next startup idea ETA: 2-3 weeks

Source Code : https://github.com/loarsaw/syntax-highlighter

Stay tuned for updates, or don't—I probably won't be posting them anyway.

More from this blog

Blogs

10 posts

Monologue