Category Archives: Programming

Scintilla in an MFC app

I'm working on a side project of my own - a Windows desktop application to apply video effects. It's written in Visual Studio (started in VS2022, now VS2026). I wanted to include a text editor to edit the 'script' files that drive the program, and I wanted one that could handle basic syntax coloring. So I looked around and found Scintilla.

Scintilla is a very nice, open source, text editor component. It's been around for a long time and has been used in many well-known applications (it's the editor behind the popular Notepad++ app, for example). But, there are surprisingly few examples of how to make it do all the various things it can do, on the web. There's the Scintilla documentation, but I didn't find it to be very useful when getting started. Once I started seeing how things worked, it did come in handy however.

You can find the source to a few large, open source, applications that use it (ex. Notepad++, SciTX). But those kinds of applications aren't always good references if you're just getting started in something because their usage is probably very complex and have become encrusted with their own patterns and mechanisms. You have to figure out enough of their workings to know which parts are and aren't applicable to what you need and which are just the things that their app needs.

Eventually I was able to get it to do what I wanted, mostly. But it took me a full week of digging on the web, to get it to do what my application needed. And so, I'm going to share what I came up with, in case anyone else finds themselves in my situation.

  1. Download and build Scintilla (the editor) and Lexilla (the library that contains the lexers to do syntax highlighting for all of the various languages). There are .vcxproj files in the repos that work fine with modern Visual Studio releases. Build both projects and put the DLLs in your project's root folder.
  2. You'll need to declare a function pointer for the CreateLexer function from Lexilla. So add this in a header file somewhere. I put these at the bottom of my CWinApp-derived class header file.
    typedef void* (FAR WINAPI* CreateLexerPtr)(const char* name);
    extern CreateLexerPtr gCreateLexer;
  3. Add this global variable to a CPP file. I put it in my CWinApp-derived class CPP file. This is how you will talk to Lexilla.
    CreateLexerPtr gCreateLexer;
  4. Load the Scintiall and Lexilla DLLs. I did this in the InitInstance() method of my CWinApp-derived class.
    m_hModScintilla = LoadLibrary("Scintilla.dll");
    if (m_hModScintilla == NULL)
    {
       TRACE("Scintilla load error : %d\n", GetLastError());
       AfxMessageBox("The Scintilla DLL could not be loaded.", MB_OK | MB_ICONERROR);
    }
    else
    {
       m_hModLexilla = LoadLibrary("Lexilla.dll");
       if (m_hModLexilla == NULL)
       {
          TRACE("Lexilla load error: %d\n", GetLastError());
          AfxMessageBox("The Lexilla DLL could not be loaded.", MB_OK | MB_ICONERROR);
       }
       else
       {
          gCreateLexer = reinterpret_cast(GetProcAddress(m_hModLexilla, "CreateLexer"));
          if (gCreateLexer == NULL)
          {
             TRACE("Lexilla CreateLexer load error: %d\n", GetLastError());
             AfxMessageBox("The CreateLexer Lexilla from could not be loaded.", MB_OK | MB_ICONERROR);
          }
       }
    }
  5. FreeLibrary on each, when you're done. I did this in my app's dtor.

Now your application is ready to create a Scintilla window and give it a lexer to do syntax highlighting.

So here is my fairly-simple MFC CDialog-based class that creates a Scintilla editor, sets it up for JSON lexing, sets some styles, does basic auto-indent, deals with the document's 'dirty' state, shows reading and setting of the document text. It can use either Scintilla's SendMessage API or the direct-call API (which end up being essentially identical once you define a couple of helper functions). In my application, it is used as a modeless dialog, and is based on a simple resizable CDialog base class, but neither of those things add any real complexity to the class. So, you should be able to take the Scintilla-related bits from this without much trouble.

I am far from a Scintilla expert, but if you have any questions, feel free to ask.

The Personal Touch

I am working on a programming-related post. So, I'm writing some comments in the code, explaining some things. Copilot, seeing that I'm using a personal tone, gives some suggestions about how I could sign my comment.

Of course, it doesn't know who I am, or what year it is, or anything at all about reality. But it does know how to spit out the stuff it was trained on:

Stateless Actor

The latest version of Visual Studio (VS 2026) has Microsoft's AI assistant "Copilot" built-in. So, I've been trying to find a use for it.

Here's how that's going.

I have a class (CParameter) that can hold one of several types of data: integers, floating point numbers, text strings, booleans, etc.. This is used to pass parameters from a script (written in JSON) into an arbitrary set of "actors" who will perform various operations on still images. And then there's a class that holds a collection of these parameters (CParameterMap). So these are utility classes. They don't interact with the user or the screen; they simply exist to shuffle data from one place to another, transforming it from one type to another (ex. text to number) as necessary. And they are fairly simple. But since they are at the heart of the application I'm writing, I thought I would see in Copilot had any suggestions.

Classes like these show up in a lot of applications. "Data from outside sources needs to be collected and presented to the code that uses it in a way the code expects" is a common situation. I would expect Copilot to have seen many variations of this.

OK. Open the CParameter / CParameterMap file, open the Copilot chat pane, type "analyze this for performance".

Copilot chugs away for a few minutes, then comes back with 20 paragraphs of suggestions. Uh oh.

First paragraph:

Hot spots and costs

  • Key type and container
  • std::map uses ordered tree lookups: O(log N). Every lookup constructs a temporary CString CListCtrl::GetEmptyText from const char* and then does node traversal — cost includes allocations/comparisons and pointer chasing.

Yes, maps have overhead. String keys are not ideal. I know.

"Every lookup constructs a temporary CString CListCtrl::GetEmptyText" ?

What?

CListCtrl is a UI component that shows collections of things, in various ways: strings, icons, grids, lists. The original Windows Explorer was built around it (maybe it still is). It's the classic example of what the control does: Here's your list of files in a grid; here they are in a list; here they are as icons of a different size; change the columns show, change the size. Same data, just shown in different ways. It's very useful and infinitely flexible, though notoriously complicated to use.

But specifically, CListCtrl::GetEmptyText is a function of CListCtrl that gives you a copy of what is known as the 'zero-state' message. It's the "There's nothing to display here" or "Your search returned no results" text. It's what gets shown when the list of things you would otherwise show is empty.

I don't use a CListCtrl, or anything like it, anywhere in my application. And a big complex UI component like CListCtrl has nothing to do with what my parameter collection code does.

But eight of the 20 paragraphs Copilot returned mention CListCtrl::GetEmptyText ! And I don't know why.

So, even though I know LLMs have no concept of their own internal state, and so they can't actually answer the question, I ask it anyway:

"Why are you talking about CListCtrl ?"

Here's the response:

Sorry — that was a mistake. There is no CListCtrl in the ParamMap.h you opened; any mention of CListCtrl was a context mix-up on my part.
Focus now: your CParamMap/CParameter analysis and optimizations. I can:

  • Add non-copy lookup helpers (findParam, readRef) to avoid CString CListCtrl::GetEmptyText copies, or
  • Convert m_paramMap to std::unordered_map with a CString CListCtrl::GetEmptyText hash/equal for faster lookups, or
  • Run a targeted profiler pass to show hot call sites.

Which would you like me to do next?

Perfection.

Have NAND, will compute.

What if you found a simple integer overflow bug in an obscure image decoding algorithm, and used it to ... build an entire scriptable computer architecture that executes as part of the image decoder itself?

That way, your target doesn't have to do anything but receive and render your trojan image - which it will do automatically in a messaging app like iMessage - and just like that, your code is running on their computer!

JBIG2 doesn't have scripting capabilities, but when combined with a vulnerability, it does have the ability to emulate circuits of arbitrary logic gates operating on arbitrary memory [ie. it is designed to be able to apply basic logical operations on image components as it composes them -cleek]. So why not just use that to build your own computer architecture and script that!? That's exactly what this exploit does. Using over 70,000 segment commands defining logical bit operations, they define a small computer architecture with features such as registers and a full 64-bit adder and comparator which they use to search memory and perform arithmetic operations. It's not as fast as Javascript, but it's fundamentally computationally equivalent.

The bootstrapping operations for the sandbox escape exploit are written to run on this logic circuit and the whole thing runs in this weird, emulated environment created out of a single decompression pass through a JBIG2 stream. It's pretty incredible, and at the same time, pretty terrifying.

It's downright diabolical.

The Frink is Good, the Unit is Evil

Behold, Frink! A programming language devoted to the correct handling of units of measurement!


//How do you measure, oh measure a year?
> 525600 minutes -> years
0.99933688171353600106

> 525600 minutes -> siderealyears
0.9992981356527034257

> 525600 minutes -> gaussianyears
0.99926355644744010579

> 525600 minutes -> calendaryear
1

2018 Early voting analysis

You know you're in a programming town when this gets run on your local TV station;s website:

Full methodology:

Latitude and longitude coordinates for 2018 early voting locations were obtained from the State Board of Elections and Ethics Enforcement's lookup tool by using a Python script.

Coordinates were not available for 2014 through this tool, so the bulk of these locations were generated using the U.S. Census geocoder tool. Addresses that could not be matched were manually researched and recorded using Google's geocoder tool.

In 30 North Carolina counties, there were no changes in early voting locations between 2014 and 2018, so these counties were omitted from the analysis. This left 580 sites for the two midterm elections. Voters in these counties were also omitted from this analysis, leaving 6,433,969 active and inactive voters, both of which are eligible to cast ballots, according to state elections officials.

While some early voting locations may have been relocated due to the impact of hurricanes Florence or Michael, this analysis considered only the original early voting locations approved by local elections board and the state board.

Latitude and longitude coordinates were then matched to active and inactive registered voters on addresses, city, county and ZIP using MySQL database software. The query failed to match the addresses of 145,645 voters, a 97.7 percent match rate.

We used the free application programming interface (API) from the Open Route Service to generate isochrones – polygons for geographic information systems used to determine driving distances radiating outward from a point source. Isochrones were generated programatically using a Python script.

Open Route Service limits queries through its API to 10 shapefiles at a time. The service also limits total API queries to 2,500 a day.

Due to these limitations, the Python script runs queries for each site four times to produce a geojson feature collection with shapefiles at half-mile intervals from 0.5 to 20, with each polygon describing a driving distance range.

For example, a point that appears in the isochrone with a mile value of 5, but not in an isochrone with a mile value of 4.5, is within 4.5 and 5 miles from the early voting location.

Voter registration data, in CSV format, are loaded into the database, and a separate Python script was used to import the isochrone geojsons using ogr2ogr and its pygdaltools wrapper with a Python script.

SQL queries can then generate mile values for each isochrone intersecting each voter, by county. By deduplicating the table based on the voter and keeping the smallest value, we can find the closest site and distance for each voter in 2014 and 2018.

We then used database software to calculate the change in distance from the closest voting location in 2014 and the closest early voting location in 2018 for every active and inactive voter.

Because the driving distances were limited to 20 miles from each voting location, 62,325 voters could not be matched with either a 2014 or 2018 isochrone because they were outside the 20-mile range. This amounts to less than 1 percent of the registered voters in the study for which the difference in driving distance could not be calculated.

...

Floating Video Killer

If you're annoyed by the floating, screaming video player that haunts CBSNews.com pages, here's a Greasemoney script I wrote to kill it:

// ==UserScript==
// @name         CBSNews Floating Video Killer
// @include      https://www.cbsnews.com/*
// @grant        none
// @noframes
// ==/UserScript==

setInterval(function handler() {
  
  // front page
  var frameDiv = document.querySelector(".embed__content--draggable");
  
  // article
  if (frameDiv == null) frameDiv = document.querySelector(".media-block");
  
  if (frameDiv != null)
  {
    // are we looking at the floating frame, or the header player?
    var floated = false;
    var classList = frameDiv.className.split(/\s+/);
    for (var i = 0; i < classList.length; i++) {
      if (classList[i] === 'floating') {
          floated = true;
        break;
      }
    }
    
    // don't touch the header player's button
    // (or you'll never be able to play it at all!)
    if (floated) {
      var element = document.querySelector(".player-overlay__button");
      if (element) {
          element.click();
      }
    }
  }
  
}, 1000);

(really just posting this here so I can share it between my own computers)

eFail!

This is kindof hilarious...

That brings us back to last week, and the release of Efail. The hack is simple and brilliant: It uses the fact that your email client thinks it’s a web browser. An attacker sending mail can steal the content of secret messages you may have sent or received. It works like this: An email client running OpenPGP (the current standard of PGP) or S/MIME decrypts messages when it receives them, and since the clients are also web browsers, they fetch things from the web for displaying them to you in the email you open at the same time. So what if you happened to open an email, which decrypts whatever message it may have inside, even a hidden one, while the same email also tells your email client to fetch an image off the web whose name is now the entire contents of a message it just decrypted? It would just do it, invisibly, sending the now easily readable message anywhere on the net without you ever knowing it happened. Sure, an image named “Meet me at the park on Sunday at 3 a.m. and we’ll make plans from there come alone.jpg” would never load on your screen, but you’ll have invisibly asked for it, and that ask will now be recorded in whatever computer out there the person who sent the mail wanted it recorded on. And that mail could have just as easily said it was from your spouse or boss as God or Santa Claus.

EMail is fundamentally broken.

Alas. I like email.

Source: Email Hackers Are Winning - The Atlantic