The Solitary Programmer

Sunday, January 4, 2026

From the Ground Up: Building an actual website - Part 1: The Idea and Authentication

Photo by <a href='/photographer/groble-50361'>groble</a> on <a href='/'>Freeimages.com</a>

I've been studying industrial-scale websites for a long time now. I've been reading about them, working on them, and trying to understand them that whole time. But something has always been missing. It feels like I understand parts of them, but not the whole thing at once. Like the blind men and the elephant. Maybe understanding real industrial-scale websites is too much for one person, but I still believe I can do better. I've started to believe the only way to learn them the way I want to is to try building one myself.

The Idea

When I worked at a large midwestern chemical information company I adopted a chemical lookup program. The group I worked in needed quick, fast lookup for commonly used chemicals in the documentation we were writing. The program I adopted used a simple spreadsheet to store the chemical list, and used a simple UI to search the list.

When I adopted that I expanded it into a full three-tier application with a web UI frontend, a compiled middle tier, and a fully normalized database backend. It was ridiculously overengineered for the demands place on it (I don't think anybody but me ever used it), but it was a good way to test concepts like database normalization, three-tier development, web calls, and so on.

I wanted to continue with that, but since I no longer work for this company and was contemplating web development and deployment I am leery about taking their list of synonyms and search terms and publishing them for the world to access. I don't think this application has any commercial value whatsoever, but it's probably better to be safe than sorry.

So what if the users came up with their own substances and search terms? What if a small drug development company wanted a way to store and manage access to a list of chemicals, with each development team uploading their own drug targets, abbreviations, and properties? So that's the idea I want to develop. MyOrg has been born!

Let's start from the very beginning (a very good place to start)

All the security blogs and speakers say the place to think about security, authentication, and authorization is at the start and during the whole lifetime of a project. Since this is an area about which I know very little, I decided to start with that.

At another previous employer I had been tasked with implementing OAuth2 authorization with Github. Since I knew something about that, and Github is widely used and known by the developer community, I decided to do Github-based authentication. My goals for authentication were that as much work as possible be done by the backend. Some frontend code is necessary to kick off the process and for the user to authorize the Github app to allow access by MyOrg, but the backend should do the token exchange and manage the results of authentication. Also, based on a blog I read a year or so back I don't want to be sending Github tokens back and forth between the backend and frontend, so after the initial authentication with Github I decided to have the backend generate a certificate-signed JWT and use that for interactions after that.

Github says their Github Apps are preferred to Oauth apps because of their finer-grained control of permissions and allowed activities, so that's the route I took. I created two versions of the app, one for local development and one for production. Since I'm only using them for authentication I allowed them to request only minimal access to users' accounts. I made note of their client ids and secrets. For the local app I set the callback url to the localhost url and port the app will be running at (https://localhost:7055/auth/callback). I'll describe the production url later.

I created a new C# ASP.NET application with the minimal API and wrote an Auth endpoint and service for it. Then I wrote the following methods to handle a user's initial request:


public static class AuthEndpoints
{
    public static void MapAuthEndpoints(this IEndpointRouteBuilder app)
    {
        var group = app.MapGroup("/auth");

        group.MapGet("/login", ([FromQuery] string origin, [FromServices] IAuthService auth) =>
        {
            var url = auth.GetLoginRedirectUrl(origin);
            return Results.Redirect(url);
        });
        ...
    }
}

public class AuthService : IAuthService
{
    private const string State = "abc123";
    ...
    
    public string GetLoginRedirectUrl(string origin)
    {
        string enhancedState;
        if (origin == null || origin == string.Empty)
        {
            enhancedState = State;
        }
        else
        {
            var originState = Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes(origin));
            enhancedState = $"{State}+{originState}";
        }

        var q = HttpUtility.ParseQueryString(string.Empty);
        q.Add("client_id", _github.ClientId);
        q.Add("redirect_uri", _github.RedirectUri);
        q.Add("state", enhancedState);
        q.Add("allow_signup", "false");

        var theUrl = $"{_github.AuthUrl}/authorize?{q}";
        _logger.LogDebug("Redirect URL: {URL}", theUrl);
        return theUrl;
    }
}

I made all these variables available to the backend as environment variables. In development I put the insensitive values (client id, redirect uri, auth url, etc) in the environment via appsettings.Development.json and the sensitive client secret in dotnet's user-secrets. Then I used dotnet's configuration process to parse them into an object that gets injected into the services that need them.

For frontend reasons I needed to know the origin URL the login request was made from, so I separate that out as soon as the request is made. To survive the roundtrip to Github and back I concatenate that with a State variable (hardcoded to "abc123" currently) and send it as a query variable.

The callback and token exchange process took me a while to work out. The basic process is fairly easy:


    public static void MapAuthEndpoints(this IEndpointRouteBuilder app)
    {
        var group = app.MapGroup("/auth");

        ...

        group.MapGet("/callback", async (
            [FromQuery] string code,
            [FromQuery] string state,
            [FromServices] IAuthService auth,
            HttpContext ctx) =>
        {
            var pair = state.Split("+");
            var stateString = pair[0];
            var originString = pair.Length > 1 ? pair[1] : Convert.ToBase64String(System.Text.Encoding.UTF8.GetBytes("http://nowhere.com/nothing"));
            var origin = System.Text.Encoding.UTF8.GetString(Convert.FromBase64String(originString));
            var user = await auth.HandleCallbackAsync(code, stateString);
            return Redirect($"http://localhost:5173/dashboard?token={user}");
        }
    }
...
    public async Task HandleCallbackAsync(string code, string state)
    {
        if (state != State)
        {
            throw new UnauthorizedAccessException("State mismatch");
        }

        var data = new Dictionary
        {
            ["client_id"] = _github.ClientId,
            ["client_secret"] = _github.ClientSecret,
            ["code"] = code,
            ["redirect_uri"] = _github.RedirectUri,
        };

        using var response = await _http.PostAsync($"{_github.AuthUrl}/access_token", new FormUrlEncodedContent(data));
        response.EnsureSuccessStatusCode();

        var responseBody = await response.Content.ReadAsStringAsync();
        var queryParams = HttpUtility.ParseQueryString(responseBody);
        var accessToken = queryParams["access_token"];

        if (string.IsNullOrEmpty(accessToken))
        {
            throw new UnauthorizedAccessException("Failed to retrieve Github token");
        }

        using var userRequest = new HttpRequestMessage(HttpMethod.Get, $"{_github.ApiUrl}/user");
        userRequest.Headers.Authorization = new AuthenticationHeaderValue("Bearer", accessToken);
        foreach (var (key, value) in _github.RequiredHeaders)
        {
            userRequest.Headers.Add(key, value);
        }

        using var userResponse = await _http.SendAsync(userRequest);
        userResponse.EnsureSuccessStatusCode();

        var user = await userResponse.Content.ReadFromJsonAsync()
            ?? throw new InvalidOperationException("Failed to parse Github user.");
        var authenticatedUser = GenerateUserWithJwt(user);
        return authenticatedUser;
    }

I check to make sure the state matches the value expected, and then construct the response Github is expecting for the token exchange. When the result comes back, if the access token is not null, I immediately query Github for the user record of the person making the request, parsing that into a User record I wrote.

This is the process to create a signed JWT with a local certificate that I borrowed from still another previous client:


    private User GenerateUserWithJwt(User user)
    {
        var cert = GetCertificateFromStore(_jwt.Thumbprint)
            ?? throw new InvalidOperationException("Signing certificate not found.");
        var key = new X509SecurityKey(cert);
        var creds = new SigningCredentials(key, SecurityAlgorithms.RsaSha256Signature);

        var claims = new[]
        {
            new Claim("LE-User-Name", user.Name ?? string.Empty),
            new Claim("LE-User-Login", user.Login ?? string.Empty),
            new Claim("LE-Company", user.Company ?? string.Empty),
        };

        var token = new JwtSecurityToken(
            _jwt.Issuer,
            _jwt.Audience,
            claims,
            expires: DateTime.UtcNow.AddDays(1),
            signingCredentials: creds
        );

        return new User
        {
            Login = user.Login ?? string.Empty,
            Name = user.Name ?? string.Empty,
            Url = user.Url,
            Company = user.Company ?? string.Empty,
            OrganizationsUrl = user.OrganizationsUrl,
            SiteAdmin = user.SiteAdmin,
            Jwt = new JwtSecurityTokenHandler().WriteToken(token),
        };
    }

    private static X509Certificate2? GetCertificateFromStore(string thumbprint, StoreName storeName = StoreName.My)
    {
        using var certStore = new X509Store(storeName, StoreLocation.LocalMachine);
        certStore.Open(OpenFlags.ReadOnly);
        var certs = certStore.Certificates.Find(X509FindType.FindByThumbprint, thumbprint, false);

        if (certs.Count == 0)
        {
            // this is for local testing. I'm guessing there is a better way to do this?
            using var userStore = new X509Store(storeName, StoreLocation.CurrentUser);
            userStore.Open(OpenFlags.ReadOnly);
            certs = userStore.Certificates.Find(X509FindType.FindByThumbprint, thumbprint, false);
        }

        return certs.Count == 0 ? null : certs[0];
    }

I'm pretty sure this wouldn't work in a cloud environment, nor on a non-Windows setup, but I decided I could come back to it eventually. This worked locally on my work PC and was enough to get me going.

To kick off the authorization call from a frontend I just make a call to the backend:


const url = new URL('http://localhost:5164/auth/login')
const urlString  = url.toString()

const doLogin = () => {
  window.location.href = urlString
}

The Redirect return in the backend along with Vue routing took care of loading the correct page when login was successful.

I didn't like this for a couple of reasons. Ironically, the part I thought would be hardest to fix was actually the easiest, and the part I thought would be pretty straightforward was the hardest to get past.

The first is the use of a local certificate store for signing the JWT. Fortunately, moving to Azure and setting some values in the startup process and a KeyVault pretty much took care of it. I'll describe this more in a couple more minutes.

Second was the hardcoded Redirect in the backend to take care of the successful login path. I didn't think the backend should know that much about the structure of the frontend application and should be able to return a more general value, like just a json document with the credentials. Unfortunately, that just wasn't possible. It was the cause of my biggest, most passionate argument with ChatGPT to date. However, I lost that battle. It seems, because this is going through a web interface, the backend has to return some sort of HTML-like stuff for the browser to accept it. Any textual data, like json, will simply be displayed in the browser, which is not at all what I want. So what I finally worked out was returning a simple HTML document with a call to a presumed function in the calling webpage, which I further assumed will have opened a popup window to perform the login process.


    group.MapGet("/callback", async (
        {
            ...
            var userJson = JsonSerializer.Serialize(user);
            return Results.Content($@"
                <html>
                    <body>
                        <script>
                            window.opener.postMessage({userJson}, '{origin}')
                            console.log('postMessage sent!')
                        </script>
                    </body>
                </html>
            ", "text/html");
        });


const frontendOrigin = window.location.origin
const url = new URL(`${baseUrl}/auth/login?origin=${encodeURIComponent(frontendOrigin)}`)
const urlString  = url.toString()

const doLogin = () => {
  const width = 600, height = 700
  const left = (screen.width - width) / 2
  const top = (screen.height - height) / 2

  const handleMessage = (event: MessageEvent) => {
    if (event.origin !== baseUrl) {
      return
    }

    if (!event.data || typeof event.data !== 'object') {
      return
    }

    const user = event.data as User

    if (!user) {
      return
    }

    try {
      auth.init(JSON.stringify({ ...user }))
    } catch (err) {
      console.error('[auth] init failed', err)
    } finally {
      clearTimeout(to)
      window.removeEventListener('message', handleMessage)
      try { popup?.close() } catch {}
      router.push('/dashboard')
    }
  }

  window.addEventListener('message', handleMessage)

  const popup = window.open(
    urlString,
    '_blank',
    `width=${width},height=${height},top=${top},left=${left}`
  )

  if (!popup) {
    window.removeEventListener('message', handleMessage)
    console.warn('[auth] popup blocked')
    return
  }

  const timeoutMs = 2 * 60 * 1000
  const to = setTimeout(() => {
    console.warn('[auth] auth message timeout; removing listener')
    window.removeEventListener('message', handleMessage)
    try { popup.close() } catch {}
  }, timeoutMs)
}

With a valid JWT in hand, it's pretty easy to require it to query your endpoints. All it needs is some settings in Program.cs and a method call on the endpoints to be secured. There can also be custom requirements on the JWT, such as the presence of a User field.


// Program.cs
builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme).AddJwtBearer(options =>
{
    var jwtOptions = builder.Configuration.GetSection(JwtTokenOptions.Jwt).Get() ?? throw new InvalidOperationException("Signing certificate not found.");
    var cert = SecretsService.LoadCertificate(jwtOptions);
    options.TokenValidationParameters = new TokenValidationParameters
    {
        IssuerSigningKey = new X509SecurityKey(cert),
        ValidateIssuer = true,
        ValidateIssuerSigningKey = true,
        ValidateAudience = true,
        ValidateLifetime = true,
        ValidAudience = builder.Configuration["JwtTokenOptions:Audience"],
        ValidIssuer = builder.Configuration["JwtTokenOptions:Issuer"],
    };
});
builder.Services.AddAuthorizationBuilder()
    .AddPolicy("IsUser", policy => policy.RequireClaim("LE-User-Login").Build());
...
app.UseAuthorization();

// SearchEndpoints.cs
public static class SearchEndpoints
{
    public static void MapSearchEndpoints(this IEndpointRouteBuilder app)
    {
        app.MapGet("/search", async (
                [FromServices] ISearchService searchService,
                [FromQuery(Name = "st")] string[] searchTerms,
                CancellationToken cancellationToken) =>
                await searchService.Search(searchTerms, cancellationToken)
            )
            .RequireAuthorization();
    }
}

So this worked locally. The next challenge is to deploy it and get it to work in the cloud.

Ship it!

I decided to deploy the app to Azure since I know less about it and it's what my current client is using. I've worked in Azure before and was certified in it at one point, but again, what I know is mostly theoretical, so I decided to make it more concrete by deploying there.

Since Microsoft's acquisition of Github that seems to be where they're devoting most of their recent development. Again, however, since I'm less familiar with it, I decided to implement this fully in Azure DevOps, right down to the source code repository and the task board to manage my own work.

Microsoft provides starting templates for building and deploying common types of projects, so I just took and adapted one of them. I added it to my repository in Azure DevOps, adjusted the values to suit my project, committed it, and then it was available to pull down locally.

From the descriptions on Azure it sounded like an app service was what I needed, so that's what I wrote the yml file to deploy to. Interestingly, the app service needs to exist before you can deploy to it. So I went to Azure, created a resource group to hold all the artifacts I would need, and created the app service. You also need a Service Connection between the app service and Azure DevOps, so I followed the steps in ADO to do that. That was enough to get the code and the app out to Azure, but I still needed to fix the configuration and certificate generation problem.

Adding non-sensitive configuration values to the app was a simple matter of adding them to the runtime environment. That can be done in Azure in the app service page. Select Settings, Environment variables, and add the keys and values you need there. Use a double underscore to mimic sections in environment variable settings in ASP.NET, e.g., GithubOptions:ClientId becomes GithubOptions__ClientId. Now when the app host starts those values are available to the app.

I created a KeyVault to hold the certificate and the Github ClientSecret. Creating the vault was easy, but I quickly ran into a frustring aspect of Azure: I couldn't add the values to the vault I had just created! I have come to learn that I, as a developer, am not Microsoft's primary customer. My almost ultimate boss, the CTO of my organization is, so all of Microsoft's products are geared to him or her, not to me. Even though it makes perfect sense to me to be able to add stuff to a resource I just created, that's probably not how most tech organizations work. The overworked and always-busy owner of the Azure resources may respond to my request to create a KeyVault, but will probably not be the one who adds values to it. Minimal access to the Nth degree. So, immediately after creating the KeyVault, I had to turn around and look up the way to give myself permission to add values to it. That was an easy Google/ChatGPT search, and then I was able to add the Github ClientSecret to it. I decided to ask KeyVault to generate a signing certificate rather than trying to do it myself and upload it to it, but that route is possible to.

The code changes needed to take advantage of the KeyVault were fairly simple. I added the KeyVault name and SecretName (just the name of the certificate, not really a secret) to configuration. Then I modified Program.cs to access the KeyVault:


if (!builder.Environment.IsDevelopment())
{
    var vaultName = builder.Configuration["JwtTokenOptions:KeyVaultName"];
    var kvUri = new Uri($"https://{vaultName}.vault.azure.net/");
    builder.Configuration.AddAzureKeyVault(kvUri, new DefaultAzureCredential());
}

With that, the Github ClientSecret just appeared in the app's environment. Loading the certificate required a modification to the code to load it:


public static class SecretsService
{
    public static X509Certificate2 LoadCertificate(JwtTokenOptions options)
    {
        var client = new SecretClient(
            new Uri($"https://{options.KeyVaultName}.vault.azure.net/"),
            new DefaultAzureCredential());
        var secret = client.GetSecret(options.KeyVaultSecretName);
        var pfxBytes = Convert.FromBase64String(secret.Value.Value);
        return X509CertificateLoader.LoadPkcs12(pfxBytes, null);
    }
}

I think there was something else in there about giving the app service permission to access the KeyVault, but that was a simple permission and configuration change in Azure. Bu after that the deployed version of the app worked too!

So now I have an app that allows authentication to Github, generates a signed JWT, and requires that JWT to query endpoints on the backend. Now I can get on with the fun stuff.

Source code

The working repository for this project is in Azure DevOps, and is private by default. I haven't found a way to change that after creation, so I am mirroring the source code files to Github.

Tuesday, October 14, 2025

Going Native - Swift

"Your second-hand bookseller is second to none in the worth of the treasures he dispenses."

Leigh Hunt

Coming home to roost

This is the completion of a series on calling native code from high-level languages. Here is a description of the native library I'm calling in this series.

Apple has used several languages for its operating system and devices, most notably Objective-C and Swift. But I read a few years ago that Swift had found some adoption in data analysis and Big Data applications because of its expressiveness and streaming features. Swift has been released in open source, so there are implementations for Linux and Windows in addition to MacOS. I did an Advent of Code in Swift one year, and enjoyed it. To wrap up this project of calling native code from high-level languages I decided to give Swift a try.

Getting Started

The interface for calling native code from Swift has changed recently. The mechanism is the Swift Package Manager, but the changes have meant some older references are out of date. One example that gave me hope, even though it didn't work was this blog post: Wrapping C Libraries in Swift.

The example that got me going was directly from the Swift Documentation on the Swift Package Manager, particularly using system libraries to call native code.

As an Apple-original language, I wasn't sure how it would translate to Windows. I was fairly confident in its applicability to Linux, though, so that's where I started. That meant writing a command line application, instead of an app: those are Mac-only.


$ mkdir SwiftRMatrix
$ cd SwiftRMatrix
$ swift package init --type executable
$ tree .
.
├── Package.swift
└── Sources
    └── SwiftRMatrix
        └── SwiftRMatrix.swift

2 directories, 2 files

These commands set up a group of files and directories, the most important of which are Package.swift and Sources/SwiftRMatrix/SwiftRMatrix.swift. The latter is the entrypoint to the application, and the former is the directions for how to build the project. This is all that is needed to run "Hello, world!": you can do swift run at this point and see the message printed to the console.

Linking to native code is a matter of writing new modules and setting up dependencies among the modules in the project.


$ mkdir Sources/CRashunal
$ touch Sources/CRashunal/rashunal.h
$ touch Sources/CRashunal/module.modulemap

rashunal.h:


#import <rashunal.h>

module.modulemap:


module CRashunal [system] {
    umbrella header "rashunal.h"
    link "rashunal"
}

rashunal.h, which is distinct from the rashunal.h I wrote for the Rashunal project, is simply a transitive import to the native code, bringing all the declarations in the original rashunal.h into the Swift project. module.modulemap emphasizes this by saying that rashunal.h is an umbrella header, and that the code will link the rashunal library. At this point, CRashunal (the Swift project) can be imported into Swift code and used.

Package.swift:


// swift-tools-version: 6.2
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription

let package = Package(
    name: "SwiftRMatrix",
    dependencies: [],
    targets: [
        // Targets are the basic building blocks of a package, defining a module or a test suite.
        // Targets can depend on other targets in this package and products from dependencies.
        .systemLibrary(
            name: "CRashunal"
        ),
        .executableTarget(
            name: "SwiftRMatrix",
            dependencies: ["CRashunal"],
            path: "Sources/SwiftRMatrix"
        ),
    ]
)

SwiftRMatrix.swift:


// The Swift Programming Language
// https://docs.swift.org/swift-book
import Foundation

@main
struct SwiftRMatrix {
    public func run() throws {
        let r: UnsafeMutablePointer = n_Rashunal(numericCast(1), numericCast(2))
        print("{\(r.pointee.numerator),\(r.pointee.denominator)}")
    }
}

I like that Swift distinguishes between mutable and immutable pointers (UnsafeMutablePointer and UnsafePointer), and uses generics to indicate what the pointer is to. Swift also has an OpaquePointer when the fields of a struct are not imported, like an RMatrix. I'll come back to that later. The pointee field to access the fields of the struct is an additional bonus.

ChatGPT pointed me to memory safety early on, so I learned quickly how to access the standard library on the different platforms. Swift recognizes C-like compiler directives, so accessing it was a simple matter of importing the right native libraries. For Windows, it's a part of the platform, so no special import is needed.


#if os(Linux)
import Glibc
#elseif os(Windows)

#elseif os(macOS)
import Darwin
#else
#error("Unsupported platform")
#endif
...
let r: UnsafeMutablePointer = n_Rashuna(numericCast(1), numericCast(2))
print("{\(r.pointee.numerator),\(r.pointee.denominator)}")
free(r)

And that's it, for code. The devil, of course, is in the compiling and linking.

A chain is only as strong as its weakest link

Swift Package Manager uses several sources to find libraries, but none of them seemed to match my particular use case. The closest was to make use of pkg-config. The more I read about it, the more it seemed to be an industry standard, and that Rashunal and RMatrix would benefit by taking advantage of it. So I broke my rule that I established earlier and decided to enhance the libraries.

Fortunately, it wasn't too painful. Telling Rashunal to write to pkg-config was only a few lines added to rashunal/CMakeLists.txt:


+set(PACKAGE_NAME rashunal)
+set(PACKAGE_VERSION 0.0.1)
+set(PACKAGE_DESC "Rational arithmetic library")
+set(PKGCONFIG_INSTALL_DIR "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
+
+configure_file(
+  ${CMAKE_CURRENT_SOURCE_DIR}/rashunal.pc.in
+  ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}.pc
+  @ONLY
+)
+
 add_library(rashunal SHARED src/rashunal.c src/rashunal_util.c)
...
+install(
+  FILES ${CMAKE_CURRENT_BINARY_DIR}/rashunalConfig.cmake
+  DESTINATION lib/cmake/rashunal
+)
+
+install(
+  FILES ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}.pc
+  DESTINATION ${PKGCONFIG_INSTALL_DIR}
 )

The first block is toward the top of CMakeLists.txt, and the second is toward the bottom.

The configure_file directive needs a template for the pc file that will be written. The template has placeholders set of by '@' that will be filled in during the build process.

rashunal.pc.in:


prefix=@CMAKE_INSTALL_PREFIX@
exec_prefix=${prefix}
libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@

Name: @PACKAGE_NAME@
Description: @PACKAGE_DESC@
Version: @PACKAGE_VERSION@
Libs: -L${libdir} -l@PACKAGE_NAME@
Cflags: -I${includedir}

During installation the newly-written rashunal.pc file will be written to a platform-standard location on disk.

After making those changes, building, compiling, and installing, pkg-config was able to tell me something about the Rashunal library:


$ rm -rf build
$ mkdir build
$ cd build
$ cmake ..
$ make && sudo cmake --install .
$ ls /usr/local/lib/pkgconfig
rashunal.pc
$ cat /usr/local/lib/pkgconfig/rashunal.pc
prefix=/usr/local
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

Name: rashunal
Description: Rational arithmetic library
Version: 0.0.1
Libs: -L${libdir} -lrashunal
Cflags: -I${includedir}
$ pkg-config --cflags rashunal
-I/usr/local/include
$ pkg-config --libs rashunal
-L/usr/local/lib -lrashunal

Notice the new command to install the project: apparently this is the more modern and more approved way to do it nowadays. The bash output means that the declarations of the Rashunal library can be found at /usr/local/include and the binaries at /usr/local lib.

Now the Swift Package Manager can be told just to consult pkg-config for the header and binary location of any system libraries it's attempting to build. It's not necessary, but the examples I saw recommended adding some suggestions for how to install Rashunal if it's not present. I haven't looked into what it takes to package a library for apt or brew, but I'm pretty sure this is how they are consumed:

Package.swift:


.systemLibrary(
    name: "CRashunal",
    pkgConfig: "rashunal",
    providers: [
        .apt(["rashunal"]),
        .brew(["rashunal"]),
    ],
)

Then the Swift project could be built and run:


$ swift build
$ swift run SwiftRMatrix
{1,2}

And rinse and repeat for RMatrix. There is nothing new in building the RMatrix pkg-config files or linking to it from Swift, except for the dependency on Rashunal in the template for RMatrix:

rmatrix.pc.in


prefix=@CMAKE_INSTALL_PREFIX@
exec_prefix=${prefix}
libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@

Name: @PACKAGE_NAME@
Description: @PACKAGE_DESC@
Version: @PACKAGE_VERSION@
Requires: rashunal
Libs: -L${libdir} -l@PACKAGE_NAME@
Cflags: -I${includedir}

I started to look into removing that hardcoded dependency and getting it from the link libraries in CMakeLists.txt, but that quickly started to grow big and nasty, so I abandoned it. ChatGPT assured me that was common, especially for small projects.

Crossing the operating system ocean

Trying to do this on MacOS, I ran into my old nemesis SIP. Fortunately, the solution here was similar to the solution I followed there. The Swift command at /usr/bin/swift was protected by SIP, but the executable generated by the swift build command wasn't:


% swift build -Xlinker -rpath -Xlinker /usr/local/lib
% swift run .build/debug/SwiftRMatrix
{1,2}

What is astonishing is that, with one more testy exchange with ChatGPT, I also got it to work on Windows. I still don't understand what was the difference with Linux and MacOS or how this changed things on Windows, but I had to make an additional change to Rashunal's CMakeLists.txt and the cmake command to build RMatrix:

rashunal/CMakeLists.txt


if (WIN32)
  set_target_properties(rashunal PROPERTIES
    ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin"
    RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin"
  )
endif()


>cmake .. -G "NMake Makefiles" ^
More? -DCMAKE_BUILD_TYPE=Release ^
More? -DCMAKE_INSTALL_PREFIX=C:/Users/john.todd/local/rmatrix ^
More? -DCMAKE_PREFIX_PATH=C:/Users/john.todd/local/rashunal ^
More? -DCMAKE_C_FLAGS_RELEASE="/MD /O2 /DNDEBUG"
>nmake
>nmake install

Then the Swift application could be built and run from the command line, albeit with a few additional linker switches. This also needs to be done from a Powershell or DOS window with Admin rights because, even though it only changes the local project directory, it seems to write to a protected directory.


> swift build `
>>   -Xcc -IC:/Users/john.todd/local/rashunal/include `
>>   -Xcc -IC:/Users/john.todd/local/rmatrix/include `
>>   -Xlinker /LIBPATH:C:/Users/john.todd/local/rashunal/lib `
>>   -Xlinker /LIBPATH:C:/Users/john.todd/local/rmatrix/lib `
>>   -Xlinker /DEFAULTLIB:rashunal.lib `
>>   -Xlinker /DEFAULTLIB:rmatrix.lib `
>>   -Xlinker /DEFAULTLIB:ucrt.lib
> ./.build/debug/SwiftRMatrix.exe
{1,2}

Cleaning up the guano

My last task was to abstract the native calls away from the main application. To do this I wrote a Models module that wrapped the native Rashunal, RMatrix, and Gauss Factorization structs.

Sources/Model/Model.swift


public class Rashunal: CustomStringConvertible {
    var _rashunal: UnsafePointer

    public init(_ numerator: Int, _ denominator: Int = 1) {
        _rashunal = UnsafePointer(n_Rashunal(numericCast(numerator), numericCast(denominator)))
    }

    public init(_ data: [Int]) {
        _rashunal = UnsafePointer(n_Rashunal(numericCast(data[0]), data.count > 1 ? numericCast(data[1]) : 1))
    }

    public var numerator: Int { Int(_rashunal.pointee.numerator) }

    public var denominator: Int { Int(_rashunal.pointee.denominator) }

    public var description: String {
        return "{\(numerator),\(denominator)}"
    }

    deinit {
        free(UnsafeMutablePointer(mutating: _rashunal))
    }
}

What gets returned from the native n_Rashunal call is a Swift UnsafeMutablePointer. I wanted them to be immutable wherever possible, so I cast it to an UnsafePointer in both the constructors. Swift makes property definition and string representations easy and natural. The deinit method calls the native standard library's free method to release the native memory allocated by Rashunal. This makes cleanup and memory hygiene easy.

Sources/Model/Model.swift


public class RMatrix: CustomStringConvertible {
    var _rmatrix: OpaquePointer

    private init(_ rmatrix: OpaquePointer) {
        _rmatrix = rmatrix
    }

    public init(_ data: [[[Int]]]) {
        let height = data.count
        let width = data.first!.count

        let rashunals = data.flatMap {
            row in row.map {
                cell in n_Rashunal(numericCast(cell[0]), cell.count > 1 ? numericCast(cell[1]) : 1)
            }
        }
        let ptrArray = UnsafeMutablePointer?>.allocate(capacity: rashunals.count)
        for i in 0.. = RMatrix_get(_rmatrix, i, j)
                let rep = "{\(cellPtr.pointee.numerator),\(cellPtr.pointee.denominator)}"
                free(UnsafeMutablePointer(mutating: cellPtr))
                return rep
            }.joined(separator: " ") + " ]"
        }.joined(separator: "\n")
    }

    deinit {
        free_RMatrix(_rmatrix)
    }
}

Unsurprisingly, RMatrix was the hardest of these to get right. The private constructor is used in the factor method as a convenience method to initialize a Swift RMatrix. The other constructor is used to initialize a matrix from the familiar 3D array of Ints. I get the height and width from the first two dimensions of the input array, then use the n_Rashunal method to construct a list of native Rashunal structs as UnsafeMutablePointer<CRashunal.Rashunal>s. As before, new_RMatrix expects an array of pointers to structs, but the rashunals array is in managed memory, not native memory. So I allocate and fill an array of pointers to the Rashunal structs in native memory. ChatGPT suggested I add the defer block in case new_RMatrix abends for any reason. Because the RMatrix struct is declared but not defined in rmatrix.h, what is automatically returned is an OpaquePointer, which is just fine with me.

Properties defer to the encapsulated _rmatrix pointer, and the string description method makes full use of Swift's stream processing capabilities. deinit calls the RMatrix library's free_RMatrix method.

After all that, factoring a matrix and the GaussFactorization struct are pretty routine.

Sources/Model/Model.swift


public struct GaussFactorization {
    public var PInverse: RMatrix
    public var Lower: RMatrix
    public var Diagonal: RMatrix
    public var Upper: RMatrix

    public init(PInverse: RMatrix, Lower: RMatrix, Diagonal: RMatrix, Upper: RMatrix) {
        self.PInverse = PInverse
        self.Lower = Lower
        self.Diagonal = Diagonal
        self.Upper = Upper
    }
}

public class RMatrix: CustomStringConvertible {
...
    public func factor() -> GaussFactorization {
        let gf = RMatrix_gelim(_rmatrix)!
        let sgf = GaussFactorization(
            PInverse: RMatrix(gf.pointee.pi),
            Lower: RMatrix(gf.pointee.l),
            Diagonal: RMatrix(gf.pointee.d),
            Upper: RMatrix(gf.pointee.u)
        )
        free(gf)
        return sgf
    }
}

Calling the native method RMatrix_gelim returns a newly-allocated struct pointing to four newly-allocated matrices. The matrices are passed to the RMatrix constructor, so that the class takes responsibility for managing their memory. The native struct itself is freed by the RMatrix factor method before returning the Swift struct.

The driver class has no import of native code, and all the allocations look just like Swift objects.


import ArgumentParser
import Foundation
import Model

enum SwiftRMatrixError: Error {
    case runtimeError(String)
}

@main
struct SwiftRMatrix: ParsableCommand {
    @Option(help: "Specify the input file")
    public var inputFile: String

    public func run() throws {
        let url = URL(fileURLWithPath: inputFile)
        var inputText = ""
        do {
            inputText = try String(contentsOf: url, encoding: .utf8)
        } catch {
            throw SwiftRMatrixError.runtimeError("Error reading file [\(inputFile)]")
        }
        let data = inputText
            .split(whereSeparator: \.isNewline)
            .map { $0.trimmingCharacters(in: .whitespaces) }
            .map { line in line.split(whereSeparator: { $0.isWhitespace })
            .map { token in token.split(separator: "/").map { Int($0)! } }
        }
        let m = Model.RMatrix(data)
        print("Input matrix:")
        print(m)

        let factor = m.factor()
        print("Factors into:")
        print("PInverse:")
        print(factor.PInverse)

        print("Lower:")
        print(factor.Lower)

        print("Diagonal:")
        print(factor.Diagonal)

        print("Upper:")
        print(factor.Upper)
    }
}

$ swift run SwiftRMatrix --input-file /home/john/workspace/rmatrix/driver/example.txt
[1/1] Planning build
Building for debugging...
[11/11] Linking SwiftRMatrix
Build of product 'SwiftRMatrix' complete! (1.17s)
Input matrix:
[ {-2,1} {1,3} {-3,4} ]
[ {6,1} {-1,1} {8,1} ]
[ {8,1} {3,2} {-7,1} ]
Factors into:
PInverse:
[ {1,1} {0,1} {0,1} ]
[ {0,1} {0,1} {1,1} ]
[ {0,1} {1,1} {0,1} ]
Lower:
[ {1,1} {0,1} {0,1} ]
[ {-3,1} {1,1} {0,1} ]
[ {-4,1} {0,1} {1,1} ]
Diagonal:
[ {-2,1} {0,1} {0,1} ]
[ {0,1} {17,6} {0,1} ]
[ {0,1} {0,1} {23,4} ]
Upper:
[ {1,1} {-1,6} {3,8} ]
[ {0,1} {1,1} {-60,17} ]
[ {0,1} {0,1} {1,1} ]

Reflection

Wow, that turned out a lot better than I expected. I thought this would be possible on Linux and MacOS. To be able to get it to work on Windows too was a pleasant surprise. I really like the Swift language: it is expressive and concise and makes really good use of streaming approaches. I hope I get to use it to make money sometime.

Code repository

https://github.com/proftodd/GoingNative/tree/main/SwiftRMatrix

Monday, October 6, 2025

Going Native - Python

Photo by <a href='https://freeimages.com/photographer/rolve-45406'>rolve</a> on <a href='https://freeimages.com/'>Freeimages.com</a>

"If you're not stubborn, you'll give up on experiments too soon. And if you're not flexible, you'll pound your head against the wall and you won't see a different solution to a problem you're trying to solve."

Jeff Bezos

This is the continuation of a series on calling native code from high-level languages. Here is a description of the native library I'm calling in this series.

When I got to Python I thought things would get easier. After all, Python was written to be a quick and easy wrapper around C. Alas, no, in a pattern that was becoming familiar it was fairly easy to wrap the native code and call it from the Python code, but getting it to find the native libraries at runtime was another difficult challenge.

There are two traditional ways to call native code from Python. The first is `ctypes`, and the other is writing a Python extension in C using `Cython`. ctypes is generally easier for quick and easy calling of a native library, while Cython is better for truly getting the advantages of calling native code (primarily optimization of execution speed). There are some other approaches, but these are the ones I tried for this post.

Calling the native libraries via ctypes

ctypes is part of the Python standard library, so no steps were necessary to import it into the project.

Loading the libraries was a simple matter of calling a ctypes function and mapping the argument and return types.


import ctypes

class RASHUNAL(ctypes.Structure):
    _fields_ = [("numerator", ctypes.c_int), ("denominator", ctypes.c_int)]

class RMATRIX(ctypes.Structure):
    pass

class GAUSS_FACTORIZATION(ctypes.Structure):
    _fields_ = [
        ("P_INVERSE", ctypes.POINTER(RMATRIX)),
        ("LOWER", ctypes.POINTER(RMATRIX)),
        ("DIAGONAL", ctypes.POINTER(RMATRIX)),
        ("UPPER", ctypes.POINTER(RMATRIX))
    ]

_rashunal_lib = ctypes.CDLL('librashunal.so')
_rashunal_lib.n_Rashunal.argtypes = (ctypes.c_int, ctypes.c_int)
_rashunal_lib.n_Rashunal.restype = ctypes.POINTER(RASHUNAL)

_rmatrix_lib = ctypes.CDLL('librmatrix.so')
_rmatrix_lib.new_RMatrix.argtypes = (ctypes.c_size_t, ctypes.c_size_t, ctypes.POINTER(ctypes.POINTER(RASHUNAL)))
_rmatrix_lib.new_RMatrix.restype = ctypes.POINTER(RMATRIX)

_rmatrix_lib.free_RMatrix.argtypes = (ctypes.POINTER(RMATRIX),)

_rmatrix_lib.RMatrix_gelim.argtypes = (ctypes.POINTER(RMATRIX),)
_rmatrix_lib.RMatrix_gelim.restype = ctypes.POINTER(GAUSS_FACTORIZATION)

_rmatrix_lib.RMatrix_height.argtypes = (ctypes.POINTER(RMATRIX),)
_rmatrix_lib.RMatrix_width.argtypes = (ctypes.POINTER(RMATRIX),)

_rmatrix_lib.RMatrix_get.argtypes = (ctypes.POINTER(RMATRIX), ctypes.c_size_t, ctypes.c_size_t)
_rmatrix_lib.RMatrix_get.restype = ctypes.POINTER(RASHUNAL)

Custom types are declared as subclasses of the ctypes.Structure class. The RMatrix struct is declared but not given a body in the RMatrix library, so I modeled that as a Python class that also extends ctypes.Structure but has no body. Pointer types are modeled as `ctypes.POINTER` objects with an argument of the type or struct the pointer is for.

Note that if a function has a single argument, the field is still argtypes (plural). Also, the argument is a Python tuple, so if it only has one element then it needs a trailing comma. That took me a while to figure out!

Once the functions are declared, they are called just like regular Python functions.


def allocate_c_rmatrix(m):
    height = m.height
    width = m.width
    element_count = height * width

    c_rashunal_pointers = (ctypes.POINTER(RASHUNAL) * element_count)()
    for i in range(element_count):
        pel = m.data[i]
        r = _rashunal_lib.n_Rashunal(pel.numerator, pel.denominator)
        c_rashunal_pointers[i] = ctypes.cast(r, ctypes.POINTER(RASHUNAL))
    c_rmatrix = _rmatrix_lib.new_RMatrix(height, width, c_rashunal_pointers)
    for i in range(element_count):
        cel = c_rashunal_pointers[i]
        _std_lib.free(cel)
    _std_lib.free(c_rashunal_pointers)
    return c_rmatrix

def allocate_python_rmatrix(m):
    height = _rmatrix_lib.RMatrix_height(m)
    width = _rmatrix_lib.RMatrix_width(m)
    p_rashunals = []
    for i in range(1, height + 1):
        for j in range(1, width + 1):
            c_rashunal = _rmatrix_lib.RMatrix_get(m, i, j)
            p_rashunals.append(RMatrix.PRashunal((c_rashunal.contents.numerator, c_rashunal.contents.denominator)))
            _std_lib.free(ctypes.cast(c_rashunal, ctypes.c_void_p))
    return RMatrix.PRMatrix(height, width, p_rashunals)

def factor(m):
    crm = allocate_c_rmatrix(m)
    gf = _rmatrix_lib.RMatrix_gelim(crm)

    p_inverse = allocate_python_rmatrix(gf.contents.P_INVERSE)
    lower = allocate_python_rmatrix(gf.contents.LOWER)
    diagonal = allocate_python_rmatrix(gf.contents.DIAGONAL)
    upper = allocate_python_rmatrix(gf.contents.UPPER)

    _rmatrix_lib.free_RMatrix(gf.contents.P_INVERSE)
    _rmatrix_lib.free_RMatrix(gf.contents.LOWER)
    _rmatrix_lib.free_RMatrix(gf.contents.DIAGONAL)
    _rmatrix_lib.free_RMatrix(gf.contents.UPPER)
    _std_lib.free(ctypes.cast(gf, ctypes.c_void_p))

    return RMatrix.PGaussFactorization(p_inverse, lower, diagonal, upper)

ctypes and objects obtained from it have some utility methods that come in handy. Arrays are declared by calling the pointer type times the length of the array as a function: c_rashunal_pointers = (ctypes.POINTER(RASHUNAL) * element_count)(). Pointers can be cast (c_rashunal_pointers[i] = ctypes.cast(r, ctypes.POINTER(RASHUNAL))), and dereferenced (upper = allocate_python_rmatrix(gf.contents.UPPER)).

As in other languages, the structs allocated by the native library and returned to the caller have to be disposed of properly to prevent memory leaks.

So that seems pretty straightforward. I've written this as if it were to be run on a Linux machine. Trying to move to other platforms introduced the complexity.

Making it cross-platform

I started in my Ubuntu WSL shell this time, so note the names of the files in the ctypes.CDLL calls. Very Linux specific. The first task was to make that cross-platform.


def load_library(lib_name):
    if sys.platform.startswith("win"):
        filename = f"{lib_name}.dll"
    elif sys.platform.startswith("darwin"):
        filename = f"lib{lib_name}.dylib"
    else:
        filename = f"lib{lib_name}.so"

    try:
        return ctypes.CDLL(filename)
    except OSError as e:
        raise OSError(f"Could not load library '{filename}'")

_rashunal_lib = load_library('rashunal')
_rashunal_lib.n_Rashunal.argtypes = (ctypes.c_int, ctypes.c_int)
_rashunal_lib.n_Rashunal.restype = ctypes.POINTER(RASHUNAL)

Also, I needed a different approach to load the standard libraries. As discussed in the C# post, the standard libraries have different names on the three operating systems, so a simple root name based approach wouldn't work:


def load_standard_library():
    if sys.platform.startswith("win"):
        return ctypes.CDLL('ucrtbase.dll')
    elif sys.platform.startswith("darwin"):
        return ctypes.CDLL('libSystem.dylib')
    else:
        return ctypes.CDLL('libc.so.6')

_std_lib = load_standard_library()
_std_lib.free.argtypes = (ctypes.c_void_p,)
_std_lib.malloc.argtypes = (ctypes.c_size_t,)

Not too bad. That worked fine on Linux, and also on Mac OS if I put /usr/local/lib in DYLD_LIBRARY_PATH. However, Windows was the standout this time. Turns out since Windows 10 "ctypes.CDLL and the system loader sometimes ignore PATH for dependent DLLs due to 'SafeDllSearchMode' and other loader rules." Thanks, Microsoft.

You can add to the Python interpreter's search path by making os.add_dll_directory calls. To keep the code flexible, I went back to the environment variable trick to add the required locations.


def get_dll_dirs_from_env(env_var="RMATRIX_LIB_DIRS"):
    val = os.environ.get(env_var, "")
    if not val:
        return []
    return val.split(os.pathsep)

def load_library(lib_name):
    if sys.platform.startswith("win"):
        dll_dirs = get_dll_dirs_from_env()
        for d in dll_dirs:
            if not os.path.isdir(d):
                continue
            os.add_dll_directory(d)
        filename = f"{lib_name}.dll"
    elif sys.platform.startswith("darwin"):
        filename = f"lib{lib_name}.dylib"
    else:
        filename = f"lib{lib_name}.so"

    try:
        return ctypes.CDLL(filename)
    except OSError as e:
        raise OSError(f"Could not load library '{filename}'")

> $env:RMATRIX_LIB_DIRS="C:\Users\john.todd\local\rashunal\lib;C:\Users\john.todd\local\rmatrix\lib"
> python main.py /Users/john.todd/source/repos/rmatrix/driver/example.txt
using data from file /Users/john.todd/source/repos/rmatrix/driver/example.txt
Input matrix:
[ {-2,1} {1,3} {-3,4} ]
[ {6,1} {-1,1} {8,1} ]
[ {8,1} {3,2} {-7,1} ]

PInverse:
[ {1,1} {0,1} {0,1} ]
[ {0,1} {0,1} {1,1} ]
[ {0,1} {1,1} {0,1} ]

Lower:
[ {1,1} {0,1} {0,1} ]
[ {-3,1} {1,1} {0,1} ]
[ {-4,1} {0,1} {1,1} ]

Diagonal:
[ {-2,1} {0,1} {0,1} ]
[ {0,1} {17,6} {0,1} ]
[ {0,1} {0,1} {23,4} ]

Upper:
[ {1,1} {-1,6} {3,8} ]
[ {0,1} {1,1} {-60,17} ]
[ {0,1} {0,1} {1,1} ]

And voila, works on all three platforms.

Calling the native libraries via Cython

Cython is a weird dialect? sublanguage? independent language? It looks most like Python, but includes some elements of C. Hence the name, a combination of C and Python. The Cython documentation and examples in the tutorials discussed mainly wrapping C standard library functions or an implementation of Queues. I couldn't find a good example of wrapping a custom library, or two custom libraries with dependencies on each other like my model libraries. So once again, ChatGPT and I plunged in.

For this experiment I worked in my Ubuntu WSL shell. I wound up with two Python modules that can be separately compiled and packaged and installed via pip.

The easier one: packaging Rashunal

Cython requires a declarations file (pxd) and an implementation file (pyx). The convention seems to be to name the declarations file as the name of the library with a 'c' prepended. The pyx file can be named just the name of the library.


# crashunal.pxd
cdef extern from "rashunal.h":
    ctypedef struct Rashunal:
        int numerator
        int denominator
    
    Rashunal *n_Rashunal(int numerator, int denominator)

# rashunal.pyx
from libc.stdlib cimport free
cimport crashunal

cdef class Rashunal:
    cdef crashunal.Rashunal *_c_rashunal

    def __cinit__(self, numerator, denominator):
        self._c_rashunal = crashunal.n_Rashunal(numerator, denominator)
        if self._c_rashunal is NULL:
            raise MemoryError()
    
    def __dealloc__(self):
        if self._c_rashunal is not NULL:
            crashunal.free(self._c_rashunal)
            self._c_rashunal = NULL
    
    def __str__(self):
        return f"{{{self._c_rashunal.numerator},{self._c_rashunal.denominator}}}"
    
    @property
    def numerator(self):
        return self._c_rashunal.numerator
    
    @property
    def denominator(self):
        return self._c_rashunal.denominator

In Cython, things that begin with a "c" are related to the native library and the C code. So "cimport" means "import something from the C library", "cdef" means "declare this as something that will be used by the C code", and "ctypedef" means "this is a type that will be coming from C". Things without the "c" prefix are meant to be used by the Python code. (There is also a "cp" prefix, meaning something can be used by both the C and Python code. I'm not sure how that would be useful.)

crashunal.pxd declares the Rashunal struct and the n_Rashunal method. It says their definitions can be obtained from the rashunal.h header file, wherever that may be. (I'll come back to that later.)

rashunal.pyx declares an ordinary Python class, Rashunal that wraps a crashunal.Rashunal struct and holds a reference to it. Rashunal's constructor accepts a numerator and a denominator, passing them to the native n_Rashunal method, and holding on to the struct that is returned. It also declares a __dealloc__ method that frees the struct when the object goes out of scope, and a couple of convenience properties for easy access to the fields of the struct.

Cython modules are built using a setup.py file:


import os
from setuptools import setup, Extension
from Cython.Build import cythonize

extensions = [
    Extension(
        "rashunal._rashunal",
        ["rashunal/rashunal.pyx"],
        libraries=["rashunal"],
        include_dirs=[os.environ.get("RASHUNAL_INCLUDE", "/usr/local/include")],
        library_dirs=[os.environ.get("RASHUNAL_LIB", "/usr/local/lib")]
    )
]

setup(
    name="rashunal",
    version="0.1.0",
    packages=["rashunal"],
    ext_modules=cythonize(
        extensions,
        language_level="3",
        include_path=["rashunal"]
    ),
)

The extensions is the list of all extensions that are to be built. More than one can be built by a single setup.py file, and I did that for a while with Rashunal and RMatrix, but backed off to one at a time in order to make the process and packages more granular. The extension is named rashunal._rashunal to reflect finding the package and paralleling the directory structure. The underscore is to hide the C library and prevent import confusion when bringing it into a client. Most of the flags here are related to finding the C libraries: libraries is the list of libraries to link to, include_dirs is where to find their header files (if they're not part of the project), and library_dirs is where to find their compiled binaries. If you're building at the command line these can be supplemented by flags, but for reasons I'll discuss later I had to complete them with environment variables and default values.

The setup method describes how to actually build the extensions. It needs the name(s) of the package(s) to build and the list of extensions to include. The include_path here is where to find the pxd and pyx files


# __init__.py
from ._rashunal import Rashunal

__init__.py is required, but can be empty. I added this import to both obscure the C library and simplify the import. If __init__.py were empty the build would work and the code could be imported, but it would look pretty ugly: import rashunal._rashunal.Rashunal, or something like that.

Here's the directory setup:


$ tree .
.
├── rashunal
│   ├── __init__.py
│   ├── crashunal.pxd
│   └── rashunal.pyx
└── setup.py

1 directory, 4 files

Cython and its related tools are not part of the Python standard library, so they have to be installed.


$ pip install Cython, setuptools
$ python setup.py build_ext -i

This works, and the output can be imported into client code and be used. I wanted to take the further step and make this into a pip package, however. That required a couple more files.


# pyproject.toml
[build-system]
requires = ["setuptools>=61.0", "wheel", "Cython"]
build-backend = "setuptools.build_meta"

[project]
name = "rashunal"
version = "0.1.0"
description = "Python bindings for the Rashunal C library"
authors = [{ name = "John Todd" }]
readme = "README.md"
requires-python = ">=3.8"


# MANIFEST.in
include rashunal/*.pxd
include rashunal/*.pyx

$ tree .
.
├── MANIFEST.in
├── README.md
├── pyproject.toml
├── rashunal
│   ├── __init__.py
│   ├── crashunal.pxd
│   └── rashunal.pyx
└── setup.py

1 directory, 7 files

pyproject.toml gives instructions on how the wheel file is to be built and a description of the project, including any dependencies or runtime requirements. MANIFEST.in says that the pxd and pyx file should be included in the wheel. The build tool will need those in order to compile the Cython code later on.

Now the package can be built at the command line, but include_dirs and library_dirs cannot be added at this point. This is why I had to include environment variables in setup.py to find the C header and library files. I also didn't want this experimental project permanently installed in my Python environment, so I created a virtual environment to test them.

The build tool also has to be installed before it can be used.


$ python3 -m pip install build
$ python3 -m build
$ python3 -m venv venv-test
$ source venv-test/bin/activate
(venv-test) $ pip install --upgrade pip wheel
(venv-test) $ pip install dist/rashunal-0.1.0-cp310-cp310-linux_x86_64.whl
(venv-test) $ cd ~
(venv-test) $ python
>>> from rashunal import Rashunal
>>> r = Rashunal(1, 2)
>>> print(r)
{1,2}

Note when starting the Python REPL and importing the code I had to be in a different directory than the project directory so the interpreter didn't confuse the installed pip wheel with the source code.

The harder one: packaging RMatrix

Things got really hairy when I tried to package RMatrix because of its dependency on Rashunal. I imagined that Rashunal and RMatrix would be packaged separately, since a library of rational numbers could theoretically be used for other purposes than matrices and linear algebra.

The __init__.py, pxd and pyx files were fairly straightforward and comparable to Rashunal's:


# __init__.py
from ._rmatrix import RMatrix

# crmatrix.pxd
cimport crashunal

cdef extern from "rmatrix.h":
    ctypedef struct RMatrix:
        pass
    
    RMatrix *new_RMatrix(size_t height, size_t width, crashunal.Rashunal **data)
    void free_RMatrix(RMatrix *m)
    size_t RMatrix_height(const RMatrix *m)
    size_t RMatrix_width(const RMatrix *m)
    Gauss_Factorization *RMatrix_gelim(const RMatrix *m)
    crashunal.Rashunal *RMatrix_get(const RMatrix *m, size_t row, size_t col)

    ctypedef struct Gauss_Factorization:
        const RMatrix *pi
        const RMatrix *l
        const RMatrix *d
        const RMatrix *u

# rmatrix.pyx
from libc.stdlib cimport malloc, free
cimport crashunal
cimport crmatrix

cdef class RMatrix:
    cdef crmatrix.RMatrix *_c_rmatrix

    def __cinit__(self, data):
        cdef height = len(data)
        cdef width = len(data[0])
        cdef el_count = height * width
        cdef crashunal.Rashunal **arr =  malloc(el_count * sizeof(crashunal.Rashunal*))
        if arr is NULL:
            raise MemoryError()

        try:
            for i in range(el_count):
                el = data[i // width][i % width]
                num = el[0]
                den = el[1] if len(el) == 2 else 1
                arr[i] = crashunal.n_Rashunal(num, den)
                if arr[i] is NULL:
                    raise MemoryError()
            self._c_rmatrix = crmatrix.new_RMatrix(height, width, arr)
            if self._c_rmatrix is NULL:
                raise MemoryError()
        finally:
            for i in range(el_count):
                if arr[i] is not NULL:
                    crashunal.free(arr[i])
            crashunal.free(arr)
    
    def __dealloc__(self):
        if self._c_rmatrix is not NULL:
            crmatrix.free_RMatrix(self._c_rmatrix)
            self._c_rmatrix = NULL

    @property
    def height(self):
        return crmatrix.RMatrix_height(self._c_rmatrix)

    @property
    def width(self):
        return crmatrix.RMatrix_width(self._c_rmatrix)
    
    def factor(self):
        cdef crmatrix.Gauss_Factorization *f
        f = crmatrix.RMatrix_gelim(self._c_rmatrix)
        try:
            result = (
                _crmatrix_to_2d_array(f.pi),
                _crmatrix_to_2d_array(f.l),
                _crmatrix_to_2d_array(f.d),
                _crmatrix_to_2d_array(f.u)
            )
        finally:
            if f.pi != NULL: crmatrix.free_RMatrix(f.pi)
            if f.l  != NULL: crmatrix.free_RMatrix(f.l)
            if f.d  != NULL: crmatrix.free_RMatrix(f.d)
            if f.u  != NULL: crmatrix.free_RMatrix(f.u)
            crashunal.free(f)
        return result

cdef _crmatrix_to_2d_array(const crmatrix.RMatrix *crm):
    cdef height = crmatrix.RMatrix_height(crm)
    cdef width = crmatrix.RMatrix_width(crm)
    cdef result = []
    cdef const crashunal.Rashunal *el
    for i in range(height):
        row = []
        for j in range(width):
            el = crmatrix.RMatrix_get(crm, i + 1, j + 1)
            row.append((el.numerator, el.denominator))
            crashunal.free(el)
        result.append(row)
    return result

The type definitions mirror what is in the native libraries. For the implementation I backed off to passing the RMatrix constructor a 3D array of integers rather than a custom object for maximum flexibility when packaged for pip. By now the allocation and deallocation code should be understandable, even if the syntax varies from implementation to implementation. The pointer casts when deallocating memory are necessary to avoid C compiler warnings.


# setup.py
import os
import sys
from setuptools import setup, Extension
from Cython.Build import cythonize

extensions = [
    Extension(
        "rmatrix._rmatrix",
        ["rmatrix/rmatrix.pyx"],
        include_dirs=[os.environ.get("RMATRIX_INCLUDE", "/usr/local/include")],
        libraries=["rmatrix"],
        library_dirs=[os.environ.get("RMATRIX_LIB", "/usr/local/lib")]
    )
]

setup(
    name="rmatrix",
    version="0.1.0",
    packages=["rmatrix"],
    install_requires=["rashunal>=0.1.0"],
    ext_modules=cythonize(
        extensions,
        language_level="3",
        include_path=["rmatrix"]
    )
)

setup.py is very similar to Rashunal's. Notice the install_requires value to setup. That would ordinarily require an include_path reference to Rashunal's pxd and pyx files, but if these were separate projects neither I nor ChatGPT could come up with a way to include them here. Fortunately, we did discover a way to do it in virtual environments.

pyproject.toml and MANIFEST.in were pretty much identical to Rashunal's. The toml file did include a field saying it depends on Rashunal.


# pyproject.toml
[build-system]
requires = ["setuptools>=61.0", "wheel", "Cython"]
build-backend = "setuptools.build_meta"

[project]
name = "rmatrix"
version = "0.1.0"
description = "Python bindings for the RMatrix C library"
authors = [{ name = "John Todd" }]
readme = "README.md"
requires-python = ">=3.8"
dependencies = ["rashunal>=0.1.0"]

# MANIFEST.in
include rmatrix/*.pxd
include rmatrix/*.pyx

Much thrashing ensued as I tried to get RMatrix to compile, mainly with locating the Rashunal library. As I outlined above, the compiler needed to find Rashunal's pxd and pyx files. Assuming these would be packaged separately, I didn't want to refer to the source code, even though it was right next to the rmatrix code in my project directory. Instead, I eventually noticed that the wheel file contained them and they were extracted when it was installed in my virtual environment. The build process works in its own fresh virtual environment, but there was no way to install Rashunal in it before trying to install RMatrix. I could reuse the test virtual environment I already had with Rashunal installed in it, however.


$ cd rmatrix
$ source ~/workspace/venv-test/bin/activate
(venv-test) $ python -m build --no-isolation
(venv-test) $ pip install ~/workspace/GoingNative/cython_rmatrix/rmatrix/dist/rmatrix-0.1.0-cp310-cp310-linux_x86_64.whl
(venv-test) $ cd ~
(venv-test) $ python
>>> from rmatrix import RMatrix
>>> crm = RMatrix([[[1], [2], [3,2]], [[4,3], [5], [6]]])
>>> (p_inverse, lower, diagonal, upper) = crm.factor()
>>> print(lower)

Not sure if that's an acceptable way to do it, but at this point I was just happy it worked. Once again, I needed to change to a different directory when starting the REPL to avoid confusing the installed wheel with the source code.

So there it is, in Linux at least. Some other possibilities ChatGPT mentioned that I didn't look into are:

Better packaging of the native libraries using pkg-config. This could probably be done in the CMake code.
Packing the generated C file along with or instead of the pxd and pxc files for downstream compiling.
Packing the binaries themselves within the wheel so they just work.

I briefly looked into doing this on Windows and MacOS, but ran into insurmountable difficulties. I won't go into the details, but the gist is that virtual environments on Windows and MacOS don't inherit settings from the shell they are invoked from. So there is no way to point to the native headers or binaries to get everything to compile. Both require modifying the source code in setup.py or pyproject.toml in order to set the paths. So if you're trying to write a cross-platform Python library that relies on native libraries, good luck. I can't help you.

Reflection

Wow, that was a journey.

The ctypes approach was definitely simpler, and I got it to work on all three platforms. The Cython approach was much more complicated. I'm not sure how to measure or assess the claims that it is more performant than the ctypes approach. It seems to be better for packaging up the C libraries in a format suitable to Python. Once the pip packages are available clients can use them in a way that Python developers are intimately familiar with. But boy, was it a bear to get working. Still, I feel a sense of accomplishment getting it done, and I do think I learned more about compiling and linking tools, even if I don't fully understand all the syntax and tools.

Code repositories

https://github.com/proftodd/GoingNative/tree/main/python_rmatrix https://github.com/proftodd/GoingNative/tree/main/cython_rmatrix

Monday, September 29, 2025

Going Native - C#

"I belong to the warrior in whom the old ways have joined the new."

Inscription on the sword wielded by Captain Nathan Algren, The Last Samurai

From the JVM to the CLR

This is the third part in a series on calling native code from high-level languages. I've been interested in making useful code locked away in native libraries more widely available, and took this opportunity to finally look into how it's done.

Here is a description of the native library I'm calling in this series.

After struggling through getting the FFM to work, I wasn't sure to expect from .NET. Nevertheless, that's the next language I'm most familiar with it, so I went ahead and plunged in.

Here is a description of the native library.

The approach I followed is Explicit PInvoke, outlined on the Microsfoft Learn website. That provides good background and outline of the process and alternatives. In reality it was so easy that I got by just with conversations with ChatGPT.

The Basics

I started by declaring structs that mirrored the (public) structs in the native libraries:


[StructLayout(LayoutKind.Sequential)]
private struct Rashunal
{
    public int numerator;
    public int denominator;
}
[StructLayout(LayoutKind.Sequential)]
private struct GaussFactorization
{
    public IntPtr PInverse;
    public IntPtr Lower;
    public IntPtr Diagonal;
    public IntPtr Upper;
}

The attributes indicate that the structs are laid out in memory with one field directly following on the previous one. IntPtr is a generic .NET class for a pointer to some memory location. You'll see it again!

Then the native functions are declared in a simple fashion that matches C#'s variable types, with attributes that declare what library to find it in and what the native method is. The methods (and the class) are declared partial because the implementation is provided by the native code. According to convention the C# function and the native function have the same name, but that's not required.


[LibraryImport("rashunal", EntryPoint = "n_Rashunal")]
private static partial IntPtr n_Rashunal(int numerator, int denominator);

[LibraryImport("rmatrix", EntryPoint = "new_RMatrix")]
private static partial IntPtr new_RMatrix(int height, int width, IntPtr data);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_height")]
private static partial int RMatrix_height(IntPtr m);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_width")]
private static partial int RMatrix_width(IntPtr m);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_get")]
private static partial IntPtr RMatrix_get(IntPtr m, int row, int col);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_gelim")]
private static partial IntPtr RMatrix_gelim(IntPtr m);

Then the native methods can be called alongside normal C# code. I'll go in reverse of the actual process of factoring a matrix using the native code.


public static CsGaussFactorization Factor(Model.CsRMatrix m)
{
    var nativeMPtr = AllocateNativeRMatrix(m);
    var fPtr = RMatrix_gelim(nativeMPtr);
    var f = Marshal.PtrToStructure(fPtr);
    var csF = new CsGaussFactorization
    {
        PInverse = AllocateManagedRMatrix(f.PInverse),
        Lower = AllocateManagedRMatrix(f.Lower),
        Diagonal = AllocateManagedRMatrix(f.Diagonal),
        Upper = AllocateManagedRMatrix(f.Upper),
    };
    NativeStdLib.Free(nativeMPtr);
    NativeStdLib.Free(fPtr);
    return csF;
}

First I call a method to allocate a native matrix (below), and then I call RMatrix_gelim on it, which returns a pointer to a native struct. Since the struct is part of the public native interface it can be unmarshaled into a C# object with the Marshal.PtrToStructure call. Then the native matrix pointers are used to construct managed matrices through the AllocateManagedRMatrix calls (also below). Finally, since the native matrix pointer and the factorization pointer are allocated by the native code, they have to be freed by a call to the native free method. Also see below.


private static IntPtr AllocRashunal(int num, int den)
{
    IntPtr ptr = NativeStdLib.Malloc((UIntPtr)Marshal.SizeOf());
    var value = new Rashunal { numerator = num, denominator = den };
    Marshal.StructureToPtr(value, ptr, false);
    return ptr;
}

private static IntPtr AllocateNativeRMatrix(Model.CsRMatrix m)
{
    int elementCount = m.Height * m.Width;
    IntPtr elementArray = NativeStdLib.Malloc((UIntPtr)(IntPtr.Size * elementCount));
    unsafe
    {
        var pArray = (IntPtr*)elementArray;
        for (int i = 0; i < elementCount; ++i)
        {
            var element = m.Data[i];
            var elementPtr = AllocRashunal(element.Numerator, element.Denominator);
            pArray[i] = elementPtr;
        }
        var rMatrixPtr = new_RMatrix(m.Height, m.Width, elementArray);
        for (int i = 0; i < elementCount; ++i)
        {
            NativeStdLib.Free(pArray[i]);
        }
        NativeStdLib.Free(elementArray);
        return rMatrixPtr;
    }
}

Allocating a native RMatrix required native memory allocations, both for individual Rashunals and also for an array of Rashunal pointers. In a pattern that seems familiar now, I wrapped those calls in a NativeStdLib class that I promise to get to very soon. Allocating a Rashunal involves declaring a managed Rashunal struct, a pointer to a native Rashunal, and marshaling the struct to the pointer in native memory. The unsafe block is needed to treat the block of memory allocated for the pointer array as an actual array, instead of a block of unstructured memory. To get this to compile I had to add True to the PropertyGroup in the project file. Finally, I have to free both the individual allocated native Rashunals and the array of pointers to them, since new_RMatrix makes copies of them all.


private static Model.CsRMatrix AllocateManagedRMatrix(IntPtr m)
{
    int height = RMatrix_height(m);
    int width = RMatrix_width(m);
    var data = new CsRashunal[height * width];
    for (int i = 1; i <= height; ++i)
    {
        for (int j = 1; j <= width; ++j)
        {
            var rPtr = RMatrix_get(m, i, j);
            var r = Marshal.PtrToStructure(rPtr);
            data[(i - 1) * width + (j - 1)] = new CsRashunal { Numerator = r.numerator, Denominator = r.denominator };
            NativeStdLib.Free(rPtr);
        }
    }
    return new Model.CsRMatrix { Height = height, Width = width, Data = data, };
}

After all that, allocating a native RMatrix is not very interesting. The native RMatrix_get method returns a newly-allocated copy of the Rashunal at a position in the RMatrix, so it has to be freed the same way as before.

Ok, finally, as promised, here is the interface to loading the native standard library methods:


using System.Reflection;
using System.Runtime.InteropServices;

namespace CsRMatrix.Engine;

public static partial class NativeStdLib
{
    static NativeStdLib()
    {
        NativeLibrary.SetDllImportResolver(typeof(NativeStdLib).Assembly, ResolveLib);
    }

    private static IntPtr ResolveLib(string libraryName, Assembly assembly, DllImportSearchPath? searchPath)
    {
        if (libraryName == "c")
        {
            if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
                return NativeLibrary.Load("ucrtbase.dll", assembly, searchPath);
            if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
                return NativeLibrary.Load("libc.so.6", assembly, searchPath);
            if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
                return NativeLibrary.Load("libSystem.dylib", assembly, searchPath);
        }
        return IntPtr.Zero;
    }

    [LibraryImport("c", EntryPoint = "free")]
    internal static partial void Free(IntPtr ptr);

    [LibraryImport("c", EntryPoint = "malloc")]
    internal static partial IntPtr Malloc(UIntPtr size);
}

The platform-specific switching and filenames are pretty ugly, but neither ChatGPT nor I could find a way around it. At least it's confined to a single method in a single class in the project.

ChatGPT really wanted there to be library-specific ways to free Rashunals and factorizations. Then those methods could be declared and called the same way as the new_* methods. But I remained stubborn and said I didn't want to change the source code of the libraries. I was willing to recompile them as needed, but not to change the source code or the CMake files. Eventually, we found this way of handling the standard native library calls.

Getting the name of the file on Windows and getting this to compile and work was a little challenging. The C# code and the native code need to match exactly in the operating system (obviously), architecture (64-bit vs. 32-bit), and configuration (Debug vs. Release). It took a few more details than what I went through when compiling the JNI code.

Compiling on Windows

Windows is very careful about which library can free memory: it can only free memory that was allocated by the same library. Practically, that meant I needed to make sure I was allocating and freeing memory from the same runtime with the same C runtime model. That meant I needed to compile with the multi-threaded DLL (/MD) instead of the default multi-threaded (/MT) compiler flag. I also needed to use the right filename to link the libraries to. ChatGPT and I thought it was mscvrt initially. So I modified the steps to compile the library and checked its headers, imports, and dependencies. This again is in an x64 Native Tools Command Prompt for VS 2022.


>cmake .. -G "NMake Makefiles" ^
  -DCMAKE_BUILD_TYPE=Release ^
  -DCMAKE_INSTALL_PREFIX=C:/Users/john.todd/local/rashunal ^
  -DCMAKE_C_FLAGS_RELEASE="/MD /O2 /DNDEBUG"
>nmake
>nmake install
>cd /Users/john.todd/local/rashunal/bin
>dumpbin /headers rashunall.dll | findstr machine
            8664 machine (x64)

>dumpbin /imports rashunal.dll | findstr free
                          18 free

>dumpbin /dependents rashunal.dll

I didn't see msvcrt.dll, but did see VCRUNTIME140.DLL instead. ChatGPT said, "Ah, that's okay, that's actually better. msvcrt is the old way, ucrt (Universal CRT) is the new way." Then linking to "ucrtbase" in the NativeStdLib utility class (as shown above) worked.

Like with JNI, I had to add the Rashunal and RMatrix libraries to the PATH, and then it worked!


> $env:PATH += ";C:\Users\john.todd\local\rashunal\bin\rashunal.dll;C:\Users\john.todd\local\rmatrix\bin\rmatrix.dll"
> dotnet run C:\Users\john.todd\source\repos\rmatrix\driver\example.txt
Using launch settings from C:\Users\john.todd\source\repos\GoingNative\CsRMatrix\CsRMatrix\Properties\launchSettings.json...
Reading matrix from C:/Users/john.todd/source/repos/rmatrix/driver/example.txt
Starting Matrix:
[ {-2/1} {1/3} {-3/4} ]
[ {6/1} {-1/1} {8/1} ]
[ {8/1} {3/2} {-7/1} ]


PInverse:
[ {1/1} {0/1} {0/1} ]
[ {0/1} {0/1} {1/1} ]
[ {0/1} {1/1} {0/1} ]


Lower:
[ {1/1} {0/1} {0/1} ]
[ {-3/1} {1/1} {0/1} ]
[ {-4/1} {0/1} {1/1} ]


Diagonal:
[ {-2/1} {0/1} {0/1} ]
[ {0/1} {17/6} {0/1} ]
[ {0/1} {0/1} {23/4} ]


Upper:
[ {1/1} {-1/6} {3/8} ]
[ {0/1} {1/1} {-60/17} ]
[ {0/1} {0/1} {1/1} ]

What's even more exciting is that when I committed this to Github and pulled it down in Linux and MacOS, it also just worked (for MacOS after adding the install directories to DYLIB_LD_PATH, similarly to what I had to do with JNI.)

Optimization

Remembering to free pointers allocated by native code isn't so bad. I had to do it in Java with the FFM and when writing the libraries in the first place. But ChatGPT suggested an optimization to have the CLR do it automatically. After reassuring it many times that the new_*, RMatrix_get, and RMatrix_gelim native methods returned pointers to newly-allocated copies of the relevant entities and not pointers to the entities themselves, it said this was the perfect application of the handler pattern. Who can pass that up?

First I wrote some wrapper classes for the pointers returned from the native code:


internal abstract class NativeHandle : SafeHandle
{
    protected NativeHandle() : base(IntPtr.Zero, ownsHandle: true) { }

    protected NativeHandle(IntPtr existing, bool ownsHandle)
        : base(IntPtr.Zero, ownsHandle)
        => SetHandle(existing);

    public override bool IsInvalid => handle == IntPtr.Zero;

    protected override bool ReleaseHandle()
    {
        NativeStdLib.Free(handle);
        return true;
    }
}

internal sealed class RashunalHandle : NativeHandle
{
    internal RashunalHandle() : base() { }

    internal RashunalHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

internal sealed class RMatrixHandle : NativeHandle
{
    internal RMatrixHandle() : base() { }

    internal RMatrixHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

internal sealed class GaussFactorizationHandle : NativeHandle
{
    internal GaussFactorizationHandle() : base() { }

    internal GaussFactorizationHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

Then I had most of the native and managed code use the handles as parameters and return values instead of the pointers returned by the native code:


[DllImport("rashunal", EntryPoint = "n_Rashunal")]
private static extern RashunalHandle n_Rashunal(int numerator, int denominator);

[DllImport("rmatrix", EntryPoint = "new_RMatrix")]
private static extern RMatrixHandle new_RMatrix(int height, int width, IntPtr data);

[DllImport("rmatrix", EntryPoint = "RMatrix_height")]
private static extern int RMatrix_height(RMatrixHandle m);

[DllImport("rmatrix", EntryPoint = "RMatrix_width")]
private static extern int RMatrix_width(RMatrixHandle m);

[DllImport("rmatrix", EntryPoint = "RMatrix_get")]
private static extern RashunalHandle RMatrix_get(RMatrixHandle m, int row, int col);

[DllImport("rmatrix", EntryPoint = "RMatrix_gelim")]
private static extern GaussFactorizationHandle RMatrix_gelim(RMatrixHandle m);

private static Model.CsRMatrix AllocateManagedRMatrix(RMatrixHandle m)
{
    int height = RMatrix_height(m);
    int width = RMatrix_width(m);
    var data = new CsRashunal[height * width];
    for (int i = 1; i <= height; ++i)
    {
        for (int j = 1; j <= width; ++j)
        {
            using var rPtr = RMatrix_get(m, i, j);
            var r = Marshal.PtrToStructure(rPtr.DangerousGetHandle());
            data[(i - 1) * width + (j - 1)] = new CsRashunal { Numerator = r.numerator, Denominator = r.denominator };
        }
    }
    return new Model.CsRMatrix { Height = height, Width = width, Data = data, };
}

Note the switch from LibraryImport to DllImport on the struct declarations. LibraryImport is newer and more preferred, but for some reason it can't do the automatic marshaling of pointers into handles like DllImport can.

Now there's no need to explicitly free the pointers returned from RMatrix_get, n_Rashunal, n_RMatrix, and RMatrix_gelim. There are still some places where I have to remember to free memory, such as when the array of Rashunal pointers is allocated in AllocRashunal. There are also some calls to ptr.DangerousGetHandle() when I need to marshal a pointer to a struct. I tried to get rid of those, but apparently they are unavoidable.

I didn't like the repeated boilerplate code in the concrete subclasses of NativeHandle. I wanted to just use NativeHandle as a generic, i.e. NativeHandle, but that didn't work. ChatGPT said I needed a concrete class to marshal the native struct into, and that the structs I declared in the adapter wouldn't do it. That's also why the parameterless constructors are needed, for the marshaling code, even though they don't do anything but defer to the base class. So be it.

Reflection

After struggling so much with FFM, I was pleasantly surprised by how easy it was to work with C# and its method of calling native code. Interspersing the native calls with the managed code was pretty fun and easy, especially after refactoring to use handles to automatically dispose of allocated memory. It was a little tricky figuring out when I still had to marshal pointers into structs or vice versa, but the compiler and ChatGPT helped me figure it out pretty quickly.

So far, if given the choice of how to call my native libraries, C# and the CLR is definitely how I would do it.

Code repository

https://github.com/proftodd/GoingNative/tree/main/CsRMatrix

Wednesday, September 17, 2025

Going Native - Foreign Function & Memory API (FFM)

Be not the first by whom the new are tried, nor yet the last to lay the old aside.

Alexander Pope

When I started doing research for my post on JNI, I heard about some newfangled thing called the Foreign Function and Memory API (FFM). Apparently it does all the same things as JNI, but purely in Java code, so you have all the conveniences of modern Java development without all the hassles of compiling and linking two different languages and getting them to play nicely together. After finishing my experiments in JNI, therefore I was excited to give it a try.

For a refresher on the native matrix library, see the section The native code in the introduction to this series.

The concepts in the FFM have been kicking around for several Java versions, going back at least to Java 17. However, it's nearly finalized in Java 24, although the native-accessing code is still marked experimental and give warnings when compiled without specific flags (--enable-native-access=ALL-UNNAMED).

There are several blog posts about using FFM, but they all seem to copy the same examples on the official Java website. Thus I was truly on my own this time.

An aside about AI programming aids

Well, not completely on my own. I made extensive use of AI programming aids during this project, particularly a couple of installations of ChatGPT). I have been slow to get on the AI train, and I am still highly skeptical of many of the claims that are made about it. But I freely admit that I could not have completed this project or the JNI project without its help. There is just so much detailed, obscure, and esoteric knowledge about compiling, linking, tool flags, and platform idiosyncrasies that no person can know it all. While my Google searching skills are decent, I don't believe I could have found the answers I needed within the bounds of my patience in order to bring this to a conclusion. While ChatGPT is not perfect (it is limited by published APIs and documentation and can get confused about the requirements of different software versions), it was definitely a big help to me!

The Arena

The basic idea of FFM is that you take over the management of native memory in Java code instead of native code. This starts with an Arena, which can be opened and disposed of in a try block like any other try-with resource. Also within the Java code you can lay out the memory of structs you'll be using.


GroupLayout RASHUNAL_LAYOUT = MemoryLayout.structLayout(
    JAVA_INT.withName("numerator"),
    JAVA_INT.withName("denominator")
);

GroupLayout GAUSS_FACTORIZATION_LAYOUT = MemoryLayout.structLayout(
    ADDRESS.withName("PI"),
    ADDRESS.withName("L"),
    ADDRESS.withName("D"),
    ADDRESS.withName("U")
);

try (Arena arena = Arena.ofConfined()) {
...    
}

MemoryLayout is an interface with static methods to lay out primitives, structs, arrays, and other entities. The Arena object is then used to allocate blocks of native memory using a layout as a map.


int[][][] data = ;
int height = data.length;
int width = data[0].width;
int elementCount = height * width;

long elementSize = RASHUNAL_LAYOUT.byteSize();
long elementAlign = RASHUNAL_LAYOUT.byteAlignment();
long totalBytes = elementSize * (long)elementCount;
MemorySegment elems = arena.allocate(totalBytes, elementAlign);
long numOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("numerator"));
long denOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("denominator"));
for (int i = 0; i < elementCount; ++i) {
    int row = i / width;
    int col = i % width;
    int[] element = data[row][col];
    int numerator = element[0];
    int denominator = element.length == 1 ? 1 : element[1];
    MemorySegment elementSlice = elems.asSlice(i * elementSize, elementSize);
    elementSlice.set(JAVA_INT, numOffset, numerator);
    elementSlice.set(JAVA_INT, denOffset, denominator);
}

Before with the JNI this was all done in C. Now it's all being done in Java code. It's a lot of steps, and it gets down pretty far into the weeds, but there are advantages to doing it all in Java. Pick your poison.

Native methods are retrieved into the Java code as method handles. They are retrieved by making downcalls (from Java to native methods) on a Linker object. To make the downcall you need the full signature of the native method, with the return value of the call first.


Linker linker = Linker.nativeLinker();
SymbolLookup lookup = OpenNativeLib("rmatrix", arena); // I'll come back to this later
MemorySegment newRMatrixLocation = lookup.find("new_RMatrix").getOrThrow();
MethodHandle new_RMatrix_handle = linker.downcallHandle(newRMatrixLocation, FunctionDescriptor.of(ADDRESS, JAVA_LONG, JAVA_LONG, ADDRESS));

After getting a Linker object, the native library needs to be opened and brought into the JVM. OpenNativeLib is a static method I wrote on the utility class this code is coming from, and I'll come back to its details later.

linker.downcallHandle accepts a MemorySegment, a FunctionDescriptor, and a variable-length list of Linker.Options. It returns a MethodHandle that can be used to call into native methods.

The SymbolLookup returned by OpenNativeLib is used to search the native library for methods and constants. It's a simple name lookup, and returns an Option with whatever it finds.

The FunctionDescriptor is fairly self-explanatory: it's the signature of a native method with constants from java.lang.foreign.ValueLayout representing the return value and the arguments (return value first, followed by arguments). ADDRESS is a general value for a C pointer. new_RMatrix accepts longs representing the height and width of the matrix to be constructed, a pointer to an array of Rashunals, and returns a pointer to the newly-allocated RMatrix.

Once the handle for new_RMatrix is in hand, it can be called to allocate a new RMatrix:


new_RMatrix_handle.invoke((long) height, (long) width, elems);
// compiles, but blows up when run

Not so fast! elems represents an array of Rashunal structs laid out in sequence in native memory. But what new_RMatrix expects is a pointer to an array of Rashunal pointers, not the array of Rashunals themselves. So that array of pointers also needs to be constructed:


MemorySegment ptrArray = arena.allocate(ADDRESS.byteSize() * elementCount, ADDRESS.byteAlignment());
for (int i = 0; i < elementCount; ++i) {
    MemorySegment elementAddr = elems.asSlice(i * elementSize, elementSize);
    ptrArray.setAtIndex(ADDRESS, i, elementAddr);
}
MemorySegment nativeRMatrix = new_RMatrix_handle.invoke((long) height, (long) width, ptrArray);

In a similar way, I got handles to RMatrix_gelim to factor the input matrix and RMatrix_height, RMatrix_width, and RMatrix_get to get information about the four matrices in the factorization. There was one wrinkle when getting information about structs returned by pointer from these methods:


MemorySegment factorZero = (MemorySegment) RMatrix_gelim_handle.invoke(rmatrixPtr);
MemorySegment factor = factorZero.reinterpret(GAUSS_FACTORIZATION_LAYOUT.byteSize(), arena, null);
long piOffset = GAUSS_FACTORIZATION_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("PI"));
...
MemorySegment piPtr = factor.get(ADDRESS, piOffset);
...

When a native method returns a pointer to a struct, the handle returns a zero-length memory segment that has no information about the struct pointed to by that memory. It needs to be reinterpreted as the struct itself using the MemoryLayout that corresponds to the struct. Then the struct can be interpreted using offsets in the reverse of the process used to set data.

Then I worked on the code to translate them back to Java objects:


long numeratorOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("numerator"));
long denominatorOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("denominator"));
long height = (long) RMatrix_height_handle.invoke(mPtr);
long width = (long) RMatrix_width_handle.invoke(mPtr);
JRashunal[] data = new JRashunal[Math.toIntExact(height * width)];
for (long i = 1; i <= height; ++i) {
    for (long j = 1; j <= width; ++j) {
        MemorySegment elementZero = (MemorySegment) RMatrix_get_handle.invoke(mPtr, i, j);
        MemorySegment element = elementZero.reinterpret(RASHUNAL_LAYOUT.byteSize(), arena, null);
        int numerator = element.get(JAVA_INT, numeratorOffset);
        int denominator = element.get(JAVA_INT, denominatorOffset);
        data[Math.toIntExact((i - 1) * width + (j - 1))] = new JRashunal(numerator, denominator);
    }
}
JRashunalMatrix jrm = new JRashunalMatrix(Math.toIntExact(height), Math.toIntExact(width), data);

The offsets are the memory offset within the struct of the field of interest, in this case, the numerator and denominator of the Rashunal struct.

In this way I was able to complete a round trip from Java objects to native code and back.

Missing Link

So how do you load the native code? I thought it would be as simple as the guides say.


var lookup = SymbolLookup.libraryLookup("rmatrix", arena);

Unfortunately, that's not the way it turned out. Many ChatGPT questions and answers followed, but apparently there is a big difference between SymbolLookup.libraryLookup and


System.loadLibrary("jnirmatrix");

which is how I loaded the native library compiled from the JNI header. That used C tools to find rmatrix and rashunal, which are well-understood and have stood the test of time.

According to ChatGPT, System.loadLibrary does a lot of additional work on behalf of the programmer, including formatting library names correctly, looking for code in platform-specific locations, and handling symlinks. FFM deliberately dials back on that, so SymbolLookup.libraryLookup only calls Java code to load libraries. The Javadoc for SymbolLookup.libraryLookup says it defers to dlopen on POSIX systems and LoadLibrary on Windows systems. This searches the path and some environment variables for libraries, but does none of the name enhancements (libLib.so or libLib.dylib or lib.dll) that System.loadLibrary does. This made a bad first impression, but system-specific code turns out to be the way to do it in .NET too, so it's not too bad. /usr/local/lib is on the search path in Linux, but I installed the libraries in a nonstandard location on Windows, so I had to add those to PATH.


String osSpecificLibrary;
String osName = System.getProperty("os.name");
if (osName.contains("Linux")) {
    osSpecificLibrary = "lib" + library + ".so";
} else if (osName.contains("Mac OS")) {
    osSpecificLibrary = "lib" + library + ".dylib";
} else if (osName.contains("Windows")) {
    osSpecificLibrary = library + ".dll";
} else {
    throw new IllegalStateException("Unsupported OS: " + osName);
}
return SymbolLookup.libraryLookup(osSpecificLibrary, arena);


> $env.PATH+=";C:/Users/john.todd/local/rmatrix/bin/rmatrix.dll;C:/Users/john.todd/local/rashunal/bin/rashunal.dll"
> ./gradlew ...

Trying to get this to work on a Mac was an odyssey on its own. Modern versions of MacOS (since OS X El Capitan) have something called System Integrity Protection (SIP), which the developers in Cupertino have wisely put into place to protect us all from ourselves. The Google AI answer for "what is sip macos" says it "Prevents unauthorized code execution: SIP prevents malicious software from running unauthorized code on your Mac", which I guess includes loading dependent libraries from the JVM.

I could load RMatrix using an absolute path to the dylib, but I couldn't load Rashunal from there because RMatrix uses rpaths (relative paths?) to refer to libraries it depends on. rpaths can be supplied in other situations (like the JNI application) by DYLD_LIBRARY_PATH or DYLD_FALLBACK_LIBRARY_PATH, but SIP restricts that from working in certain contexts, such as the JVM (invoked in a particular way). After many big detours into rewriting rpaths to loader_paths or absolute paths and granting the JVM entitlements that allowed loading paths from DYLD_LIBRARY_PATH I finally discovered that java and /usr/bin/java on my Mac are not the same as /Library/Java/JavaVirtualMachines/jdk-24.jdk/Contents/Home/bin/java. Specifically, the first two have the SIP restrictions, but the last one doesn't and it just works with the osSpecificLibrary defined above. Having already spent a lot of time trying to discover how to bypass SIP I wasn't going to look any further into how to get the /usr/bin/java shim to work. So the following command worked from the command line in Mac. Gradle could probably be convinced to do it too, but it didn't by default and I wasn't interested in investigating this any further.


$ /Library/Java/JavaVirtualMachines/jdk-24.jdk/Contents/Home/bin/java \
  -cp app/build/classes/java/main \
  --enable-native-access=ALL-UNNAMED \
  org.jtodd.ffm.ffmrmatrix.App \
  /Users/john/workspace/rmatrix/driver/example.txt
Input matrix:
[ {-2} {1/3} {-3/4} ]
[ {6} {-1} {8} ]
[ {8} {3/2} {-7} ]


PInverse:
[ {1} {0} {0} ]
[ {0} {0} {1} ]
[ {0} {1} {0} ]


Lower:
[ {1} {0} {0} ]
[ {-3} {1} {0} ]
[ {-4} {0} {1} ]


Diagonal:
[ {-2} {0} {0} ]
[ {0} {17/6} {0} ]
[ {0} {0} {23/4} ]


Upper:
[ {1} {-1/6} {3/8} ]
[ {0} {1} {-60/17} ]
[ {0} {0} {1} ]

Cleaning up

Like Java's good old garbage collector, the Arena will clean up any memory directly allocated in it, like the Rashunal array or the pointer array in the code segments above. But memory that is allocated in the native code is opaque to the Java code, and will leak if it's not cleaned up. To do that, you need handles to any library-specific cleanup code or to the stdlib free method. FFM has a special Linker method to look up the language standard libraries, and note the special-purpose FunctionDescriptor.ofVoid method to describe native methods that return void:


MemorySegment freeRMatrixLocation = lookup.find("new_RMatrix").orElseThrow();
MethodHandle freeRMatrixHandle = linker.downcallHandle(newRMatrixLocation, FunctionDescriptor.ofVoid(ADDRESS));

var clib = linker.defaultLookup();
MemorySegment freeLocation = clib.find("free").orElseThrow();
MethodHandle freeHandle = linker.downcallHandle(freeLocation, FunctionDescriptor.ofVoid(ADDRESS));

freeRMatrixHandle.invoke(rmatrixPtr);
freeHandle.invoke(rashunalElement);

I briefly looked at using Valgrind to verify that I wasn't leaking anything further. Apparently the JVM itself spawns a lot of false (?) alarms. I grepped the output for any mentions of librmatrix or librashunal and didn't find any, so hopefully this approach doesn't leak too badly.

Reflection

My first impression of FFM was pretty bad. I had to do a lot more investigating and ChatGPT querying to get this to work on all my platforms than I did with JNI. I'm not sure if any further improvements to Java, FFM, or the operating systems will take away some of the pain. Maybe just time, experience, and more bloggers will make this easier for future developers.

It is nice being able to write all your marshaling and unmarshaling code in a single language, rather than having to write both Java and C code to do it. Nevertheless, an FFM developer still needs to keep C concepts in mind, particularly freeing natively-allocated memory and linking to the libraries. But that seems to be the common thread when connecting to native code.

Code repository

https://github.com/proftodd/GoingNative/tree/main/ffm_rmatrix