Tuesday, October 14, 2025

Going Native - Swift

"Your second-hand bookseller is second to none in the worth of the treasures he dispenses."

Leigh Hunt

Coming home to roost

This is the completion of a series on calling native code from high-level languages. Here is a description of the native library I'm calling in this series.

Apple has used several languages for its operating system and devices, most notably Objective-C and Swift. But I read a few years ago that Swift had found some adoption in data analysis and Big Data applications because of its expressiveness and streaming features. Swift has been released in open source, so there are implementations for Linux and Windows in addition to MacOS. I did an Advent of Code in Swift one year, and enjoyed it. To wrap up this project of calling native code from high-level languages I decided to give Swift a try.

Getting Started

The interface for calling native code from Swift has changed recently. The mechanism is the Swift Package Manager, but the changes have meant some older references are out of date. One example that gave me hope, even though it didn't work was this blog post: Wrapping C Libraries in Swift.

The example that got me going was directly from the Swift Documentation on the Swift Package Manager, particularly using system libraries to call native code.

As an Apple-original language, I wasn't sure how it would translate to Windows. I was fairly confident in its applicability to Linux, though, so that's where I started. That meant writing a command line application, instead of an app: those are Mac-only.


$ mkdir SwiftRMatrix
$ cd SwiftRMatrix
$ swift package init --type executable
$ tree .
.
├── Package.swift
└── Sources
    └── SwiftRMatrix
        └── SwiftRMatrix.swift

2 directories, 2 files

These commands set up a group of files and directories, the most important of which are Package.swift and Sources/SwiftRMatrix/SwiftRMatrix.swift. The latter is the entrypoint to the application, and the former is the directions for how to build the project. This is all that is needed to run "Hello, world!": you can do swift run at this point and see the message printed to the console.

Linking to native code is a matter of writing new modules and setting up dependencies among the modules in the project.


$ mkdir Sources/CRashunal
$ touch Sources/CRashunal/rashunal.h
$ touch Sources/CRashunal/module.modulemap

rashunal.h:


#import <rashunal.h>

module.modulemap:


module CRashunal [system] {
    umbrella header "rashunal.h"
    link "rashunal"
}

rashunal.h, which is distinct from the rashunal.h I wrote for the Rashunal project, is simply a transitive import to the native code, bringing all the declarations in the original rashunal.h into the Swift project. module.modulemap emphasizes this by saying that rashunal.h is an umbrella header, and that the code will link the rashunal library. At this point, CRashunal (the Swift project) can be imported into Swift code and used.

Package.swift:


// swift-tools-version: 6.2
// The swift-tools-version declares the minimum version of Swift required to build this package.

import PackageDescription

let package = Package(
    name: "SwiftRMatrix",
    dependencies: [],
    targets: [
        // Targets are the basic building blocks of a package, defining a module or a test suite.
        // Targets can depend on other targets in this package and products from dependencies.
        .systemLibrary(
            name: "CRashunal"
        ),
        .executableTarget(
            name: "SwiftRMatrix",
            dependencies: ["CRashunal"],
            path: "Sources/SwiftRMatrix"
        ),
    ]
)

SwiftRMatrix.swift:


// The Swift Programming Language
// https://docs.swift.org/swift-book
import Foundation

@main
struct SwiftRMatrix {
    public func run() throws {
        let r: UnsafeMutablePointer = n_Rashunal(numericCast(1), numericCast(2))
        print("{\(r.pointee.numerator),\(r.pointee.denominator)}")
    }
}

I like that Swift distinguishes between mutable and immutable pointers (UnsafeMutablePointer and UnsafePointer), and uses generics to indicate what the pointer is to. Swift also has an OpaquePointer when the fields of a struct are not imported, like an RMatrix. I'll come back to that later. The pointee field to access the fields of the struct is an additional bonus.

ChatGPT pointed me to memory safety early on, so I learned quickly how to access the standard library on the different platforms. Swift recognizes C-like compiler directives, so accessing it was a simple matter of importing the right native libraries. For Windows, it's a part of the platform, so no special import is needed.


#if os(Linux)
import Glibc
#elseif os(Windows)

#elseif os(macOS)
import Darwin
#else
#error("Unsupported platform")
#endif
...
let r: UnsafeMutablePointer = n_Rashuna(numericCast(1), numericCast(2))
print("{\(r.pointee.numerator),\(r.pointee.denominator)}")
free(r)

And that's it, for code. The devil, of course, is in the compiling and linking.

A chain is only as strong as its weakest link

Swift Package Manager uses several sources to find libraries, but none of them seemed to match my particular use case. The closest was to make use of pkg-config. The more I read about it, the more it seemed to be an industry standard, and that Rashunal and RMatrix would benefit by taking advantage of it. So I broke my rule that I established earlier and decided to enhance the libraries.

Fortunately, it wasn't too painful. Telling Rashunal to write to pkg-config was only a few lines added to rashunal/CMakeLists.txt:


+set(PACKAGE_NAME rashunal)
+set(PACKAGE_VERSION 0.0.1)
+set(PACKAGE_DESC "Rational arithmetic library")
+set(PKGCONFIG_INSTALL_DIR "${CMAKE_INSTALL_LIBDIR}/pkgconfig")
+
+configure_file(
+  ${CMAKE_CURRENT_SOURCE_DIR}/rashunal.pc.in
+  ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}.pc
+  @ONLY
+)
+
 add_library(rashunal SHARED src/rashunal.c src/rashunal_util.c)
...
+install(
+  FILES ${CMAKE_CURRENT_BINARY_DIR}/rashunalConfig.cmake
+  DESTINATION lib/cmake/rashunal
+)
+
+install(
+  FILES ${CMAKE_CURRENT_BINARY_DIR}/${PACKAGE_NAME}.pc
+  DESTINATION ${PKGCONFIG_INSTALL_DIR}
 )

The first block is toward the top of CMakeLists.txt, and the second is toward the bottom.

The configure_file directive needs a template for the pc file that will be written. The template has placeholders set of by '@' that will be filled in during the build process.

rashunal.pc.in:


prefix=@CMAKE_INSTALL_PREFIX@
exec_prefix=${prefix}
libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@

Name: @PACKAGE_NAME@
Description: @PACKAGE_DESC@
Version: @PACKAGE_VERSION@
Libs: -L${libdir} -l@PACKAGE_NAME@
Cflags: -I${includedir}

During installation the newly-written rashunal.pc file will be written to a platform-standard location on disk.

After making those changes, building, compiling, and installing, pkg-config was able to tell me something about the Rashunal library:


$ rm -rf build
$ mkdir build
$ cd build
$ cmake ..
$ make && sudo cmake --install .
$ ls /usr/local/lib/pkgconfig
rashunal.pc
$ cat /usr/local/lib/pkgconfig/rashunal.pc
prefix=/usr/local
exec_prefix=${prefix}
libdir=${exec_prefix}/lib
includedir=${prefix}/include

Name: rashunal
Description: Rational arithmetic library
Version: 0.0.1
Libs: -L${libdir} -lrashunal
Cflags: -I${includedir}
$ pkg-config --cflags rashunal
-I/usr/local/include
$ pkg-config --libs rashunal
-L/usr/local/lib -lrashunal

Notice the new command to install the project: apparently this is the more modern and more approved way to do it nowadays. The bash output means that the declarations of the Rashunal library can be found at /usr/local/include and the binaries at /usr/local lib.

Now the Swift Package Manager can be told just to consult pkg-config for the header and binary location of any system libraries it's attempting to build. It's not necessary, but the examples I saw recommended adding some suggestions for how to install Rashunal if it's not present. I haven't looked into what it takes to package a library for apt or brew, but I'm pretty sure this is how they are consumed:

Package.swift:


.systemLibrary(
    name: "CRashunal",
    pkgConfig: "rashunal",
    providers: [
        .apt(["rashunal"]),
        .brew(["rashunal"]),
    ],
)

Then the Swift project could be built and run:


$ swift build
$ swift run SwiftRMatrix
{1,2}

And rinse and repeat for RMatrix. There is nothing new in building the RMatrix pkg-config files or linking to it from Swift, except for the dependency on Rashunal in the template for RMatrix:

rmatrix.pc.in


prefix=@CMAKE_INSTALL_PREFIX@
exec_prefix=${prefix}
libdir=${exec_prefix}/@CMAKE_INSTALL_LIBDIR@
includedir=${prefix}/@CMAKE_INSTALL_INCLUDEDIR@

Name: @PACKAGE_NAME@
Description: @PACKAGE_DESC@
Version: @PACKAGE_VERSION@
Requires: rashunal
Libs: -L${libdir} -l@PACKAGE_NAME@
Cflags: -I${includedir}

I started to look into removing that hardcoded dependency and getting it from the link libraries in CMakeLists.txt, but that quickly started to grow big and nasty, so I abandoned it. ChatGPT assured me that was common, especially for small projects.

Crossing the operating system ocean

Trying to do this on MacOS, I ran into my old nemesis SIP. Fortunately, the solution here was similar to the solution I followed there. The Swift command at /usr/bin/swift was protected by SIP, but the executable generated by the swift build command wasn't:


% swift build -Xlinker -rpath -Xlinker /usr/local/lib
% swift run .build/debug/SwiftRMatrix
{1,2}

What is astonishing is that, with one more testy exchange with ChatGPT, I also got it to work on Windows. I still don't understand what was the difference with Linux and MacOS or how this changed things on Windows, but I had to make an additional change to Rashunal's CMakeLists.txt and the cmake command to build RMatrix:

rashunal/CMakeLists.txt


if (WIN32)
  set_target_properties(rashunal PROPERTIES
    ARCHIVE_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin"
    RUNTIME_OUTPUT_DIRECTORY "${CMAKE_BINARY_DIR}/bin"
  )
endif()

>cmake .. -G "NMake Makefiles" ^
More? -DCMAKE_BUILD_TYPE=Release ^
More? -DCMAKE_INSTALL_PREFIX=C:/Users/john.todd/local/rmatrix ^
More? -DCMAKE_PREFIX_PATH=C:/Users/john.todd/local/rashunal ^
More? -DCMAKE_C_FLAGS_RELEASE="/MD /O2 /DNDEBUG"
>nmake
>nmake install

Then the Swift application could be built and run from the command line, albeit with a few additional linker switches. This also needs to be done from a Powershell or DOS window with Admin rights because, even though it only changes the local project directory, it seems to write to a protected directory.


> swift build `
>>   -Xcc -IC:/Users/john.todd/local/rashunal/include `
>>   -Xcc -IC:/Users/john.todd/local/rmatrix/include `
>>   -Xlinker /LIBPATH:C:/Users/john.todd/local/rashunal/lib `
>>   -Xlinker /LIBPATH:C:/Users/john.todd/local/rmatrix/lib `
>>   -Xlinker /DEFAULTLIB:rashunal.lib `
>>   -Xlinker /DEFAULTLIB:rmatrix.lib `
>>   -Xlinker /DEFAULTLIB:ucrt.lib
> ./.build/debug/SwiftRMatrix.exe
{1,2}

Cleaning up the guano

My last task was to abstract the native calls away from the main application. To do this I wrote a Models module that wrapped the native Rashunal, RMatrix, and Gauss Factorization structs.

Sources/Model/Model.swift


public class Rashunal: CustomStringConvertible {
    var _rashunal: UnsafePointer

    public init(_ numerator: Int, _ denominator: Int = 1) {
        _rashunal = UnsafePointer(n_Rashunal(numericCast(numerator), numericCast(denominator)))
    }

    public init(_ data: [Int]) {
        _rashunal = UnsafePointer(n_Rashunal(numericCast(data[0]), data.count > 1 ? numericCast(data[1]) : 1))
    }

    public var numerator: Int { Int(_rashunal.pointee.numerator) }

    public var denominator: Int { Int(_rashunal.pointee.denominator) }

    public var description: String {
        return "{\(numerator),\(denominator)}"
    }

    deinit {
        free(UnsafeMutablePointer(mutating: _rashunal))
    }
}

What gets returned from the native n_Rashunal call is a Swift UnsafeMutablePointer. I wanted them to be immutable wherever possible, so I cast it to an UnsafePointer in both the constructors. Swift makes property definition and string representations easy and natural. The deinit method calls the native standard library's free method to release the native memory allocated by Rashunal. This makes cleanup and memory hygiene easy.

Sources/Model/Model.swift


public class RMatrix: CustomStringConvertible {
    var _rmatrix: OpaquePointer

    private init(_ rmatrix: OpaquePointer) {
        _rmatrix = rmatrix
    }

    public init(_ data: [[[Int]]]) {
        let height = data.count
        let width = data.first!.count

        let rashunals = data.flatMap {
            row in row.map {
                cell in n_Rashunal(numericCast(cell[0]), cell.count > 1 ? numericCast(cell[1]) : 1)
            }
        }
        let ptrArray = UnsafeMutablePointer?>.allocate(capacity: rashunals.count)
        for i in 0.. = RMatrix_get(_rmatrix, i, j)
                let rep = "{\(cellPtr.pointee.numerator),\(cellPtr.pointee.denominator)}"
                free(UnsafeMutablePointer(mutating: cellPtr))
                return rep
            }.joined(separator: " ") + " ]"
        }.joined(separator: "\n")
    }

    deinit {
        free_RMatrix(_rmatrix)
    }
}

Unsurprisingly, RMatrix was the hardest of these to get right. The private constructor is used in the factor method as a convenience method to initialize a Swift RMatrix. The other constructor is used to initialize a matrix from the familiar 3D array of Ints. I get the height and width from the first two dimensions of the input array, then use the n_Rashunal method to construct a list of native Rashunal structs as UnsafeMutablePointer<CRashunal.Rashunal>s. As before, new_RMatrix expects an array of pointers to structs, but the rashunals array is in managed memory, not native memory. So I allocate and fill an array of pointers to the Rashunal structs in native memory. ChatGPT suggested I add the defer block in case new_RMatrix abends for any reason. Because the RMatrix struct is declared but not defined in rmatrix.h, what is automatically returned is an OpaquePointer, which is just fine with me.

Properties defer to the encapsulated _rmatrix pointer, and the string description method makes full use of Swift's stream processing capabilities. deinit calls the RMatrix library's free_RMatrix method.

After all that, factoring a matrix and the GaussFactorization struct are pretty routine.

Sources/Model/Model.swift


public struct GaussFactorization {
    public var PInverse: RMatrix
    public var Lower: RMatrix
    public var Diagonal: RMatrix
    public var Upper: RMatrix

    public init(PInverse: RMatrix, Lower: RMatrix, Diagonal: RMatrix, Upper: RMatrix) {
        self.PInverse = PInverse
        self.Lower = Lower
        self.Diagonal = Diagonal
        self.Upper = Upper
    }
}

public class RMatrix: CustomStringConvertible {
...
    public func factor() -> GaussFactorization {
        let gf = RMatrix_gelim(_rmatrix)!
        let sgf = GaussFactorization(
            PInverse: RMatrix(gf.pointee.pi),
            Lower: RMatrix(gf.pointee.l),
            Diagonal: RMatrix(gf.pointee.d),
            Upper: RMatrix(gf.pointee.u)
        )
        free(gf)
        return sgf
    }
}

Calling the native method RMatrix_gelim returns a newly-allocated struct pointing to four newly-allocated matrices. The matrices are passed to the RMatrix constructor, so that the class takes responsibility for managing their memory. The native struct itself is freed by the RMatrix factor method before returning the Swift struct.

The driver class has no import of native code, and all the allocations look just like Swift objects.


import ArgumentParser
import Foundation
import Model

enum SwiftRMatrixError: Error {
    case runtimeError(String)
}

@main
struct SwiftRMatrix: ParsableCommand {
    @Option(help: "Specify the input file")
    public var inputFile: String

    public func run() throws {
        let url = URL(fileURLWithPath: inputFile)
        var inputText = ""
        do {
            inputText = try String(contentsOf: url, encoding: .utf8)
        } catch {
            throw SwiftRMatrixError.runtimeError("Error reading file [\(inputFile)]")
        }
        let data = inputText
            .split(whereSeparator: \.isNewline)
            .map { $0.trimmingCharacters(in: .whitespaces) }
            .map { line in line.split(whereSeparator: { $0.isWhitespace })
            .map { token in token.split(separator: "/").map { Int($0)! } }
        }
        let m = Model.RMatrix(data)
        print("Input matrix:")
        print(m)

        let factor = m.factor()
        print("Factors into:")
        print("PInverse:")
        print(factor.PInverse)

        print("Lower:")
        print(factor.Lower)

        print("Diagonal:")
        print(factor.Diagonal)

        print("Upper:")
        print(factor.Upper)
    }
}
$ swift run SwiftRMatrix --input-file /home/john/workspace/rmatrix/driver/example.txt
[1/1] Planning build
Building for debugging...
[11/11] Linking SwiftRMatrix
Build of product 'SwiftRMatrix' complete! (1.17s)
Input matrix:
[ {-2,1} {1,3} {-3,4} ]
[ {6,1} {-1,1} {8,1} ]
[ {8,1} {3,2} {-7,1} ]
Factors into:
PInverse:
[ {1,1} {0,1} {0,1} ]
[ {0,1} {0,1} {1,1} ]
[ {0,1} {1,1} {0,1} ]
Lower:
[ {1,1} {0,1} {0,1} ]
[ {-3,1} {1,1} {0,1} ]
[ {-4,1} {0,1} {1,1} ]
Diagonal:
[ {-2,1} {0,1} {0,1} ]
[ {0,1} {17,6} {0,1} ]
[ {0,1} {0,1} {23,4} ]
Upper:
[ {1,1} {-1,6} {3,8} ]
[ {0,1} {1,1} {-60,17} ]
[ {0,1} {0,1} {1,1} ]

Reflection

Wow, that turned out a lot better than I expected. I thought this would be possible on Linux and MacOS. To be able to get it to work on Windows too was a pleasant surprise. I really like the Swift language: it is expressive and concise and makes really good use of streaming approaches. I hope I get to use it to make money sometime.

Code repository

https://github.com/proftodd/GoingNative/tree/main/SwiftRMatrix

Monday, October 6, 2025

Going Native - Python

Photo by <a href='https://freeimages.com/photographer/rolve-45406'>rolve</a> on <a href='https://freeimages.com/'>Freeimages.com</a>

"If you're not stubborn, you'll give up on experiments too soon. And if you're not flexible, you'll pound your head against the wall and you won't see a different solution to a problem you're trying to solve."

Jeff Bezos

This is the continuation of a series on calling native code from high-level languages. Here is a description of the native library I'm calling in this series.

When I got to Python I thought things would get easier. After all, Python was written to be a quick and easy wrapper around C. Alas, no, in a pattern that was becoming familiar it was fairly easy to wrap the native code and call it from the Python code, but getting it to find the native libraries at runtime was another difficult challenge.

There are two traditional ways to call native code from Python. The first is `ctypes`, and the other is writing a Python extension in C using `Cython`. ctypes is generally easier for quick and easy calling of a native library, while Cython is better for truly getting the advantages of calling native code (primarily optimization of execution speed). There are some other approaches, but these are the ones I tried for this post.

Calling the native libraries via ctypes

ctypes is part of the Python standard library, so no steps were necessary to import it into the project.

Loading the libraries was a simple matter of calling a ctypes function and mapping the argument and return types.


import ctypes

class RASHUNAL(ctypes.Structure):
    _fields_ = [("numerator", ctypes.c_int), ("denominator", ctypes.c_int)]

class RMATRIX(ctypes.Structure):
    pass

class GAUSS_FACTORIZATION(ctypes.Structure):
    _fields_ = [
        ("P_INVERSE", ctypes.POINTER(RMATRIX)),
        ("LOWER", ctypes.POINTER(RMATRIX)),
        ("DIAGONAL", ctypes.POINTER(RMATRIX)),
        ("UPPER", ctypes.POINTER(RMATRIX))
    ]

_rashunal_lib = ctypes.CDLL('librashunal.so')
_rashunal_lib.n_Rashunal.argtypes = (ctypes.c_int, ctypes.c_int)
_rashunal_lib.n_Rashunal.restype = ctypes.POINTER(RASHUNAL)

_rmatrix_lib = ctypes.CDLL('librmatrix.so')
_rmatrix_lib.new_RMatrix.argtypes = (ctypes.c_size_t, ctypes.c_size_t, ctypes.POINTER(ctypes.POINTER(RASHUNAL)))
_rmatrix_lib.new_RMatrix.restype = ctypes.POINTER(RMATRIX)

_rmatrix_lib.free_RMatrix.argtypes = (ctypes.POINTER(RMATRIX),)

_rmatrix_lib.RMatrix_gelim.argtypes = (ctypes.POINTER(RMATRIX),)
_rmatrix_lib.RMatrix_gelim.restype = ctypes.POINTER(GAUSS_FACTORIZATION)

_rmatrix_lib.RMatrix_height.argtypes = (ctypes.POINTER(RMATRIX),)
_rmatrix_lib.RMatrix_width.argtypes = (ctypes.POINTER(RMATRIX),)

_rmatrix_lib.RMatrix_get.argtypes = (ctypes.POINTER(RMATRIX), ctypes.c_size_t, ctypes.c_size_t)
_rmatrix_lib.RMatrix_get.restype = ctypes.POINTER(RASHUNAL)

Custom types are declared as subclasses of the ctypes.Structure class. The RMatrix struct is declared but not given a body in the RMatrix library, so I modeled that as a Python class that also extends ctypes.Structure but has no body. Pointer types are modeled as `ctypes.POINTER` objects with an argument of the type or struct the pointer is for.

Note that if a function has a single argument, the field is still argtypes (plural). Also, the argument is a Python tuple, so if it only has one element then it needs a trailing comma. That took me a while to figure out!

Once the functions are declared, they are called just like regular Python functions.


def allocate_c_rmatrix(m):
    height = m.height
    width = m.width
    element_count = height * width

    c_rashunal_pointers = (ctypes.POINTER(RASHUNAL) * element_count)()
    for i in range(element_count):
        pel = m.data[i]
        r = _rashunal_lib.n_Rashunal(pel.numerator, pel.denominator)
        c_rashunal_pointers[i] = ctypes.cast(r, ctypes.POINTER(RASHUNAL))
    c_rmatrix = _rmatrix_lib.new_RMatrix(height, width, c_rashunal_pointers)
    for i in range(element_count):
        cel = c_rashunal_pointers[i]
        _std_lib.free(cel)
    _std_lib.free(c_rashunal_pointers)
    return c_rmatrix

def allocate_python_rmatrix(m):
    height = _rmatrix_lib.RMatrix_height(m)
    width = _rmatrix_lib.RMatrix_width(m)
    p_rashunals = []
    for i in range(1, height + 1):
        for j in range(1, width + 1):
            c_rashunal = _rmatrix_lib.RMatrix_get(m, i, j)
            p_rashunals.append(RMatrix.PRashunal((c_rashunal.contents.numerator, c_rashunal.contents.denominator)))
            _std_lib.free(ctypes.cast(c_rashunal, ctypes.c_void_p))
    return RMatrix.PRMatrix(height, width, p_rashunals)

def factor(m):
    crm = allocate_c_rmatrix(m)
    gf = _rmatrix_lib.RMatrix_gelim(crm)

    p_inverse = allocate_python_rmatrix(gf.contents.P_INVERSE)
    lower = allocate_python_rmatrix(gf.contents.LOWER)
    diagonal = allocate_python_rmatrix(gf.contents.DIAGONAL)
    upper = allocate_python_rmatrix(gf.contents.UPPER)

    _rmatrix_lib.free_RMatrix(gf.contents.P_INVERSE)
    _rmatrix_lib.free_RMatrix(gf.contents.LOWER)
    _rmatrix_lib.free_RMatrix(gf.contents.DIAGONAL)
    _rmatrix_lib.free_RMatrix(gf.contents.UPPER)
    _std_lib.free(ctypes.cast(gf, ctypes.c_void_p))

    return RMatrix.PGaussFactorization(p_inverse, lower, diagonal, upper)

ctypes and objects obtained from it have some utility methods that come in handy. Arrays are declared by calling the pointer type times the length of the array as a function: c_rashunal_pointers = (ctypes.POINTER(RASHUNAL) * element_count)(). Pointers can be cast (c_rashunal_pointers[i] = ctypes.cast(r, ctypes.POINTER(RASHUNAL))), and dereferenced (upper = allocate_python_rmatrix(gf.contents.UPPER)).

As in other languages, the structs allocated by the native library and returned to the caller have to be disposed of properly to prevent memory leaks.

So that seems pretty straightforward. I've written this as if it were to be run on a Linux machine. Trying to move to other platforms introduced the complexity.

Making it cross-platform

I started in my Ubuntu WSL shell this time, so note the names of the files in the ctypes.CDLL calls. Very Linux specific. The first task was to make that cross-platform.


def load_library(lib_name):
    if sys.platform.startswith("win"):
        filename = f"{lib_name}.dll"
    elif sys.platform.startswith("darwin"):
        filename = f"lib{lib_name}.dylib"
    else:
        filename = f"lib{lib_name}.so"

    try:
        return ctypes.CDLL(filename)
    except OSError as e:
        raise OSError(f"Could not load library '{filename}'")

_rashunal_lib = load_library('rashunal')
_rashunal_lib.n_Rashunal.argtypes = (ctypes.c_int, ctypes.c_int)
_rashunal_lib.n_Rashunal.restype = ctypes.POINTER(RASHUNAL)

Also, I needed a different approach to load the standard libraries. As discussed in the C# post, the standard libraries have different names on the three operating systems, so a simple root name based approach wouldn't work:


def load_standard_library():
    if sys.platform.startswith("win"):
        return ctypes.CDLL('ucrtbase.dll')
    elif sys.platform.startswith("darwin"):
        return ctypes.CDLL('libSystem.dylib')
    else:
        return ctypes.CDLL('libc.so.6')

_std_lib = load_standard_library()
_std_lib.free.argtypes = (ctypes.c_void_p,)
_std_lib.malloc.argtypes = (ctypes.c_size_t,)

Not too bad. That worked fine on Linux, and also on Mac OS if I put /usr/local/lib in DYLD_LIBRARY_PATH. However, Windows was the standout this time. Turns out since Windows 10 "ctypes.CDLL and the system loader sometimes ignore PATH for dependent DLLs due to 'SafeDllSearchMode' and other loader rules." Thanks, Microsoft.

You can add to the Python interpreter's search path by making os.add_dll_directory calls. To keep the code flexible, I went back to the environment variable trick to add the required locations.


def get_dll_dirs_from_env(env_var="RMATRIX_LIB_DIRS"):
    val = os.environ.get(env_var, "")
    if not val:
        return []
    return val.split(os.pathsep)

def load_library(lib_name):
    if sys.platform.startswith("win"):
        dll_dirs = get_dll_dirs_from_env()
        for d in dll_dirs:
            if not os.path.isdir(d):
                continue
            os.add_dll_directory(d)
        filename = f"{lib_name}.dll"
    elif sys.platform.startswith("darwin"):
        filename = f"lib{lib_name}.dylib"
    else:
        filename = f"lib{lib_name}.so"

    try:
        return ctypes.CDLL(filename)
    except OSError as e:
        raise OSError(f"Could not load library '{filename}'")
> $env:RMATRIX_LIB_DIRS="C:\Users\john.todd\local\rashunal\lib;C:\Users\john.todd\local\rmatrix\lib"
> python main.py /Users/john.todd/source/repos/rmatrix/driver/example.txt
using data from file /Users/john.todd/source/repos/rmatrix/driver/example.txt
Input matrix:
[ {-2,1} {1,3} {-3,4} ]
[ {6,1} {-1,1} {8,1} ]
[ {8,1} {3,2} {-7,1} ]

PInverse:
[ {1,1} {0,1} {0,1} ]
[ {0,1} {0,1} {1,1} ]
[ {0,1} {1,1} {0,1} ]

Lower:
[ {1,1} {0,1} {0,1} ]
[ {-3,1} {1,1} {0,1} ]
[ {-4,1} {0,1} {1,1} ]

Diagonal:
[ {-2,1} {0,1} {0,1} ]
[ {0,1} {17,6} {0,1} ]
[ {0,1} {0,1} {23,4} ]

Upper:
[ {1,1} {-1,6} {3,8} ]
[ {0,1} {1,1} {-60,17} ]
[ {0,1} {0,1} {1,1} ]

And voila, works on all three platforms.

Calling the native libraries via Cython

Cython is a weird dialect? sublanguage? independent language? It looks most like Python, but includes some elements of C. Hence the name, a combination of C and Python. The Cython documentation and examples in the tutorials discussed mainly wrapping C standard library functions or an implementation of Queues. I couldn't find a good example of wrapping a custom library, or two custom libraries with dependencies on each other like my model libraries. So once again, ChatGPT and I plunged in.

For this experiment I worked in my Ubuntu WSL shell. I wound up with two Python modules that can be separately compiled and packaged and installed via pip.

The easier one: packaging Rashunal

Cython requires a declarations file (pxd) and an implementation file (pyx). The convention seems to be to name the declarations file as the name of the library with a 'c' prepended. The pyx file can be named just the name of the library.


# crashunal.pxd
cdef extern from "rashunal.h":
    ctypedef struct Rashunal:
        int numerator
        int denominator
    
    Rashunal *n_Rashunal(int numerator, int denominator)

# rashunal.pyx
from libc.stdlib cimport free
cimport crashunal

cdef class Rashunal:
    cdef crashunal.Rashunal *_c_rashunal

    def __cinit__(self, numerator, denominator):
        self._c_rashunal = crashunal.n_Rashunal(numerator, denominator)
        if self._c_rashunal is NULL:
            raise MemoryError()
    
    def __dealloc__(self):
        if self._c_rashunal is not NULL:
            crashunal.free(self._c_rashunal)
            self._c_rashunal = NULL
    
    def __str__(self):
        return f"{{{self._c_rashunal.numerator},{self._c_rashunal.denominator}}}"
    
    @property
    def numerator(self):
        return self._c_rashunal.numerator
    
    @property
    def denominator(self):
        return self._c_rashunal.denominator

In Cython, things that begin with a "c" are related to the native library and the C code. So "cimport" means "import something from the C library", "cdef" means "declare this as something that will be used by the C code", and "ctypedef" means "this is a type that will be coming from C". Things without the "c" prefix are meant to be used by the Python code. (There is also a "cp" prefix, meaning something can be used by both the C and Python code. I'm not sure how that would be useful.)

crashunal.pxd declares the Rashunal struct and the n_Rashunal method. It says their definitions can be obtained from the rashunal.h header file, wherever that may be. (I'll come back to that later.)

rashunal.pyx declares an ordinary Python class, Rashunal that wraps a crashunal.Rashunal struct and holds a reference to it. Rashunal's constructor accepts a numerator and a denominator, passing them to the native n_Rashunal method, and holding on to the struct that is returned. It also declares a __dealloc__ method that frees the struct when the object goes out of scope, and a couple of convenience properties for easy access to the fields of the struct.

Cython modules are built using a setup.py file:


import os
from setuptools import setup, Extension
from Cython.Build import cythonize

extensions = [
    Extension(
        "rashunal._rashunal",
        ["rashunal/rashunal.pyx"],
        libraries=["rashunal"],
        include_dirs=[os.environ.get("RASHUNAL_INCLUDE", "/usr/local/include")],
        library_dirs=[os.environ.get("RASHUNAL_LIB", "/usr/local/lib")]
    )
]

setup(
    name="rashunal",
    version="0.1.0",
    packages=["rashunal"],
    ext_modules=cythonize(
        extensions,
        language_level="3",
        include_path=["rashunal"]
    ),
)

The extensions is the list of all extensions that are to be built. More than one can be built by a single setup.py file, and I did that for a while with Rashunal and RMatrix, but backed off to one at a time in order to make the process and packages more granular. The extension is named rashunal._rashunal to reflect finding the package and paralleling the directory structure. The underscore is to hide the C library and prevent import confusion when bringing it into a client. Most of the flags here are related to finding the C libraries: libraries is the list of libraries to link to, include_dirs is where to find their header files (if they're not part of the project), and library_dirs is where to find their compiled binaries. If you're building at the command line these can be supplemented by flags, but for reasons I'll discuss later I had to complete them with environment variables and default values.

The setup method describes how to actually build the extensions. It needs the name(s) of the package(s) to build and the list of extensions to include. The include_path here is where to find the pxd and pyx files


# __init__.py
from ._rashunal import Rashunal

__init__.py is required, but can be empty. I added this import to both obscure the C library and simplify the import. If __init__.py were empty the build would work and the code could be imported, but it would look pretty ugly: import rashunal._rashunal.Rashunal, or something like that.

Here's the directory setup:


$ tree .
.
├── rashunal
│   ├── __init__.py
│   ├── crashunal.pxd
│   └── rashunal.pyx
└── setup.py

1 directory, 4 files

Cython and its related tools are not part of the Python standard library, so they have to be installed.


$ pip install Cython, setuptools
$ python setup.py build_ext -i

This works, and the output can be imported into client code and be used. I wanted to take the further step and make this into a pip package, however. That required a couple more files.


# pyproject.toml
[build-system]
requires = ["setuptools>=61.0", "wheel", "Cython"]
build-backend = "setuptools.build_meta"

[project]
name = "rashunal"
version = "0.1.0"
description = "Python bindings for the Rashunal C library"
authors = [{ name = "John Todd" }]
readme = "README.md"
requires-python = ">=3.8"

# MANIFEST.in
include rashunal/*.pxd
include rashunal/*.pyx
$ tree .
.
├── MANIFEST.in
├── README.md
├── pyproject.toml
├── rashunal
│   ├── __init__.py
│   ├── crashunal.pxd
│   └── rashunal.pyx
└── setup.py

1 directory, 7 files

pyproject.toml gives instructions on how the wheel file is to be built and a description of the project, including any dependencies or runtime requirements. MANIFEST.in says that the pxd and pyx file should be included in the wheel. The build tool will need those in order to compile the Cython code later on.

Now the package can be built at the command line, but include_dirs and library_dirs cannot be added at this point. This is why I had to include environment variables in setup.py to find the C header and library files. I also didn't want this experimental project permanently installed in my Python environment, so I created a virtual environment to test them.

The build tool also has to be installed before it can be used.


$ python3 -m pip install build
$ python3 -m build
$ python3 -m venv venv-test
$ source venv-test/bin/activate
(venv-test) $ pip install --upgrade pip wheel
(venv-test) $ pip install dist/rashunal-0.1.0-cp310-cp310-linux_x86_64.whl
(venv-test) $ cd ~
(venv-test) $ python
>>> from rashunal import Rashunal
>>> r = Rashunal(1, 2)
>>> print(r)
{1,2}

Note when starting the Python REPL and importing the code I had to be in a different directory than the project directory so the interpreter didn't confuse the installed pip wheel with the source code.

The harder one: packaging RMatrix

Things got really hairy when I tried to package RMatrix because of its dependency on Rashunal. I imagined that Rashunal and RMatrix would be packaged separately, since a library of rational numbers could theoretically be used for other purposes than matrices and linear algebra.

The __init__.py, pxd and pyx files were fairly straightforward and comparable to Rashunal's:


# __init__.py
from ._rmatrix import RMatrix

# crmatrix.pxd
cimport crashunal

cdef extern from "rmatrix.h":
    ctypedef struct RMatrix:
        pass
    
    RMatrix *new_RMatrix(size_t height, size_t width, crashunal.Rashunal **data)
    void free_RMatrix(RMatrix *m)
    size_t RMatrix_height(const RMatrix *m)
    size_t RMatrix_width(const RMatrix *m)
    Gauss_Factorization *RMatrix_gelim(const RMatrix *m)
    crashunal.Rashunal *RMatrix_get(const RMatrix *m, size_t row, size_t col)

    ctypedef struct Gauss_Factorization:
        const RMatrix *pi
        const RMatrix *l
        const RMatrix *d
        const RMatrix *u

# rmatrix.pyx
from libc.stdlib cimport malloc, free
cimport crashunal
cimport crmatrix

cdef class RMatrix:
    cdef crmatrix.RMatrix *_c_rmatrix

    def __cinit__(self, data):
        cdef height = len(data)
        cdef width = len(data[0])
        cdef el_count = height * width
        cdef crashunal.Rashunal **arr =  malloc(el_count * sizeof(crashunal.Rashunal*))
        if arr is NULL:
            raise MemoryError()

        try:
            for i in range(el_count):
                el = data[i // width][i % width]
                num = el[0]
                den = el[1] if len(el) == 2 else 1
                arr[i] = crashunal.n_Rashunal(num, den)
                if arr[i] is NULL:
                    raise MemoryError()
            self._c_rmatrix = crmatrix.new_RMatrix(height, width, arr)
            if self._c_rmatrix is NULL:
                raise MemoryError()
        finally:
            for i in range(el_count):
                if arr[i] is not NULL:
                    crashunal.free(arr[i])
            crashunal.free(arr)
    
    def __dealloc__(self):
        if self._c_rmatrix is not NULL:
            crmatrix.free_RMatrix(self._c_rmatrix)
            self._c_rmatrix = NULL

    @property
    def height(self):
        return crmatrix.RMatrix_height(self._c_rmatrix)

    @property
    def width(self):
        return crmatrix.RMatrix_width(self._c_rmatrix)
    
    def factor(self):
        cdef crmatrix.Gauss_Factorization *f
        f = crmatrix.RMatrix_gelim(self._c_rmatrix)
        try:
            result = (
                _crmatrix_to_2d_array(f.pi),
                _crmatrix_to_2d_array(f.l),
                _crmatrix_to_2d_array(f.d),
                _crmatrix_to_2d_array(f.u)
            )
        finally:
            if f.pi != NULL: crmatrix.free_RMatrix(f.pi)
            if f.l  != NULL: crmatrix.free_RMatrix(f.l)
            if f.d  != NULL: crmatrix.free_RMatrix(f.d)
            if f.u  != NULL: crmatrix.free_RMatrix(f.u)
            crashunal.free(f)
        return result

cdef _crmatrix_to_2d_array(const crmatrix.RMatrix *crm):
    cdef height = crmatrix.RMatrix_height(crm)
    cdef width = crmatrix.RMatrix_width(crm)
    cdef result = []
    cdef const crashunal.Rashunal *el
    for i in range(height):
        row = []
        for j in range(width):
            el = crmatrix.RMatrix_get(crm, i + 1, j + 1)
            row.append((el.numerator, el.denominator))
            crashunal.free(el)
        result.append(row)
    return result

The type definitions mirror what is in the native libraries. For the implementation I backed off to passing the RMatrix constructor a 3D array of integers rather than a custom object for maximum flexibility when packaged for pip. By now the allocation and deallocation code should be understandable, even if the syntax varies from implementation to implementation. The pointer casts when deallocating memory are necessary to avoid C compiler warnings.


# setup.py
import os
import sys
from setuptools import setup, Extension
from Cython.Build import cythonize

extensions = [
    Extension(
        "rmatrix._rmatrix",
        ["rmatrix/rmatrix.pyx"],
        include_dirs=[os.environ.get("RMATRIX_INCLUDE", "/usr/local/include")],
        libraries=["rmatrix"],
        library_dirs=[os.environ.get("RMATRIX_LIB", "/usr/local/lib")]
    )
]

setup(
    name="rmatrix",
    version="0.1.0",
    packages=["rmatrix"],
    install_requires=["rashunal>=0.1.0"],
    ext_modules=cythonize(
        extensions,
        language_level="3",
        include_path=["rmatrix"]
    )
)

setup.py is very similar to Rashunal's. Notice the install_requires value to setup. That would ordinarily require an include_path reference to Rashunal's pxd and pyx files, but if these were separate projects neither I nor ChatGPT could come up with a way to include them here. Fortunately, we did discover a way to do it in virtual environments.

pyproject.toml and MANIFEST.in were pretty much identical to Rashunal's. The toml file did include a field saying it depends on Rashunal.


# pyproject.toml
[build-system]
requires = ["setuptools>=61.0", "wheel", "Cython"]
build-backend = "setuptools.build_meta"

[project]
name = "rmatrix"
version = "0.1.0"
description = "Python bindings for the RMatrix C library"
authors = [{ name = "John Todd" }]
readme = "README.md"
requires-python = ">=3.8"
dependencies = ["rashunal>=0.1.0"]

# MANIFEST.in
include rmatrix/*.pxd
include rmatrix/*.pyx

Much thrashing ensued as I tried to get RMatrix to compile, mainly with locating the Rashunal library. As I outlined above, the compiler needed to find Rashunal's pxd and pyx files. Assuming these would be packaged separately, I didn't want to refer to the source code, even though it was right next to the rmatrix code in my project directory. Instead, I eventually noticed that the wheel file contained them and they were extracted when it was installed in my virtual environment. The build process works in its own fresh virtual environment, but there was no way to install Rashunal in it before trying to install RMatrix. I could reuse the test virtual environment I already had with Rashunal installed in it, however.


$ cd rmatrix
$ source ~/workspace/venv-test/bin/activate
(venv-test) $ python -m build --no-isolation
(venv-test) $ pip install ~/workspace/GoingNative/cython_rmatrix/rmatrix/dist/rmatrix-0.1.0-cp310-cp310-linux_x86_64.whl
(venv-test) $ cd ~
(venv-test) $ python
>>> from rmatrix import RMatrix
>>> crm = RMatrix([[[1], [2], [3,2]], [[4,3], [5], [6]]])
>>> (p_inverse, lower, diagonal, upper) = crm.factor()
>>> print(lower)

Not sure if that's an acceptable way to do it, but at this point I was just happy it worked. Once again, I needed to change to a different directory when starting the REPL to avoid confusing the installed wheel with the source code.

So there it is, in Linux at least. Some other possibilities ChatGPT mentioned that I didn't look into are:

  • Better packaging of the native libraries using pkg-config. This could probably be done in the CMake code.
  • Packing the generated C file along with or instead of the pxd and pxc files for downstream compiling.
  • Packing the binaries themselves within the wheel so they just work.

I briefly looked into doing this on Windows and MacOS, but ran into insurmountable difficulties. I won't go into the details, but the gist is that virtual environments on Windows and MacOS don't inherit settings from the shell they are invoked from. So there is no way to point to the native headers or binaries to get everything to compile. Both require modifying the source code in setup.py or pyproject.toml in order to set the paths. So if you're trying to write a cross-platform Python library that relies on native libraries, good luck. I can't help you.

Reflection

Wow, that was a journey.

The ctypes approach was definitely simpler, and I got it to work on all three platforms. The Cython approach was much more complicated. I'm not sure how to measure or assess the claims that it is more performant than the ctypes approach. It seems to be better for packaging up the C libraries in a format suitable to Python. Once the pip packages are available clients can use them in a way that Python developers are intimately familiar with. But boy, was it a bear to get working. Still, I feel a sense of accomplishment getting it done, and I do think I learned more about compiling and linking tools, even if I don't fully understand all the syntax and tools.

Code repositories

https://github.com/proftodd/GoingNative/tree/main/python_rmatrix https://github.com/proftodd/GoingNative/tree/main/cython_rmatrix

Monday, September 29, 2025

Going Native - C#

"I belong to the warrior in whom the old ways have joined the new."

Inscription on the sword wielded by Captain Nathan Algren, The Last Samurai

From the JVM to the CLR

This is the third part in a series on calling native code from high-level languages. I've been interested in making useful code locked away in native libraries more widely available, and took this opportunity to finally look into how it's done.

Here is a description of the native library I'm calling in this series.

After struggling through getting the FFM to work, I wasn't sure to expect from .NET. Nevertheless, that's the next language I'm most familiar with it, so I went ahead and plunged in.

Here is a description of the native library.

The approach I followed is Explicit PInvoke, outlined on the Microsfoft Learn website. That provides good background and outline of the process and alternatives. In reality it was so easy that I got by just with conversations with ChatGPT.

The Basics

I started by declaring structs that mirrored the (public) structs in the native libraries:


[StructLayout(LayoutKind.Sequential)]
private struct Rashunal
{
    public int numerator;
    public int denominator;
}
[StructLayout(LayoutKind.Sequential)]
private struct GaussFactorization
{
    public IntPtr PInverse;
    public IntPtr Lower;
    public IntPtr Diagonal;
    public IntPtr Upper;
}

The attributes indicate that the structs are laid out in memory with one field directly following on the previous one. IntPtr is a generic .NET class for a pointer to some memory location. You'll see it again!

Then the native functions are declared in a simple fashion that matches C#'s variable types, with attributes that declare what library to find it in and what the native method is. The methods (and the class) are declared partial because the implementation is provided by the native code. According to convention the C# function and the native function have the same name, but that's not required.


[LibraryImport("rashunal", EntryPoint = "n_Rashunal")]
private static partial IntPtr n_Rashunal(int numerator, int denominator);

[LibraryImport("rmatrix", EntryPoint = "new_RMatrix")]
private static partial IntPtr new_RMatrix(int height, int width, IntPtr data);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_height")]
private static partial int RMatrix_height(IntPtr m);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_width")]
private static partial int RMatrix_width(IntPtr m);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_get")]
private static partial IntPtr RMatrix_get(IntPtr m, int row, int col);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_gelim")]
private static partial IntPtr RMatrix_gelim(IntPtr m);

Then the native methods can be called alongside normal C# code. I'll go in reverse of the actual process of factoring a matrix using the native code.


public static CsGaussFactorization Factor(Model.CsRMatrix m)
{
    var nativeMPtr = AllocateNativeRMatrix(m);
    var fPtr = RMatrix_gelim(nativeMPtr);
    var f = Marshal.PtrToStructure(fPtr);
    var csF = new CsGaussFactorization
    {
        PInverse = AllocateManagedRMatrix(f.PInverse),
        Lower = AllocateManagedRMatrix(f.Lower),
        Diagonal = AllocateManagedRMatrix(f.Diagonal),
        Upper = AllocateManagedRMatrix(f.Upper),
    };
    NativeStdLib.Free(nativeMPtr);
    NativeStdLib.Free(fPtr);
    return csF;
}

First I call a method to allocate a native matrix (below), and then I call RMatrix_gelim on it, which returns a pointer to a native struct. Since the struct is part of the public native interface it can be unmarshaled into a C# object with the Marshal.PtrToStructure call. Then the native matrix pointers are used to construct managed matrices through the AllocateManagedRMatrix calls (also below). Finally, since the native matrix pointer and the factorization pointer are allocated by the native code, they have to be freed by a call to the native free method. Also see below.


private static IntPtr AllocRashunal(int num, int den)
{
    IntPtr ptr = NativeStdLib.Malloc((UIntPtr)Marshal.SizeOf());
    var value = new Rashunal { numerator = num, denominator = den };
    Marshal.StructureToPtr(value, ptr, false);
    return ptr;
}

private static IntPtr AllocateNativeRMatrix(Model.CsRMatrix m)
{
    int elementCount = m.Height * m.Width;
    IntPtr elementArray = NativeStdLib.Malloc((UIntPtr)(IntPtr.Size * elementCount));
    unsafe
    {
        var pArray = (IntPtr*)elementArray;
        for (int i = 0; i < elementCount; ++i)
        {
            var element = m.Data[i];
            var elementPtr = AllocRashunal(element.Numerator, element.Denominator);
            pArray[i] = elementPtr;
        }
        var rMatrixPtr = new_RMatrix(m.Height, m.Width, elementArray);
        for (int i = 0; i < elementCount; ++i)
        {
            NativeStdLib.Free(pArray[i]);
        }
        NativeStdLib.Free(elementArray);
        return rMatrixPtr;
    }
}

Allocating a native RMatrix required native memory allocations, both for individual Rashunals and also for an array of Rashunal pointers. In a pattern that seems familiar now, I wrapped those calls in a NativeStdLib class that I promise to get to very soon. Allocating a Rashunal involves declaring a managed Rashunal struct, a pointer to a native Rashunal, and marshaling the struct to the pointer in native memory. The unsafe block is needed to treat the block of memory allocated for the pointer array as an actual array, instead of a block of unstructured memory. To get this to compile I had to add True to the PropertyGroup in the project file. Finally, I have to free both the individual allocated native Rashunals and the array of pointers to them, since new_RMatrix makes copies of them all.


private static Model.CsRMatrix AllocateManagedRMatrix(IntPtr m)
{
    int height = RMatrix_height(m);
    int width = RMatrix_width(m);
    var data = new CsRashunal[height * width];
    for (int i = 1; i <= height; ++i)
    {
        for (int j = 1; j <= width; ++j)
        {
            var rPtr = RMatrix_get(m, i, j);
            var r = Marshal.PtrToStructure(rPtr);
            data[(i - 1) * width + (j - 1)] = new CsRashunal { Numerator = r.numerator, Denominator = r.denominator };
            NativeStdLib.Free(rPtr);
        }
    }
    return new Model.CsRMatrix { Height = height, Width = width, Data = data, };
}

After all that, allocating a native RMatrix is not very interesting. The native RMatrix_get method returns a newly-allocated copy of the Rashunal at a position in the RMatrix, so it has to be freed the same way as before.

Ok, finally, as promised, here is the interface to loading the native standard library methods:


using System.Reflection;
using System.Runtime.InteropServices;

namespace CsRMatrix.Engine;

public static partial class NativeStdLib
{
    static NativeStdLib()
    {
        NativeLibrary.SetDllImportResolver(typeof(NativeStdLib).Assembly, ResolveLib);
    }

    private static IntPtr ResolveLib(string libraryName, Assembly assembly, DllImportSearchPath? searchPath)
    {
        if (libraryName == "c")
        {
            if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
                return NativeLibrary.Load("ucrtbase.dll", assembly, searchPath);
            if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
                return NativeLibrary.Load("libc.so.6", assembly, searchPath);
            if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
                return NativeLibrary.Load("libSystem.dylib", assembly, searchPath);
        }
        return IntPtr.Zero;
    }

    [LibraryImport("c", EntryPoint = "free")]
    internal static partial void Free(IntPtr ptr);

    [LibraryImport("c", EntryPoint = "malloc")]
    internal static partial IntPtr Malloc(UIntPtr size);
}

The platform-specific switching and filenames are pretty ugly, but neither ChatGPT nor I could find a way around it. At least it's confined to a single method in a single class in the project.

ChatGPT really wanted there to be library-specific ways to free Rashunals and factorizations. Then those methods could be declared and called the same way as the new_* methods. But I remained stubborn and said I didn't want to change the source code of the libraries. I was willing to recompile them as needed, but not to change the source code or the CMake files. Eventually, we found this way of handling the standard native library calls.

Getting the name of the file on Windows and getting this to compile and work was a little challenging. The C# code and the native code need to match exactly in the operating system (obviously), architecture (64-bit vs. 32-bit), and configuration (Debug vs. Release). It took a few more details than what I went through when compiling the JNI code.

Compiling on Windows

Windows is very careful about which library can free memory: it can only free memory that was allocated by the same library. Practically, that meant I needed to make sure I was allocating and freeing memory from the same runtime with the same C runtime model. That meant I needed to compile with the multi-threaded DLL (/MD) instead of the default multi-threaded (/MT) compiler flag. I also needed to use the right filename to link the libraries to. ChatGPT and I thought it was mscvrt initially. So I modified the steps to compile the library and checked its headers, imports, and dependencies. This again is in an x64 Native Tools Command Prompt for VS 2022.


>cmake .. -G "NMake Makefiles" ^
  -DCMAKE_BUILD_TYPE=Release ^
  -DCMAKE_INSTALL_PREFIX=C:/Users/john.todd/local/rashunal ^
  -DCMAKE_C_FLAGS_RELEASE="/MD /O2 /DNDEBUG"
>nmake
>nmake install
>cd /Users/john.todd/local/rashunal/bin
>dumpbin /headers rashunall.dll | findstr machine
            8664 machine (x64)

>dumpbin /imports rashunal.dll | findstr free
                          18 free

>dumpbin /dependents rashunal.dll

I didn't see msvcrt.dll, but did see VCRUNTIME140.DLL instead. ChatGPT said, "Ah, that's okay, that's actually better. msvcrt is the old way, ucrt (Universal CRT) is the new way." Then linking to "ucrtbase" in the NativeStdLib utility class (as shown above) worked.

Like with JNI, I had to add the Rashunal and RMatrix libraries to the PATH, and then it worked!


> $env:PATH += ";C:\Users\john.todd\local\rashunal\bin\rashunal.dll;C:\Users\john.todd\local\rmatrix\bin\rmatrix.dll"
> dotnet run C:\Users\john.todd\source\repos\rmatrix\driver\example.txt
Using launch settings from C:\Users\john.todd\source\repos\GoingNative\CsRMatrix\CsRMatrix\Properties\launchSettings.json...
Reading matrix from C:/Users/john.todd/source/repos/rmatrix/driver/example.txt
Starting Matrix:
[ {-2/1} {1/3} {-3/4} ]
[ {6/1} {-1/1} {8/1} ]
[ {8/1} {3/2} {-7/1} ]


PInverse:
[ {1/1} {0/1} {0/1} ]
[ {0/1} {0/1} {1/1} ]
[ {0/1} {1/1} {0/1} ]


Lower:
[ {1/1} {0/1} {0/1} ]
[ {-3/1} {1/1} {0/1} ]
[ {-4/1} {0/1} {1/1} ]


Diagonal:
[ {-2/1} {0/1} {0/1} ]
[ {0/1} {17/6} {0/1} ]
[ {0/1} {0/1} {23/4} ]


Upper:
[ {1/1} {-1/6} {3/8} ]
[ {0/1} {1/1} {-60/17} ]
[ {0/1} {0/1} {1/1} ]

What's even more exciting is that when I committed this to Github and pulled it down in Linux and MacOS, it also just worked (for MacOS after adding the install directories to DYLIB_LD_PATH, similarly to what I had to do with JNI.)

Optimization

Remembering to free pointers allocated by native code isn't so bad. I had to do it in Java with the FFM and when writing the libraries in the first place. But ChatGPT suggested an optimization to have the CLR do it automatically. After reassuring it many times that the new_*, RMatrix_get, and RMatrix_gelim native methods returned pointers to newly-allocated copies of the relevant entities and not pointers to the entities themselves, it said this was the perfect application of the handler pattern. Who can pass that up?

First I wrote some wrapper classes for the pointers returned from the native code:


internal abstract class NativeHandle : SafeHandle
{
    protected NativeHandle() : base(IntPtr.Zero, ownsHandle: true) { }

    protected NativeHandle(IntPtr existing, bool ownsHandle)
        : base(IntPtr.Zero, ownsHandle)
        => SetHandle(existing);

    public override bool IsInvalid => handle == IntPtr.Zero;

    protected override bool ReleaseHandle()
    {
        NativeStdLib.Free(handle);
        return true;
    }
}

internal sealed class RashunalHandle : NativeHandle
{
    internal RashunalHandle() : base() { }

    internal RashunalHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

internal sealed class RMatrixHandle : NativeHandle
{
    internal RMatrixHandle() : base() { }

    internal RMatrixHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

internal sealed class GaussFactorizationHandle : NativeHandle
{
    internal GaussFactorizationHandle() : base() { }

    internal GaussFactorizationHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

Then I had most of the native and managed code use the handles as parameters and return values instead of the pointers returned by the native code:


[DllImport("rashunal", EntryPoint = "n_Rashunal")]
private static extern RashunalHandle n_Rashunal(int numerator, int denominator);

[DllImport("rmatrix", EntryPoint = "new_RMatrix")]
private static extern RMatrixHandle new_RMatrix(int height, int width, IntPtr data);

[DllImport("rmatrix", EntryPoint = "RMatrix_height")]
private static extern int RMatrix_height(RMatrixHandle m);

[DllImport("rmatrix", EntryPoint = "RMatrix_width")]
private static extern int RMatrix_width(RMatrixHandle m);

[DllImport("rmatrix", EntryPoint = "RMatrix_get")]
private static extern RashunalHandle RMatrix_get(RMatrixHandle m, int row, int col);

[DllImport("rmatrix", EntryPoint = "RMatrix_gelim")]
private static extern GaussFactorizationHandle RMatrix_gelim(RMatrixHandle m);

private static Model.CsRMatrix AllocateManagedRMatrix(RMatrixHandle m)
{
    int height = RMatrix_height(m);
    int width = RMatrix_width(m);
    var data = new CsRashunal[height * width];
    for (int i = 1; i <= height; ++i)
    {
        for (int j = 1; j <= width; ++j)
        {
            using var rPtr = RMatrix_get(m, i, j);
            var r = Marshal.PtrToStructure(rPtr.DangerousGetHandle());
            data[(i - 1) * width + (j - 1)] = new CsRashunal { Numerator = r.numerator, Denominator = r.denominator };
        }
    }
    return new Model.CsRMatrix { Height = height, Width = width, Data = data, };
}

Note the switch from LibraryImport to DllImport on the struct declarations. LibraryImport is newer and more preferred, but for some reason it can't do the automatic marshaling of pointers into handles like DllImport can.

Now there's no need to explicitly free the pointers returned from RMatrix_get, n_Rashunal, n_RMatrix, and RMatrix_gelim. There are still some places where I have to remember to free memory, such as when the array of Rashunal pointers is allocated in AllocRashunal. There are also some calls to ptr.DangerousGetHandle() when I need to marshal a pointer to a struct. I tried to get rid of those, but apparently they are unavoidable.

I didn't like the repeated boilerplate code in the concrete subclasses of NativeHandle. I wanted to just use NativeHandle as a generic, i.e. NativeHandle, but that didn't work. ChatGPT said I needed a concrete class to marshal the native struct into, and that the structs I declared in the adapter wouldn't do it. That's also why the parameterless constructors are needed, for the marshaling code, even though they don't do anything but defer to the base class. So be it.

Reflection

After struggling so much with FFM, I was pleasantly surprised by how easy it was to work with C# and its method of calling native code. Interspersing the native calls with the managed code was pretty fun and easy, especially after refactoring to use handles to automatically dispose of allocated memory. It was a little tricky figuring out when I still had to marshal pointers into structs or vice versa, but the compiler and ChatGPT helped me figure it out pretty quickly.

So far, if given the choice of how to call my native libraries, C# and the CLR is definitely how I would do it.

Code repository

https://github.com/proftodd/GoingNative/tree/main/CsRMatrix

Wednesday, September 17, 2025

Going Native - Foreign Function & Memory API (FFM)

Be not the first by whom the new are tried, nor yet the last to lay the old aside.

Alexander Pope

When I started doing research for my post on JNI, I heard about some newfangled thing called the Foreign Function and Memory API (FFM). Apparently it does all the same things as JNI, but purely in Java code, so you have all the conveniences of modern Java development without all the hassles of compiling and linking two different languages and getting them to play nicely together. After finishing my experiments in JNI, therefore I was excited to give it a try.

For a refresher on the native matrix library, see the section The native code in the introduction to this series.

The concepts in the FFM have been kicking around for several Java versions, going back at least to Java 17. However, it's nearly finalized in Java 24, although the native-accessing code is still marked experimental and give warnings when compiled without specific flags (--enable-native-access=ALL-UNNAMED).

There are several blog posts about using FFM, but they all seem to copy the same examples on the official Java website. Thus I was truly on my own this time.

An aside about AI programming aids

Well, not completely on my own. I made extensive use of AI programming aids during this project, particularly a couple of installations of ChatGPT). I have been slow to get on the AI train, and I am still highly skeptical of many of the claims that are made about it. But I freely admit that I could not have completed this project or the JNI project without its help. There is just so much detailed, obscure, and esoteric knowledge about compiling, linking, tool flags, and platform idiosyncrasies that no person can know it all. While my Google searching skills are decent, I don't believe I could have found the answers I needed within the bounds of my patience in order to bring this to a conclusion. While ChatGPT is not perfect (it is limited by published APIs and documentation and can get confused about the requirements of different software versions), it was definitely a big help to me!

The Arena

The basic idea of FFM is that you take over the management of native memory in Java code instead of native code. This starts with an Arena, which can be opened and disposed of in a try block like any other try-with resource. Also within the Java code you can lay out the memory of structs you'll be using.

GroupLayout RASHUNAL_LAYOUT = MemoryLayout.structLayout(
    JAVA_INT.withName("numerator"),
    JAVA_INT.withName("denominator")
);

GroupLayout GAUSS_FACTORIZATION_LAYOUT = MemoryLayout.structLayout(
    ADDRESS.withName("PI"),
    ADDRESS.withName("L"),
    ADDRESS.withName("D"),
    ADDRESS.withName("U")
);

try (Arena arena = Arena.ofConfined()) {
...    
}

MemoryLayout is an interface with static methods to lay out primitives, structs, arrays, and other entities. The Arena object is then used to allocate blocks of native memory using a layout as a map.


int[][][] data = ;
int height = data.length;
int width = data[0].width;
int elementCount = height * width;

long elementSize = RASHUNAL_LAYOUT.byteSize();
long elementAlign = RASHUNAL_LAYOUT.byteAlignment();
long totalBytes = elementSize * (long)elementCount;
MemorySegment elems = arena.allocate(totalBytes, elementAlign);
long numOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("numerator"));
long denOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("denominator"));
for (int i = 0; i < elementCount; ++i) {
    int row = i / width;
    int col = i % width;
    int[] element = data[row][col];
    int numerator = element[0];
    int denominator = element.length == 1 ? 1 : element[1];
    MemorySegment elementSlice = elems.asSlice(i * elementSize, elementSize);
    elementSlice.set(JAVA_INT, numOffset, numerator);
    elementSlice.set(JAVA_INT, denOffset, denominator);
}

Before with the JNI this was all done in C. Now it's all being done in Java code. It's a lot of steps, and it gets down pretty far into the weeds, but there are advantages to doing it all in Java. Pick your poison.

Native methods are retrieved into the Java code as method handles. They are retrieved by making downcalls (from Java to native methods) on a Linker object. To make the downcall you need the full signature of the native method, with the return value of the call first.


Linker linker = Linker.nativeLinker();
SymbolLookup lookup = OpenNativeLib("rmatrix", arena); // I'll come back to this later
MemorySegment newRMatrixLocation = lookup.find("new_RMatrix").getOrThrow();
MethodHandle new_RMatrix_handle = linker.downcallHandle(newRMatrixLocation, FunctionDescriptor.of(ADDRESS, JAVA_LONG, JAVA_LONG, ADDRESS));

After getting a Linker object, the native library needs to be opened and brought into the JVM. OpenNativeLib is a static method I wrote on the utility class this code is coming from, and I'll come back to its details later.

linker.downcallHandle accepts a MemorySegment, a FunctionDescriptor, and a variable-length list of Linker.Options. It returns a MethodHandle that can be used to call into native methods.

The SymbolLookup returned by OpenNativeLib is used to search the native library for methods and constants. It's a simple name lookup, and returns an Option with whatever it finds.

The FunctionDescriptor is fairly self-explanatory: it's the signature of a native method with constants from java.lang.foreign.ValueLayout representing the return value and the arguments (return value first, followed by arguments). ADDRESS is a general value for a C pointer. new_RMatrix accepts longs representing the height and width of the matrix to be constructed, a pointer to an array of Rashunals, and returns a pointer to the newly-allocated RMatrix.

Once the handle for new_RMatrix is in hand, it can be called to allocate a new RMatrix:


new_RMatrix_handle.invoke((long) height, (long) width, elems);
// compiles, but blows up when run

Not so fast! elems represents an array of Rashunal structs laid out in sequence in native memory. But what new_RMatrix expects is a pointer to an array of Rashunal pointers, not the array of Rashunals themselves. So that array of pointers also needs to be constructed:


MemorySegment ptrArray = arena.allocate(ADDRESS.byteSize() * elementCount, ADDRESS.byteAlignment());
for (int i = 0; i < elementCount; ++i) {
    MemorySegment elementAddr = elems.asSlice(i * elementSize, elementSize);
    ptrArray.setAtIndex(ADDRESS, i, elementAddr);
}
MemorySegment nativeRMatrix = new_RMatrix_handle.invoke((long) height, (long) width, ptrArray);

In a similar way, I got handles to RMatrix_gelim to factor the input matrix and RMatrix_height, RMatrix_width, and RMatrix_get to get information about the four matrices in the factorization. There was one wrinkle when getting information about structs returned by pointer from these methods:


MemorySegment factorZero = (MemorySegment) RMatrix_gelim_handle.invoke(rmatrixPtr);
MemorySegment factor = factorZero.reinterpret(GAUSS_FACTORIZATION_LAYOUT.byteSize(), arena, null);
long piOffset = GAUSS_FACTORIZATION_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("PI"));
...
MemorySegment piPtr = factor.get(ADDRESS, piOffset);
...

When a native method returns a pointer to a struct, the handle returns a zero-length memory segment that has no information about the struct pointed to by that memory. It needs to be reinterpreted as the struct itself using the MemoryLayout that corresponds to the struct. Then the struct can be interpreted using offsets in the reverse of the process used to set data.

Then I worked on the code to translate them back to Java objects:


long numeratorOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("numerator"));
long denominatorOffset = RASHUNAL_LAYOUT.byteOffset(MemoryLayout.PathElement.groupElement("denominator"));
long height = (long) RMatrix_height_handle.invoke(mPtr);
long width = (long) RMatrix_width_handle.invoke(mPtr);
JRashunal[] data = new JRashunal[Math.toIntExact(height * width)];
for (long i = 1; i <= height; ++i) {
    for (long j = 1; j <= width; ++j) {
        MemorySegment elementZero = (MemorySegment) RMatrix_get_handle.invoke(mPtr, i, j);
        MemorySegment element = elementZero.reinterpret(RASHUNAL_LAYOUT.byteSize(), arena, null);
        int numerator = element.get(JAVA_INT, numeratorOffset);
        int denominator = element.get(JAVA_INT, denominatorOffset);
        data[Math.toIntExact((i - 1) * width + (j - 1))] = new JRashunal(numerator, denominator);
    }
}
JRashunalMatrix jrm = new JRashunalMatrix(Math.toIntExact(height), Math.toIntExact(width), data);

The offsets are the memory offset within the struct of the field of interest, in this case, the numerator and denominator of the Rashunal struct.

In this way I was able to complete a round trip from Java objects to native code and back.

Missing Link

So how do you load the native code? I thought it would be as simple as the guides say.


var lookup = SymbolLookup.libraryLookup("rmatrix", arena);

Unfortunately, that's not the way it turned out. Many ChatGPT questions and answers followed, but apparently there is a big difference between SymbolLookup.libraryLookup and


System.loadLibrary("jnirmatrix");

which is how I loaded the native library compiled from the JNI header. That used C tools to find rmatrix and rashunal, which are well-understood and have stood the test of time.

According to ChatGPT, System.loadLibrary does a lot of additional work on behalf of the programmer, including formatting library names correctly, looking for code in platform-specific locations, and handling symlinks. FFM deliberately dials back on that, so SymbolLookup.libraryLookup only calls Java code to load libraries. The Javadoc for SymbolLookup.libraryLookup says it defers to dlopen on POSIX systems and LoadLibrary on Windows systems. This searches the path and some environment variables for libraries, but does none of the name enhancements (libLib.so or libLib.dylib or lib.dll) that System.loadLibrary does. This made a bad first impression, but system-specific code turns out to be the way to do it in .NET too, so it's not too bad. /usr/local/lib is on the search path in Linux, but I installed the libraries in a nonstandard location on Windows, so I had to add those to PATH.


String osSpecificLibrary;
String osName = System.getProperty("os.name");
if (osName.contains("Linux")) {
    osSpecificLibrary = "lib" + library + ".so";
} else if (osName.contains("Mac OS")) {
    osSpecificLibrary = "lib" + library + ".dylib";
} else if (osName.contains("Windows")) {
    osSpecificLibrary = library + ".dll";
} else {
    throw new IllegalStateException("Unsupported OS: " + osName);
}
return SymbolLookup.libraryLookup(osSpecificLibrary, arena);

> $env.PATH+=";C:/Users/john.todd/local/rmatrix/bin/rmatrix.dll;C:/Users/john.todd/local/rashunal/bin/rashunal.dll"
> ./gradlew ...

Trying to get this to work on a Mac was an odyssey on its own. Modern versions of MacOS (since OS X El Capitan) have something called System Integrity Protection (SIP), which the developers in Cupertino have wisely put into place to protect us all from ourselves. The Google AI answer for "what is sip macos" says it "Prevents unauthorized code execution: SIP prevents malicious software from running unauthorized code on your Mac", which I guess includes loading dependent libraries from the JVM.

I could load RMatrix using an absolute path to the dylib, but I couldn't load Rashunal from there because RMatrix uses rpaths (relative paths?) to refer to libraries it depends on. rpaths can be supplied in other situations (like the JNI application) by DYLD_LIBRARY_PATH or DYLD_FALLBACK_LIBRARY_PATH, but SIP restricts that from working in certain contexts, such as the JVM (invoked in a particular way). After many big detours into rewriting rpaths to loader_paths or absolute paths and granting the JVM entitlements that allowed loading paths from DYLD_LIBRARY_PATH I finally discovered that java and /usr/bin/java on my Mac are not the same as /Library/Java/JavaVirtualMachines/jdk-24.jdk/Contents/Home/bin/java. Specifically, the first two have the SIP restrictions, but the last one doesn't and it just works with the osSpecificLibrary defined above. Having already spent a lot of time trying to discover how to bypass SIP I wasn't going to look any further into how to get the /usr/bin/java shim to work. So the following command worked from the command line in Mac. Gradle could probably be convinced to do it too, but it didn't by default and I wasn't interested in investigating this any further.


$ /Library/Java/JavaVirtualMachines/jdk-24.jdk/Contents/Home/bin/java \
  -cp app/build/classes/java/main \
  --enable-native-access=ALL-UNNAMED \
  org.jtodd.ffm.ffmrmatrix.App \
  /Users/john/workspace/rmatrix/driver/example.txt
Input matrix:
[ {-2} {1/3} {-3/4} ]
[ {6} {-1} {8} ]
[ {8} {3/2} {-7} ]


PInverse:
[ {1} {0} {0} ]
[ {0} {0} {1} ]
[ {0} {1} {0} ]


Lower:
[ {1} {0} {0} ]
[ {-3} {1} {0} ]
[ {-4} {0} {1} ]


Diagonal:
[ {-2} {0} {0} ]
[ {0} {17/6} {0} ]
[ {0} {0} {23/4} ]


Upper:
[ {1} {-1/6} {3/8} ]
[ {0} {1} {-60/17} ]
[ {0} {0} {1} ]

Cleaning up

Like Java's good old garbage collector, the Arena will clean up any memory directly allocated in it, like the Rashunal array or the pointer array in the code segments above. But memory that is allocated in the native code is opaque to the Java code, and will leak if it's not cleaned up. To do that, you need handles to any library-specific cleanup code or to the stdlib free method. FFM has a special Linker method to look up the language standard libraries, and note the special-purpose FunctionDescriptor.ofVoid method to describe native methods that return void:


MemorySegment freeRMatrixLocation = lookup.find("new_RMatrix").orElseThrow();
MethodHandle freeRMatrixHandle = linker.downcallHandle(newRMatrixLocation, FunctionDescriptor.ofVoid(ADDRESS));

var clib = linker.defaultLookup();
MemorySegment freeLocation = clib.find("free").orElseThrow();
MethodHandle freeHandle = linker.downcallHandle(freeLocation, FunctionDescriptor.ofVoid(ADDRESS));

freeRMatrixHandle.invoke(rmatrixPtr);
freeHandle.invoke(rashunalElement);

I briefly looked at using Valgrind to verify that I wasn't leaking anything further. Apparently the JVM itself spawns a lot of false (?) alarms. I grepped the output for any mentions of librmatrix or librashunal and didn't find any, so hopefully this approach doesn't leak too badly.

Reflection

My first impression of FFM was pretty bad. I had to do a lot more investigating and ChatGPT querying to get this to work on all my platforms than I did with JNI. I'm not sure if any further improvements to Java, FFM, or the operating systems will take away some of the pain. Maybe just time, experience, and more bloggers will make this easier for future developers.

It is nice being able to write all your marshaling and unmarshaling code in a single language, rather than having to write both Java and C code to do it. Nevertheless, an FFM developer still needs to keep C concepts in mind, particularly freeing natively-allocated memory and linking to the libraries. But that seems to be the common thread when connecting to native code.

Code repository

https://github.com/proftodd/GoingNative/tree/main/ffm_rmatrix

Monday, September 8, 2025

Going Native - Java Native Interface (JNI)

Why do humans like old things?

Dr. Noonien Soong, Star Trek the Next Generation

Although I haven't actually done it very much, I've always been fascinated by the idea of calling into old code from modern applications. Who knows what value is locked away in those old libraries? Graphics, matrix calculations, statistics, quantum mechanical calculations, etc. I want to be able to do it all!

In reality, old code is probably dusty, unmaintained, and harder to use than modern code. I'm still interested in being able to access it.

As discussed in the introduction to this series, I wrote a small library to do matrix calculations on rational numbers so that I could focus on the calculations without worrying about data loss or errors due to rounding. This post is about calling into it from Java via the Java Native Interface (JNI).

Starting point

To get a basic education I started with Baeldung's Tutorial on JNI. This gave a few basic examples of how to write the Java code, compile it, generate the C header file, write and compile the implementation of that, and call it all together. In particular, the Using Objects and Calling Java Methods From Native Code section was a good introduction to generating Java objects from the native side of the world.

I did most of this work in an Ubuntu image in a WSL on a Windows machine. I used Java 11 SDK for the Java compilation steps.

Calling RMatrix from Java, creating Java objects from the results

I wrote some simple Java classes as counterparts of the C structs. Then I wrote a simple driver to create small matrix, call the Gauss Factorization method, and display the U matrix. To call the native method I decided to pass a three-dimensional integer array. The first two dimensions represent the height and width of the matrix. The third dimension is either a one- or two-element array representing the numerator and denominator of a rational number, with a one-dimensional array representing a denominator of 1. This matches well with the behavior of String.split("/").


package org.jtodd.jni;

public class JRashunal {
    private int numerator;
    private int denominator;

    public JRashunal(int numerator, int denominator) {
        this.numerator = numerator;
        this.denominator = denominator;
    }

    @Override
    public String toString() {
        if (denominator == 1) {
            return String.format("{%d}", numerator);
        } else {
            return String.format("{%d/%d}", numerator, denominator);
        }
    }
}

package org.jtodd.jni;

public class JRashunalMatrix {
    private int height;
    private int width;
    private JRashunal[] data;

    public JRashunalMatrix(int height, int width, JRashunal[] data) {
        this.height = height;
        this.width = width;
        this.data = data;
    }

    @Override
    public String toString() {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < height; ++i) {
            builder.append("[ ");
            for (int j = 0; j < width; ++j) {
                builder.append(data[i * width + j]);
                builder.append(" ");
            }
            builder.append("]\n");
        }
        return builder.toString();
    }
}

package org.jtodd.jni;

public class RMatrixJNI {
    
    static {
        System.loadLibrary("jnirmatrix");
    }

    public static void main(String[] args) {
        RMatrixJNI app = new RMatrixJNI();
        int data[][][] = {
            { { 1    }, { 2 }, { 3, 2 }, },
            { { 4, 3 }, { 5 }, { 6    }, },
        };
        JRashunalMatrix u = app.factor(data);
        System.out.println(u);
    }

    private native JRashunalMatrix factor(int data[][][]);
}

The Java code is compiled and the C header file is generated in the same step:


$ javac -cp build -h build -d build RMatrixJNI.java JRashunal.java JRashunalMatrix.java

Other blogs, including Baeldung's, show you what a JNI header file looks like, so I won't copy it all here. The most important line is the declaration of the method defined in the Java class:


JNIEXPORT jobject JNICALL Java_org_jtodd_jni_RMatrixJNI_factor
  (JNIEnv *, jobject, jobjectArray);

This is the method that you have to implement in the code you write. Include the header file generated by the Java compiler, as well as the Rashunal and RMatrix libraries. I named the C file the same as the header file generated by the compiler.


#include "rashunal.h"
#include "rmatrix.h"
#include "org_jtodd_jni_RMatrixJNI.h"

JNIEXPORT jobject JNICALL Java_org_jtodd_jni_RMatrixJNI_factor (JNIEnv *env, jobject thisObject, jobjectArray jdata)
{
  ...
}

After I got this far, Baeldung couldn't help me much anymore. I turned to the full list of functions defined in the JNI specification. This let me get the dimensions of the Java array and allocate the array of C Rashunals:


    long height = (long)(*env)->GetArrayLength(env, jdata);
    jarray first_row = (*env)->GetObjectArrayElement(env, jdata, 0);
    long width = (long)(*env)->GetArrayLength(env, first_row);

    size_t total = height * width;
    Rashunal **data = malloc(sizeof(Rashunal *) * total);

It took some fiddling, but then I figured out how to get data from the elements of the 2D array, create C Rashunals, create the C RMatrix, and factor it:


    for (size_t i = 0; i < total; ++i) {
        size_t row_index = i / width;
        size_t col_index = i % width;
        jarray row = (*env)->GetObjectArrayElement(env, jdata, row_index);
        jarray jel = (*env)->GetObjectArrayElement(env, row, col_index);
        long el_count = (long)(*env)->GetArrayLength(env, jel);
        jint *el = (*env)->GetIntArrayElements(env, jel, JNI_FALSE);
        int numerator = (int)el[0];
        int denominator = el_count == 1 ? 1 : (int)el[1];
        data[i] = n_Rashunal(numerator, denominator);
    }
    RMatrix *m = new_RMatrix(height, width, data);
    Gauss_Factorization *f = RMatrix_gelim(m);

    const RMatrix *u = f->u;
    size_t u_height = RMatrix_height(u);
    size_t u_width = RMatrix_width(u);

The really tricky part was finding the Java class and constructor definitions from within the native code. The JNI uses something called descriptors to refer to primitives and objects:

  • The descriptors for primitives are single letters: I for integer, Z for boolean, etc.
  • The descriptor for a class is the fully-qualified class name, preceded by an L and trailed by a semicolon: Lorg/jtodd/jni/JRashunal;.
  • The descriptor for an array is the primitive/class descriptor preceded by an opening bracket: [I, [Lorg/jtodd/jni/JRashunal;. Multidimensional arrays add an opening bracket for each dimension of the array.
  • The descriptor for a method is the argument descriptors in parentheses, followed by the descriptor of the return value.
    • You can see an example of this in the header file generated by the Java compiler for the Java class: it accepts a three-dimensional array of integers and returns a JRashunalMatrix, so the signature is ([[[I)Lorg/jtodd/jni/JRashunalMatrix;.
  • If a method has multiple arguments, the descriptors are concatenated with no delimiter. This caused me a lot of grief because I couldn't find any documentation about it. ChatGPT finally gave me the clue to this. It also told me a handy tool to find the method signature of a compiled class: javap -s -p fully.qualified.ClassName.
So in our native code we first need to find the class descriptions, then we need to find the constructors for those classes. The documentation for the GetMethodID says the name of the constructor is , and the return type is void (V):

    jclass j_rashunal_class = (*env)->FindClass(env, "org/jtodd/jni/JRashunal");
    jclass j_rmatrix_class = (*env)->FindClass(env, "org/jtodd/jni/JRashunalMatrix");
    jmethodID j_rashunal_constructor = (*env)->GetMethodID(env, j_rashunal_class, "", "(II)V");
    jmethodID j_rmatrix_constructor = (*env)->GetMethodID(env, j_rmatrix_class, "", "(II[Lorg/jtodd/jni/JRashunal;)V");

(Those II's look like the Roman numeral 2!)

That was the hard part. Although the syntax is ugly, allocating and populating an array of JRashunals and creating a JRashunalMatrix was pretty straightforward:


    jobjectArray j_rashunal_data = (*env)->NewObjectArray(env, u_height * u_width, j_rashunal_class, NULL);
    for (size_t i = 0; i < total; ++i) {
        const Rashunal *r = RMatrix_get(u, i / width + 1, i % width + 1);
        jobject j_rashunal = (*env)->NewObject(env, j_rashunal_class, j_rashunal_constructor, r->numerator, r->denominator);
        (*env)->SetObjectArrayElement(env, j_rashunal_data, i, j_rashunal);
        free((Rashunal *)r);
    }
    jobject j_rmatrix = (*env)->NewObject(env, j_rmatrix_class, j_rmatrix_constructor, RMatrix_height(u), RMatrix_width(u), j_rashunal_data);

Compiling, linking, and running

Up to now I've assumed you understand the basics of C syntax, compiling, linking, and running. I won't assume that for the rest of this because it got pretty tricky and took me a while to figure it out.

I've laid out my project like this:


$ tree .
.
├── JRashunal.java
├── JRashunalMatrix.java
├── RMatrixJNI.java
├── build
│   ├── all generated and compiled code
└── org_jtodd_jni_RMatrixJNI.c

4 directories, 31 files

I set `JAVA_HOME` to the root of the Java 11 SDK I'm using. To compile the C file:


$ echo $JAVA_HOME
/usr/lib/jvm/java-11-openjdk-amd64
$ cc -c -fPIC \
  -Ibuild \
  -I${JAVA_HOME}/include \
  -I${JAVA_HOME}/include/linux \
  org_jtodd_jni_RMatrixJNI.c \
  -o build/org_jtodd_jni_RMatrixJNI.o

Adjust the includes to find the JNI header files for your platform. If you installed Rashunal and RMatrix to a recognized location (/usr/local/include for me) the compiler should find them on its own. If not, add includes to them as well.


$ cc -shared -fPIC -o build/libjnirmatrix.so build/org_jtodd_jni_RMatrixJNI.o -L/usr/local/lib -lrashunal -lrmatrix -lc

To create the shared library you have to link in the Rashunal and RMatrix libraries, hence the additional link location and link switches.


$ LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH java -cp build \
  -Djava.library.path=/home/john/workspace/JavaJNI/build org.jtodd.jni.RMatrixJNI
[ {1} {2} {3/2} ]
[ {0} {1} {12/7} ]

This is the tricky one. Since we created a shared library (libjnirmatrix.so), we need to provide the runtime the path to the linked Rashunal and RMatrix libraries. This isn't done through the java.library.path variable; that tells the JVM where to find the JNI header file. You need the system-specific load path to tell the system runtime (not the JVM runtime) where to find the linked libraries. On Linux and MacOS that's the LD_LIBRARY_PATH variable. Thanks again, ChatGPT!

Whew, that's a lot of work! And who wants to type those absurdly long CLI commands?

Doing it in a modern build system: Gradle

I'm continuing in my Ubuntu WSL shell with Java 11 and Gradle 8.3.

$ mkdir jrmatrix
$ gradle init
# I chose a basic project with Groovy as the DSL and the new APIs

Apparently Gradle's Java and native plugins don't play nicely in the same project, so the first thing I did was separate the project into app (Java) and native (C) subprojects. All of the automatically-generated files could stay the way they were. I just needed to make a small change to settings.gradle:


rootProject.name = 'jrmatrix'

include('app', 'native')

Then I made folders for each subproject.

In the app subproject I created the typical Java folder structure:


$ tree .
.
├── build.gradle
└── src
    └── main
        └── java
            └── org
                └── jtodd
                    └── jni
                        ├── JRashunal.java
                        ├── JRashunalMatrix.java
                        ├── RMatrixJNI.java
                        └── jrmatrix
                            └── App.java

26 directories, 11 files

The build.gradle file in app needed switches to tell the Java compiler to generate the JNI header file and where to put it. This is no longer a separate command (javah), but is an additional switch on the javac command. In addition, I wanted the run task to depend on the native compilation and linking steps I'll describe later in the native subproject.


plugins {
    id 'application'
    id 'java'
}

tasks.named("compileJava") {
    def headerDir = file("${buildDir}/generated/jni")
    options.compilerArgs += ["-h", headerDir.absolutePath]
    outputs.dir(headerDir)
}

application {
    mainClass = 'org.jtodd.jni.jrmatrix.App'
    applicationDefaultJvmArgs = [
        "-Djava.library.path=" + project(":native").layout.buildDirectory.dir("libs/shared").get().asFile.absolutePath
    ]
}

tasks.named("run") {
    dependsOn project(":native").tasks.named("linkJni")
}

Most of the fun was in the native project. I set it up with a similar folder structure to Java projects:


$ tree .
.
├── build.gradle
└── src
    └── main
        └── c
            └── org_jtodd_jni_RMatrixJNI.c

6 directories, 4 files

Gradle has a cpp-library plugin, but it seems tailored just to C++, not to C. There is also a c-library plugin, but that seems not to be bundled with Gradle 8.3, so I decided to skip it. The other alternative was a bare-bones c plugin, which apparently is pretty basic. Much of the code will look similar to what I did earlier at the command line.

Like I did there, I had to write separate steps to compile and link the implementation of the JNI header file with the Rashunal and RMatrix libraries. After a couple of refactorings to pull out common definitions of the includes and not hardcoding the names of C source files I wound up with this:


apply plugin: 'c'

def jniHeaders = { project(":app").layout.buildDirectory.dir("generated/jni").get().asFile }
def jvmHome = { file(System.getenv("JAVA_HOME")) }
def outputDir = { file("$buildDir/libs/shared") }

def osSettings = {
    def os = org.gradle.nativeplatform.platform.internal.DefaultNativePlatform.currentOperatingSystem
    def baseInclude = new File(jvmHome(), "/include")
    def includeOS
    def libName
    if (os.isLinux()) {
        includeOS = new File(jvmHome(), "/include/linux")
        libName = "libjnirmatrix.so"
    } else if (os.isMacOsX()) {
        includeOS = new File(jvmHome(), "/include/darwin")
        libName = "libjnirmatrix.dylib"
    } else if (os.isWindows()) {
        includeOS = new File(jvmHome(), "/include/win32")
        libName = "jnirmatrix.dll"
    } else if (os.isFreeBSD()) {
        includeOS = new File(jvmHome(), "/include/freebsd")
        libName = "libjnirmatrix.so"
    } else {
        throw new GradleException("Unsupported OS: $os")
    }
    [baseInclude, includeOS, libName]
}

def sourceDir = file("src/main/c")
def cSources = fileTree(dir: sourceDir, include: "**/*.c")
def objectFiles = cSources.files.collect { file ->
    new File(outputDir(), file.name.replaceAll(/\.c$/, ".o")).absolutePath
}

tasks.register('compileJni', Exec) {
    dependsOn project(":app").tasks.named("compileJava")
    outputs.dir outputDir()
    doFirst { outputDir().mkdirs() }

    def (baseInclude, includeOS, _) = osSettings()

    def compileArgs = cSources.files.collect { file ->
        [
            '-c',
            '-fPIC',
            '-I', jniHeaders().absolutePath,
            '-I', baseInclude.absolutePath,
            '-I', includeOS.absolutePath,
            file.absolutePath,
            '-o', new File(outputDir(), file.name.replaceAll(/\.c$/, ".o")).absolutePath
        ]
    }.flatten()

    commandLine 'gcc', *compileArgs
}

tasks.register('linkJni', Exec) {
    dependsOn tasks.named("compileJni")
    outputs.dir outputDir()
    doFirst { outputDir().mkdirs() }

    def (baseInclude, includeOS, libName) = osSettings()

    commandLine 'gcc',
        '-shared',
        '-fPIC',
        '-o', new File(outputDir(), libName).absolutePath,
        *objectFiles,
        '-I', jniHeaders().absolutePath,
        '-I', baseInclude.absolutePath,
        '-I', includeOS.absolutePath,
        '-L', '/usr/local/lib',
        '-l', 'rashunal',
        '-l', 'rmatrix',
        '-Wl,-rpath,/usr/local/lib'
}

tasks.named('build') {
    dependsOn tasks.named('compileJni')
    dependsOn tasks.named('linkJni')
}

Gradle subprojects have references to each other, so this library can get references to app's output directory to reference the JNI header file. The compileJni task is set to depend on app's compileJava task, and native's build task is set to depend on the compileJni and linkJni tasks defined in this file.

This worked if I explicitly called app's compileJava task and native's build task, but it failed after a clean task. It turned out Java's compile task wouldn't detect the deletion of the JNI header file as a change that required rebuilding, so I added the build directory as an output to the task (outputs.dir(headerDir)). Thus deleting that file (or cleaning the project) caused recompilation and rebuilding.

The nice thing is that this runs with a single command now (`./gradlew run`). Much nicer than entering all the command line commands by hand!

Reflection

As expected, this works but is very fragile. Particularly calling Java code from the native code depends on knowledge of the class and method signatures. If those change in the Java code, the project will compile and start just fine, but blow up pretty explosively and nastily with unclear explanations at runtime.

I was surprised by Gradle's basic tooling for C projects. I thought there would be more help than paralleling the command line so closely. I'll have to look into the `c-library` plugin to see if it offers any more help. I'm also surprised by how few blogs and Stack Overflow posts I found about this: apparently this isn't something very many people do (or live to tell the tale!).

Update

Turns out it is possible to compile C code with the cpp-library plugin, and it is a little more user-friendly than the bare bones C plugin.

I needed a common way to refer to the operating system name, so I put a library function in the root build.gradle file:


ext {
    // Normalize OS name into what Gradle's native plugin actually uses
    normalizedOsName = {
        def os = org.gradle.internal.os.OperatingSystem.current()
        if (os.isWindows()) {
            return "windows"
        } else if (os.isLinux()) {
            return "linux"
        } else if (os.isMacOsX()) {
            return "macos"
        } else if (os.isUnix()) {
            return "unix"
        } else {
            throw new GradleException("Unsupported OS: $os")
        }
    }
}

Then I can refer to it in app/build.gradle:


def osName = rootProject.ext.normalizedOsName()
def buildType = (project.findProperty("nativeBuildType") ?: "debug")

application {
    mainClass = 'org.jtodd.jni.jrmatrix.App'

    applicationDefaultJvmArgs = [
        "-Djava.library.path=${project(":native").layout.buildDirectory.dir("lib/main/${buildType}/${osName}").get().asFile.absolutePath}"
    ]
}

tasks.named("run") {
    def nativeLibs = project(":native").layout.buildDirectory.dir("lib")

    dependsOn(":native:assemble")
}

The buildType and osName variables were required because the native plugin puts the library in locations that depend on them.

native/build.gradle was completely rewritten:


plugins {
    id 'cpp-library'
}

library {
    linkage.set([Linkage.SHARED])
    targetMachines = [
        machines.windows.x86_64,
        machines.macOS.x86_64,
        machines.linux.x86_64,
    ]
    baseName = "jnirmatrix"

    binaries.configureEach {
        def compileTask = compileTask.get()
        compileTask.dependsOn(project(":app").tasks.named("compileJava"))

        compileTask.source.from fileTree(dir: "src/main/c", include: "**/*c")

        def jvmHome = System.getenv("JAVA_HOME")
        compileTask.includes.from(file("$jvmHome/include"))
        compileTask.includes.from(project(":app").layout.buildDirectory.dir("generated/sources/headers/java/main"))

        def os = org.gradle.internal.os.OperatingSystem.current()
        if (os.isWindows()) {
            compileTask.includes.from("$jvmHome/include/win32")
            compileTask.includes.from(file("C:/headers/rashunal/include"))
            compileTask.includes.from(file("C:/headers/rmatrix/include"))
            compileTask.compilerArgs.add("/TC")
        } else if (os.isLinux()) {
            compileTask.includes.from(file("$jvmHome/include/linux"))
            compileTask.compilerArgs.addAll(["-x", "c", "-fPIC", "-std=c11"])
        } else if (os.isMacOsX()) {
            compileTask.includes.from(file("$jvmHome/include/darwin"))
            compileTask.compilerArgs.addAll(["-x", "c", "-fPIC", "-std=c11"])
        } else if (os.isUnix()) {
            compileTask.includes.from(file("$jvmHome/include/freebsd"))
            compileTask.compilerArgs.addAll(["-x", "c", "-fPIC", "-std=c11"])
        } else {
            throw new GradleException("Unsupported OS for JNI build: $os")
        }

        def linkTask = linkTask.get()
        if (toolChain instanceof GccCompatibleToolChain) {
            linkTask.linkerArgs.addAll([
                "-L/usr/local/lib",
                "-lrashunal",
                "-lrmatrix",
                "-Wl,-rpath,/usr/local/lib"
            ])
        } else if (toolChain instanceof VisualCpp) {
            linkTask.linkerArgs.addAll([
                "C:/libs/rashunal.lib",
                "C:/libs/rmatrix.lib"
            ])
        }
    }
}

def osName = rootProject.ext.normalizedOsName().capitalize()
def buildType = (project.findProperty("nativeBuildType") ?: "debug").capitalize()
def targetTaskName = "link${buildType}${osName}"

tasks.named("assemble") {
    dependsOn tasks.named(targetTaskName)
}

The plugin is cpp-library, not cpp-application because it is building a shared library, not an application. That might have been the problem I had before.

I set the linkage to shared (not static), and the machines I'm targeting. Then I set the base name of the shared library.

The binaries configuration has a dependency on the compile task of the app library, and it gets a list of source files.

JAVA_HOME is queried and the header files common to all tasks are set. Then additional headers and compiler flags are set based on operating system. Then linker arguments are set, again based on operating system.

Finally, the build type and operating system name are used to set the assemble task. This determines the location of the shared library (build/lib/main/[debug|release]/[linux|macos|windows]).

Details on Windows compilation

Compiling and linking was especially complicated on Windows. Specifically, the JNI implementation and the target libraries had to match exactly in CPU architecture (32-bit vs. 64-bit) and release configuration (Release vs. Debug). It took a while and a lot of back and forth with ChatGPT to figure it out.

  1. Open a 64-bit specific Visual Studio developer window.
    • Building for 64-bit is not the default in Windows, and NMake doesn't allow you to set the architecture when it's invoked. Hence the specific window to do it.
    • In the Windows Search bar start typing "x64". Choose "x64 Native Tools Command Prompt for VS 2022".
  2. Make a build directory in the native project. To distinguish it from any ordinary development directory I called it build-release. Change directories into it.
  3. CMake the project. On Windows NMake is most like GNU make, and it comes preinstalled with Visual Studio.
  4. 
    >cmake .. -G "NMake Makefiles" ^
      -DCMAKE_BUILD_TYPE=Release ^
      -DCMAKE_INSTALL_PREFIX=C:/Users/john.todd/local/rashunal ^
      -DCMAKE_C_FLAGS_RELEASE="/MD /O2 /DNDEBUG"
    >nmake
    >nmake install
    
  5. To verify the architecture use dumpbin to check the headers of the created dll.
  6. 
    >cd /Users/john.todd/local/rashunal/bin
    >dumpbin /headers rashunall.dll | findstr machine
                8664 machine (x64)
    
  7. Finally, add the complete paths to the DLLs to PATH, specify to build the native code as Release, and call the Java class. (The enable-native-access switch isn't required, but it does suppress some warnings.)
  8. 
    > $env:PATH += ";C:\Users\john.todd\local\rashunal\bin;C:\Users\john.todd\local\rmatrix\bin"
    > ./gradlew clean
    > ./gradlew build -PnativeBuildType=Release
    > ./gradlew run --args="C:/Users/john.todd/source/repos/rmatrix/driver/example.txt"
    

After all that it finally worked on Windows, joining Linux and MacOS. Not quite as nice, but at least it completes the big three operating systems.

https://github.com/proftodd/GoingNative/tree/main/jrmatrix