Monday, September 29, 2025

Going Native - C#

"I belong to the warrior in whom the old ways have joined the new."

Inscription on the sword wielded by Captain Nathan Algren, The Last Samurai

From the JVM to the CLR

After struggling through getting the FFM to work, I wasn't sure to expect from .NET. Nevertheless, that's the next language I'm most familiar with it, so I went ahead and plunged in.

Here is a description of the native library.

I started by declaring structs that mirrored the (public) structs in the native libraries:


[StructLayout(LayoutKind.Sequential)]
private struct Rashunal
{
    public int numerator;
    public int denominator;
}
[StructLayout(LayoutKind.Sequential)]
private struct GaussFactorization
{
    public IntPtr PInverse;
    public IntPtr Lower;
    public IntPtr Diagonal;
    public IntPtr Upper;
}

The attributes indicate that the structs are laid out in memory with one field directly following on the previous one. IntPtr is a generic .NET class for a pointer to some memory location. You'll see it again!

Then the native functions are declared in a simple fashion that matches C#'s variable types, with attributes that declare what library to find it in and what the native method is. The methods (and the class) are declared partial because the implementation is provided by the native code. According to convention the C# function and the native function have the same name, but that's not required.


[LibraryImport("rashunal", EntryPoint = "n_Rashunal")]
private static partial IntPtr n_Rashunal(int numerator, int denominator);

[LibraryImport("rmatrix", EntryPoint = "new_RMatrix")]
private static partial IntPtr new_RMatrix(int height, int width, IntPtr data);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_height")]
private static partial int RMatrix_height(IntPtr m);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_width")]
private static partial int RMatrix_width(IntPtr m);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_get")]
private static partial IntPtr RMatrix_get(IntPtr m, int row, int col);

[LibraryImport("rmatrix", EntryPoint = "RMatrix_gelim")]
private static partial IntPtr RMatrix_gelim(IntPtr m);

Then the native methods can be called alongside normal C# code. I'll go in reverse of the actual process of factoring a matrix using the native code.


public static CsGaussFactorization Factor(Model.CsRMatrix m)
{
    var nativeMPtr = AllocateNativeRMatrix(m);
    var fPtr = RMatrix_gelim(nativeMPtr);
    var f = Marshal.PtrToStructure(fPtr);
    var csF = new CsGaussFactorization
    {
        PInverse = AllocateManagedRMatrix(f.PInverse),
        Lower = AllocateManagedRMatrix(f.Lower),
        Diagonal = AllocateManagedRMatrix(f.Diagonal),
        Upper = AllocateManagedRMatrix(f.Upper),
    };
    NativeStdLib.Free(nativeMPtr);
    NativeStdLib.Free(fPtr);
    return csF;
}

First I call a method to allocate a native matrix (below), and then I call RMatrix_gelim on it, which returns a pointer to a native struct. Since the struct is part of the public native interface it can be unmarshaled into a C# object with the Marshal.PtrToStructure call. Then the native matrix pointers are used to construct managed matrices through the AllocateManagedRMatrix calls (also below). Finally, since the native matrix pointer and the factorization pointer are allocated by the native code, they have to be freed by a call to the native free method. Also see below.


private static IntPtr AllocRashunal(int num, int den)
{
    IntPtr ptr = NativeStdLib.Malloc((UIntPtr)Marshal.SizeOf());
    var value = new Rashunal { numerator = num, denominator = den };
    Marshal.StructureToPtr(value, ptr, false);
    return ptr;
}

private static IntPtr AllocateNativeRMatrix(Model.CsRMatrix m)
{
    int elementCount = m.Height * m.Width;
    IntPtr elementArray = NativeStdLib.Malloc((UIntPtr)(IntPtr.Size * elementCount));
    unsafe
    {
        var pArray = (IntPtr*)elementArray;
        for (int i = 0; i < elementCount; ++i)
        {
            var element = m.Data[i];
            var elementPtr = AllocRashunal(element.Numerator, element.Denominator);
            pArray[i] = elementPtr;
        }
        var rMatrixPtr = new_RMatrix(m.Height, m.Width, elementArray);
        for (int i = 0; i < elementCount; ++i)
        {
            NativeStdLib.Free(pArray[i]);
        }
        NativeStdLib.Free(elementArray);
        return rMatrixPtr;
    }
}

Allocating a native RMatrix required native memory allocations, both for individual Rashunals and also for an array of Rashunal pointers. In a pattern that seems familiar now, I wrapped those calls in a NativeStdLib class that I promise to get to very soon. Allocating a Rashunal involves declaring a managed Rashunal struct, a pointer to a native Rashunal, and marshaling the struct to the pointer in native memory. The unsafe block is needed to treat the block of memory allocated for the pointer array as an actual array, instead of a block of unstructured memory. To get this to compile I had to add True to the PropertyGroup in the project file. Finally, I have to free both the individual allocated native Rashunals and the array of pointers to them, since new_RMatrix makes copies of them all.


private static Model.CsRMatrix AllocateManagedRMatrix(IntPtr m)
{
    int height = RMatrix_height(m);
    int width = RMatrix_width(m);
    var data = new CsRashunal[height * width];
    for (int i = 1; i <= height; ++i)
    {
        for (int j = 1; j <= width; ++j)
        {
            var rPtr = RMatrix_get(m, i, j);
            var r = Marshal.PtrToStructure(rPtr);
            data[(i - 1) * width + (j - 1)] = new CsRashunal { Numerator = r.numerator, Denominator = r.denominator };
            NativeStdLib.Free(rPtr);
        }
    }
    return new Model.CsRMatrix { Height = height, Width = width, Data = data, };
}

After all that, allocating a native RMatrix is not very interesting. The native RMatrix_get method returns a newly-allocated copy of the Rashunal at a position in the RMatrix, so it has to be freed the same way as before.

Ok, finally, as promised, here is the interface to loading the native standard library methods:


using System.Reflection;
using System.Runtime.InteropServices;

namespace CsRMatrix.Engine;

public static partial class NativeStdLib
{
    static NativeStdLib()
    {
        NativeLibrary.SetDllImportResolver(typeof(NativeStdLib).Assembly, ResolveLib);
    }

    private static IntPtr ResolveLib(string libraryName, Assembly assembly, DllImportSearchPath? searchPath)
    {
        if (libraryName == "c")
        {
            if (RuntimeInformation.IsOSPlatform(OSPlatform.Windows))
                return NativeLibrary.Load("ucrtbase.dll", assembly, searchPath);
            if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
                return NativeLibrary.Load("libc.so.6", assembly, searchPath);
            if (RuntimeInformation.IsOSPlatform(OSPlatform.OSX))
                return NativeLibrary.Load("libSystem.dylib", assembly, searchPath);
        }
        return IntPtr.Zero;
    }

    [LibraryImport("c", EntryPoint = "free")]
    internal static partial void Free(IntPtr ptr);

    [LibraryImport("c", EntryPoint = "malloc")]
    internal static partial IntPtr Malloc(UIntPtr size);
}

The platform-specific switching and filenames are pretty ugly, but neither ChatGPT nor I could find a way around it. At least it's confined to a single method in a single class in the project.

ChatGPT really wanted there to be library-specific ways to free Rashunals and factorizations. Then those methods could be declared and called the same way as the new_* methods. But I remained stubborn and said I didn't want to change the source code of the libraries. I was willing to recompile them as needed, but not to change the source code or the CMake files. Eventually, we found this way of handling the standard native library calls.

Getting the name of the file on Windows and getting this to compile and work was a little challenging. The C# code and the native code need to match exactly in the operating system (obviously), architecture (64-bit vs. 32-bit), and configuration (Debug vs. Release). It took a few more details than what I went through when compiling the JNI code.

Compiling on Windows

Windows is very careful about which library can free memory: it can only free memory that was allocated by the same library. Practically, that meant I needed to make sure I was allocating and freeing memory from the same runtime with the same C runtime model. That meant I needed to compile with the multi-threaded DLL (/MD) instead of the default multi-threaded (/MT) compiler flag. I also needed to use the right filename to link the libraries to. ChatGPT and I thought it was mscvrt initially. So I modified the steps to compile the library and checked its headers, imports, and dependencies. This again is in an x64 Native Tools Command Prompt for VS 2022.


>cmake .. -G "NMake Makefiles" ^
  -DCMAKE_BUILD_TYPE=Release ^
  -DCMAKE_INSTALL_PREFIX=C:/Users/john.todd/local/rashunal ^
  -DCMAKE_C_FLAGS_RELEASE="/MD /O2 /DNDEBUG"
>nmake
>nmake install
>cd /Users/john.todd/local/rashunal/bin
>dumpbin /headers rashunall.dll | findstr machine
            8664 machine (x64)

>dumpbin /imports rashunal.dll | findstr free
                          18 free

>dumpbin /dependents rashunal.dll

I didn't see msvcrt.dll, but did see VCRUNTIME140.DLL instead. ChatGPT said, "Ah, that's okay, that's actually better. msvcrt is the old way, ucrt (Universal CRT) is the new way." Then linking to "ucrtbase" in the NativeStdLib utility class (as shown above) worked.

Like with JNI, I had to add the Rashunal and RMatrix libraries to the PATH, and then it worked!


> $env:PATH += ";C:\Users\john.todd\local\rashunal\bin\rashunal.dll;C:\Users\john.todd\local\rmatrix\bin\rmatrix.dll"
> dotnet run C:\Users\john.todd\source\repos\rmatrix\driver\example.txt
Using launch settings from C:\Users\john.todd\source\repos\GoingNative\CsRMatrix\CsRMatrix\Properties\launchSettings.json...
Reading matrix from C:/Users/john.todd/source/repos/rmatrix/driver/example.txt
Starting Matrix:
[ {-2/1} {1/3} {-3/4} ]
[ {6/1} {-1/1} {8/1} ]
[ {8/1} {3/2} {-7/1} ]


PInverse:
[ {1/1} {0/1} {0/1} ]
[ {0/1} {0/1} {1/1} ]
[ {0/1} {1/1} {0/1} ]


Lower:
[ {1/1} {0/1} {0/1} ]
[ {-3/1} {1/1} {0/1} ]
[ {-4/1} {0/1} {1/1} ]


Diagonal:
[ {-2/1} {0/1} {0/1} ]
[ {0/1} {17/6} {0/1} ]
[ {0/1} {0/1} {23/4} ]


Upper:
[ {1/1} {-1/6} {3/8} ]
[ {0/1} {1/1} {-60/17} ]
[ {0/1} {0/1} {1/1} ]

What's even more exciting is that when I committed this to Github and pulled it down in Linux and MacOS, it also just worked (for MacOS after adding the install directories to DYLIB_LD_PATH, similarly to what I had to do with JNI.)

Optimization

Remembering to free pointers allocated by native code isn't so bad. I had to do it in Java with the FFM and when writing the libraries in the first place. But ChatGPT suggested an optimization to have the CLR do it automatically. After reassuring it many times that the new_*, RMatrix_get, and RMatrix_gelim native methods returned pointers to newly-allocated copies of the relevant entities and not pointers to the entities themselves, it said this was the perfect application of the handler pattern. Who can pass that up?

First I wrote some wrapper classes for the pointers returned from the native code:


internal abstract class NativeHandle : SafeHandle
{
    protected NativeHandle() : base(IntPtr.Zero, ownsHandle: true) { }

    protected NativeHandle(IntPtr existing, bool ownsHandle)
        : base(IntPtr.Zero, ownsHandle)
        => SetHandle(existing);

    public override bool IsInvalid => handle == IntPtr.Zero;

    protected override bool ReleaseHandle()
    {
        NativeStdLib.Free(handle);
        return true;
    }
}

internal sealed class RashunalHandle : NativeHandle
{
    internal RashunalHandle() : base() { }

    internal RashunalHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

internal sealed class RMatrixHandle : NativeHandle
{
    internal RMatrixHandle() : base() { }

    internal RMatrixHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

internal sealed class GaussFactorizationHandle : NativeHandle
{
    internal GaussFactorizationHandle() : base() { }

    internal GaussFactorizationHandle(IntPtr existing, bool ownsHandle)
        : base(existing, ownsHandle) { }
}

Then I had most of the native and managed code use the handles as parameters and return values instead of the pointers returned by the native code:


[DllImport("rashunal", EntryPoint = "n_Rashunal")]
private static extern RashunalHandle n_Rashunal(int numerator, int denominator);

[DllImport("rmatrix", EntryPoint = "new_RMatrix")]
private static extern RMatrixHandle new_RMatrix(int height, int width, IntPtr data);

[DllImport("rmatrix", EntryPoint = "RMatrix_height")]
private static extern int RMatrix_height(RMatrixHandle m);

[DllImport("rmatrix", EntryPoint = "RMatrix_width")]
private static extern int RMatrix_width(RMatrixHandle m);

[DllImport("rmatrix", EntryPoint = "RMatrix_get")]
private static extern RashunalHandle RMatrix_get(RMatrixHandle m, int row, int col);

[DllImport("rmatrix", EntryPoint = "RMatrix_gelim")]
private static extern GaussFactorizationHandle RMatrix_gelim(RMatrixHandle m);

private static Model.CsRMatrix AllocateManagedRMatrix(RMatrixHandle m)
{
    int height = RMatrix_height(m);
    int width = RMatrix_width(m);
    var data = new CsRashunal[height * width];
    for (int i = 1; i <= height; ++i)
    {
        for (int j = 1; j <= width; ++j)
        {
            using var rPtr = RMatrix_get(m, i, j);
            var r = Marshal.PtrToStructure(rPtr.DangerousGetHandle());
            data[(i - 1) * width + (j - 1)] = new CsRashunal { Numerator = r.numerator, Denominator = r.denominator };
        }
    }
    return new Model.CsRMatrix { Height = height, Width = width, Data = data, };
}

Note the switch from LibraryImport to DllImport on the struct declarations. LibraryImport is newer and more preferred, but for some reason it can't do the automatic marshaling of pointers into handles like DllImport can.

Now there's no need to explicitly free the pointers returned from RMatrix_get, n_Rashunal, n_RMatrix, and RMatrix_gelim. There are still some places where I have to remember to free memory, such as when the array of Rashunal pointers is allocated in AllocRashunal. There are also some calls to ptr.DangerousGetHandle() when I need to marshal a pointer to a struct. I tried to get rid of those, but apparently they are unavoidable.

I didn't like the repeated boilerplate code in the concrete subclasses of NativeHandle. I wanted to just use NativeHandle as a generic, i.e. NativeHandle, but that didn't work. ChatGPT said I needed a concrete class to marshal the native struct into, and that the structs I declared in the adapter wouldn't do it. That's also why the parameterless constructors are needed, for the marshaling code, even though they don't do anything but defer to the base class. So be it.

Reflection

After struggling so much with FFM, I was pleasantly surprised by how easy it was to work with C# and its method of calling native code. Interspersing the native calls with the managed code was pretty fun and easy, especially after refactoring to use handles to automatically dispose of allocated memory. It was a little tricky figuring out when I still had to marshal pointers into structs or vice versa, but the compiler and ChatGPT helped me figure it out pretty quickly.

So far, if given the choice of how to call my native libraries, C# and the CLR is definitely how I would do it.

Code repository

https://github.com/proftodd/GoingNative/tree/main/CsRMatrix

No comments:

Post a Comment