Discussion:
[Swig-devel] Questions about binary data typemaps
Vadim Zeitlin
2015-07-29 22:04:14 UTC
Permalink
Hello,

As http://www.swig.org/Doc3.0/Library.html#Library_nn10 explains, SWIG
provides convenient (char* STRING, size_t LENGTH) typemaps for passing
binary data. Unfortunately I have two problems with it:

The first one is relatively trivial and is that in my case I have a
function taking (as is not uncommon IME) "unsigned char* data, size_t
length" parameters and doing

%apply (char *STRING, size_t LENGTH) { (unsigned char* data, size_t length) };

results in uncompilable code (at least for Java, but probably other
languages too) because the generated code tries to assign "char*" to an
"unsigned char*" which can't be done in C++. I could work around this by
doing

%extend {
void MyFunc(char* data, size_t length) {
self->MyFunc(reinterpret_cast<unsigned char*>(data), length);
}
}

which works but is rather ugly.


The second problem is that this typemap is not available in all languages,
notably not in C# so currently this doesn't work at all there.


I'd like to fix both problems, especially the latter one but also the
former one if there are no objections to extending the typemap to unsigned
case. The trouble is that I'm not sure about how to do it because the
existing typemap seems to be defined in a number of places:

1. Lib/cdata.i
2. In various files under Lib/{chicken,go,guile,java,ocaml,php,pyke,r}
3. Lib/typemaps/string.swg (via Lib/typemaps/strings.swg included by it)

It looks like (3) is supposed to be the new way to do it, but why then
does Java do it in its own way? To be even more clear, my question is
whether C# should follow Java or use the generic typemaps library?

Thanks in advance for any hints,
VZ
William S Fulton
2015-08-03 19:10:18 UTC
Permalink
Post by Vadim Zeitlin
Hello,
As http://www.swig.org/Doc3.0/Library.html#Library_nn10 explains, SWIG
provides convenient (char* STRING, size_t LENGTH) typemaps for passing
The first one is relatively trivial and is that in my case I have a
function taking (as is not uncommon IME) "unsigned char* data, size_t
length" parameters and doing
%apply (char *STRING, size_t LENGTH) { (unsigned char* data, size_t length) };
results in uncompilable code (at least for Java, but probably other
languages too) because the generated code tries to assign "char*" to an
"unsigned char*" which can't be done in C++. I could work around this by
doing
%extend {
void MyFunc(char* data, size_t length) {
self->MyFunc(reinterpret_cast<unsigned char*>(data), length);
}
}
which works but is rather ugly.
The second problem is that this typemap is not available in all languages,
notably not in C# so currently this doesn't work at all there.
I'd like to fix both problems, especially the latter one but also the
former one if there are no objections to extending the typemap to unsigned
case. The trouble is that I'm not sure about how to do it because the
1. Lib/cdata.i
2. In various files under Lib/{chicken,go,guile,java,ocaml,php,pyke,r}
3. Lib/typemaps/string.swg (via Lib/typemaps/strings.swg included by it)
It looks like (3) is supposed to be the new way to do it, but why then
does Java do it in its own way? To be even more clear, my question is
whether C# should follow Java or use the generic typemaps library?
I thought these typemaps already had the appropriate casts because
they are already used for slightly different types, eg:

Examples/test-suite/char_binary.i:%apply (char *STRING, size_t LENGTH)
{ (const char *str, size_t len) }
Examples/test-suite/director_binary_string.i:%apply (char* STRING,
size_t LENGTH) { (const void* data, size_t datalen) };

Anyway, by all means add in the appropriate casts to char *. We use C
casts usually rather than reinterpret_cast, except in the UTL
languages which can use %reinterpret_cast which results in either a C
or C++ cast. I suggest adding this into all the different places and
making sure it is tested across the board by enhancing the
director_binary_string.i test.

I'd also like to see a solution for C#. A while ago when I expanded
support for a number of languages, I couldn't think of one for C#. The
other day I thought of some half baked ideas though around passing a
binary blob of data in something like a StringBuffer where the size is
in the first 4 or 8 bytes and the string in the remaining data. Did
you have a cunning plan up your sleeve? C# and Java do not use many
common typemaps in Lib/typemaps, so just add the C# support into
csharp.swg in the same sort of place that java.swg contains these
typemaps.

William

------------------------------------------------------------------------------
Vadim Zeitlin
2015-08-04 15:23:50 UTC
Permalink
On Mon, 3 Aug 2015 20:10:18 +0100 William S Fulton <***@fultondesigns.co.uk> wrote:

WSF> I thought these typemaps already had the appropriate casts because
WSF> they are already used for slightly different types, eg:
WSF>
WSF> Examples/test-suite/char_binary.i:%apply (char *STRING, size_t LENGTH)
WSF> { (const char *str, size_t len) }
WSF> Examples/test-suite/director_binary_string.i:%apply (char* STRING,
WSF> size_t LENGTH) { (const void* data, size_t datalen) };

That's because casting from "char*" to both "const char*" and "const
void*" works -- but casting to "unsigned char*" doesn't. Anyhow, I'll add
the casts.

WSF> Anyway, by all means add in the appropriate casts to char *. We use C
WSF> casts usually rather than reinterpret_cast, except in the UTL

It must be something embarrassingly simple, but I can't figure out what
does "UTL" stand for?

WSF> I'd also like to see a solution for C#. A while ago when I expanded
WSF> support for a number of languages, I couldn't think of one for C#. The
WSF> other day I thought of some half baked ideas though around passing a
WSF> binary blob of data in something like a StringBuffer where the size is
WSF> in the first 4 or 8 bytes and the string in the remaining data. Did
WSF> you have a cunning plan up your sleeve?

Nothing really cunning, I just thought to pass it as 2 parameters in the
intermediate level and unwrap a byte[] into them in C# code. Am I missing
some reason for which this wouldn't work?

Regards,
VZ
William S Fulton
2015-08-04 19:03:28 UTC
Permalink
Post by Vadim Zeitlin
On Mon, 3 Aug 2015 20:10:18 +0100 William S Fulton <
WSF> I thought these typemaps already had the appropriate casts because
WSF>
WSF> Examples/test-suite/char_binary.i:%apply (char *STRING, size_t LENGTH)
WSF> { (const char *str, size_t len) }
WSF> Examples/test-suite/director_binary_string.i:%apply (char* STRING,
WSF> size_t LENGTH) { (const void* data, size_t datalen) };
That's because casting from "char*" to both "const char*" and "const
void*" works -- but casting to "unsigned char*" doesn't. Anyhow, I'll add
the casts.
WSF> Anyway, by all means add in the appropriate casts to char *. We use C
WSF> casts usually rather than reinterpret_cast, except in the UTL
It must be something embarrassingly simple, but I can't figure out what
does "UTL" stand for?
Yes, it doesn't appear in the docs, but I'm afraid that this is the case
for what most of what Marcelo did, however, it is in the CHANGES files:

10/18/2005: mmatus
Added the Unified Typemap Library (UTL). It unifies the
typemaps for
python, ruby, tcl
Post by Vadim Zeitlin
WSF> I'd also like to see a solution for C#. A while ago when I expanded
WSF> support for a number of languages, I couldn't think of one for C#. The
WSF> other day I thought of some half baked ideas though around passing a
WSF> binary blob of data in something like a StringBuffer where the size is
WSF> in the first 4 or 8 bytes and the string in the remaining data. Did
WSF> you have a cunning plan up your sleeve?
Nothing really cunning, I just thought to pass it as 2 parameters in the
intermediate level and unwrap a byte[] into them in C# code. Am I missing
some reason for which this wouldn't work?
Ah I'm afraid it wouldn't work. The multi-arg typemaps collapse all the
input typemaps into one argument for all layers.

My not so cunning plan was to pass the data as say byte[] on the C# side
and then set the first 8 bytes as the length. The C side would then have
the length from the 1st 8 bytes and set the char *pointer to the beginning
plus 8. For the directorin typemaps, the array would need to be copied to
get the 8 byte offset (not good), otherwise use a callback to C# to create
the byte[], I dont' know!

A less optimal solution might be to keep the 2 parameters in the C# layer
as say (byte[] data, object datalen) and use boxing, so that if datalen ==
null, obtain datalen = data.Length. A user can then optionally override
with a different value of the length.

William
Vadim Zeitlin
2015-08-05 21:40:46 UTC
Permalink
On Tue, 4 Aug 2015 20:03:28 +0100 William S Fulton <***@fultondesigns.co.uk> wrote:

WSF> > WSF> I'd also like to see a solution for C#. A while ago when I expanded
WSF> > WSF> support for a number of languages, I couldn't think of one for C#. The
WSF> > WSF> other day I thought of some half baked ideas though around passing a
WSF> > WSF> binary blob of data in something like a StringBuffer where the size is
WSF> > WSF> in the first 4 or 8 bytes and the string in the remaining data. Did
WSF> > WSF> you have a cunning plan up your sleeve?
WSF> >
WSF> > Nothing really cunning, I just thought to pass it as 2 parameters in the
WSF> > intermediate level and unwrap a byte[] into them in C# code. Am I missing
WSF> > some reason for which this wouldn't work?
WSF>
WSF> Ah I'm afraid it wouldn't work. The multi-arg typemaps collapse all the
WSF> input typemaps into one argument for all layers.

As usual whenever I start looking at a new (for me) area of SWIG, I'm
completely lost: am I right in thinking that multi-argument typemaps are
not supported at all for C#? At least if I do

%typemap(imtype) (const char *data, size_t size) "byte[]"

I get a warning about no imtype being defined for size_t and the generated
code is completely wrong as it contains just "(byte[] data, size)". Am I
doing it wrong (but this seems to work for the other languages?) or is this
something that needs to be fixed in C# module?

WSF> My not so cunning plan was to pass the data as say byte[] on the C# side

So I'm already having trouble doing just this...

WSF> and then set the first 8 bytes as the length. The C side would then have
WSF> the length from the 1st 8 bytes and set the char *pointer to the beginning
WSF> plus 8. For the directorin typemaps, the array would need to be copied to
WSF> get the 8 byte offset (not good), otherwise use a callback to C# to create
WSF> the byte[], I dont' know!

Would it be acceptable to not support this typemap for the directors? I
don't see any good way to handle this neither and I think using binary data
with directors should be much more rare, typically you pass or get such
data directly from somewhere, I don't think I've seen many virtual
functions using it in C++.

WSF> A less optimal solution might be to keep the 2 parameters in the C# layer
WSF> as say (byte[] data, object datalen) and use boxing, so that if datalen ==
WSF> null, obtain datalen = data.Length. A user can then optionally override
WSF> with a different value of the length.

FWIW this example works for me with C# (and also Java, Python, Perl,
Ruby but this is not new):

---------------------------------- >8 --------------------------------------
%module bd

#ifdef SWIGCSHARP
%typemap(imtype) const char *data "byte[]"
%typemap(cstype) const char *data "byte[]"

%pragma(csharp) modulecode=%{
public static uint CountZeroes(byte[] data) {
return CountZeroes(data, (uint)data.Length);
}
%}
%csmethodmodifiers CountZeroes "private";
#else
%apply (char *STRING, size_t LENGTH) { (const char* data, size_t size) };
#endif

%inline %{
size_t CountZeroes(const char* data, size_t size) {
size_t nuls = 0;
for (size_t n = 0; n < size; n++ ) {
if (*data++ == '\0')
nuls++;
}
return nuls;
}
%}
---------------------------------- >8 --------------------------------------

But I have no idea how to generate the code currently in %pragma(csharp)
using the typemaps machinery. For the class methods it is actually done
with the cscode typemap, but for the class, not the parameters or even the
function using them.

Also, is there a way to avoid generating the method in the module class
instead of generating it and making it private? I could also call
bdPINVOKE.CountZeroes() directly from the public overload.

I'd like to make the above work automatically in C# but I'd need some
hints from you to advance. Otherwise I'll just stick with this ad hoc
solution for my own needs for now...

Thanks,
VZ
William S Fulton
2015-08-07 20:49:08 UTC
Permalink
Post by Vadim Zeitlin
WSF> > WSF> I'd also like to see a solution for C#. A while ago when I expanded
WSF> > WSF> support for a number of languages, I couldn't think of one for C#. The
WSF> > WSF> other day I thought of some half baked ideas though around passing a
WSF> > WSF> binary blob of data in something like a StringBuffer where the size is
WSF> > WSF> in the first 4 or 8 bytes and the string in the remaining data. Did
WSF> > WSF> you have a cunning plan up your sleeve?
WSF> >
WSF> > Nothing really cunning, I just thought to pass it as 2 parameters in the
WSF> > intermediate level and unwrap a byte[] into them in C# code. Am I missing
WSF> > some reason for which this wouldn't work?
WSF>
WSF> Ah I'm afraid it wouldn't work. The multi-arg typemaps collapse all the
WSF> input typemaps into one argument for all layers.
As usual whenever I start looking at a new (for me) area of SWIG, I'm
completely lost: am I right in thinking that multi-argument typemaps are
not supported at all for C#? At least if I do
%typemap(imtype) (const char *data, size_t size) "byte[]"
I get a warning about no imtype being defined for size_t and the generated
code is completely wrong as it contains just "(byte[] data, size)". Am I
doing it wrong (but this seems to work for the other languages?) or is this
something that needs to be fixed in C# module?
Maybe the warnings aren't very good for multi-arg typemaps, but usage
of multi-arg typemaps always requires the 'in' typemap. If you don't
have this typemap defined not much will work. The argument count in
all the layers is based off the 'in' typemap. You will need to define
the whole family of other typemaps (in, imtype, ctype, cstype,
csin...) using the same multi-arg types.
Post by Vadim Zeitlin
WSF> My not so cunning plan was to pass the data as say byte[] on the C# side
So I'm already having trouble doing just this...
WSF> and then set the first 8 bytes as the length. The C side would then have
WSF> the length from the 1st 8 bytes and set the char *pointer to the beginning
WSF> plus 8. For the directorin typemaps, the array would need to be copied to
WSF> get the 8 byte offset (not good), otherwise use a callback to C# to create
WSF> the byte[], I dont' know!
Would it be acceptable to not support this typemap for the directors? I
don't see any good way to handle this neither and I think using binary data
with directors should be much more rare, typically you pass or get such
data directly from somewhere, I don't think I've seen many virtual
functions using it in C++.
It can't be much more work for directors and I think the tests will
fail if you miss them out. I'll have a go if you get a good solution
for the non-director typemaps.
Post by Vadim Zeitlin
WSF> A less optimal solution might be to keep the 2 parameters in the C# layer
WSF> as say (byte[] data, object datalen) and use boxing, so that if datalen ==
WSF> null, obtain datalen = data.Length. A user can then optionally override
WSF> with a different value of the length.
FWIW this example works for me with C# (and also Java, Python, Perl,
---------------------------------- >8 --------------------------------------
%module bd
#ifdef SWIGCSHARP
%typemap(imtype) const char *data "byte[]"
%typemap(cstype) const char *data "byte[]"
%pragma(csharp) modulecode=%{
public static uint CountZeroes(byte[] data) {
return CountZeroes(data, (uint)data.Length);
}
%}
%csmethodmodifiers CountZeroes "private";
#else
%apply (char *STRING, size_t LENGTH) { (const char* data, size_t size) };
#endif
%inline %{
size_t CountZeroes(const char* data, size_t size) {
size_t nuls = 0;
for (size_t n = 0; n < size; n++ ) {
if (*data++ == '\0')
nuls++;
}
return nuls;
}
%}
---------------------------------- >8 --------------------------------------
But I have no idea how to generate the code currently in %pragma(csharp)
using the typemaps machinery. For the class methods it is actually done
with the cscode typemap, but for the class, not the parameters or even the
function using them.
Not possible using typemaps currently. Now you can see why I didn't
really have any decent solution and why my proposal was around using a
single type and hacking it with the length prefix.
Post by Vadim Zeitlin
Also, is there a way to avoid generating the method in the module class
instead of generating it and making it private? I could also call
bdPINVOKE.CountZeroes() directly from the public overload.
Not that I can think of, sorry.

William

------------------------------------------------------------------------------
Loading...