Δευτέρα 3 Ιουνίου 2013

Parsing a C/C++ struct in Java

Assume you are a "lucky" guy that your Java application interfaces with a C/C++ application (e.g. a kind of server) which sends you some kind of TCP/UDP network messages you need to parse. An example such C/C++ structure is shown below:


enum Gender { MALE, FEMALE };
struct msg {
  #ifdef INTEL_STYLE
      UCHAR spare4:4;
      UCHAR octal:3;
      UCHAR bool:1;
  #else
      UCHAR bool:1;
      UCHAR octal:3;
      UCHAR spare4:4;
  #endif
  UINT uint;  
  char str[5];
  float flt;

  enum Gender gender;
}


If your C/C++ application runs on an INTEL (x86) based machine architecture, then you receive the bits as little endian (see INTEL_STYLE above), otherwise as big endian (e.g. SPARC machines). Note that the JVM is big endian, too. In the following we assume a big-endian architecture.
In this blog entry we are going to see how you can parse such a message in your receiving Java application. 

What will you need?

  • the javolution library to parse the C/C++ struct in Java
  • a calculator that handles binaries, hexadecimals and decimals (Windows, Linux and MacOSX already provide such calculators. However, they don't handle decimal point numbers, so this online converter will prove useful, too).
The following table shows how the C/C++ data types correspond to Javolution Struct.

C Java (Javolution Struct)
UCHAR
Unsigned8
UWORD
Unsigned16
UINT
Unsigned32
byte
Signed8
short
Signed16
int
Signed32
long
Signed64
long long
Signed64
float
Float32
double
Float64
pointer
Reference32
char[]
UTF8String
enum
Enum32

Let's get started. 

The following Java class represents the above C/C++ struct in Java:

import java.nio.ByteBuffer;

public class Message extends javolution.io.Struct {     
   private final Unsigned8 bool = new Unsigned8(1);     
   private final Unsigned8 octal = new Unsigned8(3);     
   private final Unsigned8 spare2 = new Unsigned8(4);
   private final Unsigned32 uint = new Unsigned32();

   private final UTF8String str = new UTF8String(5); 
   private final Float32 flt = new Float32();  
   private final Enum32 gender = new Enum32(Gender.values());  

   public Message (byte[] b) {         
       this.setByteBuffer(ByteBuffer.wrap(b), 0);     
   }
   
   public boolean getBool() {      
       return bool.get() != 0;     
   }
   
   public int getOctal() {         
       return octal.get();     
   } 

   public long getUInt() {                 
       return uint.get();
   }

   public String getStr() {             

       return str.get();         
   } 

   public float getFlt() {         
       return flt.get();     
   } 


   public Gender getGender() {
       return gender.get();
   }
 }

enum Gender { MALE, FEMALE };

Our Message class corresponds to the C msg struct. It extends javolution.io.Struct which is an implementation of the java.nio.ByteBuffer. This crash course about Java ByteBuffer provides useful background information. 
The C/C++ struct starts with a UCHAR which corresponds to Unsigned8, i.e. one byte. The numbers after the colons (:) denote how many bits inside the byte represent each field of the UCHAR. Thus, octal:3 means that 3 bits represent the octal field. This in Javolution is represented by Unsigned8(3)

UINT is represented by Unsigned32 in javolution, which is 4 bytes long.
The string char[5] is represented by UTF8String(5)float by Float32.
Finally, the enum is represented by Enum32.

Let's create a unit test to test the above:

import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

public class MessageTest {

    private Message msg;

    @Before
    public void setUp() {
      byte[] bb = new byte[] {
        (byte) 0x90,  // 1001 0000

        (byte) 0x00, (byte) 0x00, (byte) 0x00, // alignment with previous!
        (byte) 0x00, (byte) 0x00, (byte) 0x00, (byte) 0x02, // uint
        (byte) 0x48, (byte) 0x41, (byte) 0x4C, (byte) 0x4C, (byte) 0x4F, // str

        (byte) 0x00, (byte) 0x00, (byte) 0x00, // alignment with previous!
        (byte) 0x3F, (byte) 0xC0, (byte) 0x00, (byte) 0x00, // flt
        (byte) 0x00, (byte) 0x00, (byte) 0x00, (byte) 0x01, // gender
      };
      msg = new Message(bb);
    }

    @After
    public void tearDown() {
    }

    @Test
    public void testMessage() {
       assertTrue(msg.getBool());         // 1 = true
       assertEquals(1, msg.getOctal());   // 001

       assertEquals(2, msg.getUInt());    
       assertEquals("HALLO", msg.getStr());
       assertEquals(1.5, msg.getFlt(), 0.0);   
       assertEquals(Gender.FEMALE, msg.getGender());   
    }
}



The first byte 0x90 corresponds to the binary value 1001 0000. The first bit (1) represents bool:1, the next three (001) the octal:3, and the last four (0000) spare:4.  
Be careful of the alignment! 1 byte + 3 bytes (of alignment) and the next field (uint) starts at the 5th byte and not at the 2nd as you might have expected. 
The next 4 bytes correspond to the uintThe next 5 ASCII characters correspond to the string "HALLO". Again, another alignment, and then the float field. The last 4 bytes represent the Gender enum which contains the value 1, i.e. Gender.FEMALE.

Packed 

However, your data might be packed, i.e. no alignment/padding is happening. To do this, you override the isPacked() method of javolution.io.Struct:

public class Message extends javolution.io.Struct {
  ...
  @Override
  public boolean isPacked() {
      return true;
  }
  ...
}

Now your test case data should contain no padding in order to pass:


import org.junit.After;
import org.junit.Before;
import org.junit.Test;

import static org.junit.Assert.assertEquals;
import static org.junit.Assert.assertTrue;

public class MessageTest {

    private Message msg;

    @Before
    public void setUp() {
      byte[] bb = new byte[] {
        (byte) 0x90,  // 1001 0000
   //    (byte) 0x00, (byte) 0x00, (byte) 0x00, // alignment with previous!
        (byte) 0x00, (byte) 0x00, (byte) 0x00, (byte) 0x02, // uint
        (byte) 0x48, (byte) 0x41, (byte) 0x4C, (byte) 0x4C, (byte) 0x4F, // str
   //    (byte) 0x00, (byte) 0x00, (byte) 0x00, // alignment with previous!
        (byte) 0x3F, (byte) 0xC0, (byte) 0x00, (byte) 0x00, // flt
        (byte) 0x00, (byte) 0x00, (byte) 0x00, (byte) 0x01, // gender
     };
     msg = new Message(bb);
    }

    @After
    public void tearDown() {
    }

    @Test
    public void testMessage() {
      assertTrue(msg.getBool());         // 1 = true
      assertEquals(1, msg.getOctal());   // 001
      assertEquals(2, msg.getUInt());    
      assertEquals("HALLO", msg.getStr());
      assertEquals(1.5, msg.getFlt(), 0.0);   
      assertEquals(Gender.FEMALE, msg.getGender());
    }
}

Conclusion 

This concludes what you need to know to parse a C/C++ struct in Java. However, keep in mind the following gotchas of Javolution:
  • All Structs should be declared final.
  • Javolution doesn't support nested structs; you need to flaten your C/C++ structs in Java. E.g.
struct Identification { 
    byte b; 
    long l; 

struct msg { 
    struct Identification id; 
    int i; 
}

should be represented by:


public class Message extends javolution.io.Struct {     

   private final Signed8 b = new Signed8();     

   private final Signed64 l = new Signed64();     

   private final Signed32 i = new Signed32(); 
...
}

  • Javolution array can only accept members of Struct.Member. The following will not work:
  private final Reference32[] refs = array(new Reference32[2]);

and you need to replace it by:

  private final Signed32[] refs = array(new Signed32[2]);

The following won't work neither:
  
private final AStruct[] aStruct = array(new AStruct[2]);


Happy parsing!

(You may wish to write a parser to automatically parse the C/C++ source file and generate a Java java.nio.Struct file based on the above mappings. Please let me know).

1 σχόλιο:

Unknown είπε...

How do we handle C- enum { MALE=10,FAMELE=11} in javolution Enum32 ?