Widening and narrowing in Java

While reading the Java language specification (JLS 14), I immediately pay attention to the chapter 5 where it describes how widening and narrowing for primitive data types works in Java. It’s because I’m always interested in somethings that go behind the sense, somethings that are at a low level, for example things related to binary representation of data.

Java has the following primitive data types that are summarized in a table like so:

Type Description Size Range Default value
byte two’s complement integer 1 byte whole numbers from -128 to 127 0
short two’s complement integer 2 bytes whole numbers from -32,768 to 32,767 0
int two’s complement integer 4 bytes whole numbers from -2,147,483,648 to 2,147,483,647 0
long two’s complement integer 8 bytes whole numbers from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 0L
float IEEE 754 floating point 4 bytes approximately ±3.40282347E+38F (6-7 significant decimal digits) 0.0f
double IEEE 754 floating point 8 bytes approximately ±1.79769313486231570E+308 (15 significant decimal digits) 0.0d
char single Unicode character 2 bytes Unicode character from ‘\u000’ (or 0) to ‘\uffff’ (or 65,535 inclusive) ‘\u0000’
boolean true or false 1 bit true and false false

Widening happens implicitly when converting a smaller type to a larger size type:

  • byte to short, int, long, float, or double

  • short to int, long, float, or double

  • char to int, long, float, or double

  • int to long, float, or double

  • long to float or double

  • float to double

Narrowing happens explicitly by using the keyword cast when converting a larger type to a smaller size type:

  • short to byte or char

  • char to byte or short

  • int to byte, short, or char

  • long to byte, short, char, or int

  • float to byte, short, char, int, or long

  • double to byte, short, char, int, long, or float

Let me introduce some examples to see how some of these work.

Widening conversion from byte to int

public class Main {
   public static void main(String[] args) {
       int largeNumber;
       byte smallNumber = 10;
       largeNumber = smallNumber;
       System.out.println(largeNumber);
   }
}

In this case, after the assignment largeNumber = smallNumber the largeNumber will be having the value 10 meaning the conversion happens without losing information. This is because a byte data type takes up just one byte so it can fit within an int data type that takes up 4 bytes without issues.

Sign extension

A widening conversion of a signed integer value to an integral type T simply sign extends the two’s-complement representation of the integer value to fill the wider format. – JLS14

Let’s consider the above example to see how sign extension works in this case.

The two’s-complement representation of a byte data type with value of 10 is 00001010(8 bits here because of abyte data type). The sign bit is the most significant bit or the leftmost bit, in this case is 0. To retrieve the converted value of 10 in the int data type, the sign extension will fill all the extra bits on the left of the 8 bits representation until the result reaches 32 bits in length (because the size of an int is 32 bits). This results in 00000000 00000000 00000000 00001010. This is the two’s-complement representation of 10 as an int.

If the smallNumber was initialized with -10 then the two’s-complement representation of it is 11110110, thus the sign bit (left most bit) is 1. The sign extension will fill all the extra bits on the left of this representation with bit 1 until the result reaches 32 bits in length. In result, 11111111 11111111 11111111 11110110wil be produced as the two’s-complement representation of -10.

char data type is a special one

Let’s talk a bout how a char data type is represented in Java. A char is used to represent a single Unicode character using the UTF-16 character encoding format. In UTF-16 each character is represented using a notion of a prefix U+ plus some designated hexadecimal number. This notion is called the code point of the character. For example the character has the code point U+6700. In Java this character can be represented using 2 bytes: 0x67 and 0x00 and stored in a char variable. The binary representation of the code point for this character is 110011100000000 which is called a code unit. Let’s see a code example:

public class Main {
    public static void main(String[] args) {
        char c = '\u6700';
        System.out.println(c);
    }
}

Running the code above will output the in the console.

Since 110011100000000 in binary is 26368 in decimal. You can also using an integer literal 26368 to declare and initialize the character like in the following:

public class Main {
    public static void main(String[] args) {
        char c = 26368;
        System.out.println(c);
    }
}

Running the code above will also output the in the console.

Widening conversion from int to float, or from long to float, or from long to double

Since we’re converting from a data type to another one which is of a smaller size: from long to float; or from a data type to another one which is of the same size: from int to float, from long to double but has to hold both the decimal and fractional part, loss of precision might occur.

Let’s see a code example

public class Main {
    public static void main(String[] args) {
        int big = 1234567890;
        float approx = big;
        System.out.println(big - (int)approx);
    }
}

Running the code above will output -46 in the console thus indicating that information was lost during the conversion from type int to type float because values of type float are not precise to nine significant digits.

Narrowing conversion from int to short

Let’s consider a code example

public class Main {
    public static void main(String[] args) {
        int largerNumber = 99990000;
        short smallerNumber;
        smallerNumber = (short) largerNumber;
        System.out.println(smallerNumber);
    }
}

Since 99990000 is outside of the range supported by the short data type, the code without a cast as (short) will not compile. Java forces users to explicitly specify their attention by using a cast because it’s a way to signify users of possibility of losing information about the sign and magnitude of the numeric values and also precision as in this case, the result produced by the code to the console is -17936. How come is that?

In Java all integral primitive data types including byte, short, int, long are represented in memory using two’s-complement representation. The two’s-complement representation of 99990000 is 00000101 11110101 10111001 11110000. This is also the 4 bytes of information stored in the memory for an int. When casting to a short which is 2 bytes in size, only 16 least significant bits which is 10111001 11110000 will be kept and other information will be discarded (loss). 10111001 1111000 is the two’s complement representation of -17936. That why the result is like what it is.

Summary: while working with primitive data types in Java, one can face issues of losing information when converting between them either explicitly with a cast or implicitly without it. This loss can result in unexpected errors that are not our intention. To prevent this from happening, a good habit is to keep in mind the range of the data type involved in. Also understanding the two’s-complement representation of integral types can really help in some cases.