Widening and narrowing in Java
While reading the Java language specification (JLS 14), I immediately pay attention to the chapter 5 where it describes how widening and narrowing for primitive data types works in Java. It’s because I’m always interested in somethings that go behind the sense, somethings that are at a low level, for example things related to binary representation of data.
Java has the following primitive data types that are summarized in a table like so:
Type | Description | Size | Range | Default value |
---|---|---|---|---|
byte | two’s complement integer | 1 byte | whole numbers from -128 to 127 | 0 |
short | two’s complement integer | 2 bytes | whole numbers from -32,768 to 32,767 | 0 |
int | two’s complement integer | 4 bytes | whole numbers from -2,147,483,648 to 2,147,483,647 | 0 |
long | two’s complement integer | 8 bytes | whole numbers from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 | 0L |
float | IEEE 754 floating point | 4 bytes | approximately ±3.40282347E+38F (6-7 significant decimal digits) | 0.0f |
double | IEEE 754 floating point | 8 bytes | approximately ±1.79769313486231570E+308 (15 significant decimal digits) | 0.0d |
char | single Unicode character | 2 bytes | Unicode character from ‘\u000’ (or 0) to ‘\uffff’ (or 65,535 inclusive) | ‘\u0000’ |
boolean | true or false | 1 bit | true and false | false |
Widening happens implicitly when converting a smaller type to a larger size type:
-
byte to short, int, long, float, or double
-
short to int, long, float, or double
-
char to int, long, float, or double
-
int to long, float, or double
-
long to float or double
-
float to double
Narrowing happens explicitly by using the keyword cast
when converting a larger type to a smaller size type:
-
short to byte or char
-
char to byte or short
-
int to byte, short, or char
-
long to byte, short, char, or int
-
float to byte, short, char, int, or long
-
double to byte, short, char, int, long, or float
Let me introduce some examples to see how some of these work.
Widening conversion from byte to int
public class Main {
public static void main(String[] args) {
int largeNumber;
byte smallNumber = 10;
largeNumber = smallNumber;
System.out.println(largeNumber);
}
}
In this case, after the assignment largeNumber = smallNumber
the largeNumber
will be having the value 10
meaning the conversion happens without losing information. This is because a byte
data type takes up just one byte so it can fit within an int
data type that takes up 4 bytes without issues.
Sign extension
A widening conversion of a signed integer value to an integral type T simply sign extends the two’s-complement representation of the integer value to fill the wider format. – JLS14
Let’s consider the above example to see how sign extension works in this case.
The two’s-complement representation of a byte
data type with value of 10
is 00001010
(8 bits here because of abyte
data type). The sign bit is the most significant bit or the leftmost bit, in this case is 0
. To retrieve the converted value of 10
in the int
data type, the sign extension will fill all the extra bits on the left of the 8 bits representation until the result reaches 32 bits in length (because the size of an int
is 32 bits). This results in 00000000 00000000 00000000 00001010
. This is the two’s-complement representation of 10
as an int
.
If the smallNumber
was initialized with -10
then the two’s-complement representation of it is 11110110
, thus the sign bit (left most bit) is 1
. The sign extension will fill all the extra bits on the left of this representation with bit 1
until the result reaches 32 bits in length. In result, 11111111 11111111 11111111 11110110
wil be produced as the two’s-complement representation of -10
.
char data type is a special one
Let’s talk a bout how a char
data type is represented in Java. A char is used to represent a single Unicode character using the UTF-16 character encoding format. In UTF-16 each character is represented using a notion of a prefix U+
plus some designated hexadecimal number. This notion is called the code point of the character. For example the character 最
has the code point U+6700. In Java this character can be represented using 2 bytes: 0x67 and 0x00 and stored in a char
variable. The binary representation of the code point for this character is 110011100000000
which is called a code unit. Let’s see a code example:
public class Main {
public static void main(String[] args) {
char c = '\u6700';
System.out.println(c);
}
}
Running the code above will output the 最
in the console.
Since 110011100000000
in binary is 26368
in decimal. You can also using an integer literal 26368
to declare and initialize the character like in the following:
public class Main {
public static void main(String[] args) {
char c = 26368;
System.out.println(c);
}
}
Running the code above will also output the 最
in the console.
Widening conversion from int to float, or from long to float, or from long to double
Since we’re converting from a data type to another one which is of a smaller size: from long
to float
; or from a data type to another one which is of the same size: from int
to float
, from long
to double
but has to hold both the decimal and fractional part, loss of precision might occur.
Let’s see a code example
public class Main {
public static void main(String[] args) {
int big = 1234567890;
float approx = big;
System.out.println(big - (int)approx);
}
}
Running the code above will output -46
in the console thus indicating that information was lost during the conversion from type int to type float because values of type float are not precise to nine significant digits.
Narrowing conversion from int to short
Let’s consider a code example
public class Main {
public static void main(String[] args) {
int largerNumber = 99990000;
short smallerNumber;
smallerNumber = (short) largerNumber;
System.out.println(smallerNumber);
}
}
Since 99990000
is outside of the range supported by the short
data type, the code without a cast as (short)
will not compile. Java forces users to explicitly specify their attention by using a cast because it’s a way to signify users of possibility of losing information about the sign and magnitude of the numeric values and also precision as in this case, the result produced by the code to the console is -17936
. How come is that?
In Java all integral primitive data types including byte, short, int, long are represented in memory using two’s-complement representation. The two’s-complement representation of 99990000
is 00000101 11110101 10111001 11110000
. This is also the 4 bytes of information stored in the memory for an int
. When casting to a short
which is 2 bytes in size, only 16 least significant bits which is 10111001 11110000
will be kept and other information will be discarded (loss). 10111001 1111000
is the two’s complement representation of -17936
. That why the result is like what it is.
Summary: while working with primitive data types in Java, one can face issues of losing information when converting between them either explicitly with a cast or implicitly without it. This loss can result in unexpected errors that are not our intention. To prevent this from happening, a good habit is to keep in mind the range of the data type involved in. Also understanding the two’s-complement representation of integral types can really help in some cases.