Windows IME for Unicode

Abstract

This documents investigates the construction of IMEs on Windows, and as a side effect provides the boilerplate code for doing so, as well as a useful IME for Unicode.

Table of Content

1.  Introduction
2.  Handling character names
 
2.1.  Getting the names
2.2.  Experimenting with the data
2.3.  Representing the data
2.4.  The key store
2.5.  The node store
2.6.  Parameters for execution
2.7.  The keyboard store
2.8.  The complete store
2.9.  Serializing the store
2.10.  Finding a character's name
3.  The IME core engine
 
3.1.  Cross-platform layer
3.2.  Formatting a USV
3.3.  Basic access the store
3.4.  Finding the name of a character
3.5.  Completing a character name
3.6.  Building the candidate list
3.7.  Retrieving the scalar value corresponding to the input
3.8.  Representing the IME state
3.9.  IME Overall Logic
3.10.  The keymaps
3.11.  Filling the result string
4.  The Windows IME
 
4.1.  Platforms
4.2.  Tools
4.3.  Setting up
4.4.  Debugging
4.5.  The IME framework on Windows
4.6.  The IME static members
4.7.  The IME instance members
4.8.  Loading the names database
4.9.  The taskbar menu
4.10.  DLL setup
4.11.  Putting it together
4.12.  The def file
4.13.  The resource files
4.14.  The sources file
4.15.  Registry entries

1. Introduction

This document investigates the construction of Input Method Editors (or IMEs) on recent Windows platforms. It attempts to complete the documentation provided by Microsoft, and assumes a minimal understanding of Windows programming.

The IME I have chosen to develop is targeted at people working with Unicode: it supports the input of arbitrary Unicode characters, by scalar value or by character name. When a character is identified, the inserted text can be just the character itself, or the character and its name.

I have tried to isolate the parts specific to this IME from the parts found in pretty much all IMEs, so that developing a new IME should be easier.

2. Handling character names

Remember that our input method supports the specification of a character to insert either via its code point or via its name. Furthermore, regardless of how the character was selected, we optionally insert its name. Therefore, we need some data structures to record names and code points.

In this section, we deal with these data structures, and the computations they support. We will keep an eye on a generalization of this machinery: while we are primarily interested in the Unicode names of the characters (and their various localizations), we may as well build something that works with arbitrary lists of names.

2.1. Getting the names

First, let’s assemble the character names. We take those from an XML representation of the Unicode Character Database and process it with a stylesheet that extracts only the name followed by a semicolon, followed by the scalar value, followed by a display indication.

We reject the names starting with CJK UNIFIED IDEOGRAPH- and HANGUL SYLLABLE; there are many such characters, and they follow a regular organization; we will take care of them specially. We also reject the control characters, since their name (<control>) is not very useful and they are unlikely to be entered (the U+ method allows us to enter them anyway).

The display indication is used when we insert the name of the character, following the Unicode convention. In all cases, we insert a string of the form “U+wxyz <character> <name>”, where character depends on the display indication:

names.xsl == <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:ucd="http://www.unicode.org/ns/2003/ucd/1.0" version="1.0"> <xsl:output method="text"/> <xsl:template match='/'> <xsl:apply-templates select='//ucd:char[@cp]'/> </xsl:template> <xsl:template match="ucd:char"> <xsl:variable name='gc'> <xsl:choose> <xsl:when test='@gc'><xsl:value-of select='@gc'/></xsl:when> <xsl:otherwise><xsl:value-of select='../@gc'/></xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name='nabare'> <xsl:choose> <xsl:when test='@na'><xsl:value-of select='@na'/></xsl:when> <xsl:otherwise><xsl:value-of select='../@na'/></xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name='na'> <xsl:choose> <xsl:when test='contains ($nabare, "*")'><xsl:value-of select='substring-before($nabare, "*")'/><xsl:value-of select='@cp'/></xsl:when> <xsl:otherwise><xsl:value-of select='$nabare'/></xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:choose> <xsl:when test='starts-with ($na, "CJK UNIFIED IDEOGRAPH-")'/> <xsl:when test='starts-with ($na, "HANGUL SYLLABLE ")'/> <xsl:when test='starts-with ($na, "&lt;control>")'/> <xsl:when test='$na = ""'/> <xsl:otherwise> <xsl:value-of select="$na"/> <xsl:text>;</xsl:text> <xsl:value-of select="@cp"/> <xsl:text>;</xsl:text> <xsl:choose> <xsl:when test='$gc="Mn" or $gc="Mc" or $gc="Me"'> <xsl:text>1</xsl:text> </xsl:when> <xsl:when test='$gc="Zs"'> <xsl:text>2</xsl:text> </xsl:when> <xsl:when test='$gc="Cf" or $gc="Cc" or $gc="Zl" or $gc="Zp"'> <xsl:text>3</xsl:text> </xsl:when> <xsl:otherwise> <xsl:text>0</xsl:text> </xsl:otherwise> </xsl:choose> <xsl:text>&#x0a;</xsl:text> </xsl:otherwise> </xsl:choose> </xsl:template> </xsl:stylesheet>

2.2. Experimenting with the data

Before we build our data structures, let‘s experiment a little bit to understand our data.

The list of names for Unicode 3.2 contains 13,789 names (after removing the controls, CJK ideographs, and Hangul syllables), and is about 400,000 characters. There is a fair amount of redundancy in the names (e.g. LATIN occurs 968 times), and indeed, compressing its serialization in UTF-8 with a common compression utility yields a file less than 100 Kbytes. Of course, this file does not support our operations very well, but it gives us a good lower bound for our data structures.

The completion operation, where we use the beginning of a name to prune the list of candidate names, suggests a tree organization, where a node represents the prefix of some names, and its children are the possible continuation for that prefix. Here is a fragment for the three names ACCOUNT OF, ACUTE ACCENT and ACUTE ANGLE:

In this graph, the red circle indicates complete names. Note that it is possible for a non-leaf node to represent a character, as some names are prefixes of others; for example, the node representing the isolated A in LATIN CAPITAL LETTER A WITH ACUTE will have a child labelled space and represent the character U+0041.

Here is a data structure to represent a tree node.

Node class, to represent a node in the name tree == static class Node { public int codePoint; public int gc; public Map children; public Node parent; public String key; public Node (Node parent, String key) { this.codePoint = -1; this.parent = parent; this.children = new TreeMap (); this.key = key; this.gc = 0; } compiler.tree.methods: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 }

If the node does not represent a character, then the codePoint field is -1. The children are indexed by the character they represent, also called their key, for convenient access. The node that represents the root of the tree is a little bit special, because its parent field is null, as is its key. For all the other nodes, these fields are non-null

Inserting a new string in the tree is rather straightforward. s.substring (from, to) is the string to insert, and codePoint is the code point for that name:

Node method to insert a string in a tree == public void insert (String s, int from, int to, int codePoint, int gc) { if (from < to) { String key = s.substring (from, from + 1).intern (); from = from + 1; Node n = (Node) children.get (key); if (n == null) { n = new Node (this, key); children.put (key, n); } n.insert (s, from, to, codePoint, gc); } else { this.codePoint = codePoint; this.gc = gc; } }

Let's have a few methods to investgate the tree. First let's count the number of nodes:

public int count () { int c = 1; // ourselves for (Iterator it = children.values ().iterator (); it.hasNext (); ) { c += ((Node) it.next ()).count (); } return c; }

Just to make sure that our structure is correct, we can recreate the data from the tree and compare with the original:

public void restore (Writer f) throws Exception { restore2 (f, ""); } public void restore2 (Writer f, String prefix) throws Exception { if (key != null) { prefix = prefix + key; } if (codePoint != -1) { f.write (prefix); f.write (";"); f.write (format (codePoint)); f.write (";"); f.write (Integer.toString (gc)); f.write ("\n"); } for (Iterator it = children.values ().iterator (); it.hasNext (); ) { ((Node) it.next ()).restore2 (f, prefix); } } public String format (int codePoint) { String s = Integer.toHexString (codePoint).toUpperCase (); if (s.length () < 4) { return "0000".substring (s.length ()) + s; } else { return s; } }

Let's run this to see where we are. First, our shell, with root as the root of our tree:

Compiler class == import java.io.LineNumberReader; import java.io.OutputStream; import java.io.FileOutputStream; import java.io.FileReader; import java.io.FileWriter; import java.io.Writer; import java.io.File; import org.xml.sax.SAXException; import javax.xml.parsers.SAXParserFactory; import java.util.Iterator; import java.util.Set; import java.util.Map; import java.util.SortedMap; import java.util.HashMap; import java.util.TreeMap; import java.util.Vector; class Compiler { compiler.tree: 1, 2, 3, 4 public static void main (String [] args) throws Exception { Node root = new Node (null, null); root.codePoint = -1; Compiler body: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 } }

Let's load our names list and report some numbers:

{ String name; int charCount = 0; int stringCount = 0; System.out.println ("---- loading the names in the tree"); LineNumberReader rd = new LineNumberReader (new FileReader (args[0])); while ((name = rd.readLine ()) != null) { int semi = name.indexOf (';'); int semi2 = name.indexOf (';', semi + 1); int codePoint = Integer.parseInt (name.substring (semi + 1, semi2), 16); int gc = Integer.parseInt (name.substring (semi2 + 1)); root.insert (name, 0, semi, codePoint, gc); stringCount++; charCount += semi; } System.out.println ("" + stringCount + " names"); System.out.println ("" + charCount + " characters"); int totalNodes = root.count (); System.out.println ("" + totalNodes + " char nodes"); }

We discover that we have 15,015 names, using 399,396 characters, but only 88,778 nodes. This is not too surprising since many names have a common prefix (e.g. LATIN), and that we represent character in those prefixes only once.

Let's restore the names list, to make sure our data structure is correct:

{ String name = args [0] + ".restore.1"; System.out.println ("---- restoring in " + name); FileWriter x = new FileWriter (name); root.restore (x); x.close (); }

As expected, the restored name list matches with original names list, modulo the order.

Going back to our picture of the tree, we observe that many nodes are not leaves and have a single child. Again, this should not be surprising: after LATIN CAPI, the next characters are always TAL. The nodes I, T, and A have this property. Let's count those nodes to see if we can exploit that property:

public int countSimple () { int c = 0; if (codePoint == -1 && children.size () == 1) { c++; } for (Iterator it = children.values ().iterator (); it.hasNext (); ) { c += ((Node) it.next ()).countSimple (); } return c; } { System.out.println ("---- counting the simple nodes"); int simpleNodes = root.countSimple (); System.out.println ("" + simpleNodes + " simple nodes"); }

Indeed, 69,387 nodes or about 80% are simple! Representing the simple nodes are separate nodes does not help us much, but it consumes quite a bit of space. If we collapse the adjacent simple nodes, we get the following graph:

Let's do it, and verify that our collapsing is correct:

public void collapse () { if (codePoint == -1 && children.size () == 1) { Node n = (Node) children.values ().iterator ().next (); parent.children.remove (key); n.key = (key + n.key).intern (); n.parent = parent; parent.children.put (n.key, n); n.collapse (); } else { Object [] n = children.values ().toArray (); for (int i = 0; i < n.length; i++) { ((Node) n[i]).collapse (); }} }

The astute reader will have noticed that our restore does not depend on the keys being single characters, so we can use it here.

{ System.out.println ("---- collapsing the simple nodes"); root.collapse (); int nodes = root.count (); System.out.println ("" + nodes + " nodes"); String name = args[0] + ".restore.2"; System.out.println ("---- restoring in " + name); FileWriter x = new FileWriter (name); root.restore (x); x.close (); }

We are left with 19,391 nodes, indeed quite a simplification.

Here is a further step to take a look at the nodes we now have, but outputing all the keys:

public void collectKeys (Map s) { if (key != null) { int n; if (s.get (key) != null) { n = ((Integer) s.get (key)).intValue () + 1; } else { n = 1; } s.put (key, new Integer (n)); } for (Iterator it = children.values ().iterator (); it.hasNext ();) { ((Node) it.next ()).collectKeys (s); } } Map keys = new HashMap (); { String name = args [0] + ".keys"; System.out.println ("---- collecting the keys in " + name); root.collectKeys (keys); System.out.println ("" + keys.size () + " distinct keys"); FileWriter x = new FileWriter (name); for (Iterator it = keys.keySet ().iterator(); it.hasNext (); ) { String s = (String) (it.next ()); int n = ((Integer) (keys.get (s))).intValue (); x.write (Integer.toString (n)); x.write (" "); x.write (s); x.write ("\n"); } x.close (); }

We have 4,310 unique keys (for 19,391 nodes). This indicates that many keys have multiple occurrences. Not surprisingly, the most frequent keys are individual letters (920 nodes have a single A key). We also find other frequent keys: "FINAL FORM" (172 occurrences), "SOLATED FORM" (116 occurrences), "NITIAL FORM" (116 occurrences) " WITH" (104 occurrences). The second and third are interesting: most names that contain ISOLATED FORM have a corresponding name with INITAL FORM instead, but the initial I is common between those pairs, so it is its own single node.

Clearly, storing 172 occurrences of "FINAL FORM" is redundant. Let's see how many characters we have if we store each unique key only once.

{ int keyStore = 0; int uniqueKeyStore = 0; for (Iterator it = keys.keySet ().iterator(); it.hasNext (); ) { String key = (String) (it.next ()); int count = ((Integer) (keys.get (key))).intValue (); keyStore += (key.length () + 1) * count; uniqueKeyStore += (key.length () + 1); } System.out.println (keyStore + " characters in keys"); System.out.println (uniqueKeyStore + " character in unique keys"); }

If we do not collapse keys, we have 108,167 characters. If we collapse, we bring down this number to 43,767, i.e. we save about 64K characters. Therefore it is useful to represent the keys outside the tree itself, in a big array, and have the nodes in the tree point to that data structure.

2.3. Representing the data

We are now ready to design our data structure.

Clearly, we want our IME to be relatively unobstrusive; in particular, we want to it to initialize fast, since it may be loaded and unloaded on demand. The best approach is to craft a data struture that can be loaded in one big blob of memory, and can be used right away. This implies in particular that all logical pointers are represented as offsets in that memory blob.

Not surprisingly, the code we have accumulated so far to investigate our data is useful to build this blob. It is not entirely a coincidence that the name of the class for this code is Compiler.

Here is a class to manipulate this blob of memory while we build it.

public static class Store { Members for Store: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14 }

2.4. The key store

In the Unicode standard, the character names are all made of printable ASCII characters only. However, the names can be translated in other languages, and indeed the French version of ISO 10646 has translated names. Thus, we really want to store character names made of arbitrary characters, and we therefore use one of the UTFs to represent to names.

In selecting a UTF, we have two conflicting approaches. The first observes that the overwhelming majority of characters in names will be ASCII characters, even in some translations; this points at UTF-8 as the most compact form. The second approach observes that the IME has to interact with other parts of the application platform, e.g. when it gets characters from the keyboard, when it displays its status and when it delivers its output; this points at UTF-16 as the most practical form. After experimenting a little bit with a UTF-8 representation, we decided that the cost of UTF-16 was justified by the corresponding simplification in the code.

Another concern is whether the names are normalized, and if so, how. At this point, it is difficult to make an intelligent choice, since the only names list we have is made of ASCII characters only, and therefore is always in NFC and NFD. We will actually ignore this aspect until we gain more experience with other names lists.

As we have noted earlier, we gain significantly if we can share the keys among the nodes, as multiple nodes have the same key. To achieve this, we store all the keys in a big array, and the nodes will simply point in this array.

To avoid storing the size of keys, we separate the keys by sentinels. Ideally, we would use a noncharacter for our sentinel; however, because they occupy 3 bytes in UTF-8, and because we can be sure that no control character is part of the character names, we use U+0000 instead.

This approach also provides a further opportunity for space optimization: if key A is a suffix of key B, we only need to store B, and we can simply point in the middle of B to represent A. However, this is probably a small optimization and we do not implement it here.

Here are the pieces to represent and manipulate the key store inside a store:

protected byte [] keyStore; protected int keyStoreCount; protected Map keyStoreMap; public int SENTINEL_UNIT = 0; public void initKeyStore () { keyStoreCount = 0; keyStore = new byte [1000]; appendUTF16Unit (SENTINEL_UNIT); keyStoreMap = new HashMap (); }

In keyStore, the UTF-16 sequences are serialized in little endian order. Only the positions 0 .. keyStoreCount-1 are used. keyStoreMap maps a string to its offset in keyStore.

Finding the offset of a key, and adding it if not present, is simple:

protected void appendUTF16Unit (int codeUnit) { keyStore [keyStoreCount++] = (byte) ((codeUnit & 0x00FF)); keyStore [keyStoreCount++] = (byte) ((codeUnit & 0xFF00) >> 8); } public int findKeyOffset (String s) { if (keyStoreMap.get (s) == null) { char[] chars = s.toCharArray (); if (keyStoreCount + 2 * (chars.length + 1) > keyStore.length) { byte[] temp = new byte [keyStore.length + Math.min (1000, 2 * (chars.length + 1))]; System.arraycopy (keyStore, 0, temp, 0, keyStoreCount); keyStore = temp; } int offset = keyStoreCount; for (int i = 0; i < chars.length; i++) { appendUTF16Unit (chars [i]); } appendUTF16Unit (SENTINEL_UNIT); keyStoreMap.put (s, new Integer (offset)); return offset; } else { return ((Integer) keyStoreMap.get (s)).intValue (); } }

Retrieving is only complicated by the fact that Java does not support unsigned bytes:

protected int getKeyByte (int offset) { int x = keyStore [offset]; if (x < 0) { x += 256; } return x; } protected char getUTF16Unit (int offset) { int b1 = getKeyByte (offset); int b2 = getKeyByte (offset + 1) << 8; return (char) (b1 | b2); } public String stringAt (int offset) { int nChars = 0; int o = offset; while (getUTF16Unit (o) != SENTINEL_UNIT) { o += 2; nChars += 1; } char[] chars = new char [nChars]; for (int i = 0; i < nChars; i++) { chars [i] = getUTF16Unit (offset); offset += 2; } return new String (chars); }

Let's report our size:

public int keyStoreSize () { return keyStoreCount; }

Here is the code that walks all the nodes in our tree and builds the key store. The keyStoreOffset member remembers the index (in bytes) in the key store of the key for the current node:

protected int keyStoreOffset; protected void populateKeyStore (Store store) { store.initKeyStore (); populateKeyStore2 (store); } protected void populateKeyStore2 (Store store) { if (key != null) { this.keyStoreOffset = store.findKeyOffset (key); } for (Iterator it = children.values ().iterator (); it.hasNext ();) { ((Node) it.next ()).populateKeyStore2 (store); } }

2.5. The node store

At this point, each node in our tree contains:

One aspect we have not dealt with so far is to find the name of character given its scalar value. Suppose we could somehow locate the node that corresponds to a character (that is, the last node to form the name of character), and that we can walk up in the tree: we can then reassemble the name from all the keys of those nodes.

To find the node that represents a character, we could build an explicit map. But here is an alternative that consumes no space, at the expense of a bit of computation. We represent each node as a sequence of fixed size units, each unit being either the pointer to the key, the scalar value or a pointer to one of the children. Futhermore, each unit records the type of data it contains, and given a pointer to a unit in a block, it is easy to find the boundaries of that block (much like it's easy to find the boundaries of a character in a UTF code unit sequence). The tree as a whole is represented by the concatenation of the nodes, and we impose one more constraint: the nodes that hold characters must appear in increasing scalar value order. The point of all this is that we can locate the node for a character by doing a binary search in tree, viewed as an array of units.

To walk up in the tree, the simplest is to store back pointers from the child nodes to their parents.

We know that the key store is not too big, of the order of 40K characters, which will turn in roughly 40 Kbytes. So an offset in the key store will fit comfortably in 24 bits. Similary, our node store is of limited size: 18K nodes, each node having about three units; again, an offset in the node store will fit comfortably in 24 bits. Finally, a scalar value also fits in 24 bits. Thus, we can make our units 32 bits, and use 8 bits to record the type of unit and the starting and ending units of a node. More precisely:

Note that all units are optional: for example, a node that does not represent a character does not have a character unit. Also, we do not store explicitly the number of child nodes; we put all the child units at the end of the node and the last one (if any) is therefore the last unit of the node, and marked as such.

Here are the Store members to manipulate the node store:

public static final int UNIT_SIZE = 4; protected static final byte FIRST_UNIT = 0x10; protected static final byte LAST_UNIT = 0x20; protected static final byte FUNCTION_MASK = 0x0F; protected static final byte PARENT_OFFSET = 0x00; protected static final byte KEY_OFFSET = 0x01; protected static final byte USV_VALUE = 0x02; protected static final byte CHILD_OFFSET = 0x03; protected byte [] nodeStore; public void initNodeStore (int size) { nodeStore = new byte [size]; firstUnit = false; currentNodeOffset = 0; }

To enter a node in the node store, the client must start with a call to startNode, followed by calls to the append* methods, and finalize by a call to endNode.

protected boolean firstUnit; protected int currentNodeOffset; public void startNode (int offset) { firstUnit = true; currentNodeOffset = offset; } public void appendParentOffset (int offset) { appendUnit (PARENT_OFFSET, offset); } public void appendKeyOffset (int offset) { appendUnit (KEY_OFFSET, offset); } public void appendUSV (int codePoint, int gc) { int n = (gc << 21) | codePoint; appendUnit (USV_VALUE, n); } public void appendChildOffset (int offset) { appendUnit (CHILD_OFFSET, offset); } public void endNode () { nodeStore [currentNodeOffset - 1] |= LAST_UNIT; } public void appendUnit (byte control, int value) { if (firstUnit) { control |= FIRST_UNIT; firstUnit = false; } nodeStore [currentNodeOffset++] = (byte) ((value & 0x000000FF)); nodeStore [currentNodeOffset++] = (byte) ((value & 0x0000FF00) >> 8); nodeStore [currentNodeOffset++] = (byte) ((value & 0x00FF0000) >> 16); nodeStore [currentNodeOffset++] = control; }

Here are a few methods to access a node unit:

protected int getNodeByte (int offset) { int x = nodeStore [offset]; if (x < 0) { x += 256; } return x; } public int getNodeUnitControl (int offset) { return getNodeByte (offset + 3); } public int getNodeUnitValue (int offset) { int b3 = getNodeByte (offset+0) << 0; int b2 = getNodeByte (offset+1) << 8; int b1 = getNodeByte (offset+2) << 16; return (b1 | b2 | b3); } public int getNodeUnitUSV (int offset) { return getNodeUnitValue (offset) & 0x1FFFFF; } public int getNodeUnitGc (int offset) { return (getNodeUnitValue (offset) >> 21) & 0x7; }

Finally, a method to get the size of the node store.

public int nodeStoreSize () { return nodeStore.length; }

To build the node store, we do the following:

  1. we first build a map from scalar values to nodes
  2. we assign a relative position to each and every node, by enumerating them starting the nodes in the map, in scalar value order. At the same time, we compute the size of each node and therefore find the absolute position of the next node.
  3. finally, we can build the node store.

protected int nodeStoreOffset; protected void collectCharNodes (SortedMap m) { if (codePoint != -1) { m.put (new Integer (codePoint), this); } for (Iterator it = children.values ().iterator (); it.hasNext ();) { ((Node) it.next ()).collectCharNodes (m); } } protected void populateNodeStore (Store store) { SortedMap m = new TreeMap (); collectCharNodes (m); int offset = 0; // put ourselves, the root, first offset = position (offset); // position the character nodes, in usv order for (Iterator it = m.values ().iterator (); it.hasNext ();) { offset = ((Node) it.next ()).position (offset); } // position the remaining nodes, in any order for (Iterator it = children.values ().iterator (); it.hasNext ();) { offset = ((Node) it.next ()).positionAndChildren (offset); } store.initNodeStore (offset); fillAndChildren (store); } protected int position (int offset) { this.nodeStoreOffset = offset; if (key != null) { offset += Store.UNIT_SIZE; } if (codePoint > 0) { offset += Store.UNIT_SIZE; } if (parent != null) { offset += Store.UNIT_SIZE; } offset += children.size () * Store.UNIT_SIZE; return offset; } protected int positionAndChildren (int offset) { if (this.nodeStoreOffset == 0) { offset = position (offset); } for (Iterator it = children.values ().iterator (); it.hasNext ();) { offset = ((Node) it.next ()).positionAndChildren (offset); } return offset; } protected void fill (Store store) { store.startNode (nodeStoreOffset); if (key != null) { store.appendKeyOffset (keyStoreOffset); } if (codePoint > 0) { store.appendUSV (codePoint, gc); } if (parent != null) { store.appendParentOffset (parent.nodeStoreOffset); } for (Iterator it = children.values ().iterator (); it.hasNext (); ) { store.appendChildOffset (((Node) it.next ()).nodeStoreOffset); } store.endNode (); } protected void fillAndChildren (Store store) { fill (store); for (Iterator it = children.values ().iterator (); it.hasNext (); ) { ((Node) it.next ()).fillAndChildren (store); } }

2.6. Parameters for execution

It turns out to be very convenient to compute some numbers as we compile the store and saved them in it, so that we can avoid computing them every time the IME is loaded. In general, those numbers will serve to determine the size of buffers allocated in the IME; this allows us to just pass those buffers around, knowing that they are large enough for the data they will receive, and therefore avoid all the complexity and cost of dual pass APIs (call once to compute the needed size, allocate the buffer, call a second time to actually fill the buffer).

The downside of this approach is that we need to maintain synchronicity between the code in the compiler and code in the IME. It is also possible that a different implementation of the IME would not need the numbers we compute here (no big deal), or need other numbers; we can add such numbers as needed. In any case, the cost seems worth the benefits.

The first such number is the maximal length of a character name, measured in UTF-16 code units.

public int maxNameLength () { int max = 0; for (Iterator it = children.values ().iterator (); it.hasNext (); ) { int m = ((Node) it.next ()).maxNameLength (); if (m > max) { max = m; }} if (key != null) { max += key.length (); } return max; }

The second number deals with completions on partially specified names. The number we are looking for is the size of the buffer that will contain all the candidates. One approach is to compute this length precisely; this would amount to a dual pass API, having just moved the first pass in the compiler. Instead, we are going to compute a close upper bound, which allows us to have a simpler “first pass”, and in particular gives us a better chance of proving that it matches the second pass.

If there are few enough candidates that we elect to show all of them in their entirety, then we know that each one will be no more than the maximal length of a name, plus at most 10 ASCII characters for the " U+xxxxxx" suffix and the terminating U+0000. We simply multiply this by the maximum number of candidates for this case.

If we elect to show the word completions instead, the same upper bound will apply just as well. Either a candidate is a complete name, or it is no bigger than a full name, and the suffix is "...", which is smaller than what we already accounted for.

Finally, if we elect to show the individual letters completions, then the number of candidates corresponding to children nodes is bound by the maximum number of children in any node, and each such candidate will occupy 5 units at most (2 for the individual letter, in case somebody uses supplementary characters in a character name, 2 for the ellipsis, and 1 for termination). To that, we need to add the node itself, which is 10 units at most.

public int maxChildren () { int max = children.size (); for (Iterator it = children.values ().iterator (); it.hasNext (); ) { int m = ((Node) it.next ()).maxChildren (); if (m > max) { max = m; }} return max; } public int maxCandidatesLength () { int max = 15 * (maxNameLength () + 10); int letterCompletions = maxChildren () * 5 + 10; if (letterCompletions > max) { max = letterCompletions; } return max; }

Let's put those in the store:

public int maxNameLength; public int maxCandidatesLength;

2.7. The keyboard store

static public Vector <Keyboard> keyboards = new Vector <Keyboard> (); public static class Keyboard { String [] vk2string = new String [256]; String [] vk2stringShift = new String [256]; byte [] store; protected void setInt32 (int offset, int v) { store [offset] = (byte) ((v ) & 0xFF); store [offset+1] = (byte) ((v >> 8) & 0xFF); store [offset+2] = (byte) ((v >> 16) & 0xFF); store [offset+3] = (byte) ((v >> 24) & 0xFF); } protected void setInt16 (int offset, int v) { store [offset] = (byte) ((v & 0x00FF)); store [offset+1] = (byte) ((v & 0xFF00) >> 8); } protected int putStringAt (String s, int currentStringOffset) { setInt16 (currentStringOffset, s.length ()); currentStringOffset += 2; for (int j = 0; j < s.length (); j++) { setInt16 (currentStringOffset, s.charAt (j)); currentStringOffset += 2; } return currentStringOffset; } public void serialize () { int byteCount = 4 * 512; // offsets to strings for (int i = 0; i < 256; i++) { if (vk2string [i] != null) { byteCount += 2 + 2 * vk2string [i].length (); } if (vk2stringShift [i] != null) { byteCount += 2 + 2 * vk2stringShift [i].length (); }} System.out.println (" " + byteCount + " for keyboard"); store = new byte [byteCount]; int currentStringOffset = 4 * 512; for (int i = 0; i < 256; i++) { if (vk2string [i] == null) { setInt32 (i * 4, 0); } else { setInt32 (i * 4, currentStringOffset); currentStringOffset = putStringAt (vk2string [i], currentStringOffset); } if (vk2stringShift [i] == null) { setInt32 (4 * (256 + i), 0); } else { setInt32 (4 * (256 + i), currentStringOffset); currentStringOffset = putStringAt (vk2stringShift [i], currentStringOffset); }}} } static byte [] keyboardStore; static void setInt32 (int offset, int v) { keyboardStore [offset] = (byte) ((v ) & 0xff); keyboardStore [offset+1] = (byte) ((v >> 8) & 0xff); keyboardStore [offset+2] = (byte) ((v >> 16) & 0xff); keyboardStore [offset+3] = (byte) ((v >> 24) & 0xff); } static void buildKeyboardStore () { System.out.println ("--- serializing keyboard"); int totalByteCount = 4; // nb keyboards for (Keyboard k : keyboards) { totalByteCount += 4; k.serialize (); totalByteCount += k.store.length; } System.out.println ("" + totalByteCount + " bytes"); keyboardStore = new byte [totalByteCount]; int currentKeyboardOffset = (1 + keyboards.size ()) * 4; int currentKeyboardIndex = 0; setInt32 (0, keyboards.size ()); for (Keyboard k : keyboards) { setInt32 (4 * (1 + currentKeyboardIndex), currentKeyboardOffset); currentKeyboardIndex++; System.arraycopy (k.store, 0, keyboardStore, currentKeyboardOffset, k.store.length); currentKeyboardOffset += k.store.length; } } static int keyboardStoreSize () { return keyboardStore.length; } static int getKeyboardByte (int i) { int x = keyboardStore [i]; if (x < 0) { x += 256; } return x; } static class KeyboardDescHandler extends org.xml.sax.helpers.DefaultHandler { Keyboard currentKeyboard = null; public void startElement (String uri, String localName, String qName, org.xml.sax.Attributes attributes) { if ("keyboard".equals (qName)) { currentKeyboard = new Keyboard (); keyboards.add (currentKeyboard); } if ("key".equals (qName)) { int vkey = -1; boolean shift = false; if (attributes.getValue ("ch") != null) { vkey = attributes.getValue ("ch").charAt (0); if (vkey > 0x5f) { vkey -= 0x20; shift = false; } else { shift = true; }} else if (attributes.getValue ("vk") != null) { vkey = Integer.decode (attributes.getValue ("vk")); shift = "yes".equals (attributes.getValue ("shift")); } if (vkey != -1) { if (shift) { currentKeyboard.vk2stringShift [vkey] = attributes.getValue ("out"); } else { currentKeyboard.vk2string [vkey] = attributes.getValue ("out"); }}} } public void endElement (String uri, String localName, String qName) throws SAXException { } public void characters (char[] ch, int start, int length) { } } System.out.println ("---- loading the keyboards"); KeyboardDescHandler keyboardDescHandler = new KeyboardDescHandler (); SAXParserFactory.newInstance ().newSAXParser ().parse (new File (args [1]), keyboardDescHandler);

2.8. The complete store

Populating the store from the root node means populating both the key store and the node store:

public void populateStore (Store store) { populateKeyStore (store); populateNodeStore (store); store.maxNameLength = maxNameLength (); store.maxCandidatesLength = maxCandidatesLength (); }

Let's build our store and report some interesting numbers:

Store store = new Store (); { System.out.println ("---- populating the store"); root.populateStore (store); System.out.println ("" + store.keyStoreSize () + " bytes in key store"); System.out.println ("" + store.nodeStoreSize () + " bytes in node store, (" + (store.nodeStoreSize () / 4) + " units)"); } System.out.println ("" + store.maxNameLength + " code units for names"); System.out.println ("" + store.maxCandidatesLength + " code units for candidates");

As is usual now, we verify that the original data can be recreated from our new data structure. Here is how a store restores itself:

public void restore (Writer f) throws Exception { restore2 (f, 0, ""); } protected void restore2 (Writer f, int offset, String prefix) { boolean done = false; do { switch (getNodeUnitControl (offset) & FUNCTION_MASK) { case KEY_OFFSET: { prefix += stringAt (getNodeUnitValue (offset)); break; } case USV_VALUE: { try { f.write (prefix); f.write (";"); f.write (format (getNodeUnitUSV (offset))); f.write (";"); f.write (Integer.toString (getNodeUnitGc (offset))); f.write ("\n"); } catch (java.io.IOException e) { System.err.println ("error writing to disk"); System.exit (1); } break; } case CHILD_OFFSET: { restore2 (f, getNodeUnitValue (offset), prefix); break; }} if ((getNodeUnitControl (offset) & LAST_UNIT) != 0) { done = true; } offset += UNIT_SIZE; } while (! done); } protected String formatGc (int flags) { if (flags == 0x01) { return "M"; } else if (flags == 0x02) { return "S"; } else { return "-"; } } protected String format (int codePoint) { String s = Integer.toHexString (codePoint).toUpperCase (); if (s.length () < 4) { return "0000".substring (s.length ()) + s; } else { return s; } }

Let's see if it works:

{ String name = args[0] + ".restore.3"; System.out.println ("---- restoring store in " + name); FileWriter x = new FileWriter (name); store.restore (x); x.close (); }

2.9. Serializing the store

We are now ready to represent our complete store in a single byte sequence (rather than the two separate sequences we have used so far.

To prepare for future revision of the format, we use an organization much like an sfnt. First comes a header, with version information and a pointer to a table of content. The table of contents contains offset and size of the key store and node store. Then we have the node store and key store as before.

public void serialize (OutputStream f) throws java.io.IOException { // Header writeInteger (f, 1); // major writeInteger (f, 0); // minor writeInteger (f, 20); // offset to table of content writeInteger (f, maxNameLength); writeInteger (f, maxCandidatesLength); // TOC writeInteger (f, 44); writeInteger (f, nodeStoreSize ()); writeInteger (f, 44 + nodeStoreSize ()); writeInteger (f, keyStoreSize ()); buildKeyboardStore (); writeInteger (f, 44 + nodeStoreSize () + keyStoreSize ()); writeInteger (f, keyboardStoreSize ()); // Node store for (int i = 0; i < nodeStoreSize (); i++) { f.write (nodeStore [i]); } // Key store for (int i = 0; i < keyStoreSize (); i++) { f.write (keyStore [i]); } // Keyboard store for (int i = 0; i < keyboardStoreSize (); i++) { f.write (keyboardStore [i]); } } protected void writeInteger (OutputStream f, int i) throws java.io.IOException { f.write ((i ) & 0xFF); f.write ((i >> 8) & 0xFF); f.write ((i >> 16) & 0xFF); f.write ((i >> 24) & 0xFF); }

Here is an alternate serialization, in a form that is acceptable to the Windows RC compiler (this will allow us to put the store as a resource in the DLL that implements the IME).

public void dumpResource (FileWriter f) throws java.io.IOException { f.write ("0x" + Integer.toHexString (1) + "L,\n"); f.write ("0x" + Integer.toHexString (0) + "L,\n"); f.write ("0x" + Integer.toHexString (20) + "L,\n"); f.write ("0x" + Integer.toHexString (maxNameLength) + "L,\n"); f.write ("0x" + Integer.toHexString (maxCandidatesLength) + "L,\n"); f.write ("0x" + Integer.toHexString (44) + "L,\n"); f.write ("0x" + Integer.toHexString (nodeStoreSize ()) + "L,\n"); f.write ("0x" + Integer.toHexString (44 + nodeStoreSize ()) + "L,\n"); f.write ("0x" + Integer.toHexString (keyStoreSize ()) + "L,\n"); f.write ("0x" + Integer.toHexString (44 + nodeStoreSize () + keyStoreSize ()) + "L,\n"); f.write ("0x" + Integer.toHexString (keyboardStoreSize ()) + "L,\n"); for (int i = 0; i < nodeStoreSize (); i += 2) { int b2 = getNodeByte (i+1) << 8; int b1 = getNodeByte (i); f.write ("0x" + Integer.toHexString (b2 | b1) + ",\n"); } for (int i = 0; i < keyStoreSize (); i += 2) { int b2 = getKeyByte (i+1) << 8; int b1 = getKeyByte (i); f.write ("0x" + Integer.toHexString (b2 | b1) + ",\n"); } for (int i = 0; i < keyboardStoreSize (); i += 2) { int b2 = getKeyboardByte (i+1) << 8; int b1 = getKeyboardByte (i); f.write ("0x" + Integer.toHexString (b2 | b1) + ",\n"); } f.write ("0x0L\n"); }

Let's do it:

{ String name = args[0] + ".db"; System.out.println ("---- creating the database in " + name); FileOutputStream x = new FileOutputStream (name); store.serialize (x); x.close (); } { String name = args[0] + ".rc"; System.out.println ("---- creating the database (RC form) in " + name); FileWriter x = new FileWriter (name); store.dumpResource (x); x.close (); }

2.10. Finding a character's name

Let's start with a method that builds the name of a character, given an offset to its node.

public String characterName (int offset, String suffix) { int next = -1; boolean done = false; do { switch (getNodeUnitControl (offset) & FUNCTION_MASK) { case PARENT_OFFSET: { next = getNodeUnitValue (offset); break; } case KEY_OFFSET: { suffix = stringAt (getNodeUnitValue (offset)) + suffix; break; }} if ((getNodeUnitControl (offset) & LAST_UNIT) != 0) { done = true; } offset += UNIT_SIZE; } while (! done); if (next != -1) { return characterName (next, suffix); } else { return suffix; } }

As we described earlier, search for the name of character is essentially a binary search on the node store:

public String findCharacterName (int usv) { int first = 0; int last = nodeStore.length - 1; while (first <= last) { int mid = ((first / 4 + last / 4) / 2) * 4; while ( mid >= first && (getNodeUnitControl (mid) & FUNCTION_MASK) != USV_VALUE) { mid -= UNIT_SIZE; } if (mid < first) { mid = ((first / 4 + last / 4) / 2) * 4; while ( mid <= last && (getNodeUnitControl (mid) & FUNCTION_MASK) != USV_VALUE) { mid += UNIT_SIZE; } if (mid > last) { return null; }} int thisUsv = getNodeUnitUSV (mid); if (thisUsv < usv) { first = mid + UNIT_SIZE; } else if (thisUsv > usv) { last = mid - UNIT_SIZE; } else { while ((getNodeUnitControl (mid) & FIRST_UNIT) == 0) { mid -= UNIT_SIZE; } return characterName (mid, ""); }} return null; }

Let's see if this really works:

System.out.println ("name of U+20AC is '" + store.findCharacterName (0x20ac) + "', should be 'EURO SIGN'"); System.out.println ("name of U+effff is '" + store.findCharacterName (0xeffff) + "', should be 'null'"); System.out.println ("name of U+13FF is '" + store.findCharacterName (0x13ff) + "', should be 'null'");

3. The IME core engine

3.1. Cross-platform layer

We intend the core of the IME to be cross-platform.

#define UTF16CodeUnit WCHAR #define USV int

3.2. Formatting a USV

The function formatUSV fills the buffer pointed by a with the four to six digit representation of usv and returns the number of characters it produced:

PRIVATE int formatUSV (UTF16CodeUnit *a, int usv) { int n; if (usv <= 0xffff) { n = 4; } else if (usv < 0xfffff) { n = 5; } else { n = 6; } { int i; for (i = n; i > 0; i--) { int nibble = (usv >> ((i-1)*4)) & 0xf; if (nibble < 10) { *(a++) = '0' + nibble; } else { *(a++) = 'A' + (nibble - 10); }}} return n; }

The function formatUSVuplus fills the buffer pointed by a with the U+ representation of usv and returns the number of characters it produced:

PRIVATE int formatUSVuplus (UTF16CodeUnit *a, int usv) { int n = 0; a [n++] = 'U'; a [n++] = '+'; n += formatUSV (a + n, usv); return n; }

Here are the declarations for these functions:

PRIVATE int formatUSV (UTF16CodeUnit *buffer, int usv); PRIVATE int formatUSVuplus (UTF16CodeUnit *buffer, int usv);

3.3. Basic access the store

Let's start with a few functions to access the node store.

getKeyChar returns the UTF-16 code unit at offset.

PRIVATE UTF16CodeUnit getKeyChar (int offset) { return keyStore [offset + 1] << 8 | keyStore [offset]; }

Remember that each node unit is a four byte structure, with the control part of the node unit in byte 3 and the value part in bytes 0 through 2. getNodeUnitControl returns the control part of the node unit at offset, and getNodeUnitValue returns the value part.

PRIVATE int getNodeUnitControl (int offset) { return nodeStore [offset + 3]; } PRIVATE int getNodeUnitValue (int offset) { return (nodeStore [offset]) | (nodeStore [offset+1] << 8) | (nodeStore [offset+2] << 16); } PRIVATE int getNodeUnitUSV (int offset) { return getNodeUnitValue (offset) & 0x1FFFFF; } PRIVATE int getNodeUnitGc (int offset) { return (getNodeUnitValue (offset) >> 21) & 0x7; }

The next batch of functions work at the scale of a node rather than a node unit. Here are some definitions to interpret the control part:

#define FUNCTION_MASK 0x0F #define PARENT_OFFSET 0x00 #define KEY_OFFSET 0x01 #define USV_VALUE 0x02 #define CHILD_OFFSET 0x03 #define FIRST_UNIT 0x10 #define LAST_UNIT 0x20 #define UNIT_SIZE 4

In each function, nodeOffset must point to the first unit of a node. If there is no key offset (which should happen for the root node only), getNodeKeyOffset returns -1. If there is no usv node unit, getNodeUSV returns -1.

PRIVATE int getNodeKeyOffset (int nodeOffset) { int control = getNodeUnitControl (nodeOffset); while ( (control & FUNCTION_MASK) != KEY_OFFSET && (control & LAST_UNIT) == 0) { nodeOffset += UNIT_SIZE; control = getNodeUnitControl (nodeOffset); } if ((control & FUNCTION_MASK) == KEY_OFFSET) { return getNodeUnitValue (nodeOffset); } else { return -1; } } PRIVATE USV getNodeUSV (int nodeOffset) { int control = getNodeUnitControl (nodeOffset); while ( (control & FUNCTION_MASK) != USV_VALUE && (control & LAST_UNIT) == 0) { nodeOffset += UNIT_SIZE; control = getNodeUnitControl (nodeOffset); } if ((control & FUNCTION_MASK) == USV_VALUE) { return getNodeUnitUSV (nodeOffset); } else { return -1; } } PRIVATE int getNodeGc (int nodeOffset) { int control = getNodeUnitControl (nodeOffset); while ( (control & FUNCTION_MASK) != USV_VALUE && (control & LAST_UNIT) == 0) { nodeOffset += UNIT_SIZE; control = getNodeUnitControl (nodeOffset); } if ((control & FUNCTION_MASK) == USV_VALUE) { return getNodeUnitGc (nodeOffset); } else { return -1; } } PRIVATE BOOL nodeHasChildren (int nodeOffset) { int control = getNodeUnitControl (nodeOffset); while ( (control & FUNCTION_MASK) != CHILD_OFFSET && (control & LAST_UNIT) == 0) { nodeOffset += UNIT_SIZE; control = getNodeUnitControl (nodeOffset); } return (control & FUNCTION_MASK) == CHILD_OFFSET; }

Let's have declarations for those functions

PRIVATE UTF16CodeUnit getKeyChar (int keyOffset); PRIVATE int getNodeUnitControl (int nodeOffset); PRIVATE int getNodeUnitValue (int nodeOffset); PRIVATE int getNodeUnitUSV (int nodeOffset); PRIVATE int getNodeUnitGc (int nodeOffset); PRIVATE int getNodeKeyOffset (int nodeOffset); PRIVATE USV getNodeUSV (int nodeOffset); PRIVATE USV getNodeGc (int nodeOffset); PRIVATE BOOL nodeHasChildren (int nodeOffset);

3.4. Finding the name of a character

This function assembles the name of a character given a pointer to its node. buffer is filled with the character name, terminated by a 0 code unit. The return value points to that last code unit. We rely on the fact that the node unit for the parent (if any) appears before the node unit for the key.

PRIVATE UTF16CodeUnit *characterName (int offset, UTF16CodeUnit* buffer) { int control; int o; o = offset; do { control = getNodeUnitControl (o); if ((control & FUNCTION_MASK) == PARENT_OFFSET) { buffer = characterName (getNodeUnitValue (o), buffer); } o += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); o = offset; do { control = getNodeUnitControl (o); if ((control & FUNCTION_MASK) == KEY_OFFSET) { int keyOffset = getNodeUnitValue (o); while (getKeyChar (keyOffset) != 0) { *(buffer++) = getKeyChar (keyOffset); keyOffset += 2; } buffer [0] = 0; } o += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return buffer; }

This function is given a scalar value, finds the node for that scalar value, and assembles the name. It return true iff there is a name for the scalar value.

PRIVATE BOOL findCharacterName (int usv, UTF16CodeUnit* buffer, int* gc) { int first = 0; int last = nodeStoreLength - 1; int mid; int thisUsv; while (first <= last) { mid = ((first / 4 + last / 4) / 2) * 4; while ( mid >= first && (getNodeUnitControl (mid) & FUNCTION_MASK) != USV_VALUE) { mid -= UNIT_SIZE; } if (mid < first) { mid = ((first / 4 + last / 4) / 2) * 4; while ( mid <= last && (getNodeUnitControl (mid) & FUNCTION_MASK) != USV_VALUE) { mid += UNIT_SIZE; } if (mid > last) { return FALSE; }} thisUsv = getNodeUnitUSV (mid); debugMessage (DBG_CORE, L" usv = 0x%x", thisUsv); if (thisUsv < usv) { first = mid + UNIT_SIZE; } else if (thisUsv > usv) { last = mid - UNIT_SIZE; } else { while ((getNodeUnitControl (mid) & FIRST_UNIT) == 0) { mid -= UNIT_SIZE; } characterName (mid, buffer); *gc = getNodeGc (mid); return TRUE; }} return FALSE; }

Here are the declarations for those functions:

PRIVATE UTF16CodeUnit* characterName (int offset, UTF16CodeUnit* buffer); PRIVATE BOOL findCharacterName (int usv, UTF16CodeUnit* buffer, int* gc);

3.5. Completing a character name

Let's start with a procedure that walks the node store given a string. There are two basic outcomes. First, the string is not the prefix of any character name; in that case, the return value points to the first character of string that makes this happen, and the values of nodeOffset and keyOffset are undefined. The other outcome is that the string is the prefix of one or more characters; the return value is 0, nodeOffset and keyOffset are the node it leads to, and the rest of the key to match.

PRIVATE UTF16CodeUnit* followString (UTF16CodeUnit* string, int len, int *nodeOffset, int *keyOffset) { *nodeOffset = 0; // the root node *keyOffset = 0; return followString1 (string, len, nodeOffset, keyOffset); } PRIVATE UTF16CodeUnit* followString1 (UTF16CodeUnit* string, int len, int *nodeOffset, int *keyOffset) { if (len == 0) { return 0; } else if (getKeyChar (*keyOffset) == 0) { int control; int nodeOffsetChild = *nodeOffset; // look for a correct child do { control = getNodeUnitControl (nodeOffsetChild); if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffsetChild); int childKeyOffset = getNodeKeyOffset (childNodeOffset); if (getKeyChar (childKeyOffset) == *string) { *nodeOffset = childNodeOffset; *keyOffset = childKeyOffset + 2; return followString1 (string+1, len-1, nodeOffset, keyOffset); }} nodeOffsetChild += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); // no children match return string; } else if (getKeyChar (*keyOffset) != *string) { return string; } else { (*keyOffset) += 2; return followString1 (string+1, len-1, nodeOffset, keyOffset); } } PRIVATE UTF16CodeUnit* followString (UTF16CodeUnit* string, int len, int *nodeOffset, int *keyOffset); PRIVATE UTF16CodeUnit* followString1 (UTF16CodeUnit* string, int len, int *nodeOffset, int *keyOffset);

3.6. Building the candidate list

After the user enters the prefix of one or more names, he may ask for all the possible completions of that prefix. We want to present those completions in an somewhat intelligent way: there is no sense presenting the full list if it is too long. Consequently, we have three different levels of completion. For each level, we have a function that counts the number of completions, and one function that actually builds them.

The first level is to enumerate all the completions in their entirety:

PRIVATE int countAllCompletions (int nodeOffset) { int count = 0; int control; do { control = getNodeUnitControl (nodeOffset); if ((control & FUNCTION_MASK) == USV_VALUE) { count++; } if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffset); count += countAllCompletions (childNodeOffset); } nodeOffset += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return count; } PRIVATE UTF16CodeUnit* buildAllCompletions (int nodeOffset, UTF16CodeUnit* names) { //TODO UTF16CodeUnit *prefix = (UTF16CodeUnit *) malloc (maxNameLength * 2); UTF16CodeUnit prefix [100]; return buildAllCompletions1 (nodeOffset, names, prefix, 0); } PRIVATE UTF16CodeUnit* buildAllCompletions1 (int nodeOffset, UTF16CodeUnit* names, UTF16CodeUnit* prefix, int prefixLength) { int control; do { control = getNodeUnitControl (nodeOffset); if ((control & FUNCTION_MASK) == USV_VALUE) { { int i; for (i = 0; i < prefixLength; i++) { *(names++) = prefix [i]; }} *(names++) = ' '; names += formatUSVuplus (names, getNodeUnitUSV (nodeOffset)); *(names++) = 0; } if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffset); int o = prefixLength; { int childKeyOffset = getNodeKeyOffset (childNodeOffset); while (getKeyChar (childKeyOffset) != 0) { prefix [o++] = getKeyChar (childKeyOffset); childKeyOffset += 2; }} names = buildAllCompletions1 (childNodeOffset, names, prefix, o); } nodeOffset += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return names; }

If the complete list of all completions is too big, we retreat to completions to a word boundary:

PRIVATE int countWordCompletions (int nodeOffset) { int count = 0; int control; do { control = getNodeUnitControl (nodeOffset); if ((control & FUNCTION_MASK) == USV_VALUE) { count++; } if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffset); int childKeyOffset = getNodeKeyOffset (childNodeOffset); if (wcschr ((WCHAR *)(keyStore + childKeyOffset), ' ') == 0) { count += countWordCompletions (childNodeOffset); } else { count ++; }} nodeOffset += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return count; } PRIVATE UTF16CodeUnit* buildWordCompletions (int nodeOffset, UTF16CodeUnit* names) { //TODO UTF16CodeUnit *prefix = (UTF16CodeUnit *) malloc (maxNameLength * 2); UTF16CodeUnit prefix [100]; return buildWordCompletions1 (nodeOffset, names, prefix, 0); } PRIVATE UTF16CodeUnit* buildWordCompletions1 (int nodeOffset, UTF16CodeUnit* names, UTF16CodeUnit* prefix, int prefixLength ) { int count = 0; int control; int i; do { control = getNodeUnitControl (nodeOffset); if ((control & FUNCTION_MASK) == USV_VALUE) { for (i = 0; i < prefixLength; i++) { *(names++) = prefix [i]; } if (prefixLength > 0) { *(names++) = ' '; } names += formatUSVuplus (names, getNodeUnitUSV (nodeOffset)); *(names++) = 0; } if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffset); int o = prefixLength; int lastSpace = -1; { int childKeyOffset = getNodeKeyOffset (childNodeOffset); while (getKeyChar (childKeyOffset) != 0) { if (getKeyChar (childKeyOffset) == ' ') { lastSpace = o; } prefix [o++] = getKeyChar (childKeyOffset); childKeyOffset += 2; }} if (! nodeHasChildren (childNodeOffset)) { names = buildWordCompletions1 (childNodeOffset, names, prefix, o); } else { if (lastSpace == -1) { names = buildWordCompletions1 (childNodeOffset, names, prefix, o); } else { int i; for (i = 0; i <= lastSpace; i++) { *(names++) = prefix [i]; } *(names++) = '.'; *(names++) = '.'; *(names++) = '.'; *(names++) = 0; }}} nodeOffset += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return names; }

Our final level is to retreat to completions to the next character:

PRIVATE int countLetterCompletions (int nodeOffset) { int count = 0; int control; do { control = getNodeUnitControl (nodeOffset); if ((control & FUNCTION_MASK) == USV_VALUE) { count++; } if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffset); count ++; } nodeOffset += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return count; } PRIVATE UTF16CodeUnit* buildLetterCompletions (int nodeOffset, UTF16CodeUnit* names) { int control; do { control = getNodeUnitControl (nodeOffset); if ((control & FUNCTION_MASK) == USV_VALUE) { names += formatUSVuplus (names, getNodeUnitUSV (nodeOffset)); *(names++) = 0; } if ((control & FUNCTION_MASK) == CHILD_OFFSET) { int childNodeOffset = getNodeUnitValue (nodeOffset); int childKeyOffset = getNodeKeyOffset (childNodeOffset); *(names++) = getKeyChar (childKeyOffset); *(names++) = '.'; *(names++) = '.'; *(names++) = '.'; *(names++) = 0; } nodeOffset += UNIT_SIZE; } while ((control & LAST_UNIT) == 0); return names; }

With that in place, here is the driver that figures out which level we should use, and invokes it.

PRIVATE void buildCandidates (int nodeOffset, UTF16CodeUnit *names, int *count, int* length) { int c; WCHAR *endOfCandidates; names [0] = 0; c = countAllCompletions (nodeOffset); debugMessage (DBG_CORE, L" %d completions from there", c); if (c < 15) { endOfCandidates = buildAllCompletions (nodeOffset, names); } else { c = countWordCompletions (nodeOffset); debugMessage (DBG_CORE, L" %d word completions from there", c); if (c < 15) { endOfCandidates = buildWordCompletions (nodeOffset, names); } else { c = countLetterCompletions (nodeOffset); debugMessage (DBG_CORE, L" %d immediate completions from there", c); endOfCandidates = buildLetterCompletions (nodeOffset, names); }} *count = c; *length = endOfCandidates - names; { WCHAR *n = names; int i; for (i = 0; i < c; i++) { debugMessage (DBG_CORE, L" ..%s", n); while (*n != 0) { n++; } n++; }} }

Here are the declarations for all those functions:

PRIVATE void buildCandidates (int nodeOffset, UTF16CodeUnit *names, int *count, int *length); PRIVATE int countCompletions (int nodeOffset); PRIVATE int countWordCompletions (int nodeOffset); PRIVATE int countLetterCompletions (int nodeOffset); PRIVATE UTF16CodeUnit *buildAllCompletions (int nodeOffset, UTF16CodeUnit *names); PRIVATE UTF16CodeUnit *buildAllCompletions1 (int nodeOffset, UTF16CodeUnit *names, UTF16CodeUnit *prefix, int prefixLength); PRIVATE UTF16CodeUnit *buildWordCompletions (int nodeOffset, UTF16CodeUnit *names); PRIVATE UTF16CodeUnit *buildWordCompletions1 (int nodeOffset, UTF16CodeUnit *names, UTF16CodeUnit *prefix, int prefixLength); PRIVATE UTF16CodeUnit *buildLetterCompletions (int nodeOffset, UTF16CodeUnit *names);

3.7. Retrieving the scalar value corresponding to the input

These two functions intepret the composition string as a fully form character specification, either in the form of a number or in the form of a character name, and return the corresponding scaler value. If the composition string cannot be interpreted properly, the return value is -1.

PRIVATE int getUSVFromCompositionAsUSV (UTF16CodeUnit* name, int len) { int usv = 0; int i; for (i = 0; i < len; i++) { UTF16CodeUnit ch = name [i]; if ('0' <= ch && ch <= '9') { usv = usv * 16 + (ch - '0'); } else if ('A' <= ch && ch <= 'F') { usv = usv * 16 + (ch - 'A' + 10); } else { return -1; }} if (usv > 0x10FFFF) { return -1; } else { return usv; } } PRIVATE int getUSVFromCompositionAsName (UTF16CodeUnit *name, int len) { int nodeOffset; int keyOffset; LPCTSTR x; x = followString (name, len, &nodeOffset, &keyOffset); if (x != 0) { return -1; } else if (getKeyChar (keyOffset) != 0) { return -1; } else { return getNodeUSV (nodeOffset); } } PRIVATE int getUSVFromCompositionAsUSV (UTF16CodeUnit *name, int len); PRIVATE int getUSVFromCompositionAsName (UTF16CodeUnit *name, int len);

3.8. Representing the IME state

Of course, we need some structure to represent our IME state: which keystrokes have been typed, what feedback to give to the user, etc.

typedef struct { int prefixLength; UTF16CodeUnit prefixString [3]; int compositionLength; UTF16CodeUnit compositionString [1000]; int completionLength; UTF16CodeUnit completionString [1000]; BOOL canStopHere; int resultLength; UTF16CodeUnit resultString [1000]; int candidatesLength; UTF16CodeUnit candidates [6000]; int candidatesCount; int curMap; int curMod; } ImeState;

prefixLength/String hold the prefix characters typed by the user.

compositionLength/String hold what the user has typed after the prefix.

completionLength/String are filled based on the what the user has typed and represent the interpretation the composition string.

If candidatesLength is 0, then candidates and candidatesLength are undefined.

If prefixLength = 0, then all the other fields are undefined.

3.9. IME Overall Logic

In this section, we implement the state machine that underlies the IME.

The first function is a quick filter that determines if a keystroke warrants processing or should be simply passed through to the application:

PRIVATE BOOL wantKeyStroke (ImeState* state, int ch) { if (state->prefixLength == 0) { return ( ch == 'U' || ch == 'N' || ch == 'J'); } else { return TRUE; } }

handleKeyStroke is given all the keystrokes that have not been filtered away by wantKeyStroke. It modifies the state the of IME based on the input, as well as prepares what is given as feedback to the user. The return value indicates what should be done next and takes one of the following values:

#define REJECT_KEYSTROKE 1 #define START_COMPOSITION 2 #define CONTINUE_COMPOSITION 3 #define INSERT_RESULT_AND_FINISH_COMPOSITION 4 #define INSERT_RESULT_AND_RESTART_COMPOSITION 5 #define ABORT_COMPOSITION 6 #define SHOW_CANDIDATES 7 #define SHOW_SOFT_KEYBOARD 8 #define DO_NOTHING 9 #define MODIFIER_SHIFT 0x01 #define MODIFIER_CONTROL 0x02

This function handles directly the initial states (0 or 1 prefix character already entered), as well as the the escape key in any state. The remaining cases are dispatched to specialized functions, based on the prefix.

PRIVATE int handleKeyStroke (ImeState* state, int key, int ch, int modifiers) { debugMessage (DBG_CORE, L"handleKeyStroke modifiers=0x%x, key=0x%x, ch=0x%x (%c)", modifiers, key, ch, ch); if (state->prefixLength == 0) { state->prefixString [state->prefixLength++] = /*TODO*/(WCHAR) ch; state->compositionLength = 0; state->completionLength = 0; state->candidatesCount = 0; state->canStopHere = FALSE; return START_COMPOSITION; } if (ch == 0x1B) { // ESC state->prefixLength = 0; return ABORT_COMPOSITION; } if (state->prefixLength == 1) { if (ch == '+') { state->prefixString [state->prefixLength++] = /*TODO*/(WCHAR) ch; if (state->prefixString [0] == 'J') { return SHOW_SOFT_KEYBOARD; } else { return CONTINUE_COMPOSITION; }} else { state->resultString [0] = state->prefixString [0]; state->resultString [1] = /*TODO*/(WCHAR) ch; state->resultLength = 2; state->prefixLength = 0; return INSERT_RESULT_AND_FINISH_COMPOSITION; }} else if (state->prefixLength > 1 && state->compositionLength == 0 && ch == '+') { if (state->prefixLength == 2) { state->prefixString [state->prefixLength++] = /*TODO*/(WCHAR) ch; } else { state->prefixLength--; } return CONTINUE_COMPOSITION; } if (state->prefixString [0] == 'N') { return handleNameKeyStroke (state, ch, modifiers); } else if (state->prefixString [0] == 'U') { return handleCodePointKeyStroke (state, ch, modifiers); } else if (state->prefixString [0] == 'J') { return handleMapKeyStroke (state, key, modifiers); } // cannot get there - silence the compiler return DO_NOTHING; }

This function handles the composition via a code point specification (U+ prefix).

PRIVATE int handleCodePointKeyStroke (ImeState* state, int ch, int modifiers) { state->completionLength = 0; state->canStopHere = FALSE; state->resultLength = 0; if (ch == 0x0D) { // RETURN if (state->prefixLength == 2 && state->compositionLength == 0) { state->resultString [state->resultLength++] = L'U'; state->resultString [state->resultLength++] = L'+'; state->prefixLength = 0; return INSERT_RESULT_AND_FINISH_COMPOSITION; } if (state->prefixLength == 3 && state->compositionLength == 0) { return REJECT_KEYSTROKE; } { int usv = getUSVFromCompositionAsUSV (state->compositionString, state->compositionLength); createResult (state, usv, modifiers); } if (state->prefixLength == 2) { state->prefixLength = 0; state->compositionLength = 0; return INSERT_RESULT_AND_FINISH_COMPOSITION; } else { state->compositionLength = 0; return INSERT_RESULT_AND_RESTART_COMPOSITION; }} if (ch == 0x08) { // BS if (state->compositionLength > 0) { state->compositionLength--; } else { return REJECT_KEYSTROKE; }} else { wint_t x = /*TODO*/(wint_t) ch; state->compositionString [state->compositionLength++] = towupper (x); if (getUSVFromCompositionAsUSV (state->compositionString, state->compositionLength) == -1) { state->compositionLength--; if (state->prefixLength == 3 && state->compositionLength == 0) { state->resultString [state->resultLength++] = /*TODO*/(WCHAR) ch; return INSERT_RESULT_AND_RESTART_COMPOSITION; } else { return REJECT_KEYSTROKE; }}} state->completionString [state->completionLength++] = L' '; { int usv = getUSVFromCompositionAsUSV (state->compositionString, state->compositionLength); int gc; BOOL found = findCharacterName (usv, state->completionString + 1, &gc); if (found) { while (state->completionString [state->completionLength] != 0) { state->completionLength++;}}} state->canStopHere = (state->compositionLength > 0); return CONTINUE_COMPOSITION; } PRIVATE int handleNameKeyStroke1 (ImeState* state, int ch, int modifiers) { if (ch == 0x0D) { // RETURN if (state->prefixLength == 2 && state->compositionLength == 0) { state->resultString [state->resultLength++] = L'N'; state->resultString [state->resultLength++] = L'+'; state->prefixLength = 0; return INSERT_RESULT_AND_FINISH_COMPOSITION; } { int usv = getUSVFromCompositionAsName (state->compositionString, state->compositionLength); if (usv == -1) { return REJECT_KEYSTROKE; } createResult (state, usv, modifiers); } state->compositionLength = 0; state->completionLength = 0; if (state->prefixLength == 2) { state->prefixLength = 0; return INSERT_RESULT_AND_FINISH_COMPOSITION; } else { return INSERT_RESULT_AND_RESTART_COMPOSITION; }} else if (ch == 0x08) { // BS if (state->compositionLength > 0) { state->compositionLength--; return CONTINUE_COMPOSITION; } else { return REJECT_KEYSTROKE; }} else if (ch == 0x20 && state->candidatesCount == 0) { // SPACE int nodeOffset, keyOffset; UTF16CodeUnit *upto = followString (state->compositionString, state->compositionLength, &nodeOffset, &keyOffset); if (getKeyChar (keyOffset) != 0) { while (getKeyChar (keyOffset) != 0) { state->compositionString [state->compositionLength++] = /*TODO*/(WCHAR) getKeyChar (keyOffset); keyOffset += 2; } return CONTINUE_COMPOSITION; } else { buildCandidates (nodeOffset, state->candidates, &state->candidatesCount, &state->candidatesLength); return SHOW_CANDIDATES; }} else { int nodeOffset, keyOffset; UTF16CodeUnit *upto; state->compositionString [state->compositionLength++] = (UTF16CodeUnit) towupper ((wchar_t) ch); upto = followString (state->compositionString, state->compositionLength, &nodeOffset, &keyOffset); if (upto != 0) { state->compositionLength--; return REJECT_KEYSTROKE; } else { if (state->candidatesCount != 0) { while (getKeyChar (keyOffset) != 0) { state->compositionString [state->compositionLength++] = /*TODO*/(WCHAR) getKeyChar (keyOffset); keyOffset += 2; }} return CONTINUE_COMPOSITION; }} } PRIVATE int handleNameKeyStroke (ImeState* state, int ch, int modifiers) { int res = handleNameKeyStroke1 (state, ch, modifiers); if (res != SHOW_CANDIDATES) { state->candidatesCount = 0; } if (res == CONTINUE_COMPOSITION) { int nodeOffset, keyOffset; UTF16CodeUnit *upto; upto = followString (state->compositionString, state->compositionLength, &nodeOffset, &keyOffset); if (getKeyChar (keyOffset) == 0 && getNodeUSV (nodeOffset) != -1) { state->canStopHere = TRUE; } else { state->canStopHere = FALSE; } state->completionLength = 0; while (getKeyChar (keyOffset) != 0) { state->completionString [state->completionLength++] = /*TODO*/(WCHAR) getKeyChar (keyOffset); keyOffset += 2; } if (! nodeHasChildren (nodeOffset)) { int usv = getNodeUSV (nodeOffset); state->completionString [state->completionLength++] = L' '; state->completionLength += formatUSVuplus (state->completionString + state->completionLength, getNodeUSV (nodeOffset)); }} return res; } PRIVATE int handleMapKeyStroke (ImeState* state, int scanCode, int modifiers) { unsigned short *keyboardString; int vk = MapVirtualKey (scanCode, 1); debugMessage (DBG_CORE, L" handleMapKey vk=0x%x", vk); state->compositionLength = 0; if (scanCode == 0x148) { // UP - vk = 0 (why?) state->curMap = (state->curMap + 1) % getNumKeyboards (); return SHOW_SOFT_KEYBOARD; } else if (scanCode == 0x150) { // DOWN - vk = 0 (why?) state->curMap = (state->curMap + getNumKeyboards () -1) % getNumKeyboards (); return SHOW_SOFT_KEYBOARD; } else if (vk == 0x10) { // SHIFT if (modifiers != state->curMod) { state->curMod = modifiers; return SHOW_SOFT_KEYBOARD; } else { return DO_NOTHING; }} keyboardString = getKeyboardEntry (state->curMap, state->curMod, vk); if (keyboardString == 0) { return REJECT_KEYSTROKE; } else { state->resultLength = 0; addStringToResult (state, keyboardString ); if (state->prefixLength == 2) { state->prefixLength= 0; return INSERT_RESULT_AND_FINISH_COMPOSITION; } else { return INSERT_RESULT_AND_RESTART_COMPOSITION; }} } PRIVATE BOOL wantKeyStroke (ImeState* state, int ch); PRIVATE int handleKeyStroke (ImeState* state, int key, int ch, int modifiers); PRIVATE int handleCodePointKeyStroke (ImeState* state, int ch, int modifiers); PRIVATE int handleNameKeyStroke (ImeState* state, int ch, int modifiers); PRIVATE int handleMapKeyStroke (ImeState* state, int ch, int modifiers);

3.10. The keymaps

PRIVATE int getNumKeyboards () { return *((int *) keyboardsStore); } PRIVATE unsigned short *getKeyboardEntry (int keyboard, int modifier, int vkey) { unsigned char *keyboardOffset = keyboardsStore + ((unsigned int *) keyboardsStore) [1 + keyboard]; int stringOffset = ((unsigned int *)(keyboardOffset)) [vkey + 256 * modifier]; if (stringOffset == 0) { return 0; } else { return ((unsigned short *) (keyboardOffset + stringOffset)); } } /* PRIVATE int nbKeyboards = 6; PRIVATE int kbd [6][2][128] = { //----------- common chars {{0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , // 00 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , // 08 0x00eb, 0x00ea, 0x00e8, 0x00e9, 0x00fc, 0x00fb, 0x00f9, 0x00ef, // 10 0x00ee, 0x00f4, 0x0 , 0x0 , 0x00 , 0x00 , 0x00e0, 0x00e2, // 18 0x2018, 0x201c, 0x0 , 0x2013, 0x201d, 0x2019, 0x0 , 0x0 , // 20 0x0 , 0x0 , 0x0 , 0x0 , 0x20ac, 0x0 , 0x00e7, 0x25cc, // 28 0x2023, 0x2022, 0x0 , 0x00e6, 0x0153, 0x0 , 0x0 , 0x0 }, // 30 {0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , // 00 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , // 08 0x00cb, 0x00ca, 0x00c8, 0x00c9, 0x00dc, 0x00db, 0x00d9, 0x00cf, // 10 0x00ce, 0x00d4, 0x0 , 0x0 , 0x00 , 0x00 , 0x00c0, 0x00c2, // 18 0x0 , 0x00ab, 0x0 , 0x2014, 0x00bb, 0x0 , 0x0 , 0x0 , // 20 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , 0x00c7, 0x0 , // 20 0x0 , 0x0 , 0x0 , 0x00c6, 0x0152, 0x0 , 0x0 , 0x0 }, // 30 }, //-------------- Circled digits // : circled digits {{0x0000, 0x0000, 0x2460, 0x2461, 0x2462, 0x2463, 0x2464, 0x2465, // 00 0x2466, 0x2467, 0x2468, 0x24ea, 0x0000, 0x0000, 0x0000, 0x0000, // 08 0x246a, 0x246b, 0x246c, 0x246d, 0x246e, 0x246f, 0x2470, 0x2471, // 10 0x2472, 0x2469, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, // 18 0x0020, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x2473, // 20 0x0000, 0x0000, 0x0000, 0x0000, 0x2780, 0x2781, 0x2782, 0x2783, // 28 0x2784, 0x2785, 0x2786, 0x2787, 0x2788, 0x2789, 0x0000, 0x0000}, // 30 // SHIFT: negative circled digits {0x0000, 0x0000, 0x2776, 0x2777, 0x2778, 0x2779, 0x277a, 0x277b, // 00 0x277c, 0x277d, 0x277e, 0x24ff, 0x0000, 0x0000, 0x0000, 0x0000, // 08 0x24eb, 0x24ec, 0x24ed, 0x24ef, 0x24f0, 0x24f1, 0x24f2, 0x24f3, // 10 0x27ff, 0x277f, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, // 18 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x0000, 0x24f4, // 20 0x0000, 0x0000, 0x0000, 0x0000, 0x278a, 0x278b, 0x278c, 0x278d, // 28 0x278e, 0x278f, 0x2790, 0x2791, 0x2792, 0x2793, 0x0000, 0x0000}, // 30 }, //------------------------------------------------- Dingbats non-circled digits {{0x0000, 0x0000, 0x2701, 0x2702, 0x2703, 0x2704, 0x260e, 0x2706, // 00 0x2707, 0x2708, 0x2709, 0x261b, 0x261e, 0x270c, 0x0000, 0x0000, // 08 0x270d, 0x270e, 0x270f, 0x2710, 0x2711, 0x2712, 0x0 , 0x0 , // 10 0x0 , 0x0 , 0x0 , 0x0 , 0x0000, 0x0000, 0x2713, 0x2714, // 18 0x2715, 0x2716, 0x2717, 0x2718, 0x0 , 0x0 , 0x0 , 0x0 , // 20 0x0 , 0x0 , 0x0000, 0x0000, 0x2719, 0x271a, 0x271b, 0x271c, // 20 0x271d, 0x271e, 0x271f, 0x2720, 0x0 , 0x0 , 0x0000, 0x0000}, // 30 // SHIFT = arrows {0x0000, 0x0000, 0x2794, 0x2192, 0x2194, 0x2195, 0x2798, 0x2799, // 00 0x279a, 0x279b, 0x279c, 0x279d, 0x279e, 0x279f, 0x0000, 0x0000, // 08 0x27a0, 0x27a1, 0x27a2, 0x27a3, 0x27a4, 0x27a5, 0x27a6, 0x27a7, // 10 0x27a8, 0x27a9, 0x27aa, 0x27ab, 0x0000, 0x0000, 0x27ac, 0x27ad, // 18 0x27ae, 0x27af, 0x27b1, 0x27b2, 0x27b3, 0x27b4, 0x27b5, 0x27b6, // 20 0x27b7, 0x27b8, 0x0000, 0x0000, 0x27b9, 0x27ba, 0x27bb, 0x27bc, // 20 0x27bd, 0x27be, 0x0 , 0x0 , 0x0 , 0x0 , 0x0000, 0x0000}, // 30 }, {{0x0000, 0x0000, 0x2721, 0x2722, 0x2723, 0x2724, 0x2725, 0x2726, // 00 0x2727, 0x2605, 0x2729, 0x272a, 0x272b, 0x272c, 0x0000, 0x0000, // 08 0x272d, 0x272e, 0x272f, 0x2730, 0x2731, 0x2732, 0x2733, 0x2735, // 10 0x2736, 0x2737, 0x2738, 0x2739, 0x0000, 0x0000, 0x273a, 0x273b, // 18 0x273c, 0x273d, 0x273e, 0x273f, 0x2740, 0x2741, 0x2742, 0x2743, // 20 0x2744, 0x2745, 0x0000, 0x0000, 0x2746, 0x2747, 0x2748, 0x2749, // 20 0x274a, 0x274b, 0x0 , 0x0 , 0x0 , 0x0 , 0x0000, 0x0000}, // 30 {0x0000, 0x0000, 0x25cf, 0x274d, 0x25a0, 0x274f, 0x2750, 0x2751, // 00 0x2752, 0x25b2, 0x25bc, 0x25c6, 0x2756, 0x25d7, 0x0000, 0x0000, // 08 0x2758, 0x2759, 0x275a, 0x0 , 0x0 , 0x0 , 0x0 , 0x0 , // 10 0x0 , 0x0 , 0x0 , 0x0 , 0x0000, 0x0000, 0x275b, 0x275c, // 18 0x275d, 0x275e, 0x0 , 0x0 , 0x0 , 0x0 , 0x2768, 0x2769, // 20 0x276a, 0x276b, 0x0000, 0x0000, 0x276c, 0x276d, 0x276e, 0x276f, // 20 0x2770, 0x2771, 0x2772, 0x2773, 0x2774, 0x2775, 0x0000, 0x0000}, // 30 }, //-------------------------------------------------------------- Devanagari {{0x0000, 0x0000, 0x093E, 0x093F, 0x0940, 0x0941, 0x0942, 0x0943, // 00 0x0947, 0x0948, 0x094B, 0x094C, 0x002D, 0x0 , 0x0000, 0x0000, // 08 0x0919, 0x091F, 0x090F, 0x0930, 0x0924, 0x092F, 0x0909, 0x0907, // 10 0x0913, 0x092A, 0x200D, 0x200c, 0x0000, 0x0000, 0x0905, 0x0938, // 18 0x0926, 0x0921, 0x0917, 0x0939, 0x091C, 0x0915, 0x0932, 0x0902, // 20 0x0 , 0x0 , 0x0000, 0x094D, 0x0937, 0x0 , 0x091A, 0x0935, // 28 0x092C, 0x0928, 0x092E, 0x002C, 0x002E, 0x0964, 0x0000, 0x0000, // 30 0x0 , 0x0020}, // 38 {0x0000, 0x0000, 0x0967, 0x0968, 0x0969, 0x096A, 0x096B, 0x096C, // 00 0x096D, 0x096e, 0x096F, 0x0966, 0x0 , 0x0 , 0x0000, 0x0000, // 08 0x091E, 0x0920, 0x0910, 0x0931, 0x0925, 0x0 , 0x090A, 0x0908, // 10 0x0914, 0x092B, 0x0 , 0x093D, 0x0000, 0x0000, 0x0906, 0x0936, // 18 0x0927, 0x0922, 0x0918, 0x0 , 0x091D, 0x0916, 0x0933, 0x0903, // 20 0x0 , 0x0 , 0x0000, 0x093C, 0x0 , 0x0 , 0x091B, 0x0 , // 20 0x092D, 0x0923, 0x0 , 0x0 , 0x0 , 0x0965, 0x0000, 0x0000}, // 30 }, //---------------------- bengali {{0x0000, 0x0000, 0x 1, 0x 2, 0x 3, 0x 4, 0x 5, 0x 6, // 00 0x 7, 0x 8, 0x 9, 0x 0, 0x -, 0x =, 0x0000, 0x0000, // 08 0x q, 0x w, 0x e, 0x r, 0x t, 0x y, 0x u, 0x i, // 10 0x o, 0x p, 0x [, 0x ], 0x0000, 0x0000, 0x a, 0x s, // 18 0x d, 0x f, 0x0997, 0x h, 0x099c, 0x0995, 0x l, 0x ;, // 20 0x ', 0x ~, 0x0000, 0x \, 0x z, 0x x, 0x099a, 0x v, // 20 0x b, 0x n, 0x m, 0x lt, 0x gt, 0x ?, 0x0000, 0x0000}, // 30 {0x0000, 0x0000, 0x 1, 0x 2, 0x 3, 0x 4, 0x 5, 0x 6, // 00 0x 7, 0x 8, 0x 9, 0x 0, 0x -, 0x =, 0x0000, 0x0000, // 08 0x q, 0x w, 0x e, 0x r, 0x t, 0x y, 0x u, 0x i, // 10 0x o, 0x p, 0x [, 0x ], 0x0000, 0x0000, 0x a, 0x s, // 18 0x d, 0x f, 0x0998, 0x h, 0x099d, 0x0996, 0x l, 0x ;, // 20 0x ', 0x ~, 0x0000, 0x \, 0x z, 0x x, 0x099b, 0x v, // 20 0x b, 0x n, 0x m, 0x lt, 0x gt, 0x ?, 0x0000, 0x0000}, // 30 }, //------------- hebrew {{0x0000, 0x0000, 0x0031, 0x0032, 0x0033, 0x0034, 0x0035, 0x0036, // 00 0x0037, 0x0038, 0x0039, 0x0030, 0x002d, 0x003d, 0x0000, 0x0000, // 08 0x05e7, 0x0 , 0x0 , 0x05e8, 0x05ea, 0x05d9, 0x0 , 0x0 , // 10 0x0 , 0x05e4, 0x005b, 0x005d, 0x0000, 0x0000, 0x05d0, 0x05e9, // 18 0x05d3, 0x0 , 0x05d2, 0x05d4, 0x0 , 0x05db, 0x05dc, 0x003b, // 20 0x0027, 0x007e, 0x0000, 0x005c, 0x05d6, 0x05e2, 0x05e6, 0x05d5, // 20 0x05d1, 0x05e0, 0x05de, 0x002c, 0x002e, 0x003f, 0x0000, 0x0000, // 30 0x0, 0x0020}, {0x0000, 0x0000, 0x0021, 0x0040, 0x0023, 0x0024, 0x0025, 0x005e, // 00 0x0026, 0x002a, 0x0028, 0x0029, 0x005f, 0x002b, 0x0000, 0x0000, // 08 0x0 , 0x0 , 0x0 , 0x0 , 0x05d8, 0x0 , 0x0 , 0x0 , // 10 0x0 , 0x05e3, 0x0 , 0x0 , 0x0000, 0x0000, 0x0 , 0x05e1, // 18 0x0 , 0x0 , 0x0 , 0x05d7, 0x0 , 0x05da, 0x0 , 0x003a, // 20 0x0 , 0x0 , 0x0000, 0x0 , 0x0 , 0x0 , 0x05e5, 0x0 , // 20 0x0 , 0x05df, 0x05dd, 0x003c, 0x003e, 0x0 , 0x0000, 0x0000}, // 30 }, //--------------------- Myanmar //{{0x0000, 0x0000, 0x 1, 0x 2, 0x 3, 0x 4, 0x 5, 0x 6, // 00 // 0x 7, 0x 8, 0x 9, 0x 0, 0x -, 0x =, 0x0000, 0x0000, // 08 // 0x q, 0x101d, 0x e, 0x102b, 0x1010, 0x101a, 0x u, 0x i, // 10 // 0x o, 0x1015, 0x [, 0x ], 0x0000, 0x0000, 0x a, 0x101e, // 18 // 0x1012, 0x f, 0x1002, 0x101f, 0x1007, 0x1000, 0x102c, 0x ;, // 20 // 0x ', 0x ~, 0x0000, 0x103a, 0x z, 0x x, 0x1005, 0x v, // 20 // 0x1017, 0x1014, 0x1019, 0x lt, 0x gt, 0x ?, 0x0000, 0x0000}, // 30 // //{0x0000, 0x0000, 0x 1, 0x 2, 0x 3, 0x 4, 0x 5, 0x 6, // 00 // 0x 7, 0x 8, 0x 9, 0x 0, 0x -, 0x =, 0x0000, 0x0000, // 08 // 0x q, 0x103d, 0x e, 0x103c, 0x1011, 0x103b, 0x u, 0x i, // 10 // 0x o, 0x1016, 0x [, 0x ], 0x0000, 0x0000, 0x a, 0x s, // 18 // 0x1013, 0x f, 0x1003, 0x103e, 0x1008, 0x1001, 0x l, 0x ;, // 20 // 0x ', 0x ~, 0x0000, 0x1039, 0x z, 0x x, 0x1006, 0x v, // 20 // 0x1018, 0x n, 0x m, 0x lt, 0x gt, 0x ?, 0x0000, 0x0000}, // 30 //}, // // {0x0000, 0x0000, 0x 1, 0x 2, 0x 3, 0x 4, 0x 5, 0x 6, // 00 // 0x 7, 0x 8, 0x 9, 0x 0, 0x -, 0x =, 0x0000, 0x0000, // 08 // 0x q, 0x w, 0x e, 0x r, 0x t, 0x y, 0x u, 0x i, // 10 // 0x o, 0x p, 0x [, 0x ], 0x0000, 0x0000, 0x a, 0x s, // 18 // 0x d, 0x f, 0x g, 0x h, 0x j, 0x k, 0x l, 0x ;, // 20 // 0x ', 0x ~, 0x0000, 0x \, 0x z, 0x x, 0x c, 0x v, // 20 // 0x b, 0x n, 0x m, 0x lt, 0x gt, 0x ?, 0x0000, 0x0000}, // 30 }; */ PRIVATE int getNumKeyboards (); PRIVATE unsigned short *getKeyboardEntry (int keyboard, int modifier, int vkey);

3.11. Filling the result string

When we are done with elaborating the input and have a scalar value, we need to create the result string. If full is true, this will be of the form 'U+xxx <ch> name', otherwise it is of the form '<ch>'. If ncr is true, the character is represented as a '&#x<usv>;', otherwise, we insert the character itself.

PRIVATE void addUSVToResult (ImeState *state, int usv, BOOL ncr) { if (ncr) { state->resultString [state->resultLength++] = '&'; state->resultString [state->resultLength++] = '#'; state->resultString [state->resultLength++] = 'x'; state->resultLength += formatUSV (state->resultString + state->resultLength, usv); state->resultString [state->resultLength++] = ';'; } else { if (usv > 0xFFFF) { int biasedUsv = usv - 0x10000; state->resultString [state->resultLength++] = 0xD800 + ((biasedUsv >> 10) & 0x3ff); state->resultString [state->resultLength++] = 0xDC00 + ((biasedUsv) & 0x3ff); } else { state->resultString [state->resultLength++] = /*TODO*/(WCHAR) usv; }} } PRIVATE void addStringToResult (ImeState *state, unsigned short* s) { int i; int nbCodePoints = s [0]; for (i = 0; i < nbCodePoints; i++) { int usv = s [1 + i]; if (0xD800 <= usv && usv < 0xDBFF) { i++; usv = 0x10000 | (usv & 0x3FF) << 10 | (s [1 + i] & 0x3FF); } addUSVToResult (state, usv, FALSE); } } PRIVATE void createResult (ImeState *state, int usv, int modifiers) { debugMessage (DBG_CORE, L"createResult usv=0x%x, modifiers=0x%x", usv, modifiers); state->resultLength = 0; if (modifiers & MODIFIER_SHIFT) { UTF16CodeUnit buffer [200]; // TODO buffer = malloc () BOOL found; int gc; found = findCharacterName (usv, buffer, &gc); state->resultString [state->resultLength++] = 'U'; state->resultString [state->resultLength++] = '+'; state->resultLength += formatUSV (state->resultString + state->resultLength, usv); if (gc != 3) { state->resultString [state->resultLength++] = ' '; if (gc == 1) { // combining addUSVToResult (state, 0x25cc, modifiers & MODIFIER_CONTROL); } else if (gc == 2) { // space addUSVToResult (state, 0x2018, modifiers & MODIFIER_CONTROL); } addUSVToResult (state, usv, modifiers & MODIFIER_CONTROL); } if (gc == 2) { addUSVToResult (state, 0x2019, modifiers & MODIFIER_CONTROL); } if (found) { int i = 0; state->resultString [state->resultLength++] = ' '; while (buffer [i] != 0) { state->resultString [state->resultLength++] = buffer [i]; i++; }}} else { addUSVToResult (state, usv, modifiers & MODIFIER_CONTROL); } } PRIVATE void createResult (ImeState *state, int uvs, int modifiers); PRIVATE void addUSVToResult (ImeState *state, int usv, BOOL ncr); PRIVATE void addStringToResult (ImeState *state, unsigned short* s);

4. The Windows IME

4.1. Platforms

I started this work under Windows 2000 and then moved on to Windows XP. It is my understanding that the framework is the same under Windows 98, although this particular IME may be less interesting there, since applications all not all capable of handling Unicode.

There is also the possibility that all this would work on Windows 95 and Windows NT. There where very few additions between 95/NT and 98/2000.

4.2. Tools

Let's first deal with the environment you need to develop IMEs.

The first think you need is Visual C++. Version 5.0 is fine, and is the one I used, standard installation.

Next, you need the Driver Development Kit (or DDK) for Windows NT. It turns out that Microsoft is retiring it to replace it with the XP DKK. I have stashed a copy in case you need it. A standard installation is fine, and we assume that you put it in C:\ntddk.

The documentation for building IMEs in c:\ntddk\src\ime\docs. There are two Word documents. The first is titled “Win32 Multilingual IME Overview for IME Development,”, and the second is titled “Vin32 Multilingual IME Application Programming Interface”. Both are labeled “Version 1.41, 04-01-1999”. These documents are not the most easy to navigate. In the code below, when implementing an API, we will point to the relevant place in the documentation by an annotation of the form “I/32”: “I” refers to the first document, and “32” is the page number.

c:\ntddk\src\ime also contains three examples of imes; I believe that cht is a real one, while jpn is a demo only. cht seems to be the best written.

The documentation for using IMEs from applications is in the platfrom SDK documentation, under Windows Base Services, International Features, Input Method Editor. This is for applications that want to behave differently when an IME is used. However, all applications should work fine without any modification.

For development, I have found two SDK tools useful: spy++ to look at the messages that are exchanged, and dbmon (which can be found under Visual C++, Visual C++ Samples, SKD Samples, SDK Tool Samples, SDK Windows NT Samples, Dbmon).

4.3. Setting up

pushd c:\winddk\2600.1106 call c:\winddk\2600.1106\bin\setenv.bat c:\winddk\2600.1106 chk popd @echo on c:\winddk\2600.1106\bin\x86\build copy objchk_wxp_x86\i386\uniime.ime c:\windows\system32

4.4. Debugging

Here are a few functions that are helpful while developing the input method, for “debugging by printf”. They all eventually call the Win32 function OutputDebugString. To see the messages, run dbmon in a shell.

All debugging output can be turned on or off, on a global basis (that is, for all input contexts), by switching the global variable debugMode:

PRIVATE BOOL debugMode = TRUE;

Here are a couple of macros to manipulate this variable. They offer a convenient choke point to debug the UI that normally controls debugging.

#define getDebugMode() (debugMode) #define setDebugMode(x) (debugMode = x;)

The first function, debugMessage, is of the printf style. It outputs one line, which is made of a fixed prefix, the formatted arguments and a newline:

PRIVATE void debugMessage (int class, LPCTSTR format, ...) { static TCHAR *prefix = L"UniIME "; int len; TCHAR msg[1024]; if (getDebugMode () == FALSE) { return; } // if (class != DBG_INPUT) { // return; } { va_list marker; va_start (marker, format); len = _vsnwprintf (msg + wcslen (prefix), sizeof (msg), format, marker); va_end (marker); } wcsncpy (msg, prefix, wcslen (prefix)); len = wcslen (msg); if (len + 3 > sizeof (msg) / sizeof (TCHAR)) { len -= 3; } msg [len++] = '\r'; msg [len++] = '\n'; msg [len++] = 0; OutputDebugString (msg); }

The next function is useful when a reporting a GetLastError error: it outputs a printf-style formatted line, and then another line with the message corresponding to err.

PRIVATE void debugLastError (DWORD err, LPCTSTR format, ...) { TCHAR msg [1024]; int ret; va_list marker; va_start (marker, err); debugMessage (DBG_ERROR, format, marker); va_end (marker); if (getDebugMode () == FALSE) { return; } ret = FormatMessage (FORMAT_MESSAGE_FROM_SYSTEM, NULL, err, MAKELANGID (LANG_NEUTRAL, SUBLANG_DEFAULT), // Default language msg, sizeof (msg) / sizeof (TCHAR), NULL); msg [ret-2] = 0; // delete the newline debugMessage (DBG_ERROR, L" LastError = %x, '%s'", err, msg); }

Here are the declarations for those functions:

#define DBG_CORE 0 #define DBG_UI 1 #define DBG_ERROR 2 #define DBG_MISC 3 #define DBG_INPUT 4 PRIVATE void debugMessage (int class, LPCTSTR format, ...); PRIVATE void debugLastError (DWORD err, LPCTSTR format, ...);

4.5. The IME framework on Windows

The overall organization of a traditional IME under Windows is this:

The code that implements an IME resides in a DLL which is loaded and unloaded when needed by the Input Method Manager (or IMM). This DLL exports a few procedures that allows the IMM to control the global operation of the IME; for example, the DLL exports ImeInquire, and that function is called by the IMM to discover some characteristics of the IME.

For (almost) each application window, the IMM ensures that there is a corresponding IME UI window and a corresponding INPUTCONTEXT structure. This window has no graphic appearance, it's simply a tool to receive messages. The best is to think of this window, together with the INPUTCONTEXT structure, as an IME instance object; of the various messages on that window as method calls; and of the call to DLL functions that take an INPUTCONTEXT argument as other method calls.

When an IME is active for an application window, it gets the first crack at keystrokes. First, the keystroke is presented to the ImeProcessKey method (really, a DLL entry point that takes an INPUTCONTEXT method), which determines if the IME wants to handle that keystroke, or if it should be passed directly to the application. If the IME instance decides to handle the keystroke, it is then given to the ToAsciiEx method, which can then deal with it.

As the IME elaborates keystrokes into a string to be given as input to the application, it manipulates a COMPOSITIONSTRING structure that is part of the INPUTCONTEXT. This structure seems to be targeted as Far East IMEs; however, there is no clear definition of what the various fields are. For our purposes, we consider the COMPOSITIONSTRING to be two strings: the raw keystrokes and the finish string which is "pasted" into the application.

As the IME modifies the COMPOSITIONSTRING, it generates some messages: IME_START_COMPOSITION, IME_COMPOSITION, IME_END_COMPOSITION. The first and last are just delimit the refinement operation, while the middle one is sent every time the COMPOSITIONSTRING is modified.

Clearly, the user needs some feedback on this elaboration process, for example to see the keystrokes he types, and the input into which they are transformed by the IME. This is where one need to distinguish various classes of applications:

The SDK contains a sample fully aware application and a sample half aware application.

If the application decides to take control of the composition UI, it just handles the IME*COMPOSITION messages. Otherwise, those messages are routes to the IME UI window, which can then decide how to implement the composition UI.

4.6. The IME static members

In this section, we deals with the IME functions that do no depend on the input context; we can think of that as the static members of the IME objects.

4.6.1. DllEntry

Since an IME is a DLL, it must have an entry point, in the usual Windows style.

BOOL CALLBACK DllEntry (HINSTANCE inst, DWORD reason, LPVOID reserved) { switch (reason) { case DLL_PROCESS_ATTACH: { debugMessage (DBG_MISC, L"DLL_PROCESS_ATTACH"); code to execute at DLL attach time: 1, 2, 3, 4, 5, 6 break; } case DLL_PROCESS_DETACH: { debugMessage (DBG_MISC, L"DLL_PROCESS_DETACH"); code to execute at DLL detach time: 1, 2, 3, 4 break; }} return TRUE; } BOOL CALLBACK DllEntry (HINSTANCE inst, DWORD reason, LPVOID reserved);

4.6.2. Static memory

Besides our debugging support, the only global we need to keep around is the DLL instance, to create IME windows.

PRIVATE HINSTANCE dllInstance;

It is set when the IME DLL is attached:

dllInstance = inst;

4.6.3. ImeInquire

After the IME DLL is attached, the IMM calls the ImeInquire method, to get some sense of what the IME is capable of and how to interact with it.

// II/41 PUBLIC BOOL WINAPI ImeInquire (LPIMEINFO imeInfo, LPTSTR wndClass, DWORD flags) { debugMessage (DBG_MISC, L"ImeInquire"); return TRUE; } PUBLIC BOOL WINAPI ImeInquire (LPIMEINFO imeInfo, LPTSTR wndClass, DWORD flags);

The IMEINFO structure is described at I/37.

An INPUTCONTEXT includes a piece of memory that is private to the IME. This is useful for the IME to keep some data about the input context around. What we need to say right now is the size of that private memory, which we will represent as an INPUTCONTEXT_PRIVATE structure.

The meaning of the conversion modes and sentence capabilities is unclear.

// II/37 imeInfo->dwPrivateDataSize = sizeof (INPUTCONTEXT_PRIVATE); imeInfo->fdwProperty = IME_PROP_KBD_CHAR_FIRST | IME_PROP_UNICODE // | IME_PROP_IGNORE_UPKEYS | IME_PROP_NEED_ALTKEY; imeInfo->fdwConversionCaps = IME_CMODE_CHARCODE | IME_CMODE_NATIVE | IME_CMODE_FULLSHAPE | IME_CMODE_NOCONVERSION; imeInfo->fdwSentenceCaps = 0; imeInfo->fdwUICaps = UI_CAP_ROTANY | UI_CAP_SOFTKBD; imeInfo->fdwSCSCaps = 0; imeInfo->fdwSelectCaps = (DWORD)0;

The wndClass string receives the name of the UI window class. The IMM needs to know this, since is creates that window as needed:

lstrcpy (wndClass, uiClassName);

4.6.4. ImeConfigure

// II/ PUBLIC BOOL WINAPI ImeConfigure (HKL hkl, HWND window, DWORD mode, LPVOID data) { debugMessage (DBG_MISC, L"ImeConfigure"); switch (mode) { case IME_CONFIG_GENERAL: { MessageBox (window, L"Configuration", L"uniime conf", MB_OK | MB_ICONHAND | MB_TASKMODAL | MB_TOPMOST); return TRUE; break; }} return FALSE; } PUBLIC BOOL WINAPI ImeConfigure (HKL hkl, HWND window, DWORD mode, LPVOID data);

4.6.5. Others

Apparently, the IMM checks that all the IME static members are present, i.e. exported by the DLL. Here are the one we do not implements:

// II/58 BOOL WINAPI ImeRegisterWord (LPCTSTR reading, DWORD style, LPCTSTR string) { debugMessage (DBG_MISC, L"ImeRegisterWord"); return FALSE; } // II/59 BOOL WINAPI ImeUnregisterWord (LPCTSTR reading, DWORD style, LPCTSTR string) { debugMessage (DBG_MISC, L"ImeUnregisterWord"); return FALSE; } // II/59 UINT WINAPI ImeGetRegisterWordStyle (UINT nItem, LPSTYLEBUF styleBuf) { debugMessage (DBG_MISC, L"ImeGetRegisterWordStyle"); return 0; } // II/60 UINT WINAPI ImeEnumRegisterWord (REGISTERWORDENUMPROC enumProc , LPCTSTR reading, DWORD style, LPCTSTR string, LPVOID data) { debugMessage (DBG_MISC, L"ImeEnumRegisterWord"); return 0; } BOOL WINAPI ImeRegisterWord (LPCTSTR reading, DWORD style, LPCTSTR string); BOOL WINAPI ImeUnregisterWord (LPCTSTR reading, DWORD style, LPCTSTR string); UINT WINAPI ImeGetRegisterWordStyle (UINT nItem, LPSTYLEBUF styleBuf); UINT WINAPI ImeEnumRegisterWord (REGISTERWORDENUMPROC enumProc, LPCTSTR reading, DWORD style, LPCTSTR string, LPVOID data);

4.7. The IME instance members

Remember that we described an IME instance as made of a UI window and the set of DLL procedures that operate in an INPUTCONTEXT. In this section, we deal with those things.

4.7.1. The UI_PRIVATE structure

In addition, the instance also has a UI_PRIVATE structure (accessed via the IMMGWLP_PRIVATE key) attached to the UI window, which let's us store the composition window and the candidate window:

typedef struct { HWND compositionWindow; HWND candidateWindow; } UI_PRIVATE, *LPUI_PRIVATE;

4.7.2. The INPUTCONTEXT_PRIVATE structure

This is a small extension to the INPUTCONTEXT structure, where we can store the store of our IME instance:

typedef struct { ImeState state; HWND softKbdWindow; } INPUTCONTEXT_PRIVATE, *LPINPUTCONTEXT_PRIVATE;

4.7.3. The COMPOSITIONSTRING and CANDIDATEINFO structures

While composing the input to the application, the user gets feedback via a COMPOSITIONSTRING structure and a CANDIDATEINFO structure; more precisely, the input method is in charge of filling those structure

Here is a method to fill those structure from the internal state of the IME. It is rather mechanical and long: the highlights are:

PRIVATE void updateCompositionAndResultString (LPINPUTCONTEXT imc, LPINPUTCONTEXT_PRIVATE imcP) { LPCOMPOSITIONSTRING cs; int nbChars = imcP->state.prefixLength + imcP->state.compositionLength + imcP->state.completionLength; { int size = sizeof (COMPOSITIONSTRING) // comp string + (nbChars) * sizeof (TCHAR) // comp clause + 2 * sizeof (DWORD) // comp attr + (nbChars) // result string + imcP->state.resultLength * sizeof (TCHAR); if (imc->hCompStr == 0) { imc->hCompStr = ImmCreateIMCC (size); } else { HIMCC x = ImmReSizeIMCC (imc->hCompStr, size); if (x) { imc->hCompStr = x; } else { ImmDestroyIMCC (imc->hCompStr); imc->hCompStr = ImmCreateIMCC (size); }} cs = (LPCOMPOSITIONSTRING) ImmLockIMCC (imc->hCompStr); cs->dwCursorPos = imcP->state.prefixLength + imcP->state.compositionLength; cs->dwDeltaStart = 0; cs->dwCompReadAttrLen = 0; cs->dwCompReadClauseLen = 0; cs->dwCompReadStrLen = 0; cs->dwCompStrLen = nbChars; cs->dwCompAttrLen = nbChars; cs->dwCompClauseLen = 2 * sizeof (DWORD); cs->dwResultReadClauseLen = 0; cs->dwResultReadStrLen = 0; cs->dwResultClauseLen = 0; cs->dwResultStrLen = imcP->state.resultLength; cs->dwCompReadAttrOffset = sizeof (COMPOSITIONSTRING); cs->dwCompReadClauseOffset = cs->dwCompReadAttrOffset + cs->dwCompReadAttrLen; cs->dwCompReadStrOffset = cs->dwCompReadClauseOffset + cs->dwCompReadClauseLen; cs->dwCompAttrOffset = cs->dwCompReadStrOffset + cs->dwCompReadStrLen * sizeof (TCHAR); cs->dwCompClauseOffset = cs->dwCompAttrOffset + cs->dwCompAttrLen; cs->dwCompStrOffset = cs->dwCompClauseOffset + cs->dwCompClauseLen; cs->dwResultReadClauseOffset = cs->dwCompStrOffset + cs->dwCompStrLen * sizeof (TCHAR); cs->dwResultReadStrOffset = cs->dwResultReadClauseOffset + cs->dwResultReadClauseLen; cs->dwResultClauseOffset = cs->dwResultReadStrOffset + cs->dwResultReadStrLen * sizeof (TCHAR); cs->dwResultStrOffset = cs->dwResultClauseOffset + cs->dwResultClauseLen; cs->dwSize = cs->dwResultStrOffset + cs->dwResultStrLen * sizeof (TCHAR); memcpy (((LPBYTE)cs) + cs->dwCompStrOffset, imcP->state.prefixString, imcP->state.prefixLength * sizeof (TCHAR)); memcpy (((LPBYTE)cs) + cs->dwCompStrOffset + imcP->state.prefixLength * sizeof (TCHAR), imcP->state.compositionString, imcP->state.compositionLength * sizeof (TCHAR)); memcpy (((LPBYTE)cs) + cs->dwCompStrOffset + (imcP->state.prefixLength + imcP->state.compositionLength) * sizeof (TCHAR), imcP->state.completionString, imcP->state.completionLength * sizeof (TCHAR)); memcpy (((LPBYTE)cs) + cs->dwResultStrOffset, imcP->state.resultString, imcP->state.resultLength * sizeof (TCHAR)); { DWORD *clause = (DWORD*)(((LPBYTE)cs) + cs->dwCompClauseOffset); clause [0] = 0; clause [1] = cs->dwCompStrLen; } { BYTE *attr = (BYTE*) (((LPBYTE)cs) + cs->dwCompAttrOffset); DWORD i; for (i = 0; i < cs->dwCompStrLen; i++) { attr [i] = imcP->state.canStopHere ? ATTR_TARGET_CONVERTED : ATTR_INPUT; }} ImmUnlockIMCC (imc->hCompStr); } { int sizel = sizeof (CANDIDATELIST) + imcP->state.candidatesCount * sizeof (DWORD) + imcP->state.candidatesLength * sizeof (TCHAR); int size = sizeof (CANDIDATEINFO) + sizel; CANDIDATEINFO *ci; CANDIDATELIST *cl; if (imc->hCandInfo == 0) { imc->hCandInfo = ImmCreateIMCC (size); } else { HIMCC x = ImmReSizeIMCC (imc->hCandInfo, size); if (x) { imc->hCandInfo = x; } else { ImmDestroyIMCC (imc->hCandInfo); imc->hCandInfo = ImmCreateIMCC (size); }} ci = (LPCANDIDATEINFO) ImmLockIMCC (imc->hCandInfo); ci->dwSize = size; ci->dwCount = 1; ci->dwOffset [0] = sizeof (CANDIDATEINFO); ci->dwPrivateSize = 0; cl = (LPCANDIDATELIST) (ci + 1); cl->dwSize = sizel; cl->dwStyle = IME_CAND_UNKNOWN; cl->dwCount = imcP->state.candidatesCount; cl->dwSelection = 0; cl->dwPageStart = 0; cl->dwPageSize = imcP->state.candidatesCount; { int j = 0; int i; DWORD *offset = &(cl->dwOffset [0]); TCHAR *data = (TCHAR *) (((BYTE *)(cl + 1)) + imcP->state.candidatesCount * sizeof (DWORD)); for (i = 0; i < imcP->state.candidatesCount; i++) { offset [i] = ((BYTE *) data) - ((BYTE *) cl); while (imcP->state.candidates [j] != 0) { *(data ++) = imcP->state.candidates [j++]; } *(data ++) = imcP->state.candidates [j++]; }}} ImmUnlockIMCC (imc->hCandInfo); }

Here is the declaration.

PRIVATE void updateCompositionAndResultString (LPINPUTCONTEXT imc, LPINPUTCONTEXT_PRIVATE imcP);

4.7.4. The UI window

Here is the name of our UI window class:

PRIVATE WCHAR uiClassName[] = L"UniIMEui";

This class is registered when the DLL is attached. The only odd thing about this class is that it has the style CS_IME, and that the per-instance private data is 2 LONGs; these aspects are mandated by the IMM.

{ WNDCLASSEX wc; wc.cbSize = sizeof (WNDCLASSEX); wc.style = CS_IME; wc.lpfnWndProc = uiWindowProc; wc.cbClsExtra = 0; wc.cbWndExtra = 2 * sizeof (LONG); wc.hInstance = dllInstance; wc.hCursor = LoadCursor (NULL, IDC_ARROW); wc.hIcon = LoadIcon (dllInstance, MAKEINTRESOURCE (ICON_RESOURCE)); wc.hIconSm = LoadImage (dllInstance, MAKEINTRESOURCE (ICON_RESOURCE), IMAGE_ICON, 16, 16, LR_DEFAULTCOLOR); wc.hbrBackground = 0; wc.lpszMenuName = 0; wc.lpszClassName = uiClassName; if (! RegisterClassEx ((LPWNDCLASSEX) &wc)) { DWORD err = GetLastError (); if (err != ERROR_CLASS_ALREADY_EXISTS) { debugLastError (err, L"registerWindowClasses, uiClass"); return FALSE; }} }

This class is unregistered as DLL detach time:

{ UnregisterClass (uiClassName, dllInstance); }

There is of course a window proc. Here is the shell of it:

LRESULT CALLBACK uiWindowProc (HWND uiWindow, UINT message, WPARAM wparam, LPARAM lparam) { switch (message) { UI window proc message handlers: 1, 2, 3 default: { debugMessage (DBG_UI, L"uiWindowProc %x", message); return DefWindowProc (uiWindow, message, wparam, lparam); }} return 0; } LRESULT CALLBACK uiWindowProc (HWND uiWindow, UINT message, WPARAM wparam, LPARAM lparam);

We have a few obvious tasks as window creation time:

case WM_CREATE: { debugMessage (DBG_UI, L"uiWindowProc, WM_CREATE"); // set the default position for UI window, it is hide now SetWindowPos (uiWindow, NULL, 0, 0, 0, 0, SWP_NOACTIVATE|SWP_NOZORDER); ShowWindow (uiWindow, SW_SHOWNOACTIVATE); SetWindowLongPtr (uiWindow, IMMGWLP_PRIVATE, (LONG_PTR) GlobalAlloc (GHND, sizeof (UI_PRIVATE))); createCompositionWindow (uiWindow); createCandidateWindow (uiWindow); break; } case WM_IME_STARTCOMPOSITION: { debugMessage (DBG_UI, L"uiWindowProc, WM_IME_STARTCOMPOSITION"); ShowWindow (getCompositionWindow (uiWindow), SW_SHOWNOACTIVATE); break; } case WM_IME_COMPOSITION: { debugMessage (DBG_UI, L"uiWindowProc, WM_IME_COMPOSITION"); sizeCompositionWindow (uiWindow); positionCompositionWindow (uiWindow); RedrawWindow (getCompositionWindow (uiWindow), 0, 0, RDW_INVALIDATE); break; } case WM_IME_ENDCOMPOSITION: { debugMessage (DBG_UI, L"uiWindowProc, WM_IME_ENDCOMPOSITION"); ShowWindow (getCompositionWindow (uiWindow), SW_HIDE); break; } // II/29 case WM_IME_CONTROL: { debugMessage (DBG_UI, L"uiWindowProc: WM_IME_CONTROL wpara=%x", wparam); break; } case WM_IME_NOTIFY: { switch (wparam) { case IMN_OPENSTATUSWINDOW: { debugMessage (DBG_UI, L"uiWindowProc: IMN_OPENSTATUSWINDOW"); { HIMC imcHandle; LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle != 0) { imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (imcP->softKbdWindow == 0) { imcP->softKbdWindow = createSoftKbdWindow (uiWindow); debugMessage (DBG_UI, L"************************** keyboard=%d", imcP->softKbdWindow); ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); }}} break; } case IMN_CLOSESTATUSWINDOW: { debugMessage (DBG_UI, L"uiWindowProc: IMN_CLOSESTATUSWINDOW"); break; } case IMN_OPENCANDIDATE: { debugMessage (DBG_UI, L"uiWindowProc: IMN_OPENCANDIDATE"); ShowWindow (getCandidateWindow (uiWindow), SW_SHOWNOACTIVATE); sizeCandidateWindow (uiWindow); RedrawWindow (getCandidateWindow (uiWindow), 0, 0, RDW_INVALIDATE); break; } case IMN_CLOSECANDIDATE: { debugMessage (DBG_UI, L"uiWindowProc: IMN_CLOSECANDIDATE"); ShowWindow (getCandidateWindow (uiWindow), SW_HIDE); break; } case IMN_SETCANDIDATEPOS: { debugMessage (DBG_UI, L"uiWindowProc: IMN_SETCANDIDATEPOS %x", lparam); positionCandidateWindow (uiWindow); break; } case IMN_SETCONVERSIONMODE: { debugMessage (DBG_UI, L"uiWindowProc: IMN_SETCONVERSIONMODE"); break; } case IMN_SETSENTENCEMODE: { debugMessage (DBG_UI, L"uiWindowProc: IMN_SETSENTENCEMODE"); break; } case IMN_SETOPENSTATUS: { debugMessage (DBG_UI, L"uiWindowProc: IMN_SETOPENSTATUS"); break; } case IMN_SETCOMPOSITIONFONT: { LOGFONT logfont; HIMC imcHandle; imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); ImmGetCompositionFont (imcHandle, &logfont); debugMessage (DBG_UI, L"uiWindowProc: IMN_SETCOMPOSITIONFONT %s, %d", logfont.lfFaceName, logfont.lfHeight); break; } case IMN_SETCOMPOSITIONWINDOW: { debugMessage (DBG_UI, L"uiWindowProc: IMN_SETCOMPOSITIONWINDOW"); positionCompositionWindow (uiWindow); break; } case IMN_SETSTATUSWINDOWPOS: { debugMessage (DBG_UI, L"uiWindowProc: IMN_SETSTATUSWINDOWPOS"); break; } case IMN_GUIDELINE: { debugMessage (DBG_UI, L"uiWindowProc: IMN_GUIDELINE"); break; } case IMN_PRIVATE: { debugMessage (DBG_UI, L"uiWindowProc: IMN_PRIVATE"); break; } default: { debugMessage (DBG_UI, L"uiWindowProc: WM_IME_NOTIFY (0x%x)", wparam); break; }} break; }

4.7.5. The Composition Window

The function of the composition window is to display the composition string as keystrokes are accumulated to build an input string to the application. It is possible for the application to provide its own services, in which cases, the composition window provided by the IME is not used.

Let's register the class for this window when then IME DLL is attached.

{ WNDCLASSEX wc; wc.cbSize = sizeof (WNDCLASSEX); wc.style = CS_IME | CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = compositionWindowProc; wc.cbClsExtra = 0; wc.cbWndExtra = 0; wc.hInstance = dllInstance; wc.hCursor = LoadCursor (NULL, IDC_ARROW); wc.hIcon = NULL; wc.hIconSm = NULL; wc.hbrBackground = GetStockObject (LTGRAY_BRUSH); wc.lpszMenuName = 0; wc.lpszClassName = compClassName; if (! RegisterClassEx ((LPWNDCLASSEX) &wc)) { DWORD err = GetLastError (); if (err != ERROR_CLASS_ALREADY_EXISTS) { debugLastError (err, L"registerWindowClasses, compClass"); return FALSE; }} }

Of course, we unregister it at DLL detach time:

{ UnregisterClass (compClassName, dllInstance); }

We have a bunch of functions to create, resize, etc. the composition window:

PRIVATE void createCompositionWindow (HWND uiWindow); PRIVATE HWND getCompositionWindow (HWND uiWindow); PRIVATE void sizeCompositionWindow (HWND uiWindow); PRIVATE void positionCompositionWindow (HWND uiWindow); PRIVATE void paintCompositionWindow (HWND compWindow, HDC hDC); PRIVATE LRESULT CALLBACK compositionWindowProc (HWND window, UINT message, WPARAM wparam, LPARAM lparam);

Creating the composition window is rather straightforward:

PRIVATE void createCompositionWindow (HWND uiWindow) { HGLOBAL privateHandle; LPUI_PRIVATE private; privateHandle = (HGLOBAL) GetWindowLongPtr (uiWindow, IMMGWLP_PRIVATE); if (privateHandle == 0) { return; } private = (LPUI_PRIVATE) GlobalLock (privateHandle); if (private == 0) { return; } if (private->compositionWindow == 0) { private->compositionWindow = CreateWindowEx (0 /*WS_EX_WINDOWEDGE | WS_EX_DLGMODALFRAME*/, compClassName, L"composition window", WS_POPUP | WS_DISABLED, // style 0, 0, 100, 20, // position, size uiWindow, 0, // no menu dllInstance, // application instance? 0); } GlobalUnlock (privateHandle); }

Here is a utility function to retrieve the composition window from the UI window.

PRIVATE HWND getCompositionWindow (HWND uiWindow) { HGLOBAL privateHandle; LPUI_PRIVATE private; HWND compositionWindow; privateHandle = (HGLOBAL) GetWindowLongPtr (uiWindow, IMMGWLP_PRIVATE); if (privateHandle == 0) { return 0; } private = (LPUI_PRIVATE) GlobalLock (privateHandle); if (private == 0) { return 0; } compositionWindow = private->compositionWindow; GlobalUnlock (privateHandle); return (compositionWindow); }

The size of the composition window is made just enough to contain the composition string.

PRIVATE void sizeCompositionWindow (HWND uiWindow) { HWND compositionWindow; HIMC imcHandle; LPINPUTCONTEXT imc; LPCOMPOSITIONSTRING cs; int width = 5, height = 5; compositionWindow = getCompositionWindow (uiWindow); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } cs = (LPCOMPOSITIONSTRING) ImmLockIMCC (imc->hCompStr); if (! cs) { ImmUnlockIMC (imcHandle); return; } // figure out the size of the composition window { SIZE size; HDC hDC = GetDC (compositionWindow); TCHAR *str = (TCHAR *) (((LPBYTE) cs) + cs->dwCompStrOffset); int len = cs->dwCompStrLen; GetTextExtentPoint32 (hDC, str, cs->dwCursorPos, &size); str += cs->dwCursorPos; len -= cs->dwCursorPos; width += size.cx; if (size.cy > height) { height = size.cy; } // cursor width += 1; if (len > 0) { GetTextExtentPoint32 (hDC, str, len, &size); width += size.cx + 1; if (size.cy > height) { height = size.cy; }} width += 2 * 5; height += 2 * 5; ReleaseDC (compositionWindow, hDC); } // position the candidate window { POINT pt; RECT currentRect; GetWindowRect (compositionWindow, &currentRect); pt.x = currentRect.left + width + 4; pt.y = currentRect.top; ScreenToClient (imc->hWnd, &pt); imc->cfCandForm [0].ptCurrentPos = pt; } ImmUnlockIMCC (imc->hCompStr); ImmUnlockIMC (imcHandle); debugMessage (DBG_UI, L"resizing composition window to %d, %d", width, height); SetWindowPos (compositionWindow, NULL, 0, 0, width, height, SWP_NOACTIVATE | SWP_NOMOVE | SWP_NOZORDER); positionCandidateWindow (uiWindow); }

In our IME, we position the composition window near the current point, slightly below it. To avoid continuous dance of the window (especially in U++ mode), we actually move it only if it is too far of its ideal position.

PRIVATE void positionCompositionWindow (HWND uiWindow) { HWND compositionWindow; HIMC imcHandle; LPINPUTCONTEXT imc; POINT newPosition; BOOL rightThere = FALSE; compositionWindow = getCompositionWindow (uiWindow); if (compositionWindow == 0) { return; } imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc->cfCompForm.dwStyle & CFS_FORCE_POSITION) { rightThere = TRUE; newPosition.x = imc->cfCompForm.ptCurrentPos.x; newPosition.y = imc->cfCompForm.ptCurrentPos.y; } else { rightThere = FALSE; newPosition.x = imc->cfCompForm.ptCurrentPos.x; newPosition.y = imc->cfCompForm.ptCurrentPos.y + 50; } ClientToScreen (imc->hWnd, &newPosition); { RECT currentRect; GetWindowRect (compositionWindow, &currentRect); #define ABS(_a) ((_a) > 0 ? (_a) : - (_a)) if ( currentRect.top != newPosition.y || (rightThere && currentRect.left != newPosition.x) || ABS (currentRect.left - newPosition.x) > 100) { debugMessage (DBG_UI, L" --- moving comp to %d, %d", newPosition.x, newPosition.y); SetWindowPos (compositionWindow, 0, newPosition.x, newPosition.y, 0, 0, SWP_NOACTIVATE | SWP_NOSIZE | SWP_NOZORDER); } else { debugMessage (DBG_UI, L" --- comp stays at %d, %d", currentRect.left, currentRect.top); }} ImmUnlockIMC (imcHandle); }

Painting is rather straightforward:

PRIVATE void paintCompositionWindow (HWND compWindow, HDC hDC) { HWND uiWindow; HIMC imcHandle; LPINPUTCONTEXT imc; LPCOMPOSITIONSTRING cs; RECT rect; int textColor; debugMessage (DBG_UI, L"painting comp"); uiWindow = GetWindow (compWindow, GW_OWNER); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } cs = (LPCOMPOSITIONSTRING) ImmLockIMCC (imc->hCompStr); if (! cs) { ImmUnlockIMC (imcHandle); return; } GetClientRect (compWindow, &rect); FillRect (hDC, &rect, GetStockObject (WHITE_BRUSH)); SelectObject (hDC, GetStockObject (WHITE_BRUSH)); SelectObject (hDC, GetStockObject (BLACK_PEN)); Rectangle (hDC, rect.left, rect.top, rect.right, rect.bottom); { BYTE *attr = (BYTE *) (((LPBYTE)cs) + cs->dwCompAttrOffset); if (attr [0] == ATTR_TARGET_CONVERTED) { textColor = 0x008800; } else { textColor = 0x000000; }} { SIZE size; int x = 5; int y = 5; TCHAR *str = (TCHAR *) (((LPBYTE) cs) + cs->dwCompStrOffset); int len = cs->dwCompStrLen; SetTextColor (hDC, textColor); ExtTextOut (hDC, x, y, 0, 0, str, cs->dwCursorPos, 0); GetTextExtentPoint32 (hDC, str, cs->dwCursorPos, &size); str += cs->dwCursorPos; len -= cs->dwCursorPos; x += size.cx; // cursor SetTextColor (hDC, 0xffffff); MoveToEx (hDC, x, y, 0); LineTo (hDC, x, y + size.cy); x++; if (len > 0) { SetTextColor (hDC, textColor); ExtTextOut (hDC, x, y, 0, 0, str, len, 0); GetTextExtentPoint32 (hDC, str, len, &size); x += size.cx; }} ImmUnlockIMCC (imc->hCompStr); ImmUnlockIMC (imcHandle); debugMessage (DBG_UI, L"done painting comp"); }

Finally, we have the window proc for the composition window:

LRESULT CALLBACK compositionWindowProc (HWND window, UINT message, WPARAM wparam, LPARAM lparam) { switch (message) { case WM_CREATE: { debugMessage (DBG_UI, L"compositionWindowProc, WM_CREATE"); return 0L; } case WM_DESTROY: { debugMessage (DBG_UI, L"compositionWindowProc, WM_DESTROY"); return 0L; } case WM_PAINT: { HDC hDC; PAINTSTRUCT ps; debugMessage (DBG_UI, L"compositionWindowProc, WM_PAINT"); hDC = BeginPaint (window, &ps); paintCompositionWindow (window, hDC); EndPaint (window, &ps); return 0L; } default: { debugMessage (DBG_UI, L"compositionWindowProc 0x%x", message); }} return DefWindowProc (window, message, wparam, lparam); }

4.7.6. The Candidate Window

The function of the candidate window is to display the candidates whenever the user asks for them. It is possible for the application to provide its own services, in which cases, the candidate window provided by the IME is not used.

Let's register the class for this window when then IME DLL is attached.

{ WNDCLASSEX wc; wc.cbSize = sizeof (WNDCLASSEX); wc.style = CS_IME | CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = candidateWindowProc; wc.cbClsExtra = 0; wc.cbWndExtra = 0; wc.hInstance = dllInstance; wc.hCursor = LoadCursor (NULL, IDC_ARROW); wc.hIcon = NULL; wc.hIconSm = NULL; wc.hbrBackground = GetStockObject (LTGRAY_BRUSH); wc.lpszMenuName = 0; wc.lpszClassName = candClassName; if (! RegisterClassEx ((LPWNDCLASSEX) &wc)) { DWORD err = GetLastError (); if (err != ERROR_CLASS_ALREADY_EXISTS) { debugLastError (err, L"registerWindowClasses, candClass"); return FALSE; }} }

Of course, we unregister it at DLL detach time:

{ UnregisterClass (candClassName, dllInstance); }

We have a bunch of functions to create, resize, etc. the candidate window:

PRIVATE void createCandidateWindow (HWND uiWindow); PRIVATE HWND getCandidateWindow (HWND uiWindow); PRIVATE void sizeCandidateWindow (HWND uiWindow); PRIVATE void positionCandidateWindow (HWND uiWindow); PRIVATE void paintCandidateWindow (HWND candWindow, HDC hDC); PRIVATE LRESULT CALLBACK candidateWindowProc (HWND window, UINT message, WPARAM wparam, LPARAM lparam);

Creating the candidate window is rather straightforward:

PRIVATE void createCandidateWindow (HWND uiWindow) { HGLOBAL privateHandle; LPUI_PRIVATE private; privateHandle = (HGLOBAL) GetWindowLongPtr (uiWindow, IMMGWLP_PRIVATE); if (privateHandle == 0) { return; } private = (LPUI_PRIVATE) GlobalLock (privateHandle); if (private == 0) { return; } if (private->candidateWindow == 0) { private->candidateWindow = CreateWindowEx (0 /*WS_EX_WINDOWEDGE | WS_EX_DLGMODALFRAME*/, candClassName, L"candidate window", WS_POPUP | WS_DISABLED, // style 0, 0, 0, 0, // position, size uiWindow, 0, // no menu dllInstance, // application instance? 0); } GlobalUnlock (privateHandle); }

Here is a utility function to retrieve the candidate window from the UI window.

PRIVATE HWND getCandidateWindow (HWND uiWindow) { HGLOBAL privateHandle; LPUI_PRIVATE private; HWND candidateWindow; privateHandle = (HGLOBAL) GetWindowLongPtr (uiWindow, IMMGWLP_PRIVATE); if (privateHandle == 0) { return 0; } private = (LPUI_PRIVATE) GlobalLock (privateHandle); if (private == 0) { return 0; } candidateWindow = private->candidateWindow; GlobalUnlock (privateHandle); return (candidateWindow); }

The size of the candidate window is made just enough to contain the candidates. We rely on the fact that there is a single candidate list.

PRIVATE void sizeCandidateWindow (HWND uiWindow) { HWND candidateWindow; HIMC imcHandle; LPINPUTCONTEXT imc; LPCANDIDATEINFO ci; LPCANDIDATELIST cl; int width = 0, height = 0; candidateWindow = getCandidateWindow (uiWindow); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } ci = (LPCANDIDATEINFO) ImmLockIMCC (imc->hCandInfo); if (! ci) { ImmUnlockIMC (imcHandle); return; } cl = (LPCANDIDATELIST) (((LPBYTE)ci) + ci->dwOffset [0]); // figure out the size of the candidate window { DWORD i; SIZE size; HDC hDC = GetDC (candidateWindow); for (i = 0; i < cl->dwCount; i++) { int len = 0; WCHAR *name = (TCHAR *) (((LPBYTE) cl) + cl->dwOffset [i]); WCHAR *n = name; while (*n != 0) { n++; len++; } GetTextExtentPoint32 (hDC, name, len, &size); if (size.cx > width) { width = size.cx; } height += size.cy; } width += 2 * 5; height += 2 * 5; ReleaseDC (candidateWindow, hDC); } ImmUnlockIMCC (imc->hCandInfo); ImmUnlockIMC (imcHandle); debugMessage (DBG_UI, L"resizing candidate window to %d, %d", width, height); SetWindowPos (candidateWindow, NULL, 0, 0, width, height, SWP_NOACTIVATE | SWP_NOMOVE | SWP_NOZORDER); }

We position the candidate window where we are told.

PRIVATE void positionCandidateWindow (HWND uiWindow) { HWND candidateWindow; HIMC imcHandle; LPINPUTCONTEXT imc; POINT pt; candidateWindow = getCandidateWindow (uiWindow); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } pt = imc->cfCandForm [0].ptCurrentPos; ClientToScreen (imc->hWnd, &pt); ImmUnlockIMC (imcHandle); debugMessage (DBG_UI, L"moving candidate window to %d, %d", pt.x, pt.y); SetWindowPos (candidateWindow, NULL, pt.x, pt.y, 0, 0, SWP_NOACTIVATE | SWP_NOSIZE | SWP_NOZORDER); }

Painting is rather straightforward:

PRIVATE void paintCandidateWindow (HWND candWindow, HDC hDC) { HIMC imcHandle; LPINPUTCONTEXT imc; LPCANDIDATEINFO ci; LPCANDIDATELIST cl; RECT rect; HWND uiWindow; uiWindow = GetWindow (candWindow, GW_OWNER); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } ci = (LPCANDIDATEINFO) ImmLockIMCC (imc->hCandInfo); if (! ci) { ImmUnlockIMC (imcHandle); return; } cl = (LPCANDIDATELIST) (((LPBYTE)ci) + ci->dwOffset [0]); GetClientRect (candWindow, &rect); FillRect (hDC, &rect, GetStockObject (WHITE_BRUSH)); SelectObject (hDC, GetStockObject (WHITE_BRUSH)); SelectObject (hDC, GetStockObject (BLACK_PEN)); Rectangle (hDC, rect.left, rect.top, rect.right, rect.bottom); { DWORD i; int x = 5; int y = 5; SIZE size; for (i = 0; i < cl->dwCount; i++) { int len = 0; WCHAR *name = (TCHAR *) (((LPBYTE) cl) + cl->dwOffset [i]); WCHAR *n = name; while (*n != 0) { n++; len++; } ExtTextOut (hDC, x, y, 0, 0, name, len, 0); GetTextExtentPoint32 (hDC, name, len, &size); y += size.cy; }} ImmUnlockIMCC (imc->hCandInfo); ImmUnlockIMC (imcHandle); }

Finally, we have the window proc for the candidate window:

PRIVATE LRESULT CALLBACK candidateWindowProc (HWND window, UINT message, WPARAM wparam, LPARAM lparam) { switch (message) { case WM_CREATE: { debugMessage (DBG_UI, L"candidateWindowProc, WM_CREATE"); return 0L; } case WM_DESTROY: { debugMessage (DBG_UI, L"candidateWindowProc, WM_DESTROY"); return 0L; } case WM_PAINT: { HDC hDC; PAINTSTRUCT ps; debugMessage (DBG_UI, L"candidateWindowProc, WM_PAINT"); hDC = BeginPaint (window, &ps); paintCandidateWindow (window, hDC); EndPaint (window, &ps); return 0L; } default: { debugMessage (DBG_UI, L"candidateWindowProc 0x%x", message); }} return DefWindowProc (window, message, wparam, lparam); }

4.7.7. The Soft Keyboard Window

{ WNDCLASSEX wc; wc.cbSize = sizeof (WNDCLASSEX); wc.style = CS_IME | CS_HREDRAW | CS_VREDRAW; wc.lpfnWndProc = softKbdWindowProc; wc.cbClsExtra = 0; wc.cbWndExtra = 4; wc.hInstance = dllInstance; wc.hCursor = LoadCursor (NULL, IDC_ARROW); wc.hIcon = NULL; wc.hIconSm = NULL; wc.hbrBackground = GetStockObject (LTGRAY_BRUSH); wc.lpszMenuName = 0; wc.lpszClassName = softKbdClassName; if (! RegisterClassEx ((LPWNDCLASSEX) &wc)) { DWORD err = GetLastError (); if (err != ERROR_CLASS_ALREADY_EXISTS) { debugLastError (err, L"registerWindowClasses, softKbdClass"); return FALSE; }} }

Of course, we unregister it at DLL detach time:

{ UnregisterClass (softKbdClassName, dllInstance); }

We have a bunch of functions to create, resize, etc. the soft keyboard window:

PRIVATE HWND createSoftKbdWindow (HWND uiWindow); PRIVATE HWND getSoftKbdWindow (HWND uiWindow); PRIVATE void sizeSoftKbdWindow (HWND uiWindow); PRIVATE void positionSoftKbdWindow (HWND uiWindow); PRIVATE void paintSoftKbdWindow (HWND compWindow, HDC hDC); PRIVATE LRESULT CALLBACK softKbdWindowProc (HWND window, UINT message, WPARAM wparam, LPARAM lparam);

Creating the soft keyboard window is rather straightforward:

PRIVATE HWND createSoftKbdWindow (HWND uiWindow) { HWND window = CreateWindowEx (0 /*WS_EX_WINDOWEDGE | WS_EX_DLGMODALFRAME*/, softKbdClassName, L"softKbd window", WS_POPUP | WS_DISABLED, // style 0, 0, 600, 215, // position, size uiWindow, 0, // no menu dllInstance, // application instance? 0); SetWindowLong (window, 0, -1); return window; }

Here is a utility function to retrieve the softKbd window from the UI window.

PRIVATE HWND getSoftKbdWindow (HWND uiWindow) { return 0; }

The size of the softKbd window is made just enough to contain the softKbd string.

PRIVATE void sizeSoftKbdWindow (HWND uiWindow) { debugMessage (DBG_UI, L"resizing softKbd window"); }

In our IME, we position the softKbd window near the current point, slightly below it. To avoid continuous dance of the window (especially in U++ mode), we actually move it only if it is too far of its ideal position.

PRIVATE void positionSoftKbdWindow (HWND uiWindow) { debugMessage (DBG_UI, L"positioning softKbd window"); }

Painting is rather straightforward:

typedef struct { int width; int scanCode; } Key; typedef struct { int nEntries; Key keys [100]; } KeyboardLayout; KeyboardLayout keyboard101 = { 65, {{18, 0x29}, {18, 0x2}, {18, 0x3}, {18, 0x4}, {18, 0x5}, {18, 0x6}, {18, 0x7}, {18, 0x8}, {18, 0x9}, {18, 0xa}, {18, 0xb}, {18, 0xc}, {18, 0xd}, {36, 0xe}, {0}, {27, 0xf}, {18, 0x10}, {18, 0x11}, {18, 0x12}, {18, 0x13}, {18, 0x14}, {18, 0x15}, {18, 0x16}, {18, 0x17}, {18, 0x18}, {18, 0x19}, {18, 0x1a}, {18, 0x1b}, {27, 0x2b}, {0}, {32, 0x3a}, {18, 0x1e}, {18, 0x1f}, {18, 0x20}, {-18, 0x21}, {18, 0x22}, {18, 0x23}, {-18, 0x24}, {18, 0x25}, {18, 0x26}, {18, 0x27}, {18, 0x28}, {42, 0x1c}, {0}, {42, 0x2a}, {18, 0x2c}, {18, 0x2d}, {18, 0x2e}, {18, 0x2f}, {18, 0x30}, {18, 0x31}, {18, 0x32}, {18, 0x33}, {18, 0x34}, {18, 0x35}, {52, 0x36}, {0}, {23, 0x1d}, {23, 0x15b}, {23}, {121, 0x39}, {23}, {23}, {23}, {23}, {0}}}; BYTE keyState [256] = {0}; PRIVATE void paintSoftKbdWindow (HWND softKbdWindow, HDC hDC) { int curMap; int curMod; RECT rect; { HWND uiWindow; HIMC imcHandle; LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; uiWindow = GetWindow (softKbdWindow, GW_OWNER); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (!imcP) { ImmUnlockIMC (imcHandle); return; } curMap = imcP->state.curMap; curMod = imcP->state.curMod; ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); } GetClientRect (softKbdWindow, &rect); SelectObject (hDC, GetStockObject (DC_BRUSH)); SelectObject (hDC, GetStockObject (BLACK_PEN)); SetDCBrushColor (hDC, RGB (0xc0, 0xc0, 0xc0)); Rectangle (hDC, rect.left, rect.top, rect.right, rect.bottom); { LOGFONT lf; HFONT hfont; lf.lfHeight = 25; lf.lfWidth =0; lf.lfEscapement = 0; lf.lfOrientation = 0; lf.lfWeight = FW_NORMAL; lf.lfItalic = FALSE; lf.lfUnderline = FALSE; lf.lfStrikeOut = FALSE; lf.lfCharSet = DEFAULT_CHARSET; lf.lfOutPrecision = OUT_DEFAULT_PRECIS; lf.lfClipPrecision = CLIP_DEFAULT_PRECIS; lf.lfQuality = DEFAULT_QUALITY; lf.lfPitchAndFamily = DEFAULT_PITCH; lstrcpy (lf.lfFaceName, L"Arial Unicode MS"); hfont = CreateFontIndirect (&lf); SelectObject (hDC, hfont); } SetTextAlign (hDC, TA_TOP | TA_CENTER); SetDCBrushColor (hDC, RGB (0xff, 0xff, 0xff)); { int i; int x = 10; int y = 10; for (i = 0; i < keyboard101.nEntries; i++) { int width = keyboard101.keys [i].width; if (width == 0) { x = 10; y += 20; } else { RECT r; int scanCode; int vk; unsigned short *entry; int mark = 0; if (width < 0) { mark = 1; width = - width; } r.top = y; r.bottom = r.top + 18; r.left = x; r.right = r.left + width; r.top = (int) (r.top * 1.6); r.bottom = (int) (r.bottom * 1.6); r.left = (int) (r.left * 1.6); r.right = (int) (r.right * 1.6); if (mark) { SelectObject (hDC, CreatePen (PS_INSIDEFRAME, 1, RGB (0x00, 0xff, 0x00))); } else { SelectObject (hDC, CreatePen (PS_NULL, 1, 0)); } RoundRect (hDC, r.left, r.top, r.right, r.bottom, 3, 3); x += width + 2; scanCode = keyboard101.keys [i].scanCode; vk = MapVirtualKey (scanCode, 1); entry = getKeyboardEntry (curMap, curMod, vk); if (entry != 0) { TextOut (hDC, (r.left + r.right) / 2, (r.top + 1), (WCHAR*)entry+1, *entry); }}}} { int curKey = GetWindowLong (softKbdWindow, 0); int scanCode; int vk; unsigned short *entry; TCHAR tch; if (curKey == -1) { return; } scanCode = keyboard101.keys [curKey].scanCode; vk = MapVirtualKey (scanCode, 1); entry = getKeyboardEntry (curMap, curMod, vk); if (entry != 0) { SetTextAlign (hDC, TA_TOP | TA_LEFT); { UTF16CodeUnit buffer [1000]; int n = 0; BOOL found; int gc; tch = entry [1]; buffer [n++] = 'U'; buffer [n++] = '+'; n += formatUSV (buffer + n, tch); buffer [n++] = ' '; buffer [n] = 0; found = findCharacterName (tch, buffer + n, &gc); TextOut (hDC, 20, 185, buffer, wcslen (buffer)); } { LOGFONT lf; HFONT hfont; lf.lfHeight = 100; lf.lfWidth = 0; lf.lfEscapement = 0; lf.lfOrientation = 0; lf.lfWeight = FW_NORMAL; lf.lfItalic = FALSE; lf.lfUnderline = FALSE; lf.lfStrikeOut = FALSE; lf.lfCharSet = DEFAULT_CHARSET; lf.lfOutPrecision = OUT_DEFAULT_PRECIS; lf.lfClipPrecision = CLIP_DEFAULT_PRECIS; lf.lfQuality = DEFAULT_QUALITY; lf.lfPitchAndFamily = DEFAULT_PITCH; lstrcpy (lf.lfFaceName, L"Arial Unicode MS"); hfont = CreateFontIndirect (&lf); SelectObject (hDC, hfont); TextOut (hDC, 500, 50, & tch, 1); }}} }

Finally, we have the window proc for the softKbd window:

PRIVATE int findSoftKbdWindowKey (POINT p) { int i; int x = 10; int y = 10; for (i = 0; i < keyboard101.nEntries; i++) { int width = abs (keyboard101.keys [i].width); if (width == 0) { x = 10; y += 20; } else { RECT r; r.top = y; r.bottom = r.top + 18; r.left = x; r.right = r.left + width; r.top = (int) (r.top * 1.6); r.bottom = (int) (r.bottom * 1.6); r.left = (int) (r.left * 1.6); r.right = (int) (r.right * 1.6); x += width + 2; if (PtInRect (&r, p)) { return i; }}} return -1; } PRIVATE void hitSoftKbdWindow (HWND window, int key) { HWND uiWindow; HIMC imcHandle; LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; uiWindow = GetWindow (window, GW_OWNER); imcHandle = (HIMC) GetWindowLongPtr (uiWindow, IMMGWLP_IMC); if (imcHandle == 0) { return; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (imc == 0) { return; } imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (!imcP) { ImmUnlockIMC (imcHandle); return; } actOnIMEAction (imc, imcP, handleMapKeyStroke (&imcP->state, keyboard101.keys [key].scanCode, 0)); ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); ImmGenerateMessage (imcHandle); } LRESULT CALLBACK softKbdWindowProc (HWND window, UINT message, WPARAM wparam, LPARAM lparam) { switch (message) { case WM_CREATE: { debugMessage (DBG_UI, L"softKbdWindowProc, WM_CREATE"); return 0L; } case WM_DESTROY: { debugMessage (DBG_UI, L"softKbdWindowProc, WM_DESTROY"); return 0L; } case WM_PAINT: { HDC hDC; PAINTSTRUCT ps; debugMessage (DBG_UI, L"softKbdWindowProc, WM_PAINT"); hDC = BeginPaint (window, &ps); paintSoftKbdWindow (window, hDC); EndPaint (window, &ps); return 0L; } case WM_SETCURSOR: { int key; debugMessage (DBG_UI, L"softKbdWindowProc, WM_SETCURSOR"); SetCursor (LoadCursor (NULL, IDC_ARROW)); { POINT cursorPosition; GetCursorPos (&cursorPosition); ScreenToClient (window, &cursorPosition); key = findSoftKbdWindowKey (cursorPosition); } if (HIWORD (lparam) == WM_LBUTTONDOWN && key != -1) { hitSoftKbdWindow (window, key); SetWindowLong (window, 0, -1); } else { if (key != GetWindowLong (window, 0)) { SetWindowLong (window, 0, key); RedrawWindow (window, 0, 0, RDW_INVALIDATE); }} return 0L; } default: { debugMessage (DBG_UI, L"softKbdWindowProc 0x%x", message); }} return DefWindowProc (window, message, wparam, lparam); }

4.7.8. IMM entry points

Most interaction between the IME and the system is mediated by the IMM, which requires a number of entry points:

// II/50 PUBLIC BOOL WINAPI ImeProcessKey (HIMC imcHandle, UINT virtualKey, LPARAM lparam, CONST LPBYTE keyState) { BYTE asciiChar [4]; int nChars; LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; BOOL result; nChars = ToAscii (virtualKey, HIWORD (lparam), keyState, (LPVOID)asciiChar, 0); debugMessage (DBG_INPUT, L"ImeProcessKey vk 0x%x lparam 0x%x char %d, 0x%x SC=%x %x", virtualKey, lparam, nChars, asciiChar [0], keyState[VK_SHIFT], keyState[VK_CONTROL]); if (!imcHandle) { return FALSE; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (!imc) { return FALSE; } imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (!imcP) { ImmUnlockIMC (imcHandle); return FALSE; } if (imcP->state.prefixLength >= 2 && imcP->state.prefixString [0] == 'J') { if ( (virtualKey & 0xFF) == VK_CONTROL || (virtualKey & 0xFF) == 0x8 || (virtualKey & 0xFF) == 0xd || (keyState [VK_CONTROL] & 0x80) != 0) { result = FALSE; } else { result = TRUE; }} else if (imcP->state.prefixLength == 0) { if (nChars != 0) { int ch = asciiChar [0]; result = wantKeyStroke (&imcP->state, ch); } else { result = FALSE; }} else { result = TRUE; } debugMessage (DBG_INPUT, L" takeit=%d", result); ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); return result; } // II/51 PUBLIC BOOL WINAPI NotifyIME (HIMC imcHandle, DWORD action, DWORD index, DWORD value) { switch (action) { case NI_IMEMENUSELECTED: { debugMessage (DBG_UI, L"NotifyIME NI_IMEMENUSELECTED 0x%x, 0x%x", index, value); break; } case NI_SELECTCANDIDATESTR: { debugMessage (DBG_UI, L"NotifyIME NI_SELECTCANDIDATESTR 0x%x, 0x%x", index, value); // ShowWindow (imcP->softKbdWindow, SW_HIDE); // updateCompositionAndResultString (imc, imcP); // addMessage (WM_IME_COMPOSITION, 0, GCS_FLAGS| GCS_RESULTSTR); // addMessage (WM_IME_ENDCOMPOSITION, 0, 0); // addMessage (WM_IME_STARTCOMPOSITION, 0, 0); // addMessage (WM_IME_COMPOSITION, 0, GCS_FLAGS); // fill hmsgbuf with // ImmGenerateMessage (imcHandle); break; } case NI_CONTEXTUPDATED: { switch (value) { case IMC_SETCOMPOSITIONWINDOW: { debugMessage (DBG_UI, L"NotifyIME IMC_SETCOMPOSITIONWINDOW"); break; } case IMC_SETCOMPOSITIONFONT: { LOGFONT logfont; ImmGetCompositionFont (imcHandle, &logfont); debugMessage (DBG_UI, L"NotifyIME IMC_SETCOMPOSITIONFONT %s, %d", logfont.lfFaceName, logfont.lfHeight); break; } case IMC_SETCANDIDATEPOS: { debugMessage (DBG_UI, L"NotifyIME IMC_SETCANDIDATEPOS"); break; } default: { debugMessage (DBG_UI, L"NotifyIME NI_CONTEXTUPDATED 0x%x, 0x%x", value, index); break; }} break; } default: { debugMessage (DBG_UI, L"NotifyIME %d (0x%x) 0x%x 0x%x", action, action, index, value); break; }} return TRUE; } // II/45 PUBLIC BOOL WINAPI ImeDestroy (UINT uReserved) { debugMessage (DBG_MISC, L"ImeDestroy 0x%x", uReserved); return (uReserved == 0); } #define GCS_FLAGS GCS_COMPSTR | GCS_COMPATTR | GCS_COMPCLAUSE | GCS_CURSORPOS PRIVATE void addMessage (LPINPUTCONTEXT imc, int message, WPARAM wParam, LPARAM lParam); PRIVATE void actOnIMEAction (LPINPUTCONTEXT imc, LPINPUTCONTEXT_PRIVATE imcP, int action); PRIVATE void addMessage (LPINPUTCONTEXT imc, int message, WPARAM wParam, LPARAM lParam) { LPTRANSMSG lpTransMsg; HIMCC x = ImmReSizeIMCC (imc->hMsgBuf, (imc->dwNumMsgBuf + 1) * sizeof (TRANSMSG)); imc->hMsgBuf = x; lpTransMsg = ImmLockIMCC (imc->hMsgBuf); lpTransMsg [imc->dwNumMsgBuf].message = message; lpTransMsg [imc->dwNumMsgBuf].wParam = wParam; lpTransMsg [imc->dwNumMsgBuf].lParam = lParam; ImmUnlockIMCC (imc->hMsgBuf); imc->dwNumMsgBuf++; } PRIVATE void actOnIMEAction (LPINPUTCONTEXT imc, LPINPUTCONTEXT_PRIVATE imcP, int action) { switch (action) { case START_COMPOSITION: { updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_STARTCOMPOSITION, 0, 0); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS); break; } case SHOW_SOFT_KEYBOARD: { if (imcP->softKbdWindow) { debugMessage (DBG_UI, L"showing soft keyboard, window=%d", imcP->softKbdWindow); ShowWindow (imcP->softKbdWindow, SW_SHOWNOACTIVATE); RedrawWindow (imcP->softKbdWindow, 0, 0, RDW_INVALIDATE); } updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS); break; } case CONTINUE_COMPOSITION: { updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS); break; } case INSERT_RESULT_AND_FINISH_COMPOSITION: { ShowWindow (imcP->softKbdWindow, SW_HIDE); updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS| GCS_RESULTSTR); addMessage (imc, WM_IME_ENDCOMPOSITION, 0, 0); break; } case INSERT_RESULT_AND_RESTART_COMPOSITION: { // ShowWindow (imcP->softKbdWindow, SW_HIDE); updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS| GCS_RESULTSTR); addMessage (imc, WM_IME_ENDCOMPOSITION, 0, 0); addMessage (imc, WM_IME_STARTCOMPOSITION, 0, 0); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS); break; } case ABORT_COMPOSITION: { ShowWindow (imcP->softKbdWindow, SW_HIDE); updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS); addMessage (imc, WM_IME_ENDCOMPOSITION, 0, 0); break; } case REJECT_KEYSTROKE: { MessageBeep (MB_ICONEXCLAMATION); break; } case SHOW_CANDIDATES: { updateCompositionAndResultString (imc, imcP); addMessage (imc, WM_IME_COMPOSITION, 0, GCS_FLAGS); addMessage (imc, WM_IME_NOTIFY, IMN_OPENCANDIDATE, 0x1); break; }} } // II/56 PUBLIC UINT WINAPI ImeToAsciiEx (UINT virtualKey, UINT scanCode, CONST LPBYTE keyState, LPTRANSMSGLIST transMsgList, UINT state, HIMC imcHandle) { TCHAR keyChar [4]; int nChars; int ch2; LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; int action; int modifiers = 0; debugMessage (DBG_MISC, L"ImeToAsciiEx"); if (!imcHandle) { return FALSE; } imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (!imc) { return FALSE; } imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (!imcP) { ImmUnlockIMC (imcHandle); return FALSE; } nChars = ToUnicode (virtualKey, scanCode, keyState, keyChar, sizeof (keyChar) / sizeof (TCHAR), 0); ch2 = MapVirtualKey (virtualKey & 0xFFFF, 2); debugMessage (DBG_INPUT, L" scanCode=0x%x virt=0x%x state=0x%x", scanCode, virtualKey, keyState [virtualKey & 0xff]); debugMessage (DBG_INPUT, L" S=0x%x C=0x%x lS=0x%x lC=0x%x rS=0x%x rC=0x%x", keyState [VK_SHIFT], keyState [VK_CONTROL], keyState [VK_LSHIFT], keyState [VK_LCONTROL], keyState [VK_RSHIFT], keyState [VK_RCONTROL]); debugMessage (DBG_INPUT, L" map=0x%x nchars=%d, keyChar=(0x%x) '%c'", ch2, nChars, keyChar [0], keyChar [0]); if (imcP->state.prefixLength >= 2) { if (imcP->state.prefixString [0] == 'J') { if (! ( (scanCode & 0x8000) == 0 // key down only || virtualKey == VK_SHIFT || virtualKey == VK_CONTROL)) { goto done; }} else { if (nChars == 0) { goto done; }}} else { if (nChars == 0) { goto done; }} addMessage (imc, WM_IME_NOTIFY, IMN_CLOSECANDIDATE, 0x1); if (keyState [VK_SHIFT] & 0x80) { modifiers |= MODIFIER_SHIFT; } if ((virtualKey & 0xFFFF) == 0x000D) { // shift-enter -> 0x0a keyChar [0] = 0x0d; } // when we get a VK_SHIFT, pretend it's the left shift key if (VK_SHIFT == virtualKey) { scanCode = 0x2a; } action = handleKeyStroke (&imcP->state, scanCode, keyChar [0], modifiers); actOnIMEAction (imc, imcP, action); debugMessage (DBG_CORE, L" -> compLength=%d", imcP->state.compositionLength); done: ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); ImmGenerateMessage (imcHandle); return 0; } PUBLIC BOOL WINAPI NotifyIME (HIMC imcHandle, DWORD action, DWORD index, DWORD value); PUBLIC BOOL WINAPI ImeDestroy (UINT uReserved); PUBLIC BOOL WINAPI ImeProcessKey (HIMC imcHandle, UINT virtualKey, LPARAM lparam, CONST LPBYTE keyState); PUBLIC UINT WINAPI ImeToAsciiEx (UINT virtualKey, UINT scanCode, CONST LPBYTE keyState, LPTRANSMSGLIST transMsgList, UINT state, HIMC imcHandle);

4.7.9. Others

Apparently, the IMM checks that all the IME instance members are present, i.e. exported by the DLL. Here are the one we do not implements:

// II/45 LRESULT WINAPI ImeEscape (HIMC imcHandle , UINT escape, LPVOID data) { debugMessage (DBG_MISC, L"ImeEscape 0x%x", escape); return 0; } // II/42 DWORD WINAPI ImeConversionList (HIMC imcHandle, LPCTSTR src, LPCANDIDATELIST dst, DWORD bufLen, UINT flag) { debugMessage (DBG_MISC, L"ImeConversionList"); return 0; } // II/49 BOOL WINAPI ImeSetActiveContext (HIMC incHandle, BOOL flag) { debugMessage (DBG_MISC, L"ImeSetActiveContext flag=0x%x", flag); return FALSE; } // II/54 BOOL WINAPI ImeSelect (HIMC imcHandle, BOOL select) { debugMessage (DBG_MISC, L"ImeSelect select=0x%x", select); return FALSE; } // II/54 BOOL WINAPI ImeSetCompositionString (HIMC imcHandle, DWORD index, LPVOID comp, DWORD compLen, LPVOID read, DWORD readLen) { debugMessage (DBG_MISC, L"ImeSetCompositionString"); return FALSE; } LRESULT WINAPI ImeEscape (HIMC imcHandle , UINT escape, LPVOID data); DWORD WINAPI ImeConversionList (HIMC imcHandle, LPCTSTR src, LPCANDIDATELIST dst, DWORD bufLen, UINT flag); BOOL WINAPI ImeSetActiveContext (HIMC incHandle, BOOL flag); BOOL WINAPI ImeSelect (HIMC imcHandle, BOOL select); BOOL WINAPI ImeSetCompositionString (HIMC imcHandle, DWORD index, LPVOID comp, DWORD compLen, LPVOID read, DWORD readLen);

4.8. Loading the names database

Here are the globals in which we store our store:

unsigned char *nodeStore; int nodeStoreLength; unsigned char *keyStore; int keyStoreLength; unsigned char *keyboardsStore; int keyboardsStoreLength; int maxNameLength; int maxCandidatesLength;

The store is loaded when the DLL is loaded:

{ unsigned char *nameStore; HRSRC hResource; HGLOBAL hGlobal; hResource = FindResource (inst, MAKEINTRESOURCE (NAMES_RESOURCE), RT_RCDATA); if (hResource == 0) { debugLastError (GetLastError (), L"FindResource failed"); return FALSE; } hGlobal = LoadResource (inst, hResource); if (hGlobal == 0) { debugLastError (GetLastError (), L"LoadResource failed"); return FALSE; } nameStore = (unsigned char *)GlobalLock (hGlobal); if (nameStore == 0) { debugLastError (GetLastError (), L"Can't lock resource"); return FALSE; } { int tocOffset; int *toc; int nodeOffset; int keyOffset; int keyboardsOffset; int major; int minor; major = ((int *) (nameStore)) [0]; minor = ((int *) (nameStore)) [1]; tocOffset = ((int *) (nameStore)) [2]; maxNameLength = ((int *) (nameStore)) [3]; maxCandidatesLength = ((int *) (nameStore)) [4]; debugMessage (DBG_CORE, L"major=0x%x, minor=0x%x, tocOffset = 0x%x", major, minor, tocOffset); debugMessage (DBG_CORE, L"maxName=%d, maxCandidates=%d", maxNameLength, maxCandidatesLength); toc = (int *) (nameStore + tocOffset); nodeOffset = toc [0]; nodeStoreLength = toc [1]; keyOffset = toc [2]; keyStoreLength = toc [3]; keyboardsOffset = toc [4]; keyboardsStoreLength = toc [5]; debugMessage (DBG_CORE, L"nodeOffset = 0x%x, keyOffset = 0x%x", nodeOffset, keyOffset); debugMessage (DBG_CORE, L"nodelength = %d, keylength = %d", nodeStoreLength, keyStoreLength); nodeStore = nameStore + nodeOffset; keyStore = nameStore + keyOffset; keyboardsStore = nameStore + keyboardsOffset; debugMessage (DBG_CORE, L"nodeStore = %x, %x, %x", nodeStore [0], nodeStore [1], nodeStore [2]); { WCHAR buffer [200]; int gc; findCharacterName (0x20ac, buffer, &gc); debugMessage (DBG_CORE, L"20AC is '%s', gc = %d", buffer, gc);} }}

4.9. The taskbar menu

Our IME offers a few configuration parameters. Those are available via the configuration panel (itself available via right click on the system tray pen icon or the Regional Options control panel (when the corresponding input language is selected, "IME Settings" button), but we'd like to offer a faster access. We do that via the left click menu on the system tray pen icon.

First, we need a few constants to identify our menus:

#define MENU_INSERT 1 #define MENU_INSERT_CHAR 10 #define MENU_INSERT_CHAR_USV_NAME 11 #define MENU_COMPPOSITION 3 #define MENU_COMPPOSITION_NEAR 31 #define MENU_COMPPOSITION_FIXED 32 #define MENU_DEBUG 2 #define MENU_DEBUG_ON 20 #define MENU_DEBUG_OFF 21

Second, we need to describe our menus to IMM. This is done via the ImeGetImeMenuItems method:

PUBLIC DWORD WINAPI ImeGetImeMenuItems (HIMC imcHandle, DWORD flags, DWORD type, LPIMEMENUITEMINFO parentMenu, LPIMEMENUITEMINFO menu, DWORD size) { LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; int insertMode = 0; int compPositionMode = 0; int itemCount = 0; debugMessage (DBG_MISC, L"ImeGetImeMenuItems"); imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (!imc) { return 0; } imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (!imcP) { ImmUnlockIMC (imcHandle); return 0; } //insertMode = imcP->insertMode; //compPositionMode = imcP->compPositionMode; ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); if (! parentMenu) { if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = IMFT_SUBMENU; menu->fState = 0; menu->wID = MENU_INSERT; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"insert..."); menu->hbmpItem = 0; menu++; } itemCount ++; if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = IMFT_SUBMENU; menu->fState = 0; menu->wID = MENU_COMPPOSITION; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"position..."); menu->hbmpItem = 0; menu++; } itemCount++; if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = IMFT_SUBMENU; menu->fState = 0; menu->wID = MENU_DEBUG; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"debug..."); menu->hbmpItem = 0; } itemCount++; } else if (parentMenu->wID == MENU_INSERT) { if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = 0; menu->fState = (insertMode == INSERT_CHAR ? IMFS_CHECKED : 0); menu->wID = MENU_INSERT_CHAR; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"character"); menu->hbmpItem = 0; menu++; } itemCount++; if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = 0; menu->fState = (insertMode == INSERT_CHAR_USV_NAME ? IMFS_CHECKED : 0); menu->wID = MENU_INSERT_CHAR_USV_NAME; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"character USV NAME"); menu->hbmpItem = 0; } itemCount++; } else if (parentMenu->wID == MENU_COMPPOSITION) { if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = 0; menu->fState = (compPositionMode == COMPPOSITION_NEAR ? IMFS_CHECKED : 0); menu->wID = MENU_COMPPOSITION_NEAR; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"near"); menu->hbmpItem = 0; menu++; } itemCount++; if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = 0; menu->fState = (compPositionMode == COMPPOSITION_FIXED ? IMFS_CHECKED : 0); menu->wID = MENU_COMPPOSITION_FIXED; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"fixed"); menu->hbmpItem = 0; } itemCount++; } else if (parentMenu->wID == MENU_DEBUG) { if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = 0; menu->fState = (debugMode ? IMFS_CHECKED : 0); menu->wID = MENU_DEBUG_ON; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"on"); menu->hbmpItem = 0; menu++; } itemCount++; if (menu) { menu->cbSize = sizeof (IMEMENUITEMINFO); menu->fType = 0; menu->fState = ( !debugMode ? IMFS_CHECKED : 0); menu->wID = MENU_DEBUG_OFF; menu->hbmpChecked = 0; menu->hbmpUnchecked = 0; lstrcpy (menu->szString, L"off"); menu->hbmpItem = 0; } itemCount++; } return itemCount; } PUBLIC DWORD WINAPI ImeGetImeMenuItems (HIMC imcHandle, DWORD flags, DWORD type, LPIMEMENUITEMINFO parentMenu, LPIMEMENUITEMINFO menu, DWORD size);

Finally, we need to answer to messages sent to notifications messages:

{ LPINPUTCONTEXT imc; LPINPUTCONTEXT_PRIVATE imcP; imc = (LPINPUTCONTEXT) ImmLockIMC (imcHandle); if (!imc) { break; } imcP = (LPINPUTCONTEXT_PRIVATE) ImmLockIMCC (imc->hPrivate); if (!imcP) { ImmUnlockIMC (imcHandle); break; } switch (index) { case MENU_INSERT_CHAR: { //imcP->insertMode = INSERT_CHAR; break; } case MENU_INSERT_CHAR_USV_NAME: { //imcP->insertMode = INSERT_CHAR_USV_NAME; break; } case MENU_COMPPOSITION_NEAR: { //imcP->compPositionMode = COMPPOSITION_NEAR; break; } case MENU_COMPPOSITION_FIXED: { //imcP->compPositionMode = COMPPOSITION_FIXED; break; } case MENU_DEBUG_ON: { debugMode = TRUE; break; } case MENU_DEBUG_OFF: { debugMode = FALSE; break;}} ImmUnlockIMCC (imc->hPrivate); ImmUnlockIMC (imcHandle); }

4.10. DLL setup

Here are the names of our window classes:

PRIVATE WCHAR compClassName[] = L"UniIMEcomp"; PRIVATE WCHAR candClassName[] = L"UniIMEcand"; PRIVATE WCHAR softKbdClassName[] = L"UniIMEsoftkbd";

4.11. Putting it together

#define PUBLIC #define PRIVATE static #include <windows.h> #include <winuser.h> #include <winerror.h> #include <commdlg.h> #include <immdev.h> #include <string.h> #include <stdio.h> #include <stdarg.h> /*#include <psapi.h>*/ #include "uniime.h" #define INSERT_CHAR 0 #define INSERT_CHAR_USV_NAME 1 #define INSERT_CHAR_PAREN_USV_NAME_PAREN 2 #define COMPPOSITION_NEAR 0 #define COMPPOSITION_FIXED 1 ime.defines: 1, 2, 3, 4, 5, 6 ime.type.declarations: 1, 2, 3 ime.globals: 1, 2, 3, 4, 5 ime.functions.declarations: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23 ime.functions: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52

4.12. The def file

Here are the export definitions for our DLL:

LIBRARY uniime EXPORTS DllEntry ImeInquire ImeConversionList ImeEscape ImeSetActiveContext ImeSelect ImeSetCompositionString ImeRegisterWord ImeUnregisterWord ImeGetRegisterWordStyle ImeEnumRegisterWord ImeConfigure ImeDestroy ImeProcessKey ImeToAsciiEx ImeGetImeMenuItems NotifyIME

4.13. The resource files

#define ICON_RESOURCE 0x100 #define NAMES_RESOURCE 0x200 /**/

We currently do not have any resources, but the build script of the DDK requires a resource file.

#include <winuser.h> #include <winver.h> #include "uniime.h" #define VER_FILEVERSION 3,0,0,0 #define VER_PRODUCTVERSION 6,1,0,0 #define VER_FILEFLAGSMASK VS_FFI_FILEFLAGSMASK #define VER_FILEFLAGS (VS_FF_DEBUG) #define VER_FILEOS VOS_DOS_WINDOWS32 #define VER_FILETYPE VFT_DRV #define VER_FILESUBTYPE VFT2_DRV_INPUTMETHOD #define VER_LANGNEUTRAL #define VER_COMPANYNAME_STR "Eric Muller" #define VER_FILEDESCRIPTION_STR "UniIME 6.1.0" #define VER_FILEVERSION_STR "6.1.0\0" #define VER_INTERNALNAME_STR "uniime" #define VER_LEGALCOPYRIGHT_STR "CC0" #define VER_ORIGINALFILENAME_STR "uniime.ime" #define VER_PROUCTNAME_STR "uniime" #define VER_PRODUCTVERSION_STR "6.1.0\0" #include "common.ver" ICON_RESOURCE ICON "../uniime.ico" NAMES_RESOURCE RCDATA BEGIN #include "names.rc" END

4.14. The sources file

The DDK build procedure uses the 'sources' file for controlling the build. It is fairly straightforward:

TARGETNAME=uniime TARGETEXT=ime TARGETPATH=obj TARGETTYPE=DYNLINK TARGETLIBS=$(DDK_LIB_PATH)\user32.lib \ $(DDK_LIB_PATH)\advapi32.lib \ $(DDK_LIB_PATH)\kernel32.lib \ $(DDK_LIB_PATH)\GDI32.LIB \ $(DDK_LIB_PATH)\IMM32.LIB \ $(DDK_LIB_PATH)\HTMLHELP.LIB \ $(DDK_LIB_PATH)\COMDLG32.LIB \ $(DDK_LIB_PATH)\psapi.lib DLLBASE=0x73100000 DLLENTRY=DllEntry USE_CRTDLL=1 C_DEFINES=-DBUILDDLL -DUNICODE INCLUDES=.;$(BASEDIR)\src\ime\inc SOURCES=\ uniime.c \ uniime.rc !INCLUDE $(NTMAKEENV)\makefile.def

4.15. Registry entries

For the IME to be accessible via the Regional Option control panel, under the Input Locales tab, it must be described in the registry, under the key HLM\SYSTEM\CurrentControlSet\Control\Keyboard Layouts. This key contains one subkey per keyboard layout and one subkey per IME. (It seems that there should also be subkeys for other input systems, such as text to speech systems). The name of that subkey is a 32 bit hexadecimal number and it top nibble is 0 for a keyboard layout and E for an input method. The bottom four nibbles form the language under which the IME is accessible, e.g. 040C for French. Finally, middle three nibbles serve to distinguish the various IMEs for a given language. Thus, to have our IME accessible under French, we can use any of the subkeys E000040C, E001040C, etc. I have not been able to find a way to register an IME for all languages.

Inside the subkey, there must be three entries of type REG_SZ:

  1. 'IME file' points to the IME dll; this path is relative to WINNT/System32. In our case, we just use uniime.ime.
  2. 'layout file' points to the keyboard layout file to use with the IME, also relative to WINNT/System32. In our case, we just use the US layout, kbdus.dll.
  3. 'layout text' is the name under with the IME will appear in the Regional Options control panel.