Object navigation: why is it okay for VoiceOver but not NVDA?

Since VoiceOver came to Mac OS X, many screen reader users have joyfully sung its praise - and there's certainly a lot to praise. Some who were previously Windows users have taken the plunge and moved to Mac OS X completely, never looking back. Others use both operating systems or perhaps dream of no longer being dependent on Windows for certain software. This is all fantastic, but as an NVDA developer, there are also aspects of this that frustrate and exasperate me.

Right from the start, NVDA has used "object navigation" in order to review the user interface, particularly to provide access to elements which can't be otherwise accessed using the keyboard. In short, every element in the user interface is generally represented by an object. These objects exist in a hierarchy or tree of objects. Object navigation allows the user to explore this hierarchy by moving between objects and then descending/entering/interacting with objects of interest. For example, entering a list would allow you to see its list items. This is in contrast to the "flat review" (or "screen review") method that Windows screen readers have traditionally used for review, whereby the user can review the content of the screen in a flat fashion from top to bottom, left to right, similar to the way a simple text document would be read. While this might seem more logical at first, it's worth remembering that a sighted user doesn't necessarily read the screen in this ordered fashion. Instead, they are more likely to focus on specific elements of interest, which in some ways is more akin to object navigation. As always, both methods have advantages and disadvantages.

On the Mac, there is only object navigation; there is no concept of flat review of the entire screen using the keyboard. (In the latest version of Mac OS X, you can explore the screen in this fashion using the trackpad. However, NVDA's tracking and reading of text under the mouse allows you to do something similar.) It seems that Mac users are quite happy with this approach, and yet we are constantly bombarded by NVDA users with complaints about the difficulty of object navigation and requests for flat review functionality in NVDA.

So, here's my question for those of you who are either full or partial Mac converts. Why is object navigation quite acceptable on the Mac, but yet not acceptable in Windows? It sometimes seems to me that some of the same users who frequently sing the praises of VoiceOver then turn and complain about the lack of flat review in NVDA. Perhaps I'm wrong and those users who want flat review would not be comfortable using VoiceOver. Even so, it sometimes seems that people are willing to accept a different approach on a completely new platform, yet are unwilling to accept it in a newer product on an existing platform. It's certainly true that NVDA's object navigation needs to be cleaned up a bit (removing extraneous objects, etc.), but I don't think this is the whole story.

ON the web, this gets even more interesting. Due to the non-linear fashion of web pages, Windows screen readers have had to provide their own "flat" representation of web pages. They then override the cursor and other keys to navigate within that representation. Aside from the technical issues associated with this, working with interactive controls on web forms requires a separate mode of interaction (named forms mode, focus mode, etc.) where the screen reader lets the user interact directly with the control by allowing cursor and other keys to pass straight through to the control. (Some screen readers can automatically switch to this mode when appropriate, but there are still two modes.) This is becoming more of a challenge with the ever growing number of web applications, where more keys are required to work with the application. Object navigation solves this problem because the cursor and other application keys are never overridden by the screen reader, so the screen reader doesn't interfere with the functionality of web applications.

Again, VoiceOver uses object navigation on the web and VoiceOver users appear to be quite happy with this. NVDA currently does what other Windows screen readers do, but again, would you be happy if we abandoned this approach in favour of object navigation? It would certainly solve this web application problem once and for all. I get the impression that NVDA users would be unhappy if we changed this.

I want to hear your thoughts. Comment here, Twitter or send me an email.