Ideal OS Part II: The User Interface

In the future touch interfaces will take over most computing tasks but 10% of people will still need ‘full general purpose computers’. We can’t let the interface stagnate. This white paper represents a decade of my thinking on what is wrong with desktop style operating systems (WIMP) style and proposed solutions. PCs are not obsolete. They just need improvements to become ‘workstations’ again.

Last time I gave you an overview of what an Operating System would look like if we took away all the bad parts, leaving not too much left, and start building replacements. But what would these new parts look like? How would you start programs and manage windows? Without a filesystem how would the desktop folders work? For the answers to these questions and just so much more, keep reading.

Desktop Folders

Since the filesystem now becomes a database, finding your files would be done through queries. Creating a folder is essentially creating a new saved query. A list of all audio files marked as being songs. A list of all code files marked as being part of a particular project. Folder contents can be readonly queries based only on the attributes of documents (like the list of songs in an album) or they could be adhoc where the user drags files into the folder. In this case a file receives a tag referring to that folder, thus simulating the old kind of folder. However, unlike traditional directories, however, a file can be in any number of folders at once.

Command line

The new OS should have a command line. Part of the magic of Unix is being able to pipe simple commands together at the shell. We still need that, but the pipes would carry streams of simple objects instead of bytes. How much better would the ImageMagick operators be if they could stream proper metadata? Building new commands that talk with the old ones would be trivial.

With a single command line you could do complex operations like: find all photos taken in the last four years within 50 miles of Yosemite, and that have a star rating of 3 or higher, resize them to be 1000px on the longest size, then upload them to a new Flickr album called “Best of Yosemite”, and link to the album on Facebook. This could all be done with built in tools; no custom coding required. Just combining a few primitives on the command line.

Of course a traditional command line is still difficult to use for novice users. Even with training you need to memorize a lot of commands. A better solution is a hybrid. In the short (default) form you can chain commands with pipes using auto-completion to help remember commands and arguments. In the expanded form you can visually chain commands together with a visual drawing-like tool. This would be similar to OSX's Automator. Switching between the modes is always possible.

Windows

Windows are still a good thing. Sometimes you need to resize them to see multiple things at once. However, they could be a lot more powerful and flexible than they are today.

First, every window should be a tab. Every window. Snap any window to any other window, just like Chrome tabs. Who cares if the tabs aren’t from the same application. We don’t have applications anymore anyways. If the user wants to snap a todo list window onto an email window, let them. User desires trump ancient technical architecture.

Second, windows should be pluggable. We’ve covered how an email app is really multiple pieces, including an inbox view and a message view. Sometimes you might want those views connected, as with a traditional email client. Other times you might want them separate. This should be as trivial as snapping them together or dragging them apart.

For more complex layouts the system should have patterns like ‘master view’, and ‘vertical accordion’. The user can pull out a new empty pattern then drop the views where they like. We already do similar things in Wordpress and other web editors. Let’s make it universal.

At first this might seem to come with some challenges. What if the user accidentally creates multiple inbox views? Well, so what? If I want an inbox on each screen of my computer, I can do that. Maybe I want an inbox that just shows work email on my first screen, and personal email on my second. Maybe I want an inbox that just shows emails around a particular project, or from a particular person. These are all just different database queries so why not? If I want to setup my windows that way I should be able to do so. The computer must adapt to how the human works, not the other way around.

The Window Manager

Now the window manager itself. A WM has a bunch of duties. It must render windows (obviously). It must let you move and resize them. It must handle notifications. It must manage the graphics card. It (usually) implements transparency and special effects. It shows dialog boxes. It starts and stops applications. There really are a lot of things required of a modern window manager.

Because of this complexity, many operating systems divide this task up in various ways. Some move app launching to a separate launcher system but still close apps with the window manager. Some split the drawing of windows from the moving. Some put notifications in a separate process. All of these are good ideas, but they don’t take things far enough. For the IdealOS we should explicitly chop the window manager into different pieces. When I say explicitly I mean actual separate processes that communicate with fully documented APIs. Documented means hackable, and hackable means we can start extending the desktop in interesting ways.

Proposed Window Manager Architecture

Compositor: Individual applications draw to an offscreen buffer or directly to a texture in the OpenGL context. The compositor draws these buffers and textures to the real screen. Since only the compositor has access to the real screen, only the compositor can do interesting effects like MacOSX’s Expose system. For hackability, the compositor must expose a (protected) API to manipulate windows and apply shader effects.

Window Manager: This draws the actual window controls and handles moving and resizing. it should be easily swappable to support theming and playing around with interaction ideas. The window manager is also in charge of deciding where new windows go when they are created. We’ll get back to this in a second.

Launcher: This is an interface for launching apps. It is a separate program but it still starts apps using the app service. It has no extra privileges. Any other program could launch an app just the same. This means we have multiple launchers at the same time. ex: a big dock bar and global search field (like the new spotlight system on OS X Yosemite).

Notification Manager: The actual processing of notifications can happen in a separate notification manager, but creating the notifications on screen should happen in the window manager because it’s the thing which decides where windows go and when. We’ll cover the details of notifications it a moment.

The Fun Begins

Window Placement

What interesting things can we do with our new system? The first thing is to let the window manager be smarter about placing windows. xmonad has some good ideas about tiling windows. When you are doing a lot of work it’s common to want multiple windows at once laid out without overlapping. Sometimes you might want a grid. Other times two columns. Those can be just a keypress away with an xmonad style window manager.

The window manager should be smarter about placing dialogs. Ideally we wouldn’t have dialogs at all, but they are sometimes needed. The WM should be smart about placing them so they don't obscure the content.

Current OSes have three strategies for window placement: attaching dialogs to the apps which opened them (save/open dialogs), center the dialog on the screen, or to simply not use dialogs. (90% of the iPad solution). A few WMs will take into account the available empty space on screen, but this is still very primitive. They consider the size of the new window but not it’s content.

This paper by Ishak & Feiner called Content Aware Layout has some great ideas. If the window manage considers the contents of windows then it can be smart about placing new ones. If a background window has large blank spaces, then use that area for the new window, possibly with some transparency.

When searching your operating system for a particular word you can search the contents of windows too. The window manager can zoom in just those windows. Even better, it can show just a subset of each window: the part containing the found word. Windows are just bitmaps. We can slice and dice them to do all sort of cool things.

Here’s another example: When copying text from one document to another the WM could help the user maintain their mental state by showing the windows involved. Move to window A. Select and copy some text. Now move to window B. The WM knows you are in the middle of a copy and paste action, so it can shrink but not hide Window A. That way you are always aware of where your clipboard content came from. The WM can also show you the current clipboard contents in a floating window.

This brings us to another horrible pain point of desktop operating systems: the clipboard.

Copy, Paste, and the Clipboard

Why should the humans have to remember what is stored in a hidden data structure called the clipboard (or the pasteboard for you old school mac-enzies). We should make it visible and relieve the human of this burden. This visible clipboard could also show previous copied contents. The clipboard should be a persistent infinitely long data structure, not just a single slot.

Did you copy something the other day but can’t remember what or where? Just look in the clipboard’s history. Copying multiple things at once becomes trivial. Grab content from four different sources then paste them together into a new document. The clipboard isn’t a hidden box that holds only one chunk of data. Now it becomes a shelf or tray that holds the many things you are working with right now. Pick up what you need then place it all in the final destination. Gather and place, not copy and paste.

Working Sets

Finally, the window manager should implement working sets. A working set is a set of documents, resources, applications, or whatever that the human is currently using to do something.

In the ideal world, when faced with a task like make "a presentation on Disneyland", the human would search through the library for some books, find some photos on the web, read a few articles, then distill all of this into a single document. When done, the human puts everything away, prints out the final document, and moves on to the next task. Very neat and orderly.

Of course, in the real world that doesn’t happen. We build up a collection of notes over a few months. Probably a stack of books related to the problem. Over a few days we read the books and articles, collects the quotes and images, then put it on hold as other projects come up. You might be in the middle of writing when a phone call comes in, or a screaming child needs lunch, then finally come back to your office wondering what you were in the middle of.

When the project is finally over the books and notes hang around the office until the annual cleanup. Real world work is messy and full of constant interruptions. Our tools should accept this reality and help, not hinder it.

A good window manger would let you group windows by topic and help you focus on a single task when you need to focus. One way to do this is by having multiple virtual screens where each screen is dedicated to a particular project. The screen can contain not just windows of active documents, but also all of the research files collected for that project. It will contain all of the emails and chat windows related to that project and no others.

A project screen is really a topic specific slice of everything on your computer across all applications and data types. Furthermore, such a screen can be saved and reloaded later; possibly months later. Remembering where you were in that open source project after a 6 week hiatus will be easy. Just load up the workspace and everything related to the project, even emails and github issues, will come up in a single screen.

This paper by Keith Edwards has some great ideas on the topic.

Special Effects

By giving the window manager full control of the screen, combined with a good API, we can do amazing things on a modern GPU that were infeasible just a decade ago. After all, a window is simply a bitmap in GPU RAM. It can be manipulated and distorted like any other texture. We could make an area of the screen a black hole with all windows stretched as they approach it. We could render a window with icicles or dust on it to indicate how long it has been since the user interacted with it.

Snowflakes and other particle effects are trivial to implement on the GPU but the real power will be in manipulating windows automatically for the human. A window that would be partly off the screen could be distorted hyperbolically instead. While the text would be squished it would still be readable enough for the user to get the gist of it. When the user wants that window they just click on it and it stretches back to normal size.

How about window zooming? In a web browser I can zoom any page with + or - buttons. Any webbased app can do the same. I often have multiple sizes of text at once in different browser windows. What if this wasn’t restricted to just web pages? Any app should be able to respond to a zoom event to increase it’s base font size. If all layout and windows are derived from the base font (as they should be) then the app will zoom just like a webpage. for apps which don’t support zooming for whatever reason, the texture itself could be zoomed. While this would result in some blurriness, modern GPUs do a very good job of smoothing zoomed textures, and it won't be an issue at all with the new HiDPI screens just arriving on the market.

Distorting Input

You may have seen effects like those I've described on Linux desktops using Compviz, a compositing window manager for X windows. The effects look cool but they are largely useless because of a major flaw in the X windows design. The window manager can control the output of windows - the actual bitmaps - but it cannot control the input. No matter how the windows are distorted the apps themselves will still receive input events normally. This means clicking on what appears to be a scaled button may instead send the mouse event to another part of the window. To fix this problem both input and output must go through the window manager so that it can keep them in sync.

The sad thing is: these problems with identified and solved decades ago. This research paper I helped with as my senior project in 1997 talks about the problem and solution. Window modification must apply to both input and output.

Could we really build this?

We have to write everything from scratch. By not being backward compatible with anything we can’t reuse existing programs. I think that’s okay, actually. iOS was built with all new apps too. Existing code doesn’t matter as much as we think. It’s the ideas and protocols that matter. That’s what we get to reuse.

Versioning

How do we version modules? If your editor experience is actually the combination of 10 different modules working together, how are they upgraded? Do we have a fixed API between them that never changes? Could one module upgrade break the rest? Have we reinvented class path hell? This new OS design doesn’t fix these issues but it does make them explicit. We already have these problems today, but they are solved in adhoc, inconsistent ways. The new OS would make dependencies in the system explicit; forcing us to deal with them. I expect we’d end up with a system like Firefox where you have different channels to get the modules from depending on the amount of risk you are comfortable with. Probably with NPM like semantic versioning.

So could we really build this? Yes, I think we could. However, rather than trying to reinvent absolutely everything we should start with a bare Linux system; similar to Android but without the Java stuff. Then add a good lightweight document database (CouchDB?) and a hardware accelerated scene graph (Amino?). Then we need an object stream oriented programming system. I’d suggest Smalltalk or Self, but NodeJS is more widely supported and has tons of modules. Perhaps ZeroMQ as the IPC mechanism.

The exact building blocks don’t matter very much as we will probably change them over time anyway. The important thing is that we build an OS with a cohesive vision and consistent metaphors. Let’s bring back the idea that the users and their data are the central parts of a computing environment, not applications and system plumbing. Let’s make machines for getting stuff done, not babysitting hardware. Let’s make work stations again.

Talk to me about it on Twitter

Posted January 10th, 2015

Tagged: idealos