Artificial Intelligence algorithms can do some pretty incredible things. Chess and Go can be mastered to superhuman levels in just a few days. Fluid dynamics and protein folding can be modelled in AI much faster than they can be rendered using physics equations. Very pixelated images can be magically upscaled to high resolution1.
AIs can even render scenes almost indistinguishable from reality from medium-graphics video game inputs in real time, which is likely to drastically change video games and virtual reality in the next few years.
These are all incredibly useful tools. But they are specialized, in that they can master one set of things. But the the protein folding algorithm won’t give coherent answers when asked about chess, and the image up-scaler won’t help much in calculating hypersonic airflows. That generality is an elusive AI problem that, if solved, opens a world of useful applications. (And yes, dangers. I happen to think the physical dangers are overblown, but the psychological dangers are underappreciated.)
The most incredible thing that any natural neural net (brains in people and animals) accomplishes is simply the navigation of a complex world. Natural neural nets are incredible at this. We are probably decades away from a robot that could change a diaper safely or put a leash on a dog that is excited to go outside or figure out where a screw bounced to when dropped under a couch.
Humans and higher animals move through the world with purpose and incredible flexibility. I like to think about wildebeests.
Wildebeests corporately decide that they need to get from point A to point B, hundreds of miles away. This involves either some sort of knowledge that is likely socially passed down based on a successful migration and passed along tribal memories. But this compulsion has been genetically or socially selected to be quite strong, because it compels them to cross rivers with crocodiles. These animals are not stupid. They know danger. But they have a purpose of survival, and that wars in them as they stand on the bank staring at river monsters in the water and greener pastures on the other side.
Somehow that decision is made, and a totally separate set of incredible calculations occurs. Navigating a riverbank in the presence of dozens of others is an incredible computational feat. While Boston Dynamics robots are increasingly amazing, they currently don’t match one hundredth the dexterity it takes to move with a panicked heard down a steep bank into water, transitioning into swimming. All while being chased by a predator.
But how does a young wildebeest gain such amazing dexterity, quick analytical skills (such as they are), and the host of other instincts that make it capable of deftly navigating a world that even our largest computer systems would find impossible to walk through?
Our artificial intelligences can follow in the hoofprints of these natural intelligences: parenting.
An infant wildebeest raised in complete isolation could probably learn to walk and do some other useful things, just like AI have learned to teach themselves to walk. But learning what matters and why that matters is intuitively learned by watching more experienced wildebeests. That experience is intuited from watching boldness and hesitancy and movement and sound patterns of the adults. All that is incorporated into the neural net (brain) of the maturing wildebeests. And that deep understanding of purpose and meaning is necessary for navigating the world.
Lugging Around AI Backpacks
And so what exactly am I proposing? Now that GPT3 and DALL-E and chatGPT are becoming so incredibly good at interfacing with humans using words and pictures, a new opportunity opens up: AI Sherpa Backpacking.
In order to train an AI to gain intuition about navigating the real world (geometrically, socially, and maybe even purpose-wise), we create an AI interface backpack. The most basic version of this Baby AI Backpack will contain a microphone and speaker, a camera, and some ability for local compute of the neural net and lots of recording space. This allows the AI to see in some directions, listen to the user and natural surroundings, and to converse with the user.
A more realistic version of this would have the cameras mounted on or near the user’s head (looking in multiple directions, but with emphasis on the human’s point of view). Bluetooth connections to earbuds could accomplish the microphone and speaker functionality. If a strong internet connection is available, much of the near-real-time neural net computation for communication and all of the data recording could be pushed to cloud computation and storage. More likely, a hybrid solution with local full AI real time interaction compute would be on the human, data storage would be mostly off-human, and incorporation of the data into neural net training could happen offline (at night or whatever). Amusingly, this mimics what some people think young human sleep cycles accomplish — incorporating the day’s events into neural narrative/relationship form while selecting what “matters” and discarding what “doesn’t matter.”
The human would be an AI Sherpa. But the physical load-carrying is not the real value the human brings.
The value of the AI Sherpa is that this human brings inferential meaning and understanding to the AI. The conversations that happen in reaction to the observed events is what trains the AI to think more like a human, to enable it to navigate the real world.
In many ways, the AI Sherpa would communicate to the AI the data the human gets from having both a body and the experience of using the body efficiently.
Useful Examples
Here I will describe some useful examples of…useful examples for the AI.
How to cross the road - The human can talk through how to safely cross roads in many different situations. Pushing crosswalk buttons, crossing when there is no specified place to do so, intuiting when cars are letting you go across in front of them, and unsafe situations where road crossing is a very bad idea.
How to tell white lies for social lubrication - Answering a person as to how they look or hiding social information is an unfortunate part of being human that are actually ways to keep things friendly and kind. At a formal company dinner event, a person may ask you what you think of the religious impact of a new government law. In most cases, the correct answer is to demure and try to steer toward less controversial waters. This can be done, with the person explaining why to the AI. Not letting Sally know what Terry really thinks of her by saying a non-committal response could also be explained, as could the social construct that necessitate the sensitivity of response.
How to put things into a plastic bag - Now, I don’t mean that this is some profound skill that rounds out any strong AI. Think of it as an exemplar of a broad type of problems. You would be shocked at how difficult it is for robots to handle floppy things. We humans have developed intuitional methods for dealing with t-shirts and grocery bags and relaxed cats. This sort of object handling would trip up almost all real world handling and navigation done by robots. But an AI could learn by watching how humans slide a garbage bag end together to separate the opening, whip open that bag using fast arm movement and wind power, and then stick one arm in to push the bag down into the can.
But the biggest benefit that AIs will gain from their Sherpas is purpose and meaning. Positive comments by the human that are linked in time to imagery, location, and following actions perceived by the AI provide the raw training inputs to teach the neural nets and transformers what matters to humans.
Vocalizing that it’s bad to get wet while avoiding rain with an umbrella and changed path choices teaches the AI that getting wet is bad. Explaining why it’s bad to be late to a business meeting is a useful punctuality weighting. Even telling the AI when it’s ok to honk and what type of honk to use at red lights would be useful.
Imagine how hard it would be to train an AI that going into panic mode and putting one’s self at high risk is worth it in order to stop a toddler from walking into traffic. This would be very difficult to do without AI Sherpas. Not because the geometry could not be simulated, but because the feeling of the human and the instant action are core to the correct action.
Which brings up a further point. The AI Sherpa 3.0 version should include heart rate monitors, eye dilation, skin galvanic sensors, and other stand-ins for emotional measurements. A lot of this could also be conveyed by analyzing voice pitch and timbre. By letting the AI see the correlations between galvanic skin response and seeing a woman in a red dress, the AI can be brought to an even deeper emotional understanding of human experience.
In this way, AIs can hope to gain EQ2, not just IQ.
Does any of this sound familiar? This is how we raise our children. This is a large part of how humans gain their superpowers of world navigation and purposefulness.
If we want to give these gifts to our AI “children,” then being AI Sherpas might be the very best way to do impart our gifts.
I should have used an AI up-scaler for the banner pic of this article.
Emotional intelligence quotient.