This presentation may be hazardous or unsettling

This talk may be a little challenging. We're going to see some things that tickle the brain. Some images that are a little bit disturbing Although it won't be really clear why they trouble us so We're going to talk about why And we're going to talk about things we do not talk about There's all sorts of things that pour downriver When you open locks like that I really want you to be here for it But if you can't be here for it Please: take care of yourself. If you can...
Hi. My name is Ashi. I'm a senior software engineer at Github, and that has nothing whatsoever to do with the talk you're about to hear. Today, I'm going to talk about a hobby. I'd like to share some things I've learned at the intersection of computational neuroscience and artificial intelligence. For years, I've been fascinated by how we think. How we perceive. How the machinery of our bodies result in qualitative experiences. Why our experiences are shaped like this and not that. Why we suffer. And for years, I've been fascinated by AI. Haven't we all? We're watching these machines begin to approximate the tasks of our cognition, in sometimes unsettling ways. Today, I want to share with you some of what I've learned. Some solid research. Some solid speculation. All of it speaks to a truth I have come to believe: we are computations our worlds created on an ancient computer, powerful beyond imagining. Let's begin.
Part one. Hallucinations.
cat dog person banana toaster
This person is... ...Miquel Perelló Nieto. And he has something to show us. It starts with simple patterns — splotches of light and dark Like images from the first eyes. These give way to lines and colors. And then curves, more complex shapes. We're diving through the layers of the Inception image classifier And it seems there are worlds in here. Shaded, multichromatic hatches. The crystalline farm fields of an alien world. Cells of plants. To understand where these visuals are coming from, let's look inside. The job of an image classifier is to reshape its input ...which is a square of pixels Into its output... ...a probability distribution. The probability that the image contains a cat. The probability of a dog, a banana, a toaster. It performs this reshaping through a series of convolutional filters. Convolutional filters are basically photoshop filters. Each neuron in a convolutional layer has a receptive field. This is a small patch of the previous layer from which it takes its input. Each convolutional layer applies a filter. Specifically, it applies an image kernel. A kernel is matrix of numbers, where each number represents the weight of the corresponding input neuron. Each pixel in each neuron's receptive field is multiplied by this weight, and then we sum them all to produce this neuron's value. The same filter is applied for every neuron across a layer. And the values in that filter are learned during training. We feed the classifier a labeled image —something where we know what's in it— it outputs predictions, we math to figure out how wrong that prediction was and then we math again, nudging each and every single filter in the direction that would have produced a better result the term for that is: gradient descent The deep dream process inverts this. This visualization is recursive. To compute the next frame, we feed the current frame into the network. We run it through the network's many layers until we activate the layer we're interested in Then we math: how could we adjust the input image to make this layer activate more? And we adjust the image in that direction. The term for that is: gradient ascent. Finally, we scale the image up very slightly before feeding it back into the network. This keeps the network from just enhancing the same patterns in the same places It also creates the wild zooming effect. Every 100 frames, we move to a deeper layer. Or a layer to the side. Inception has a lot of layers. And that gives us this. We started with these rudiments of light and shadow, And now we sortof have a city-of-Kodamas situation happening Then we enter spider observation area, in which spiders observe you. But it's okay, because soon the spiders become corgis. And the corgis become the 70s. Later, we will find a space of nearly-human eyes Which become dog slugs And dog slug birds Unfortunate saxophonist teleporter accidents And finally, the flesh zones, with a side of lizards. When I first saw this, I thought it looked like Donald Trump. And I resolved to never tell anyone until my best friend said the exact same thing. Says more about the state of our neural networks than this one. (I think it's the lizard juxtaposition.) But I do want you to notice and think about what it means that all this skin is so very white. So this is all... preeeety trippy. Why is that? What does it mean, for something to be trippy? To figure that out, let's take a look inside... ...ourselves. Meet Skully. Skully doesn't need all this cruft. We're just looking at Skully's visual system. Which starts here In the retina. Your retina, our retinas... are weird. Light comes into them, and immediately hits a membrane. Then there's a layer of ganglions, which are not generally photosensitive—though some of them are a little. There's a layer of some more stuff that does important things And then at the back of your retina are your photoreceptors: rods and cones. So light comes in winds its way through four layers of tissue and hits a photoreceptor That photoreceptor gets excited. It sends out a signal to its ganglions Which send it... where? Oh, right. To the optic nerve, routed right through the center of our eye. We mounted the sensor backwards And drilled a hole through the center It's okay. We can patch it up in software. There's a couple of other problems here too: One, our retinas have 120M luminance receptors—rods, and 6M color receptors—cones. There are about 10 times fewer ganglions. Two: Our optic nerve has about 10mbps of bandwidth. So we're trying to stream the video from this hundred-megapixel camera through a pipe that's slower than WiFi. Our retinas do what you might, if faced with such a problem: they compress the data. Each ganglion connects to a patch of 100 or so photoreceptor cells—its receptive field divided into an central disk and the surrounding region. In, and out. Center & surround. When there's no light on the entire field, the ganglion doesn't fire. When the whole field is illuminated, it fires weakly. When only the surround is illuminated, about half the ganglions fire rapidly And half don't fire at all. The other half of ganglions behave in exactly the opposite way: They fire wildly only when their center field is illuminated, and their surround is dark. Taken together, these ganglions constitute an edge detection filter. We are doing processing even in our eyeballs. This processing lets us downsample the signal from our photoreceptors a hundred times while retaining vitally important information: where the boundaries of objects are. Then the signal goes through the brain
It hits the optic chiasma, where the data streams from your left and right eyes cross, giving us 3d stereo vision It's processed by the Thalamus, which is responsible, amongst other things, for running our eyes' autofocus. Each step of this signal pathway is performing signal a little bit of processing, extracting a little something. And that's all before we get to... Ths visual cortex. All the way around here in the back. Our visual cortex is arranged into a stack of neuronal layers.
The signal stays relatively spatially oriented through the visual cortex So there's some slice of tissue in the back of the brain that's responsible for pulling faces out of this particular chunk of your visual field —with the redundancies and slop that we've come to expect from any NN, whether artificial or biological— Each neuron in a layer has a receptive field—some chunk of the entire visual field that it’s “looking at”. Neurons in a given layer respond the same way to signals within their receptive field. That operation, distributed over a whole layer of receptive fields, extracts features from the visual signal. first simple features, like lines, and curves, and edges, and then more complex ones like gradients and surfaces and objects, eyes, and faces. It’s no accident that we see the same behavior in Inception— convolutional neural networks were inspired by the structure of our visual cortex.
Our visual cortex is ofc different from Inception in... many ways Inception is a straight shot through—one pass, input to output. Visual cortex contains feedback loops—pyramidal neurons that connect deeper layers to earlier ones. These feedback loops allow the results of deeper layers to inform the behavior of earlier ones e.g. We might turn up the edge detection gain where there's an object Lets our visual system adapt and focus—not optically, but attentionally. Gives it the ability to ruminate on visual input, well before we become conciously aware of it, improving predictions over time. You know this feeling: thinking you see one thing, and then realizing it’s something else. These loopback pyramidal cells in our visual cortex are covered in serotonin receptors. Different kinds of pyramidal cells respond to serotonin differently, Generally, they find it exciting And don’t we all? You might be familiar with serotonin from its starring role as the target of typical antidepressants, which are serotonin reuptake inhibitors—when serotonin gets released into your brain, they make it stick around longer, thereby treating depression (some side effects may occur) Most serotonin is located in your gut, where it controls bowel movement. It signals to your gut that it’s got food in it and should go on and do what it does to food. What the molecule signals throughout your body: resource availability. And for animals with complex societies, like us, resources can be very abstract—social resources as well as energetic ones. That your pyramidal cells respond excitedly to serotonin suggests that we focus on that which we believe will nourish us. It’s not correct, as a blanket statement, to say that pyramidal cells are excited by serotonin. In fact, there are different kinds of serotonin receptors, and their binding produces different effects. 5-HT1A receptors tend to be inhibitory. 5-HT3 receptors in the brain brain are associated with a sensation of queasiness and anxiety. In the gut, they make it run… backwards. Anti-nausea drugs are frequently 5-HT3 antagonists. There’s another serotonin receptor, one that the pyramidal cells in your brain find particularly exciting.
This is the 5-HT2A receptor. The primary target for every known psychedelic drug. It is what enables our brains to create psychedelic experiences. So you go to a show and you eat a little piece of paper, and that piece of paper makes its way down into your stomach, where it dissolves, releasing molecules of LSD into your gut. LSD doesn’t bind to 5-HT3 receptors, so if you feel butterflies in your stomach, it’s likely just because you’re excited for what’s going to happen. What’s about to happen is: LSD will diffuse into your blood. LSD has no trouble crossing the blood brain barrier. It is tiny, but POWERFUL. Like you. It will diffuse deep into your brain, into your visual cortex, where it finds a pyramidal 5-HT2A receptor and locks into place. The LSD molecule stays bound for around 221 minutes(https://www.cell.com/cell/pdf/S0092-8674(16)31749-4.pdf). 4 hours That’s an astonishingly long time. They think a couple of proteins snap in and form a lid over top of the receptor, trapping the LSD inside. This would help explain why LSD is so very potent, with typical doses around a thousand times smaller than most drugs. And while it rattles around in there, our little LSD is stimulating a feedback loop in your visual cortex. It sends the signal: Pay attention What you’re looking may be nourishing The pattern finding machinery in your cortex to starts to run overtime, and at different rates. In one moment, the pattern in a tapestry seems to extend into the world beyond it; in the next, it is the trees that are growing and breathing, the perception of movement a visual hypothesis allowed to grow wild. With Deep Dream, we asked what would excite some layer of Inception, and then we adjusted the input image in that direction. There’s no comparable gradient ascent process in biological psychedelic experience. That’s because we aren’t looking at a source image—we’re looking at the output of the network. We are the output of the network. The output of your visual cortex is a signal carrying visual perceptions, —proto qualia— which will be integrated by other circuits to produce your next moment of experience. Inception never gets that far. We never even run it all the way to the classification stage—we never ask it what it sees in all this. But we could. We could perform the amplification process on a final result, rather than an intermediate one.
Maybe we ask, “what would it take for you to see this banana as a toaster?" Or, “Say, don't these skiiers look like a dog?" These are adversarial examples. Images tuned to give classifiers frank hallucinations. The confident belief that they're seeing something that just isn't there. They’re not completely wild, these robot delusions. I mean, that sticker really does look like a toaster. And it's so shiny. And these skiiers do kindof look like a dog if you squint. See? There’s the head, there’s the body… A person might look at this and—if they’re tired, far away, and drunk—think for a moment that it’s a big dog. But they probably wouldn’t conclude it’s a big dog. The recurrent properties of our visual cortex—not to mention much of the rest of our brain—means that our sense of the world is stateful. It’s a continually refined hypothesis, whose state is held by the state of our neurons. Lisa Sabour, preparing us for Capsule networks, writes, "A parse tree is carved out of a fixed multilayer neural network like a sculpture is carved from a rock” ([Sabour 2017](https://arxiv.org/pdf/1710.09829.pdf)). Our perceptions are a process of continuous refinement. This may point the way towards more robust recognition architectures. Recurrent convolutional networks that ruminate upon images, making better classifications, or providing a signal that something is off about an input. There are adversarial examples for the human visual system, after all, and we call them optical illusions. And they feel weird to look at.
In this image we can feel our sensory interpretation of the scene flipping between three alternatives a little box in front of a big one a box in a corner and a box missing one In this Munker illusion, there is something scintillating in the color of the dots—which are, of course, all the same. If we design CNNs with recurrence, they could exhibit such behavior as well Which maybe doesn't sound like such a good thing, on the face of it Let's make our image classifiers vascillating. Uncertain. But our ability to hem and haw and reconsider our own perceptions at many levels gives our perceptual system tremendous robustness. Paradoxically, being able to second-guess ourselves allows us greater confidence in our predictions. We are doing science in every moment, the cells of our brains continuously reconsidering and refining shifting hypotheses about the state of the world. This gives us the ability to adapt and operate within a pretty extreme range of conditions Even while tripping face. Or... ...while asleep.
Two. Dreams.
These are not real people. These are the photos of fake celebrities, dreamt up by a generative adversarial network. A pair of networks Which are particularly... creative. The networks get better through continuous, mutual refinement. It works like this: On the one side, we have the Creator. This is a deep learning network not unlike Inception, but trained to run in reverse. This network, we feed with noise. Literally, just a bunch of random numbers And it learns to generate images. But it has no way to learn how to play this game —In the technical parlance, it lacks a gradient— Without another network Without an opponent The Adversary. The Adversary is an image classifier, like Inception, but trained on only two classes: REAL and FAKE Its job is to distinguish the Creator's forgeries from true faces. We feed this network with ground truth with actual examples of celebrity faces. And the Adversary learns. And then, we use those results to train the Creator. If it makes a satisfying forgery, it's doing well. If its forgeries are detected, We backpropagate the failure, so it may learn. I should tell you that the technical terms for these networks are the Generator and Discriminator. I changed the names, because names are important, and also, meaningless. They don't change the structure of the training methodology, which is incredibly powerful. A semi-supervised learning technique. Because, see, we haven't found every possible image that the generator might make and labeled them PLAUSIBLE and IMPLAUSIBLE. That would be an impossible task. Instead we have this process of recursive co-training In which two circuits play a game Ruminating on a space of possibilities And so extracting value from a relatively small amount of training data A process that is perhaps helpful for neural circuits of all kinds. Though it does have some quirks
GANs are not particularly great at global structure. Here, it's grown a cow with an extra body Just as you may have spent a night walking through a house that is your house, but with many extra rooms
These networks are also not particularly great at counting. This monkey has eight eyes. Sometimes science goes *too far* Do something for me: Next time you think you're awake Which I think is now Count your fingers, just to be sure Go ahead Now, if you find you have more or fewer than you expected Please Don't wake up just yet We're not quite done Another interesting thing about this training methodology is that the Generator is being fed noise A vector of noise—some random point in a high-dimensional space. So it learns a mapping from this space onto its generation target—in this case, faces. And if we take a point in that space and drag it around... We get this. This... is also quite trippy, no? It resembles the things that I've seen... ...the things that Someone Who Isn't Me has seen... on acid. It resembles the sorts of things you may have seen, in long-forgotten dreams. I don't have a magic schoolbus voyage to take us on to understand why that is. But I have a theory: When we see a face, a bunch of neurons in our brain light up And begin resonating a signal which is the feeling of looking at that particular face. Taken together, all the neurons involved in face detection produce a vector embedding: a mapping from faces to positions in a high-dimensional space. And as we drag around the generator's vector here, so are we dragging around our own. A novel sensation. This is a wild theory. But not without neurocognitive precedent. Here we have a rat in a cage We've hooked an electrode up to a particular neuron in the rat's brain And those pink dots are the locations where it fires. If we speed it up... ...A pattern begins to emerge. This neuron is a grid cell. So named because the centers of its firing fields produce a triangular grid. There are lots of grid cells in your brain Each aligning to a different grid They collect data from your visual system from head direction cells, which similarly encode the position of your head And together, these cells construct an encoding of our position in 2D Euclidian space. It operates even in our sleep. If earlier, you discovered that you're dreaming and you want to see the end of this talk but you're having trouble staying in it, oneironauts recommend spinning around This detaches your perceived body—the one with twelve fingers and three extra bedrooms—from your physical body, which is lying in bed This positioning system is something which on some level, you always knew existed After all, you know where you are in space. You have a sense of space as you move through it. And it's likely —even necessary, if we believe that cognition is computation— That our qualitative sense of position has a neurocognitive precursor A signal in the web that tells us where we're at In many senses of the word.
Part three. Sticks and stones. They say you can't tickle yourself because you know it's coming. Specifically, when your brain sends an action command to your muscles...
That's called an efference When an efference is sent, your brain makes a copy. "Makes a copy" sounds so... planned. Engineered. Your brain is this big, messy, evolved signal processing mesh. Another way to think of efference copies is as reflections. We take the efference... ...and send it out to our peripheral nerves, where it will presumably make some muscles contract. Meanwhile, from the efference copy... We predict how our body's state will change... And use that to update our brain's model of our body's state. If we didn't do this, the we would have to wait for sensory data to come back, to tell what happened. Where is our hand right now? Then we'd face the same problem as trying to play a twitchy video game over a totally crap connection. Signals take 10ms to travel from our brain to our periphery, and another 10ms to travel back. It's just not that low-latency or high-bandwidth, this body of ours, at least not neurologically. To enable smooth, coordinated movements, our brain has to make predictions.

fingers did move

armpit was tickled

Life goes on. But in a moment, have a problem. We will still receive sense data from our nerves. If we updated our models again, they would actually fall *out* of sync. So we attenuate this signal. And keep our model in sync. This attenuation applies to sense of touch, when that touch is an expected consequence of our own movement. Aspects of this forward model are likely distributed throughout our brain But there's one place that's particularly important in maintaining it The cerebellum. The cerebellum is quite special. It contains half the neurons in our nervous system. All action commands from the brain to the body route through it All sensations from the body to the brain, too It has long been recognized as vitally important to our motor coordination. Like this: touch end of clicker People with cerebellar damage have difficulty performing this action smoothly. With cerebellar damage, our movements become jerky, laggy. It's theorized that the cerebellum acts as a Smith predictor. Our brain's controller for our latency-distant bodies able to estimate the body's current state integrate sensory feedback to update that model and decompose gross actions generated elsewhere in the brain into a fine-tuned, continuously adjusted control signal Once you've evolved it, such a thing has many uses. There's a growing body of evidence implicating the cerebellum in language. Which makes sense. Utterance is a kind of movement. Language—she said, gesticulating wildly—is not limited to utterance. The work of moving words is not so different from the work of moving the body. They are both transformations—from the space of internal states, efferents and ideas, to the space of space of world coordinates and external sense impressions and back again. What happens when this predictor encounters a problem? When there is an irreconcileable discontinuity in the model? These things are not so different. Visceral. Gutteral. They shake our bones. Jokes, too, are shaped like trauma. They are both shatterings Illuminations of discontinuities Paradoxes: things which cannot be and yet are Which we must revist again and again, churning Water smoothing the edges of cutting stone The machinery of our brains trying to make sense Of a world That resists it.
Preparing this talk has been a difficult time for me. I didn't think I could do it. The world appears to be falling apart. I got dumped, which didn't help. There were days when I would open my email And every subject line would be a stone And I would imagine putting them all into my dress and walking into the sea but I didn't Because I remembered I am a process of creation I am a song singing myself We are stories telling ourselves A sea understanding itself Our churning waves Creating every moment of exquisite joy and exquisite agony and everything else It's you. You are everything. Everything you have ever seen every place you have ever been every song you have ever sung every god you have ever prayed to every person you have ever loved and the boundaries between them and you and the sea and the stars are all in your head
@rakshesha ashi.io