I’m going to write down what I think is a really useful architecture for handling asynchronous events using lambdas in gcp. This architecture is very flexible in terms of its constituent components. There is also no particular need to use GCP, I just think GCP has the best queue service (pubsub). I think this is very appealing fundamental building block for message oriented architectures, and part of it is framed in terms of of chat systems, a very popular topic right now.
Basically,
The benefits of this architecture are that Receiver, Sender, and Worker are mostly decoupled; at a minimum, they only need to know how to understand their respective inputs. Every component of this sytem is also horizontally scalable because of the reliance on a message passing scheme.
The important thing about this shape is that the constituent components are very flexible. For instance, we could include S3 buckets as either the main or the secondary storage; GCP is very friendly when it comes to responding to S3 events. The receiver/sender pair could be swapped for a long-running service if we wanted to use persistent connections instead. The worker can be a gpu machine or a lambda, or even an on-premises VM. The sender and receiver could be different sources, or not a client at all (cron?).
I cleaned up a version of this I had lying around and published it on github. It uses Twilio, unfortunately, because that was the simplest and clearest version I had. I may replace that functionality with Discord in the near future.