Are there agents that can read window titles and contents, click buttons, detect the positions and labels of controls, type in them?
Imagine a robotic process automation (RPA) clickbot, like something from UiPath or Power Automate, but driven by an LLM/other AI models.