Absolutely the easiest solution would have been to have a written exam on the cases and concepts that we discussed in class. It would take a few hours to create and grade the exam.
But at a university you should experiment and learn. What better class to experiment and learn than the “AI Product Management”. Students were actually intrigued by the idea themselves.
The key goal: we wanted to ensure that the projects that students submitted was actually their own work, not “outsourced” (in a general sense) to teammates or to an LLM.
Gemini 3 and NotebookLM with slide generation were released in the middle of the class, and we realized that it is feasible for a student to have a flaweless presentation in front of the class, without understanding deeply what they are presenting.
We could schedule oral exams during the finals week, which would be a major disruption for the students, or schedule exams during the break, violating university rules and ruining students vacation.
But as I said, we learned that AI-driven interviews are more structured and better than human-driven ones, because humans do get tired, and they do have biases based on who is the person they are interviewing. That’s why we decided to experiment with voice AI for running the oral exam.
By the way the voice agent flagged the system as “the student is obviously fooling around”. I was expecting this to be caught during the grading phase but ElevenLabs has done such a good work with their product.
Guys, thank you for such fooling around. All these adversarial discussions will be great for stress testing the system. Very likely we will use these conversations as part of the course in the Spring to get students to see what it means to let AI systems “in the wild”.
Not the case for the class in the blog post, but we also have many online classes. Many professionals prefer these online classes because they can attend without having to commute, and can do it from a place of their own convenience.
Such classes do not have the luxury of pen-and-paper exams, and asking people to go to testing centers is a huge overkill.
Take home exams for such settings (or any other form of written exam) are becoming very prone to cheating, just because the bar to cheating is very low. Oral exams like that make it a bit harder to cheat. Not impossible, but harder.
I did a C# module online run by a Norwegian University. It was worth 6 points, 180 grants you a bachelor's degree in Norway (or did, I think there have been changes since). The course ran over ten weeks and there were weekly assignments. Of course it would have been easy to cheat on those but there would be no point because there was a five hour invigilated open book exam at the end of the course. Had to go to a testing centre about 35 km away to take the exam but that really wasn't a great inconvenience. If I had wanted to pursue a whole degree then I would have had 30 such exams, roughly one a month if you do the degree over the traditional three years. That doesn't seem like overkill to me, it's a lot less effort than attending lectures and tutorials for three years as I did for my Applied Physics degree.
Just in case, I am the author of the blog post. For our "AI" class, it felt like a good class to experiment with something novel.
No, we do not want to eliminate the pen and paper exam. It works well. We use it.
The oral exam is yet another tool. Not a solution for everything.
In our case, we wanted to ensure that the students who worked on the team project: (a) contributed enough to understand the project, (b) actually understood their own project and did not rely solely on an LLM. (We do allow them to use LLMs, it would be stupid not to.)
The students who did badly in the oral exam were exactly the students who we expected to do badly in the exam, even though they aced their (team) project presentations.
Could we do it in person? Sure, we could schedule personalized interviews for all the 36 students. With two instructors, it would have taken us a couple of days to go through. Not a huge deal. At 100 students and one instructor, we would have a problem doing that.
But the key reason was the following: research has shown that human interviewers are actually worse when they get tired, and that AI is actually better for conducting more standardized and more fair interviews. That result was a major reason for us to trust a final exam on a voice agent.
>We do allow them to use LLMs, it would be stupid not to.
I'm not sure why you're saying this so confidently. Using LLMs on school work is like using a forklift at the gym. You'll technically finish the task you set out to do, and it will be much easier. So why not use a forklift at the gym?
>But the key reason was the following: research has shown that human interviewers are actually worse when they get tired, and that AI is actually better for conducting more standardized and more fair interviews. That result was a major reason for us to trust a final exam on a voice agent.
I think that in an "AI class" for MBA students, the material is probably not complex enough to require much more than a Zork interpreter, but if you tried this on something in which nuance is required, that comparison would change dramatically. For something like this, which is likely going to be little more than knowledge spot checks to catch the most blatant cheaters, why not just have students do multiple choice questions at a kiosk?
I agree that I am not yet confident to use this approach for my technical classes. I am still very unhappy with any option for assessment for technical classes, but I would not trust an LLM to come up with good questions. NotebooksLM does come up with decent quizzes, but nothing super hard.
For the use of LLM in classes: I understand the reasoning, but I found LLMs to be extremely educational for parsing through dense material (eg parsing an NTSB report for an Uber self-driving crash). Prohibiting students from using LLMs would be counterproductive.
But I still want students to use LLMs responsibly, hence the oral exam.
This is a regulatory requirement, part of the "Know Your Customer" doctrine. In plain words, banks are required to know who is the client who has opened an account.
Most banks will be risk averse and will not open an account to anyone applying online from abroad. Even for US persons applying online, they will ask quite a bit of documentation.
Some banks (but not all) will open an account for a non-US person, when the non-US person physically visits a US branch, with proper identification (typically a passport) and documentation on why they want the account. But even in such cases, it is up to the discretion of the bank employee to decide whether the risk of opening an account for a non-US person is worth the benefit. So, the same bank may give different replies to the same inquiry, depending on the branch asked.
As a concrete example, TD Ameritrade will open easily an account for a foreigner in the Chinatown branch in NYC, but will not open an account when the same customer visits a branch in midtown in NYC.
The comparison with Uber and Whatsapp is not the proper one. These are private companies that were funded and acquired, respectively, purely on growth potential.
OpenTable has been a public company for almost 5 years now (see http://finance.yahoo.com/echarts?s=OPEN). Revenues, cost, growth, and all other metrics have been publicly examined and scrutinized for long time. The 46% premium paid by Priceline is based on how the new management estimates that they can leverage the assets of Opentable and hardly a "bubble-ish" premium.
If you believe that OpenTable is part of a bubble, then the whole US stock market is in a bubble, which may be true but again not directly connected to Uber and Whatsapp valuations.
The initial stages of the industrial revolution were not so good for the workers. However, the mass production phase, did increase the standard of living.
If the ads were only on their own domains, this could have gone undetected. The whole thing was discovered as a result of using those 'legit' websites, and as far as I can tell from the article, using those was an essential part of the scam, i.e. without it, it might not work... but I'm just wondering why.
Correct. I started explicitly ignoring these categories: Too much load for the crawler and the financial indexes were already carrying this information.
Absolutely the easiest solution would have been to have a written exam on the cases and concepts that we discussed in class. It would take a few hours to create and grade the exam.
But at a university you should experiment and learn. What better class to experiment and learn than the “AI Product Management”. Students were actually intrigued by the idea themselves.
The key goal: we wanted to ensure that the projects that students submitted was actually their own work, not “outsourced” (in a general sense) to teammates or to an LLM.
Gemini 3 and NotebookLM with slide generation were released in the middle of the class, and we realized that it is feasible for a student to have a flaweless presentation in front of the class, without understanding deeply what they are presenting.
We could schedule oral exams during the finals week, which would be a major disruption for the students, or schedule exams during the break, violating university rules and ruining students vacation.
But as I said, we learned that AI-driven interviews are more structured and better than human-driven ones, because humans do get tired, and they do have biases based on who is the person they are interviewing. That’s why we decided to experiment with voice AI for running the oral exam.