Experiments with ChatGPT: Don’t Panic, the Robots Are Not Writing Your Students’ Legal Memos

[Note: Please welcome guest bloggers Jennifer Wondracek, Director of the Law Library, Professor of Legal Research & Writing at Capital University Law School, and Rebecca Rich, Assistant Dean for the Law Library and Technology Services, and Assistant Teaching Professor at Drexel University Thomas R. Kline School of Law. – GL]

AI content generation tools, such as ChatGPT, have been in the news a lot lately. It’s the new cool tool for doing everything from coding to graphic art to writing legal briefs. It was even, briefly, used for a robot lawyer that was going to argue in court. And Greg Lambert wrote about it a few weeks ago on this very blog in What a Law Librarian Does with AI Tools like ChatGPT – Organize and Summarize. This post continues Greg’s discussion on ChatGPT use.

AI content generation tools are also the new education bogeyman. A myriad of headlines have been written in the last two months about how ChatGPT is the death of the essay and multiple-choice exams. It’s the newest in a line of digital tools–starting with the Internet and Wikipedia–that students might use to cheat in legal education. But we think this is a bit of an overreaction. ChatGPT and other similar AI-generative content creation tools can, and have, absolutely been misused; but we found that even with expert prompt creation and a high level of expertise, ChatGPT et al. are not yet capable of producing student work that is indistinguishable from real student work. Not all share this belief. Several professors at the University of Minnesota Law School ran some exams through ChatGPT, using a closed universe of cases provided to the program. The exams resulted in an average grade of C+ across the four exams, which were graded blindly. But they noted a few important takeaways, including:

[W]hen ChatGPT’s essay questions were incorrect, they were dramatically incorrect, often garnering the worst scores in the class. Perhaps not surprisingly, this outcome was particularly likely when essay questions required students to assess or draw upon specific cases, theories, or doctrines that were covered in class.

And

In writing essays, ChatGPT displayed a strong grasp of basic legal rules and had consistently solid organization and composition. However, it struggled to identify relevant issues and often only superficially applied rules to facts as compared to real law students.

The authors of this blog post have also done some experiments with ChatGPT. Jenny was curious about the kind of legal work that ChatGPT thought it could perform. When asked what types of legal tasks it could do, ChatGPT listed seven options, ranging from summarizing laws to drafting legal documents. The option that caught Jenny’s eye was “Helping with legal research, by providing the most relevant cases and statutes.” Challenge accepted.

Using a current problem her Legal Research and Writing students were working with, Jenny asked ChatGPT “What are the most relevant cases and statutes to determine if someone is a recreational user land entrant under Ohio law?” A few seconds later, ChatGPT gave her two statutes and three cases with brief summaries of each. While it had the general premise correct that a landowner is not liable for injury to a recreational user, assuming all of the requirements are met, it provided incorrect definitions, and every statute and case cited were incorrect. It also disagreed with itself about the duty of care owed to the recreational user in another sentence. Neither statute provided led to R.C. 1533.18 or 1533.181, the Ohio statutes for this law. When asked for more citations for the three cases listed, Jenny received both regional reporter and Ohio citations that were readable, if not quite properly Bluebooked. Investigations into the cases determined that none of the three existed by name and each of the six reporter citations led to a unique case, none of which were remotely responsive to the question. In the end, ChatGPT gave Jenny a partially correct answer with two incorrect statutes, three made-up cases, and six incorrect cases. Not a good day for accurate legal research!

Becka experimented with the law-review style research and policy paper prompts she uses for her Education Law and AI and the Law classes and had a similar experience. Even with prompts to write longer papers, ChatGPT produced short, generically written papers with no or minimal citation (including often made-up citations!) and no analysis. A five-page paper would have an average of two footnotes per page even when prompted to add more. Becka shared the results with one of her students who commented that even she could tell this was an F paper. Becka also experimented with having ChatGPT create a class policy presentation. Again, even after several refining prompts, the presentation was, at best, a C- presentation.

Given the legal writing learning curve and low level of longer form writing experience of many of our students, along with their documented increased level of stress and reduced mental health, it is understandable that instructors are nonetheless concerned about the use of AI content generation tools.

As with plagiarism tools, there is now a profitable market for detecting the use of AI generated content using AI. There are currently at least two startups developing tools: AICheatCheck and CrossPlag, both of which have usable demos. GLTR and GPTZero were developed by a collaboration between an MIT and a Harvard professor and a Princeton University student respectively (for more about these tools and a comparison of how they work, take a look at this RIPS-SIS post). Our friendly neighborhood plagiarism detection companies, Turnitin and Copyleaks, are also in the process of adding AI content generation tool use detection to their products. OpenAI (ChatGPT’s company) has developed a tool to assist in detecting the use of its tool in writing and is in the process of adding watermarking to ChatGPT-generated content.

None of these detection tools are 100% effective, so it may also be helpful to consider adding ChatGPT detection options to your paper grading rubric. Some options:

ChatGPT generated text is formulaic: it generally follows the 5-paragraph essay, stereotypical topic sentence at the top structure.
Sentence length does not vary as much as human-generated text does.
ChatGPT generated text is light on analysis and applying facts to an issue.

Also remember that ChatGPT isn’t good at citation and doesn’t have any information in it from after 2021 yet. Well-done, indistinguishable from humans is a difficult enough problem to solve that no one’s gotten there yet (though an Israeli start-up is trying).

Lastly, we recommend considering teaching students about ChatGPT rather than banning it. There are so many AI-assisted drafting tools available for lawyers now that we’d be doing them a disservice otherwise (e.g. ClearBrief, Clawdia, and Docugami). The Sentient Syllabus Project has three great principles for doing so:

AI cannot pass this class,
AI contributions must be limited and true, and
AI use should be open and documented.

On to the next experiment!

← View all posts

Article:

View original source