Discussion about this post

User's avatar
Angus Batey's avatar

Bravo Mic, well said.

There's another aspect to this which you'd imagine journalists and publishers would want to consider, which is that lobbing stuff into a public LLM is a form of publication. Presumably the Cleveland Plain Dealer has a business-specific LLM it's using, which is specific to that one workplace and what's done with it in that company doesn't feed in to the public LLM - but if not, all those story ideas, quotes and research materials that its staff (presumably; hope they're not insisting freelances work this way too) are putting into it become grist to the mill for the public version of the chatbot. Samsung infamously found this out the hard way a couple of years ago, when employees dropped proprietary semiconductor design info into ChatGPT and the company reckoned the resulting leak of trade secrets cost it over $60million. Obviously, someone would need to ask the right questions to get the LLM to republish the relevant information, but if a journalist puts material into a public LLM then they've basically just trashed any hope of their story being in any way exclusive. Similarly, anyone using generative LLM-type tools to transcribe audio of interviews needs to make sure that the terms and conditions they've signed up to don't allow the LLM company to use that audio and the transcription output to "train" the public model. The risk there is not just to blowing your exclusivity, but also potentially breaching data protection regulations and also - if the interview audio includes any moments where you agreed that the interviewee was speaking off the record - compromising your own relationship with your source as well as failing to honour any expectation of privacy they may have thought they'd agreed with you.

1 more comment...

No posts

Ready for more?