Anthropic: Expanding Access to Claude for Government(anthropic.com) |
Anthropic: Expanding Access to Claude for Government(anthropic.com) |
So I would assume that three letter agencies would love to take something like GPT-4 and fine tune it based on all the data they have about existing terrorists.
On the flip side, LLMs must give the NSA a new challenge: a flood of garbage text generated by no-one in particular. Perhaps there will be more effort to put surveillance directly on-device as tapping networks yields more noise.
It's possible that LLMs will suddenly make a leap in reliability and usability (e.g. much higher context window without corresponding massive increases in memory usage). But I have yet to see it.
So far it's great at some specific usecases. Interacting with humans, rewriting or making up text. Summarising. A hit & miss at everything else.
Don't get me wrong, I love AI tech and I'm heavily experimenting with it (both at work and at home with local models). But as with most hyped technologies I find the benefits far overblown in marketing stories.
Our leadership jumped on Microsoft Copilot (the one for Office 365 because they have tens of different copilots :) ) like a pack of hungry wolves afraid to miss the boat. And the result was.... kinda meh. It's kinda promising and impresses with simple play school stuff ("make me a presentation about home safety") and totally and utterly fails when you try to do anything serious work related. Sooo many times I get "Sorry I can't do this right now", "Sorry I need more training for this", "I can't do this for you but this is how you can do it yourself!" or it does something but like totally wrong.
Meanwhile we have a bunch of MS training people running around evangelising and telling us how great everything is and making excuses for everything that goes wrong :) You can almost see them breathe a sigh of relief every time something works as it should. That's not what we were promised.
Maybe it will get there, but I don't see it happening tomorrow to be honest. LLMs were an impressive leap but their achilles heels have become clear and it's proving difficult to overcome them.
I'm really enjoying surfing the knife's edge of technology (as I was and still am with metaverse) but I don't yet see this as a game changer except in a few specific industries. People editing text for a living certainly have a need to worry.
I also wonder what will happen with future AI training. Now that more and more websites are filled with AI-generated content that is often at best "mediocre", and considering future AI models will be trained on that, will they be able to improve their accuracy or struggle to maintain it?
These are tasks that would have taken months of development or millions of dollars in manual effort before. It's not just hype.
Put another way: most people only get charged with a crime if it's worth a law-enforcement officer's time to catch you, but many small violations are ignored in favor of higher priorities. We may have to contemplate a future where AI is clever enough to notice everything that can be construed as a violation of some law and put on a prosecutor's backlog.
Schneier talks about this as well: https://www.schneier.com/blog/archives/2023/12/ai-and-mass-s...
That's what any good stalker or person experienced with social engineering is able to do right now, but it takes a lot of time and energy. Resorting to LLMs would considerably decrease both. And it gets easier the more people you have information about.
This then begs the question of what level of censorship reduction to apply. Should government employees be allowed to e.g., war-game a mass murder with an AI? What about discussing how to erode civil rights?
The can call themselves "sonnet", "bard", "open" and a whole plethora of other positive things. What remains is that they go into the direction of Palantir and the rest is just marketing.
https://support.anthropic.com/en/articles/9528712-exceptions...
The things which you're allowing yourself to imagine, don't exist in the reality of information we're discussing here
> For example, we have crafted a set of contractual exceptions to our general Usage Policy that are carefully calibrated to enable beneficial uses by carefully selected government agencies. These allow Claude to be used for legally authorized foreign intelligence analysis, such as combating human trafficking, identifying covert influence or sabotage campaigns, and providing warning in advance of potential military activities, opening a window for diplomacy to prevent or deter them.
Sometimes I wonder if this is cynicism or if they actually drank their own cool-aid.
Firstly, anthropic made an LLM, exposed it to the internet, and provided these terms of acceptable use.
https://www.anthropic.com/legal/archive/4903a61b-037c-4293-9...
There was no need for cynicism or kool aid at this stage.
Later on, presumably now-ish, anthropic changed the usage policy, to add an exception.
https://support.anthropic.com/en/articles/9528712-exceptions...
> Exceptions to our Usage Policy
> Updated today
The exception is that, starting from now,
> Anthropic may enter into contracts with government customers that tailor use restrictions to that customer’s public mission and legal authorities if, in Anthropic’s judgment, the contractual use restrictions and applicable safeguards are adequate to mitigate the potential harms addressed by this Usage Policy.
I don't think any kool aid or cynicism is needed.
The change is that, if anthropic think the client use case meets the listed humanitarian goals, then the client may use the LLM.
What are the security implications if American corpos like Google DeepMind, Microsoft GitHub, Anthropic and “Open”AI have explicitly anticompetitive / noncommercial licenses for greed/fear, so the only models people can use without fear of legal repercussions are Chinese?
Surely, Capitalism wouldn’t lead us to make a tremendous unforced error at societal scale?
Every AI is a sleeper agent risk if nobody has the balls and / or capacity to verify their inputs. Guess who wrote about that? https://arxiv.org/abs/2401.05566
Perhaps (optimistically) this is just a credibility-grab from Anthropic, with no basis in fact.
> Government agencies can use Claude to provide improved citizen services, streamline document review and preparation, enhance policymaking with data-driven insights, and create realistic training scenarios. In the near future, AI could assist in disaster response coordination, enhance public health initiatives, or optimize energy grids for sustainability.
Listen to Edward Snowden. This guy is not fucking around.
very optimistic of you :-)
Not only is Anthropic anti-open-source, they're also anti-open-output.
Saying "Hey, try our product! It can do everything!" while ALSO saying, "Sorry, you're not allowed to use our general intelligence product to compete with general intelligence..." just evidences no upper IQ bound on Dunning-Kruger
That does sound exhausting.
You're defending a company that makes "safe" usage part of their brand and every press release mentions how much they care about safety. Then one day, they announce they are making some compromises to their safety policy so they can get large new customers (government) but don't worry all they care about is safety. It's comical how predictable this was.
It doesn't require you to believe anything. It is not a modern version of any slogan. It's simply that, anthropic will allow government clients to use Claude if anthropic are convinced it's for certain, listed purposes. You are looking at this and approaching it ass-backwards.
It used to be a forbidden use case, or not an expresssly permitted one, for governments to use Claude to fight human trafficking.
Now it expressly permitted. You don't have to drink kool aid to understand this.
The Kansas city FBI field office's human task force, can now get some api keys, as long as they can convince anthropic that it's for a "catch a predator" sting.
The prevailing wisdom on Hacker News now is that actually people who say they care about AI safety are mostly lying and any road blocks they try to put up are motivated by greed. It feels very much like someone accusing a company talking about responsible forestry and advocating for higher standards in the forest cutting business of virtue signaling
https://news.ycombinator.com/item?id=40751756
In that thread multiple people posted wrong answers from GPT-4o but assumed that the answers were correct and praised the AI.
This matches my experience that anything that deviates from an encyclopedia lookup or web search is very likely to be wrong.
I’ve started using the chat feature in Github Copilot in IntelliJ. I wanted it to add some logging to my code for me, since it was a tedious task. I started off with a few relevant files and an explanation of what I wanted. Naturally it didn’t get it right on the first try, I don’t think any humans would either. But I could continue as conversation explaining what I thought was wrong and how I wanted it to actually be. I even realised that I didn’t know exactly what I wanted before I had seen some of the suggestions.
Once I was happy with the result I added another file to the chat and asked it to do the same with this file. I had a handful of files that were structured very similarly and all needed the same kind of logging. It did a great job and I could use the response without further editing. I tried to add more files but realised that the replies got slower and slower, so instead I reverted the conversation back to the state where I had initially been happy with the results and asked it to do the same thing but this time to a different file.
I find that it takes some practice to get good at getting the best results from LLMs. One great place to start is the prompt engineering guide by OpenAI https://platform.openai.com/docs/guides/prompt-engineering
When using something like GPT-4 for developing I try to think of it as a junior developer or a grad student. With a search engine you need to include the correct keywords to get the best results. For LLMs you need to set the right mood by writing a good prompt and holding a conversation before getting to the point. I also find that GPT-4 is fairly good at answering factual questions, but it’s much more useful and powerful when used to create things or discuss an approach.
What’s the deal here with liability and accountability? That’s a serious problem when considering using these for anything other than toy problems.
Wherein you discover that unless you ask it to consider the fact that PDFs are ... very hard to parse [1] [2] you get something that misses whole blocks of text or turns them into something they aren't and the rest of the program misses chunks of the document.
[1]: https://news.ycombinator.com/item?id=22473263 [2]: https://web.archive.org/web/20200303102734/https://www.filin...
LMAO! It's so hilarious that people like you forget that the alternative is relying on bureaucracies managed by people that get things wrong more often and are both too lazy and too stubborn to process your application to review your drilling report again.
If using both human-level and AI-level analysis is cheaper and much more accurate (but still imperfect), I'm willing to settle for a better system than oppose all change and die holding out for a perfect system.
Let's say some of your drilling reports contain a pattern that indicates balrog activity, which the LLM misses. The legal or insurance context requires you to monitor and address potential balrog activity. How do you plan for these failures?
In almost every case I've seen, the plan is to not have a plan, which is another way of saying that the data doesn't matter so long as no one complains about the results.
Once again they are selling it like something that's for everyone right now. This is the problem. THe same with the metaverse. It has some really great usecases, but they made it out like next year we would all ditch our phones and work exclusively in a VR headset. Obviously that didn't happen, as the tech was nowhere near that and probably people don't want it either.
Also, if you really need to be sure that those 30.000 drilling reports really didn't contain any hazards, you still have to go through it all yourself. Don't forget LLMs aren't reproducible.
But no, my point was exactly that it's not just hype. There are genuine useful usecases, I totally agree.
As there were for metaverse, and probably even for blockchain (NFT not so sure tho :) I always thought they were really a solution looking for a problem). The key thing about a hype is that they overblow the potential benefits way too much though. I see this happening here once again.
Until then, I'd have to side with said pessimists here.
You're comparing LLMs to a hypothetical alternative where a human reviews all 30k documents in detail. But the real alternative is often just a worse quality sieve where more errors blunder their way through the existing flawed processes. LLMs can improve on that.
You're right, I am comparing it to that alternative. There are fields and applications where this is necessary. I do not know if drilling reports are one of them. If you can tolerate a large false negative rate then great. But if you need to be catching 99.99% of problems then IMO you should at least be able to show your work. Taking black box output and throwing it over the wall sounds so sketchy in engineering contexts.
So if my ass was on the line for the output of an AI-written program being correct for 30k cases of parsing unstructured or mixed data I would be extremely careful. That is my point.