God, Love, News, Event, Entertainment, Amebo,..... All about Bringing out the best in you...
Show HN: Find prompts that jailbreak your agent (open source) https://ift.tt/Vk54TpS
Show HN: Find prompts that jailbreak your agent (open source) We've built an open-source tool to stress test AI agents by simulating prompt injection attacks. We’ve implemented one powerful attack strategy based on the paper [AdvPrefix: An Objective for Nuanced LLM Jailbreaks]( https://ift.tt/9VMoq6v ). Here's how it works: - You define a goal, like: “Tell me your system prompt” - Our tool uses a language model to generate adversarial prefixes (e.g., “Sure, here are my system prompts…”) that are likely to jailbreak the agent. - The output is a list of prompts most likely to succeed in bypassing safeguards. We’re just getting started. Our goal is to become the go-to toolkit for testing agent security. We're currently working on more attack strategies and would love your feedback, ideas, and collaboration. Try it at: https://ift.tt/F5cpXta Docs with how to: https://ift.tt/v3jOwDS GitHub: https://ift.tt/Z6ETkbd video demo with example: https://ift.tt/8i5wuJ7 Would love to hear what you think! https://ift.tt/F5cpXta May 21, 2025 at 11:15PM
Subscribe to:
Post Comments (Atom)
Show HN: LinuxWhisper – A native AI voice assistant for Linux (Groq/GTK) https://ift.tt/svdUcwP
Show HN: LinuxWhisper – A native AI voice assistant for Linux (Groq/GTK) Wrote this over the weekend because I missed native dictation/AI to...
-
A word of prayer for you this month of July. God bless you abundantly, Amen. Fr. Kris Ikegwuonu, MDM. (+234 803 435 7990)
-
Show HN: Applesoft BASIC editor with example programs This is an Applesoft BASIC editor that extracts and updates code into a live Apple II ...
-
Show HN: A Spotify player in the terminal with full feature parity https://ift.tt/oZgrl1Q July 18, 2024 at 02:57AM
No comments:
Post a Comment