Show HN: I Created ErisForge, a Python Library for Abliteration of LLMs https://ift.tt/Vcm6ypb

Show HN: I Created ErisForge, a Python Library for Abliteration of LLMs ErisForge is a Python library designed to modify Large Language Models (LLMs) by applying transformations to their internal layers. Named after Eris, the goddess of strife and discord, ErisForge allows you to alter model behavior in a controlled manner, creating both ablated and augmented versions of LLMs that respond differently to specific types of input. It is also quite useful to perform studies on propaganda and bias in LLMs (planning to experiment with deepseek). Features - Modify internal layers of LLMs to produce altered behaviors. - Ablate or enhance model responses with the AblationDecoderLayer and AdditionDecoderLayer classes. - Measure refusal expressions in model responses using the ExpressionRefusalScorer. - Supports custom behavior directions for applying specific types of transformations. https://ift.tt/BTM4u6e January 27, 2025 at 05:29AM

No comments:

Show HN: They Tyrany of Rose Colored Blinders https://ift.tt/BEfA2bV

Show HN: They Tyrany of Rose Colored Blinders Another post in the continued adventures of me talking into the wind. In this post I discuss o...