Advanced usage¶
Bulk editing¶
Edits can be performed in batches to make better use of GPU resources using editor.swap_subject_concepts_and_predict_greedy_bulk()
as below:
from linear_relational import CausalEditor, ConceptSwapAndPredictGreedyRequest
concepts = trainer.train_relation_concepts(...)
editor = CausalEditor(model, tokenizer, concepts=concepts)
swap_requests = [
ConceptSwapAndPredictGreedyRequest(
text="Shanghai is located in the country of",
subject="Shanghai",
remove_concept="located in country: China",
add_concept="located in country: France",
predict_num_tokens=1,
),
ConceptSwapAndPredictGreedyRequest(
text="Berlin is located in the country of",
subject="Berlin",
remove_concept="located in country: Germany",
add_concept="located in country: Japan",
predict_num_tokens=1,
),
]
edited_answers = editor.swap_subject_concepts_and_predict_greedy_bulk(
requests=swap_requests,
edit_single_layer=False,
magnitude_multiplier=0.1,
batch_size=4,
)
print(edited_answers) # [" France", " Japan"]
Bulk concept matching¶
We can perform concept matches in batches to better utilize GPU resources using matcher.query_bulk()
as below:
from linear_relational import ConceptMatcher, ConceptMatchQuery
concepts = trainer.train_relation_concepts(...)
matcher = ConceptMatcher(model, tokenizer, concepts=concepts)
match_queries = [
ConceptMatchQuery("Beijng is a northern city", subject="Beijing"),
ConceptMatchQuery("I sawi him in Marseille", subject="Marseille"),
]
matches = matcher.query_bulk(match_queries, batch_size=4)
print(matches[0].best_match.concept) # located in country: China
print(matches[1].best_match.concept) # located in country: France
Customizing LRC training¶
The base trainer.train_relation_concepts()
function is a convenience wrapper which trains a LRE,
performs a low-rank inverse of the LRE, and uses the inverted LRE to generate concepts. If you want to customize
this process, you can generate a LRE using trainer.train_lre()
, followed by inverting the LRE with lre.invert()
,
and finally training concepts from the inverted LRE with trainer.train_relation_concepts_from_inv_lre()
. This process
is shown below:
from linear_relational import Trainer
trainer = Trainer(model, tokenizer)
prompts = [...]
lre = trainer.train_lre(...)
inv_lre = lre.invert(rank=200)
concepts = trainer.train_relation_concepts_from_inv_lre(
inv_lre=inv_lre,
prompts=prompts,
)
It’s also possible to pass a lambda function as the inv_lre
param to allow using a different inverted LRE
for each object. This lambda takes the object as a string and returns the inverted LRE for that object. However,
if you use this approach, you must also pass in relation
, object_aggregation
and object_layer
, as these
cannot be inferred from the inverted LRE when passed as a function.
This is shown below:
from linear_relational import Trainer
trainer = Trainer(model, tokenizer)
prompts = [...]
lre1 = trainer.train_lre(...)
inv_lre1 = lre.invert(rank=200)
lre2 = trainer.train_lre(...)
inv_lre2 = lre.invert(rank=200)
def inv_lre_fn(object_name):
return inv_lre1 if object_name == "Paris" else inv_lre2
concepts = trainer.train_relation_concepts_from_inv_lre(
inv_lre=inv_lre_fn,
prompts=prompts,
relation="located_in_country",
object_aggregation="mean",
object_layer=20,
)
Custom objects in prompts¶
By default, when you create a Prompt
, the answer to the prompt is assumed to be the object
corresponding to a LRC. For instance, in the prompt Prompt("Paris is located in", "France", subject="Paris")
,
the answer, “France”, is assumed to be the object. However, if this is not the case, you can specify the object
explicitly using the object_name
parameter as below:
from linear_relational import Prompt
prompt1 = Prompt(
text="PARIS IS LOCATED IN",
answer="FRANCE",
subject="PARIS",
object_name="france",
)
prompt2 = Prompt(
text="Paris is located in",
answer="France",
subject="Paris",
object_name="france",
)
Skipping prompt validation¶
By default, the Trainer
will validate that for every prompt passed in, that the model answers the prompt correctly,
and will filter out any prompts where this is not the case.
If you want to skip this validation, you can pass validate_prompts=False
to all methods on the trainer
like Trainer.train_relation_concepts(prompts, validate_prompts=False)
.
Multi-token object aggregation¶
If a prompt has an answer which is multiple tokens, by default the Trainer
will use the mean activation of
the tokens in the answer when training a LRE. An example of a prompt with a multi-token answer is “The CEO of Microsoft is Bill Gates”,
where the object, “Bill Gates”, has two tokens. Alternatively, you can use just the first token of the object by
passing object_aggregation="first_token"
when training a LRE. For instance, you can run the following:
lre = trainer.train_lre(
prompts=prompts,
object_aggregation="first_token",
)
If the answer is a single token, “mean” and “first_token” are equivalent.
Custom layer selection¶
By default, the library will try to guess which layers corresponding to hidden activations in the model,
and will use these layers for reading activations and training LREs. If the layers the library guesses are not
correct, or if you want to use different layers to extract activations and train LREs, you can pass in a
custom layer_matcher
to the Trainer
, CausalEditor
, and ConceptMatcher
when creating these
objects.
A layer_matcher
is typically A string, and must include the substring "{num}"
which will be replaced
with the layer number to select a layer in the model. For instance, for GPT models, the matcher for
hidden layers is "transformer.h.{num}"
. You can find a list of all layers in a model by calling
model.named_modules()
.
For most cases, using a string is sufficient, but if you want to customize the layer matcher further
you can pass in a function to layer_matcher
which takes in the layer number as an int and
returns the layer in the model as a string. For instance, for GPT models, this could be provided as
lambda num: f"transformer.h.{num}"
.