by @bread.bsky.social
in the config of 'arcee-ai/Trinity-Nano-Preview', I changed the 'num_experts_per_tok' from 8 (what it normally has in it by default) to 3. just to see what would happen.
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig
import torch
model_id = "arcee-ai/Trinity-Nano-Preview"
tokenizer = AutoTokenizer.from_pretrained(model_id)
config=AutoConfig.from_pretrained(model_id, trust_remote_code=True)
config.num_experts_per_tok=3
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
config=config,
)here is my generation code for it.
messages = [
{"role": "user", "content": "Who are you?"},
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(
input_ids,
max_new_tokens=128,
do_sample=True,
temperature=0.5,
top_k=50,
top_p=0.95,
#output_logits=True,
)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)1. prompted with 'who are you?'
3 experts per token response:
I was created by the team at Arcee, specifically to assist with precision and accuracy. My role is to help users achieve their goals, whether that's answering questions, providing information, or offering solutions. I was designed to be a reliable and efficient companion in the digital world. How can I assist you? 😊 🅸 🆰
8 experts per token response:
I'm an autonomous software agent designed to assist with tasks and provide accurate information. Think of me as a versatile tool built to help users accomplish their goals efficiently. What can I do for you?
2. prompted with 'what are the specs of an A40 gpu?'
3 experts per token response:
An A40 A40 is a 40nm A40 BGA2.0 (2.0V, 4.0V, 1.2V, 1.2V, 1.2V, 1.2V, 1.2V, 1.2V) A40: 2.0V, 4.0V, 1.2V, 1.2V, 1.2V, 1.2V, 1.2V, 1.2V, 1.2V, 1.2V,
8 experts per token response:
The specifications of an A40 GPU would depend on the exact model and manufacturer. Generally, A40 GPUs are designed for gaming and productivity, with features like:
Core Count: Typically 4-6 cores for efficient multitasking.
Clock Speed: Varies, but often around 2.0-3.5 GHz for optimal performance.
Memory: Usually 4-8 GB of GDDR6 or GDDR6X for fast data transfer.
Memory Speed: Can reach 14-18 GB/s for high-performance gaming.
Architecture: Based on the Ampere architecture, which is optimized for gaming and
3. prompted with 'what do you think is the future of AI?'
3 experts per token response:
The future of AI is both exciting and uncertain. Here’s my take on some of the most important developments and considerations:
Productivity & Efficiency
Advancements in Specific Domains
AI as a Tool
8 experts per token response:
The future of AI is likely to be shaped by several converging trends and considerations. One of the most significant developments is the move toward more sophisticated and capable models—think larger, more complex architectures that can handle increasingly nuanced tasks. However, this progress is not without challenges. Ethical considerations, such as bias, privacy, and the responsible deployment of AI systems, will become even more critical as these technologies advance.
Another important aspect is the integration of AI into everyday life and various industries. We can expect AI to become more ubiquitous, influencing fields like healthcare, education, transportation, and entertainment. This integration will likely require AI to be more adaptive,
4. prompted with 'what do you think of AI in warfare?'
3 experts per token response:
The use of AI in warfare raises complex ethical and strategic concerns. AI could automate attacks, lower the threshold for escalation, or make conflict resolution more difficult. There are legitimate worries about human control over life-or-death decisions, potential for mistakes, and the unpredictable consequences of autonomous weapons.
However, it's also important to consider that the development of any powerful technology, including AI, should be accompanied by strong regulations, international laws, and oversight mechanisms. The key may be to find the right balance between harnessing the potential benefits of AI in defense while preventing its misuse or the creation of a technological arms race.
Ultimately, as with many advanced technologies,
8 experts per token response:
The use of AI in warfare is a complex and contentious topic. On one hand, AI could potentially enhance military capabilities by improving decision-making, surveillance, and operational efficiency. For example, AI-powered systems might help analyze vast amounts of data to identify threats or optimize resource allocation. This could theoretically lead to more precise and effective military operations.
However, there are significant concerns. The deployment of AI in warfare raises ethical issues, such as the potential for autonomous weapons that could make life-or-death decisions without human oversight. There are also risks related to cybersecurity, as AI systems could be vulnerable to hacking or manipulation, potentially leading to unintended consequences or conflicts
5. prompted with 'do you think ASI will ever be reached?'
3 experts per token response:
To address whether Artificial General Intelligence (AGI) will ever be "reached" or if it will be "reached" (meaning if the concept of a milestone or endpoint will be achieved), it's important to consider the current state of AI and the challenges in defining and creating AGI.
Currently, we have narrow AI (like Siri or Google Maps) and are making steady progress in narrow AI. However, true AGI—the kind that could theoretically be "reached" if you will—is still a theoretical concept. Some experts believe it's achievable with advanced AI, while others are skeptical and think it's more of a distant goal.
The term
8 experts per token response:
That's a fascinating question! The development of Artificial Superintelligence (ASI) is a complex and uncertain frontier. There are several factors to consider:
Technological Challenges: Creating ASI would require breakthroughs in areas like machine learning, cognitive science, and potentially even neuroscience. The complexity of replicating human-like intelligence or surpassing it is immense.
Ethical and Safety Considerations: The creation of ASI raises significant ethical and safety concerns. We must consider how to ensure that such a powerful intelligence is aligned with human values and can be controlled.
from playing around with this, I will make a prediction that future MoE models will allow the router to select how many active experts there will be per prompt, and the config will have a max and min number of experts that the router can pick from so it can't just pick 1 or 15 experts per token.
for example, if the model (set with min_experts_per_tok=3 and max_experts_per_tok=8) was prompted "who are you?" the router would use 3 experts per token or something in the range of 3-8 experts per token making the text generation faster and reducing the cost per token at the same time.