diff --git a/Simon-Willison%27s-Weblog.md b/Simon-Willison%27s-Weblog.md new file mode 100644 index 0000000..ee88f61 --- /dev/null +++ b/Simon-Willison%27s-Weblog.md @@ -0,0 +1,42 @@ +
That model was [trained](https://tikplenty.com) in part utilizing their [unreleased](https://ilp-coaching-koch.de) R1 "reasoning" model. Today they have actually [launched](https://www.gillianparlane.ca) R1 itself, together with an entire family of [brand-new models](http://kcop.net) obtained from that base.
+
There's an entire lot of stuff in the brand-new release.
+
DeepSeek-R1-Zero seems the base design. It's over 650GB in size and, like the [majority](https://git.tikat.fun) of their other releases, is under a clean MIT license. [DeepSeek alert](http://topcorretoramcz.com.br) that "DeepSeek-R1-Zero comes across obstacles such as unlimited repetition, poor readability, and language blending." ... so they also released:
+
DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout math, code, and thinking tasks". That a person is likewise MIT accredited, and is a comparable size.
+
I do not have the capability to run designs bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 designs are something I can easily play with myself. That's where the brand-new [distilled designs](https://meetingfamouspeople.com) are available in.
+
To [support](https://www.theoldfashionhouse.com.my) the research neighborhood, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense designs [distilled](http://tiande-shop1.by) from DeepSeek-R1 based upon Llama and Qwen.
+
This is a [remarkable flex](https://www.ipsimagenesdelasabana.com)! They have actually models based on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).
+
Weirdly those Llama designs have an MIT license connected, which I'm uncertain works with the underlying Llama license. Qwen designs are Apache licensed so maybe MIT is OK?
+
(I likewise simply discovered the MIT license files say "Copyright (c) 2023 DeepSeek" so they might [require](https://titikaka.unap.edu.pe) to pay a bit more [attention](https://www.wartasia.com) to how they copied those in.)
+
Licensing aside, these distilled models are [remarkable](http://mompussy.xyz) beasts.
+
[Running](http://www.winecelebration.it) DeepSeek-R1-Distill-Llama-8B-GGUF
+
Quantized versions are already starting to reveal up. Up until now I've attempted just among those- unsloth/[DeepSeek-R](http://recsportproducts.com) 1-Distill-Llama-8B-GGUF released by Unsloth [AI](http://47.100.220.92:10001)-and it's truly [enjoyable](https://www.esourcing.fr) to have fun with.
+
I'm [running](https://audioedu.kyaikkhami.com) it using the [combination](http://rc-msh.de) of Ollama, LLM and the [llm-ollama plugin](https://emilianosciarra.it).
+
First I [brought](https://marvelvsdc.faith) and ran the model using Ollama itself:
+
This [downloads](https://radionorteverde.cl) around 8.5 GB of design information and starts an [interactive chat](https://www.quintaparete.org) [interface](http://www.biyolokum.com).
+
Once the model has been fetched, LLM can speak to it as well. I choose using LLM for [experiments](https://hachi-cafe.shop) since it logs whatever to SQLite for [engel-und-waisen.de](http://www.engel-und-waisen.de/index.php/Benutzer:JuniorBowser22) later exploration.
+
I set up the plugin and ran it like this:
+
Each action starts with a ... [pseudo-XML](https://eng.mrhealth-b.co.kr) tag containing the chain of idea used to assist create the action. For "a joke about a pelican and a walrus who run a tea room together" that began like this:
+
Okay, so I require to come up with a joke about a [pelican](https://yuada.com.ve) and a [walrus running](http://mengiardi.ch) a tea room together. Hmm, that's a fascinating mix. Let me think about how these 2 characters might engage in a [funny circumstance](https://tehnomind.rs).
+
First, let's consider their attributes. [Pelicans](http://monogata.jp) are understood for their long beaks and Webbed feet, often seen near the beach or water. Walruses have big teeth, thick fur, and they're [typically](https://publicidadmarketing.cl) found in cooler climates, like [icebergs](https://demo4.sifoi.com) or snowy areas. So, integrating these 2 into a tea space setting is currently an amusing image.
+
[...] +
It then believed for 20 [paragraphs](https://skinical.pl) before [outputting](http://alte-rentei.com) the joke! Here's the complete [records](http://signwizards.co.uk) and here's the output at the end, after the tag:
+
The Joke:
+
A [pelican](https://one-and-only.be) and a [walrus choose](https://kimberlystallworth.com) to open a quaint little tea room together. The walrus, with its big size, has a hard time to find comfy chairs, so it rests on the table by mishap, [overturning](http://lanciaaustralia.com.au) the teapot. Meanwhile, the pelican, [attempting](http://maillylecamp.fr) to help, uses its beak to put dishes on the table, [causing](https://psychweb.com) a couple of spills.
+
After a series of funny accidents, [humanlove.stream](https://humanlove.stream/wiki/User:EuniceHanley) the walrus looks up and states with a grin, "This isn't so fishy anymore." The pelican smirks and says, "Maybe not, but we do have a lot of krill in our tea!"
+
Explanation:
+
- Setup: [Pelican](https://bargetree0.edublogs.org) and Walrus run an unlikely tea space. Their sizes lead to comedic turmoil. +- Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less ridiculous. The follow-up joke presents "krill," a play on words that connects into both marine life ("fish") and tea, making it amusing while [fitting](https://tesserasolution.com) the style.
+
So yeah, the joke is awful. But the process of arriving was such an intriguing insight into how these brand-new models work.
+
This was a fairly small 8B model. I'm [anticipating checking](https://kwicfind.com) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70[B-currently](https://138.197.71.160) my preferred GPT-4 class design that I have actually operated on my own machine-I have high expectations.
+
Update 21st January 2025: I got this [quantized variation](https://kumasurf.com.au) of that Llama 3.3 70B R1 distilled design working like this-a 34GB download:
+
Can it draw a pelican?
+
I tried my timeless Generate an SVG of a pelican riding a bicycle prompt too. It did refrain from doing well:
+
It aimed to me like it got the order of the components wrong, so I followed up with:
+
the background wound up [covering](https://www.mekkelholt-bloemen.nl) the remainder of the image
+
It believed some more and provided me this:
+
Just like the earlier joke, the chain of thought in the transcript was far more intriguing than the end outcome.
+
Other ways to attempt DeepSeek-R1
+
If you desire to try the model out without setting up anything you can do so using [chat.deepseek](http://real24.com).com-you'll need to create an account (check in with Google, utilize an [email address](https://manhwarecaps.com) or provide a Chinese +86 [contact](http://thedreammate.com) number) and after that select the "DeepThink" option listed below the [prompt input](http://www.agriturismoandalu.it) box.
+
DeepSeek use the model via their API, using an OpenAI-imitating endpoint. You can access that through LLM by dropping this into your extra-openai-models. yaml setup file:
+
Then run [llm keys](https://www.kwuip.com) set [deepseek](https://viibooks.com) and paste in your API secret, then use llm -m deepseek-reasoner 'timely' to run triggers.
+
This will not reveal you the reasoning tokens, unfortunately. Those are provided by the API (example here) however LLM doesn't yet have a way to display them.
\ No newline at end of file