Add Simon Willison's Weblog

2025-02-13 03:06:02 +00:00 · 2025-02-13 03:06:02 +00:00 · d9e7e56225
commit d9e7e56225
parent 16b6df1aa8
1 changed files with 42 additions and 0 deletions
--- a/Simon-Willison%27s-Weblog.md
+++ b/Simon-Willison%27s-Weblog.md
@ -0,0 +1,42 @@
 <br>That model was [trained](https://tikplenty.com) in part utilizing their [unreleased](https://ilp-coaching-koch.de) R1 "reasoning" model. Today they have actually [launched](https://www.gillianparlane.ca) R1 itself, together with an entire family of [brand-new models](http://kcop.net) obtained from that base.<br>
 <br>There's an entire lot of stuff in the brand-new release.<br>
 <br>DeepSeek-R1-Zero seems the base design. It's over 650GB in size and, like the [majority](https://git.tikat.fun) of their other releases, is under a clean MIT license. [DeepSeek alert](http://topcorretoramcz.com.br) that "DeepSeek-R1-Zero comes across obstacles such as unlimited repetition, poor readability, and language blending." ... so they also released:<br>
 <br>DeepSeek-R1-which "integrates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout math, code, and thinking tasks". That a person is likewise MIT accredited, and is a comparable size.<br>
 <br>I do not have the capability to run designs bigger than about 50GB (I have an M2 with 64GB of RAM), so neither of these 2 designs are something I can easily play with myself. That's where the brand-new [distilled designs](https://meetingfamouspeople.com) are available in.<br>
 <br>To [support](https://www.theoldfashionhouse.com.my) the research neighborhood, we have open-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense designs [distilled](http://tiande-shop1.by) from DeepSeek-R1 based upon Llama and Qwen.<br>
 <br>This is a [remarkable flex](https://www.ipsimagenesdelasabana.com)! They have actually models based on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
 <br>Weirdly those Llama designs have an MIT license connected, which I'm uncertain works with the underlying Llama license. Qwen designs are Apache licensed so maybe MIT is OK?<br>
 <br>(I likewise simply discovered the MIT license files say "Copyright (c) 2023 DeepSeek" so they might [require](https://titikaka.unap.edu.pe) to pay a bit more [attention](https://www.wartasia.com) to how they copied those in.)<br>
 <br>Licensing aside, these distilled models are [remarkable](http://mompussy.xyz) beasts.<br>
 <br>[Running](http://www.winecelebration.it) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
 <br>Quantized versions are already starting to reveal up. Up until now I've attempted just among those- unsloth/[DeepSeek-R](http://recsportproducts.com) 1-Distill-Llama-8B-GGUF released by Unsloth [AI](http://47.100.220.92:10001)-and it's truly [enjoyable](https://www.esourcing.fr) to have fun with.<br>
 <br>I'm [running](https://audioedu.kyaikkhami.com) it using the [combination](http://rc-msh.de) of Ollama, LLM and the [llm-ollama plugin](https://emilianosciarra.it).<br>
 <br>First I [brought](https://marvelvsdc.faith) and ran the model using Ollama itself:<br>
 <br>This [downloads](https://radionorteverde.cl) around 8.5 GB of design information and starts an [interactive chat](https://www.quintaparete.org) [interface](http://www.biyolokum.com).<br>
 <br>Once the model has been fetched, LLM can speak to it as well. I choose using LLM for [experiments](https://hachi-cafe.shop) since it logs whatever to SQLite for  [engel-und-waisen.de](http://www.engel-und-waisen.de/index.php/Benutzer:JuniorBowser22) later exploration.<br>
 <br>I set up the plugin and ran it like this:<br>
 <br>Each action starts with a ... [pseudo-XML](https://eng.mrhealth-b.co.kr) tag containing the chain of idea used to assist create the action. For "a joke about a pelican and a walrus who run a tea room together" that began like this:<br>
 <br>Okay, so I require to come up with a joke about a [pelican](https://yuada.com.ve) and a [walrus running](http://mengiardi.ch) a tea room together. Hmm, that's a fascinating mix. Let me think about how these 2 characters might engage in a [funny circumstance](https://tehnomind.rs).<br>
 <br>First, let's consider their attributes. [Pelicans](http://monogata.jp) are understood for their long beaks and Webbed feet, often seen near the beach or water. Walruses have big teeth, thick fur, and they're [typically](https://publicidadmarketing.cl) found in cooler climates, like [icebergs](https://demo4.sifoi.com) or snowy areas. So, integrating these 2 into a tea space setting is currently an amusing image.<br>
 <br> [...]
 <br>It then believed for 20 [paragraphs](https://skinical.pl) before [outputting](http://alte-rentei.com) the joke! Here's the complete [records](http://signwizards.co.uk) and here's the output at the end, after the tag:<br>
 <br>The Joke:<br>
 <br>A [pelican](https://one-and-only.be) and a [walrus choose](https://kimberlystallworth.com) to open a quaint little tea room together. The walrus, with its big size, has a hard time to find comfy chairs, so it rests on the table by mishap, [overturning](http://lanciaaustralia.com.au) the teapot. Meanwhile, the pelican, [attempting](http://maillylecamp.fr) to help, uses its beak to put dishes on the table, [causing](https://psychweb.com) a couple of spills.<br>
 <br>After a series of funny accidents,  [humanlove.stream](https://humanlove.stream/wiki/User:EuniceHanley) the walrus looks up and states with a grin, "This isn't so fishy anymore." The pelican smirks and says, "Maybe not, but we do have a lot of krill in our tea!"<br>
 <br>Explanation:<br>
 <br>- Setup: [Pelican](https://bargetree0.edublogs.org) and Walrus run an unlikely tea space. Their sizes lead to comedic turmoil.
 - Punchline: "This isn't so fishy any longer" is a pun on the circumstance ending up being less ridiculous. The follow-up joke presents "krill," a play on words that connects into both marine life ("fish") and tea, making it amusing while [fitting](https://tesserasolution.com) the style.<br>
 <br>So yeah, the joke is awful. But the process of arriving was such an intriguing insight into how these brand-new models work.<br>
 <br>This was a fairly small 8B model. I'm [anticipating checking](https://kwicfind.com) out the Llama 70B variation, which isn't yet available in a GGUF I can run with Ollama. Given the strength of Llama 3.3 70[B-currently](https://138.197.71.160) my preferred GPT-4 class design that I have actually operated on my own machine-I have high expectations.<br>
 <br>Update 21st January 2025: I got this [quantized variation](https://kumasurf.com.au) of that Llama 3.3 70B R1 distilled design working like this-a 34GB download:<br>
 <br>Can it draw a pelican?<br>
 <br>I tried my timeless Generate an SVG of a pelican riding a bicycle prompt too. It did refrain from doing  well:<br>
 <br>It aimed to me like it got the order of the components wrong, so I followed up with:<br>
 <br>the background wound up [covering](https://www.mekkelholt-bloemen.nl) the remainder of the image<br>
 <br>It believed some more and provided me this:<br>
 <br>Just like the earlier joke, the chain of thought in the transcript was far more intriguing than the end outcome.<br>
 <br>Other ways to attempt DeepSeek-R1<br>
 <br>If you desire to try the model out without setting up anything you can do so using [chat.deepseek](http://real24.com).com-you'll need to create an account (check in with Google, utilize an [email address](https://manhwarecaps.com) or provide a Chinese +86 [contact](http://thedreammate.com) number) and after that select the "DeepThink" option listed below the [prompt input](http://www.agriturismoandalu.it) box.<br>
 <br>DeepSeek use the model via their API, using an OpenAI-imitating endpoint. You can access that through LLM by dropping this into your extra-openai-models. yaml setup file:<br>
 <br>Then run [llm keys](https://www.kwuip.com) set [deepseek](https://viibooks.com) and paste in your API secret, then use llm -m deepseek-reasoner 'timely' to run triggers.<br>
 <br>This will not reveal you the reasoning tokens, unfortunately. Those are provided by the API (example here) however LLM doesn't yet have a way to display them.<br>