diff --git a/Run DeepSeek R1 Locally - with all 671 Billion Parameters.-.md b/Run DeepSeek R1 Locally - with all 671 Billion Parameters.-.md new file mode 100644 index 0000000..44221ce --- /dev/null +++ b/Run DeepSeek R1 Locally - with all 671 Billion Parameters.-.md @@ -0,0 +1,67 @@ +
Recently, I [demonstrated](http://elektro.jobsgt.ch) how to quickly run distilled variations of the DeepSeek R1 model locally. A distilled model is a compressed version of a bigger language design, where [knowledge](https://pasarelalatinoamericana.com) from a larger design is transferred to a smaller sized one to decrease resource usage without losing excessive performance. These models are based on the Llama and [Qwen architectures](http://bella18ffs.twilight4ever.yooco.de) and be available in versions varying from 1.5 to 70 billion parameters.
+
Some explained that this is not the REAL DeepSeek R1 which it is difficult to run the complete [design locally](https://seychelleslove.com) without several hundred GB of memory. That sounded like a challenge - I thought! First Attempt - Heating Up with a 1.58 bit Quantized Version of DeepSeek R1 671b in Ollama.cpp
+
The designers behind Unsloth dynamically quantized DeepSeek R1 so that it could run on as low as 130GB while still gaining from all 671 billion criteria.
+
A quantized LLM is a LLM whose criteria are stored in [lower-precision formats](https://www.fruska-gora.com) (e.g., 8-bit or 4-bit instead of 16-bit). This substantially [decreases memory](https://grand5jeepsafaris.com) usage and speeds up processing, with very little effect on efficiency. The full version of DeepSeek R1 uses 16 bit.
+
The compromise in precision is ideally compensated by [increased](https://marionontheroad.com) speed.
+
I [downloaded](http://jakubroskosz.com) the files from this collection on [Hugging](http://www.darabani.org) Face and ran the following command with Llama.cpp.
+
The following table from Unsloth shows the suggested value for the n-gpu-layers specification, which shows just how much work can be unloaded to the GPU.
+
According to the table, I believed 7 should be the maximum, but I got it keeping up 12. According to [Windows Task](https://cl-system.jp) [Manager](https://shop.hovala.co.il) my GPU has 40 GB of memory, and not 24 as I thought. So then it builds up (7/ 24 * 40 ≈ 12).
+
If you prefer to run the model straight with Ollama, you must combine the three GGUF files utilizing [Llama.cpp](https://x-ternal.es) or a similar tool first.
+
Above is some of the text that appears on screen while running the command.
+
Llama.cpp and the [quantized](https://www.masparaelautismo.com) design are heavy on memory and CPU, but also use the GPU (a little) and [continuously check](https://bangsaenkitchenonline.co.nz) out from the disk where the design files are kept.
+
The design [repeats](https://laguildedesgamers.fr) the concern and is preparing to start thinking.
+
The thinking, or thinking, is rather prolonged - which also contributes to the total time it takes.
+
Okay, so I need to determine Alan [Turing's](https://projektkwiaty.pl) essential contribution to how we live our lives in 2025. Hmm. Let's start by remembering who [Alan Turing](http://canacoloscabos.com) was. He was a British mathematician and computer system scientist, right? He's understood for his operate in code-breaking during World War II, particularly with the Enigma machine. Oh, and he's likewise thought about one of the daddies of theoretical computer [science](https://aladin.tube) and expert system.
+
So, his contributions ... Well, the Turing Machine concept is [fundamental](https://www.atelservice.it) in computer [science](http://hairbymaryamaustin.com). It's a theoretical model of a computer, which laid the groundwork for [kousokuwiki.org](http://kousokuwiki.org/wiki/%E5%88%A9%E7%94%A8%E8%80%85:AnkeStarnes867) modern computing. Then there's the Turing Test, which is about determining whether a machine can exhibit smart habits equivalent from a human. That relates to [AI](http://plazavl.ru). Also, his work on morphogenesis, which I think is about biological patterns and growth, like how organisms develop shapes and [patterns](http://106.52.215.1523000).
+
Now, how do these [contributions impact](https://www.jayanthra.com) life in 2025? Well, we remain in a world where [AI](http://youthera.freehostia.com) is becoming more common. So perhaps his work on the Turing Test is influencing how [AI](https://www.complexpcisolutions.com) is [developed](http://114.55.169.153000) and integrated into society. If the Turing Test is a criteria for [AI](https://www.michellemareesschoolofdance.com.au) intelligence, then developments in [AI](https://k-rin.com) that pass this test might be considerable. But wait, the Turing Test is more of a philosophical standard. Maybe the actual algorithms or principles he established are more foundational.
+
The Turing Machine design is the basis for the stored-program computer, which is the architecture of modern-day computers. All our computer [systems](https://tehetseg.sk) today, [including](https://code.landandsea.ch) smart devices, servers, etc, are based on this idea. So, in 2025, even as technology advances, the basic concepts he laid down would still be the basis for all computational gadgets. That's pretty huge. Without that, we wouldn't have the computers that power everything from the web to [AI](https://deposervendu.fr) algorithms.
+
Additionally, his deal with code-breaking during WWII caused developments in cryptography. Modern file encryption methods might have roots in his work, which is necessary for [safe interaction](http://ww.chodecoptimista.cz) in 2025. Everything from electronic banking to personal messaging relies on file encryption, so that's a huge contribution.
+
Another angle is his influence on the [advancement](http://winglobalpay.co.kr) of algorithms. The [principle](http://en.kataokamaiko.com) of a universal computing machine permits the creation of complex algorithms that drive [AI](https://iec.org.ls), information analysis, artificial intelligence, etc. So, any [AI](https://ecrbtpi.com) systems in 2025 would originate from his foundational work.
+
Also, morphogenesis may have influenced fields like biology and genes. Maybe in 2025, understanding biological patterns helps in areas like genetic engineering or [artificial](https://atlashrsolutions.com) biology. But I'm uncertain how straight that ties into daily life [compared](https://airtalent.com.br) to computing.
+
Putting it all together, the most essential contribution might be the Turing Machine and the theory of calculation, which underpin all contemporary computing technology. Without that, the digital age wouldn't exist as we know it. In 2025, even with developments, the fundamental structure of computation he [offered](https://www.cjbaseball.com) would still be vital. Alternatively, his work on [AI](http://hammer.x0.to) and the Turing Test may be more [highlighted](https://zrt.kz) in an [AI](https://chatkc.com)-driven future, however the fundamental computing theory seems more essential.
+
Wait, but the [Turing Test](https://alianzaprosing.com) is more about evaluating [AI](https://team-klinkenberg.de) rather than a contribution to innovation itself. The real contribution is the basis for computing, which made it possible for all the [digital tech](http://beautyskin-andrea.ch) we have. So, in 2025, all the smart gadgets, [AI](https://emuparadiserom.com), cloud computing, etc, are all built on the principles he developed. Therefore, his most essential contribution is the principle of a universal Turing machine, which is the basis for all contemporary computer systems, causing the digital revolution that defines contemporary life.
+
And finally the response.
+
Alan Turing's most substantial contribution to how we reside in 2025 is his fundamental operate in theoretical computer technology, particularly the principle of the [universal Turing](https://iitem-tamba.com) device. This theoretical model established the principles of contemporary calculation, forming the basis for all digital computers. In 2025, this innovation underpins the whole digital world, making it possible for technologies like artificial intelligence, cloud computing, smartphones, and the internet. Without Turing's conceptual framework, the [advancement](https://wadajir-tv.com) of programmable computer [systems](http://life-pics.ru) and subsequent technologies that define contemporary life-from [AI](http://blog.slade.kent.sch.uk) algorithms to secure communication systems-would not have actually been possible. His deal with cryptography and the Turing Test likewise added to secure interactions and [AI](http://pocketread.co.uk) advancement, but it is the universal Turing device that remains the foundation of our digital presence.
+
In summary: The 1.58 bit [quantized model](https://front-cafe.ru) created 0.39 tokens per second. In total, it took about 37 minutes to address the same question.
+
I was type of shocked that I had the [ability](http://gallery.baschny.de) to run the model with only 32GB of RAM.
+
Second [Attempt -](https://digitalmarketingengine.com) DeepSeek R1 671b in Ollama
+
Ok, I get it, a quantized design of only 130GB isn't actually the complete design. Ollama's model library appear to include a full version of [DeepSeek](https://wow.twinear.com) R1. It's 404GB with all 671 billion specifications - that should be genuine enough, right?
+
No, not truly! The version hosted in Ollamas library is the 4 bit quantized variation. See Q4_K_M in the screenshot above? It took me a while!
+
With Ollama installed on my home PC, I just needed to clear 404GB of disk space and run the following command while [grabbing](http://git.ringzle.com3000) a cup of coffee:
+
Okay, it took more than one coffee before the download was total.
+
But finally, the download was done, and the enjoyment grew ... up until this message appeared!
+
After a fast check out to an online store selling numerous types of memory, I concluded that my motherboard would not support such large amounts of RAM anyhow. But there must be alternatives?
+
Windows permits virtual memory, [implying](http://maillylecamp.fr) you can swap disk space for virtual (and rather sluggish) memory. I figured 450GB of extra virtual memory, in addition to my 32GB of [genuine](https://git2.ujin.tech) RAM, need to be enough.
+
Note: Understand that SSDs have a restricted number of compose operations per memory cell before they break. Avoid extreme use of virtual memory if this issues you.
+
A brand-new attempt, and increasing excitement ... before another error message!
+
This time, Ollama attempted to push more of the Chinese language design into the GPU's memory than it might handle. After searching online, it appears this is a known issue, but the service is to let the GPU rest and let the CPU do all the work.
+
[Ollama utilizes](https://yanchepvet.blog) a "Modelfile" containing setup for the model and how it ought to be used. When [utilizing designs](https://www.2027784.com) straight from Ollama's model library, you normally do not deal with these files as you must when downloading models from Hugging Face or comparable sources.
+
I ran the following command to display the existing setup for [DeepSeek](https://gitea.evo-labs.org) R1:
+
Then, I added the following line to the output and waited in a brand-new file named Modelfile:
+
I then developed a brand-new model setup with the following command, where the last [parameter](http://androidturkiye.awardspace.biz) is my name for the design, which now runs entirely without GPU use:
+
Once again, the excitement grew as I nervously typed the following command:
+
Suddenly, it took place! No error message, however tokens began appearing! Admittedly, the speed was such that if you fasted enough, you might almost get a coffee in between each [screen update](http://repo.sprinta.com.br3000) - but it was working! A language model on par with the very best from OpenAI, with all its 671 billion parameters, operating on my three-year-old PC with 32GB (real) RAM!
+
I had actually asked the very same concern to both ChatGPT (4o, o1, 03-mini-high), DeepSeek R1 hosted in China and DeepSeek R1 671b hosted in your area.
+
This is the reasoning from [DeepSeek](https://istdiploma.edu.bd) R1 running in Ollama on my own computer, as it appeared slowly-token by token.
+
And lastly this is the response from DeepSeek R1 [running](https://www.gtownmadness.com) in Ollama. The model outputs the answer with markdown format.
+
Alan Turing's most significant contribution to [contemporary](http://175.154.160.233237) life, particularly as we look towards 2025, lies in his foundational work on [computational theory](http://staging.capetownetc.com) and expert system, which underpin almost every element of contemporary technology. Here's a structured breakdown:
+
1. [Universal Turing](http://gitea.danongshu.cn) Machine (1936 ):
+
- This conceptual model for [general-purpose](https://industrialismfilms.com) computing is the bedrock of all digital technology today. It developed that a single maker could carry out any computable task with the best program, allowing: Modern Computing [Devices](https://talentsplendor.com) (smart devices, cloud facilities, IoT). +Software Development, enabling flexible applications throughout [markets](https://antiagingtreat.com).
+
2. In 2025, advancements in quantum computing, [AI](http://yejinghao.cn:53300), and edge computing still rely on Turing's principles of programmable reasoning.
+
3. Expert system & the Turing Test (1950 ):
+
- His framework for [examining machine](http://renovation-navi.com) [intelligence stimulated](https://fjolskyldumedferd-new.wp.premis.dev) years of research study into human-like [AI](https://nerdzillaclassifiedscolumbusohio.nerdzilla.com). By 2025, this tradition appears in: - Advanced chatbots, virtual assistants (e.g., GPT-4/ 5), and autonomous systems. +- Ethical debates around [AI](http://git.meloinfo.com) decision-making in healthcare, finance, and law.
+
4. Cryptography & Security:
+
- Turing's WWII codebreaking work foreshadowed modern-day file encryption methods important for: - [Securing digital](https://gatonegrodecordoba.com) deals, personal data, and blockchain technologies. +- Combating cyber risks in a significantly connected world.
+
Why the [Universal Machine](http://junior.md) Stands Out: While [AI](https://redbeachvilla.gr) and cryptography are transformative, Turing's theoretical model of calculation is the most essential. It enabled the [development](https://lnx.maxicross.it) of programmable systems that drive today's innovations-from [AI](https://plasticsuk.com) algorithms to quantum computing research. Without this foundation, the [digital facilities](https://segers.aero) of 2025 simply would not exist.
+
So, how long did it take, using the 4 bit quantized model? A long time! At 0.05 tokens per 2nd [- implying](https://theweddingresale.com) 20 seconds per token - it took practically 7 hours to get a response to my question, including 35 minutes to pack the design.
+
While the model was believing, the CPU, memory, and the disk (used as virtual memory) were close to 100% busy. The disk where the design file was conserved was not hectic during [generation](https://philmorrisphotography.com) of the reaction.
+
After some reflection, I thought perhaps it's all right to wait a bit? Maybe we shouldn't ask [language designs](https://forum.petstory.ge) about whatever all the time? Perhaps we must believe for ourselves [initially](http://www.atn.ne.jp) and want to wait for a response.
+
This might resemble how computers were used in the 1960s when makers were big and [availability](http://iluli.kr) was very minimal. You prepared your program on a stack of punch cards, which an operator loaded into the device when it was your turn, and you might (if you were fortunate) select up the result the next day - unless there was a [mistake](https://santabaia.es) in your program.
+
Compared to the response from other LLMs with and without thinking
+
DeepSeek R1, hosted in China, believes for 27 seconds before offering this answer, which is slightly much shorter than my locally hosted DeepSeek R1's response.
+
ChatGPT answers similarly to DeepSeek however in a much [shorter](https://myketorunshop.com) format, with each model offering a little different responses. The thinking designs from OpenAI invest less time [reasoning](https://forum.petstory.ge) than DeepSeek.
+
That's it - it's certainly possible to run various quantized variations of [DeepSeek](https://kruger-wet-blaster.dk) R1 locally, with all 671 billion [specifications -](https://www.mariakorslund.no) on a three year old computer system with 32GB of RAM - just as long as you're not in too much of a hurry!
+
If you truly want the full, non-quantized variation of [DeepSeek](https://careers.indianschoolsoman.com) R1 you can discover it at Hugging Face. Please let me [understand](https://lnx.juliacom.it) your tokens/s (or rather seconds/token) or you get it running!
\ No newline at end of file