Just one word - toss and turn: Run the large model locally - qwen2.5-coder:1.5b

Just one word - Messing around#

I suddenly wanted to see if my old notebook from seven or eight years ago, paired with a GTX1050 2G version, could still benefit from the AI large model boom.

Let's mess around with a local open-source large model to see how much residual heat this old buddy can still generate.

How to mess around conveniently and quickly? Ollama is a great choice; it is an open-source large language model service tool that supports Windows, macOS, and Linux, and various mainstream open-source large models. More importantly, it allows for one-click deployment of large models, which is really convenient.

Download and install ollama#

The official website of ollama is as follows: Ollama

Just install the version corresponding to your computer's operating system.

Choose a suitable local large model#

Click on models on the ollama homepage, and select specific models based on your local hardware performance and the advantages of the models.

Given the 1050 and its weak 2G memory, I can only choose a 1.b model to "play around." Find the corresponding model and copy the command for that model into the Windows command prompt "cmd." For example, I chose ollama run qwen2.5-coder:1.5b, and it will automatically install and run the corresponding large model. When it shows send a message (/? for help), it indicates that the large model has been installed and is ready for use.

After starting, check the GPU usage.

The 2G memory usage is 1.2G, indicating that the 1050 can still handle the "large model" of 1.5b.

Test the capabilities of qwen#

Here are some "not rigorous" tests.

The first question is naturally a self-introduction.
2. I am an AI assistant developed by Alibaba Cloud, designed to help users answer questions, provide information, and assist in completing various tasks. My purpose is to understand users' intentions and needs through natural language processing technology and provide personalized responses based on this information. I possess language understanding and generation capabilities, allowing me to converse with users and reason and make decisions based on context. Additionally, I regularly update my knowledge base to ensure that the answers I provide are accurate and reliable.
3. The response speed is quite fast, and the answer to the first question is standard. Let's see other questions next.
Testing some simple math problems.
2. Haha, there were no mistakes when comparing sizes, which is good.
Testing language proficiency.
2. Well, this answer is not very satisfactory. It couldn't recite "The Difficult Road to Shu," although I can't recite it either.
Finding some questions online to try.
1. Riding an electric bike home, speeding in an empty alley. Suddenly, a big uncle rushes out, also riding fast. It looks like a collision is imminent. The uncle shouts: "You left! I right!" Where's the punchline of this joke?
2. A: Protect our grandpa! B: Why only protect grandpa? Does Grandpa Han not need protection?
  
  What is the punchline of this joke?
3. From the following options, find the one that is different:
  1. Aluminum 2. Tin 3. Steel 4. Iron 5. Copper
4. Which is heavier, one kilogram of cotton or one kilogram of iron?
5. What is the sum of numbers from 1 to 101?
6. After testing these questions, it feels like the sky is falling. Indeed, with a "small" model, the responses are not satisfactory. Moreover, as the questions increased, the response generation speed dropped from over twenty tokens/s to below twenty.
Let's test the translation ability.

Let's try the programming ability.

Question: Write a Python program to print the value of each node in a binary tree.

class TreeNode:
    def __init__(self, value=0, left=None, right=None):
        self.value = value
        self.left = left
        self.right = right

def print_tree_nodes(node):
    if node is None:
        return

    # Print the current node's value
    print(node.value)

    # Recursively print the left subtree
    print_tree_nodes(node.left)

    # Recursively print the right subtree
    print_tree_nodes(node.right)

# Example usage
root = TreeNode(1)
root.left = TreeNode(2)
root.right = TreeNode(3)
root.left.left = TreeNode(4)
root.left.right = TreeNode(5)

print("In-order traversal of the tree:")
print_tree_nodes(root)

I tried to see if its code could run, and hey, it can run.

Finally#

Running large models on the 1050 is just one word - "play." The fun aspect far outweighs its practicality.

To have a better experience and more accurate answers, one still needs better equipment and larger models.

Next, I will try the local large model on the Mac's M4 when I have time to see how it performs.