[CODE] 🧑‍💻Data Representations¶

Visualization of Images as Tensors¶

Tensors are a specialized data structure that are very similar to arrays and matrices.
The below code snippet visualizes the grayscale images from MNIST dataset in form of tensors with values between 0 to 255. 0 is for black and 255 for white, while the numbers in between are the different shades of gray.
The images are in grayscale here, but for normal images, there will be more than one channels. For example, the colored RGB images have 3 channels denoting intensity of red, green and blue vaues. This gives the shape of tensor as \((n\_channels \times height \times width )\).

import torchvision
import torch
import matplotlib.pyplot as plt
import os
from matplotlib import rc
from matplotlib.animation import FuncAnimation
from matplotlib import animation
import matplotlib.image as mpimg
from matplotlib import animation

rc("animation", html="jshtml")

frn = 10  # Number of frames to process in the animation
fps = 0.5  # Frames per second
mywriter = animation.PillowWriter(fps=fps)

mnist_dataset = torchvision.datasets.MNIST(
    root="data/mnist",
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)

if not os.path.exists("assets/gif/image0"):
    os.makedirs("assets/gif/image0")

for loop_idx, (image_tensor, label) in enumerate(mnist_dataset):
    fig, ax = plt.subplots(figsize=(10, 10))
    image_tensor_gray = image_tensor[0]
    image_tensor_gray = image_tensor_gray * 255
    ax.matshow(image_tensor_gray, cmap="gray")
    for i in range(image_tensor_gray.shape[0]):
        for j in range(image_tensor_gray.shape[1]):
            ax.text(
                i,
                j,
                str(int(image_tensor_gray[j][i].item())),
                va="center",
                ha="center",
                color="blue",
                fontsize="small",
            )

    plt.axis("off")
    plt.tight_layout()
    plt.savefig(f"assets/gif/image0/{loop_idx}.png")
    plt.close(fig)
    if loop_idx >= frn:
        break

fig, ax = plt.subplots(figsize=(10, 10))

plot = [ax.imshow(mpimg.imread(f"assets/gif/image0/0.png"))]


def change_plot(frame_number):
    plot[0].remove()
    plt.axis("off")
    plt.tight_layout()
    plot[0] = ax.imshow(mpimg.imread(f"assets/gif/image0/{frame_number}.png"))


ani = FuncAnimation(fig, change_plot, frn, interval=1000 / fps)
plt.tight_layout()
display(ani)
ani.save("mnist_gray_values.gif", writer=mywriter)
plt.clf()
plt.close(fig)

cifar_dataset = torchvision.datasets.CIFAR10(
    root="data/cifar",
    train=True,
    download=True,
    transform=torchvision.transforms.ToTensor(),
)

if not os.path.exists("assets/gif/image1"):
    os.makedirs("assets/gif/image1")

for image_tensor, label in cifar_dataset:
    fig, (ax1, ax2, ax3, ax4) = plt.subplots(nrows=1, ncols=4, figsize=(30, 10))
    image_tensor *= 255
    image_tensor = image_tensor.int()
    plt.axis("off")
    plt.tight_layout()
    ax1.imshow(image_tensor[0], cmap="Reds")
    ax2.imshow(image_tensor[1], cmap="Greens")
    ax3.imshow(image_tensor[2], cmap="Blues")
    ax4.imshow(image_tensor.permute(1, 2, 0))
    for i in range(image_tensor[0].shape[0]):
        for j in range(image_tensor[0].shape[1]):
            ax1.text(
                i,
                j,
                str(int(image_tensor[0][j][i].item())),
                va="center",
                ha="center",
                color="blue",
                fontsize="x-small",
            )
            ax2.text(
                i,
                j,
                str(int(image_tensor[1][j][i].item())),
                va="center",
                ha="center",
                color="blue",
                fontsize="x-small",
            )
            ax3.text(
                i,
                j,
                str(int(image_tensor[2][j][i].item())),
                va="center",
                ha="center",
                color="blue",
                fontsize="x-small",
            )
    fig.savefig("assets/gif/image1/0.png", dpi=500)
    plt.close(fig)
    break

fig, ax = plt.subplots(figsize=(30, 10))
plt.axis("off")
plt.tight_layout()
ax.imshow(mpimg.imread("assets/gif/image1/0.png"))

Files already downloaded and verified

<matplotlib.image.AxesImage at 0x7f8fac1181d0>

../_images/2022_01_03_data_representations_notebook_5_2.png

Document as “Bag of Words” model¶

Now, we move to object representation of a document, which can be seen as a sequence of letters. But the distribution of frequency of letters in every English documents would be somewhat similar and not good for representation. Hence, we take individual words as the smallest unit.

For Bag of Words model, there are multiple steps. We will explore the different steps with the help of an example from IMDB reviews dataset.

import torchtext
import re

imdb_dataset = torchtext.datasets.IMDB(root="./data/imdb", split="test")

item = next(imdb_dataset)
label = item[0]
review_text = item[1]

display(review_text)

'I love sci-fi and am willing to put up with a lot. Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood. I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original). Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn\'t match the background, and painfully one-dimensional characters cannot be overcome with a \'sci-fi\' setting. (I\'m sure there are those of you out there who think Babylon 5 is good sci-fi TV. It\'s not. It\'s clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf. Star Trek). It may treat important issues, yet not as a serious philosophy. It\'s really difficult to care about the characters here as they are not simply foolish, just missing a spark of life. Their actions and reactions are wooden and predictable, often painful to watch. The makers of Earth KNOW it\'s rubbish as they have to always say "Gene Roddenberry\'s Earth..." otherwise people would not continue watching. Roddenberry\'s ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks really brings this home) trudging Trabant of a show lumbers into space. Spoiler. So, kill off a main character. And then bring him back as another actor. Jeeez! Dallas all over again.'

Step 1: Breaking the text into sentences

review_sent_tokens = re.split(
    r"(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s", review_text
)  # Use this regex instead of simple .split("."), as review contains "..."
display(review_sent_tokens)

['I love sci-fi and am willing to put up with a lot.',
 'Sci-fi movies/TV are usually underfunded, under-appreciated and misunderstood.',
 'I tried to like this, I really did, but it is to good TV sci-fi as Babylon 5 is to Star Trek (the original).',
 "Silly prosthetics, cheap cardboard sets, stilted dialogues, CG that doesn't match the background, and painfully one-dimensional characters cannot be overcome with a 'sci-fi' setting.",
 "(I'm sure there are those of you out there who think Babylon 5 is good sci-fi TV.",
 "It's not.",
 "It's clichéd and uninspiring.) While US viewers might like emotion and character development, sci-fi is a genre that does not take itself seriously (cf.",
 'Star Trek).',
 'It may treat important issues, yet not as a serious philosophy.',
 "It's really difficult to care about the characters here as they are not simply foolish, just missing a spark of life.",
 'Their actions and reactions are wooden and predictable, often painful to watch.',
 'The makers of Earth KNOW it\'s rubbish as they have to always say "Gene Roddenberry\'s Earth..." otherwise people would not continue watching.',
 "Roddenberry's ashes must be turning in their orbit as this dull, cheap, poorly edited (watching it without advert breaks really brings this home) trudging Trabant of a show lumbers into space.",
 'Spoiler.',
 'So, kill off a main character.',
 'And then bring him back as another actor.',
 'Jeeez! Dallas all over again.']

review_sent_tokens = review_sent_tokens[
    :5
]  # Take only first 3 sentences for illustration

Step 2: Breaking the sentences into words (tokenization)

def tokenise(sentence):
    # split the sentence into units (words or phrases)
    return re.findall("[A-Z]{2,}(?![a-z])|[A-Z][a-z]+(?=[A-Z])|['\w\-]+", sentence)


review_word_tokens = [tokenise(sent) for sent in review_sent_tokens]
print(*review_word_tokens, sep="\n")

['I', 'love', 'sci-fi', 'and', 'am', 'willing', 'to', 'put', 'up', 'with', 'a', 'lot']
['Sci-fi', 'movies', 'TV', 'are', 'usually', 'underfunded', 'under-appreciated', 'and', 'misunderstood']
['I', 'tried', 'to', 'like', 'this', 'I', 'really', 'did', 'but', 'it', 'is', 'to', 'good', 'TV', 'sci-fi', 'as', 'Babylon', '5', 'is', 'to', 'Star', 'Trek', 'the', 'original']
['Silly', 'prosthetics', 'cheap', 'cardboard', 'sets', 'stilted', 'dialogues', 'CG', 'that', "doesn't", 'match', 'the', 'background', 'and', 'painfully', 'one-dimensional', 'characters', 'cannot', 'be', 'overcome', 'with', 'a', "'sci-fi'", 'setting']
["I'm", 'sure', 'there', 'are', 'those', 'of', 'you', 'out', 'there', 'who', 'think', 'Babylon', '5', 'is', 'good', 'sci-fi', 'TV']

Step 3a: Stemming

import nltk
from nltk.stem import PorterStemmer

ps = PorterStemmer()


def stem_sentence(sent_word_tokens):
    return [ps.stem(word) for word in sent_word_tokens]


review_word_stem_tokens = [
    stem_sentence(sent_word_tokens) for sent_word_tokens in review_word_tokens
]
print("Stemmer::")
print(*review_word_stem_tokens, sep="\n")

Stemmer::
['I', 'love', 'sci-fi', 'and', 'am', 'will', 'to', 'put', 'up', 'with', 'a', 'lot']
['sci-fi', 'movi', 'TV', 'are', 'usual', 'underfund', 'under-appreci', 'and', 'misunderstood']
['I', 'tri', 'to', 'like', 'thi', 'I', 'realli', 'did', 'but', 'it', 'is', 'to', 'good', 'TV', 'sci-fi', 'as', 'babylon', '5', 'is', 'to', 'star', 'trek', 'the', 'origin']
['silli', 'prosthet', 'cheap', 'cardboard', 'set', 'stilt', 'dialogu', 'CG', 'that', "doesn't", 'match', 'the', 'background', 'and', 'pain', 'one-dimension', 'charact', 'cannot', 'be', 'overcom', 'with', 'a', "'sci-fi'", 'set']
["i'm", 'sure', 'there', 'are', 'those', 'of', 'you', 'out', 'there', 'who', 'think', 'babylon', '5', 'is', 'good', 'sci-fi', 'TV']

Step 3b: Lemmatization

from nltk.stem import WordNetLemmatizer

nltk.download("wordnet")
wordnet_lemmatizer = WordNetLemmatizer()


def lemmatize_sentence(sent_word_tokens):
    return [wordnet_lemmatizer.lemmatize(word) for word in sent_word_tokens]


review_word_lemma_tokens = [
    lemmatize_sentence(sent_word_tokens) for sent_word_tokens in review_word_tokens
]
print("Lemmatizer::")
print(*review_word_lemma_tokens, sep="\n")

[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Unzipping corpora/wordnet.zip.
Lemmatizer::
['I', 'love', 'sci-fi', 'and', 'am', 'willing', 'to', 'put', 'up', 'with', 'a', 'lot']
['Sci-fi', 'movie', 'TV', 'are', 'usually', 'underfunded', 'under-appreciated', 'and', 'misunderstood']
['I', 'tried', 'to', 'like', 'this', 'I', 'really', 'did', 'but', 'it', 'is', 'to', 'good', 'TV', 'sci-fi', 'a', 'Babylon', '5', 'is', 'to', 'Star', 'Trek', 'the', 'original']
['Silly', 'prosthetics', 'cheap', 'cardboard', 'set', 'stilted', 'dialogue', 'CG', 'that', "doesn't", 'match', 'the', 'background', 'and', 'painfully', 'one-dimensional', 'character', 'cannot', 'be', 'overcome', 'with', 'a', "'sci-fi'", 'setting']
["I'm", 'sure', 'there', 'are', 'those', 'of', 'you', 'out', 'there', 'who', 'think', 'Babylon', '5', 'is', 'good', 'sci-fi', 'TV']

Step 4: Removing Stopwords

from nltk.corpus import stopwords

nltk.download("stopwords")


def remove_stop_words(sent_word_tokens):
    return [word for word in sent_word_tokens if word not in stopwords.words("english")]


review_word_stem_tokens = [
    remove_stop_words(sent_word_tokens) for sent_word_tokens in review_word_stem_tokens
]
print(*review_word_stem_tokens, sep="\n")

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Unzipping corpora/stopwords.zip.
['I', 'love', 'sci-fi', 'put', 'lot']
['sci-fi', 'movi', 'TV', 'usual', 'underfund', 'under-appreci', 'misunderstood']
['I', 'tri', 'like', 'thi', 'I', 'realli', 'good', 'TV', 'sci-fi', 'babylon', '5', 'star', 'trek', 'origin']
['silli', 'prosthet', 'cheap', 'cardboard', 'set', 'stilt', 'dialogu', 'CG', 'match', 'background', 'pain', 'one-dimension', 'charact', 'cannot', 'overcom', "'sci-fi'", 'set']
["i'm", 'sure', 'think', 'babylon', '5', 'good', 'sci-fi', 'TV']

Step 5: Building Unigrams, Bigrams, Trigrams, Skip-grams etc.

Unigrams

import pandas as pd
import numpy as np

unigram_vocab = {}
for sent in review_word_stem_tokens:
    for word in sent:
        if word not in unigram_vocab:
            unigram_vocab[word] = len(unigram_vocab)

review_sent_count_vectors = np.zeros(
    (len(review_word_stem_tokens), len(unigram_vocab)), dtype=np.int32
)

for sent_idx in range(len(review_word_stem_tokens)):
    for word in review_word_stem_tokens[sent_idx]:
        review_sent_count_vectors[sent_idx][unigram_vocab[word]] += 1

df = pd.DataFrame(
    data=review_sent_count_vectors, columns=sorted(unigram_vocab, key=unigram_vocab.get)
)
display(df)

	I	love	sci-fi	put	lot	movi	TV	usual	underfund	under-appreci	misunderstood	tri	like	thi	realli	good	babylon	5	star	trek	origin	silli	prosthet	cheap	cardboard	set	stilt	dialogu	CG	match	background	pain	one-dimension	charact	cannot	overcom	'sci-fi'	i'm	sure	think
0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	1	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	2	0	1	0	0	0	1	0	0	0	0	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	2	1	1	1	1	1	1	1	1	1	1	1	0	0	0
4	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1

Bigrams

bigram_vocab = {}
for sent in review_word_stem_tokens:
    for word_idx in range(len(sent) - 1):
        bigram = sent[word_idx] + str(" ") + sent[word_idx + 1]
        if bigram not in bigram_vocab:
            bigram_vocab[bigram] = len(bigram_vocab)

review_sent_bigram_count_vectors = np.zeros(
    (len(review_word_stem_tokens), len(bigram_vocab)), dtype=np.int32
)

for sent_idx in range(len(review_word_stem_tokens)):
    for word_idx in range(len(review_word_stem_tokens[sent_idx]) - 1):
        bigram = (
            review_word_stem_tokens[sent_idx][word_idx]
            + str(" ")
            + review_word_stem_tokens[sent_idx][word_idx + 1]
        )
        review_sent_bigram_count_vectors[sent_idx][bigram_vocab[bigram]] += 1

df = pd.DataFrame(
    data=review_sent_bigram_count_vectors,
    columns=sorted(bigram_vocab, key=bigram_vocab.get),
)
display(df)

	I love	love sci-fi	sci-fi put	put lot	sci-fi movi	movi TV	TV usual	usual underfund	underfund under-appreci	under-appreci misunderstood	I tri	tri like	like thi	thi I	I realli	realli good	good TV	TV sci-fi	sci-fi babylon	babylon 5	5 star	star trek	trek origin	silli prosthet	prosthet cheap	cheap cardboard	cardboard set	set stilt	stilt dialogu	dialogu CG	CG match	match background	background pain	pain one-dimension	one-dimension charact	charact cannot	cannot overcom	overcom 'sci-fi'	'sci-fi' set	i'm sure	sure think	think babylon	5 good	good sci-fi	sci-fi TV
0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

Trigrams

trigram_vocab = {}
for sent in review_word_stem_tokens:
    for word_idx in range(len(sent) - 2):
        trigram = (
            sent[word_idx]
            + str(" ")
            + sent[word_idx + 1]
            + str(" ")
            + sent[word_idx + 2]
        )
        if trigram not in trigram_vocab:
            trigram_vocab[trigram] = len(trigram_vocab)

review_sent_trigram_count_vectors = np.zeros(
    (len(review_word_stem_tokens), len(trigram_vocab)), dtype=np.int32
)

for sent_idx in range(len(review_word_stem_tokens)):
    for word_idx in range(len(review_word_stem_tokens[sent_idx]) - 2):
        trigram = (
            review_word_stem_tokens[sent_idx][word_idx]
            + str(" ")
            + review_word_stem_tokens[sent_idx][word_idx + 1]
            + str(" ")
            + review_word_stem_tokens[sent_idx][word_idx + 2]
        )
        review_sent_trigram_count_vectors[sent_idx][trigram_vocab[trigram]] += 1

df = pd.DataFrame(
    data=review_sent_trigram_count_vectors,
    columns=sorted(trigram_vocab, key=trigram_vocab.get),
)
display(df)

	I love sci-fi	love sci-fi put	sci-fi put lot	sci-fi movi TV	movi TV usual	TV usual underfund	usual underfund under-appreci	underfund under-appreci misunderstood	I tri like	tri like thi	like thi I	thi I realli	I realli good	realli good TV	good TV sci-fi	TV sci-fi babylon	sci-fi babylon 5	babylon 5 star	5 star trek	star trek origin	silli prosthet cheap	prosthet cheap cardboard	cheap cardboard set	cardboard set stilt	set stilt dialogu	stilt dialogu CG	dialogu CG match	CG match background	match background pain	background pain one-dimension	pain one-dimension charact	one-dimension charact cannot	charact cannot overcom	cannot overcom 'sci-fi'	overcom 'sci-fi' set	i'm sure think	sure think babylon	think babylon 5	babylon 5 good	5 good sci-fi	good sci-fi TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

Skip-1-gram

skip_1_gram_vocab = {}
for sent in review_word_stem_tokens:
    for word_idx in range(len(sent) - 2):
        skipgram = sent[word_idx] + str(" ") + sent[word_idx + 2]
        if skipgram not in skip_1_gram_vocab:
            skip_1_gram_vocab[skipgram] = len(skip_1_gram_vocab)

review_sent_skipgram_count_vectors = np.zeros(
    (len(review_word_stem_tokens), len(skip_1_gram_vocab)), dtype=np.int32
)

for sent_idx in range(len(review_word_stem_tokens)):
    for word_idx in range(len(review_word_stem_tokens[sent_idx]) - 2):
        skipgram = (
            review_word_stem_tokens[sent_idx][word_idx]
            + str(" ")
            + review_word_stem_tokens[sent_idx][word_idx + 2]
        )
        review_sent_skipgram_count_vectors[sent_idx][skip_1_gram_vocab[skipgram]] += 1

df = pd.DataFrame(
    data=review_sent_skipgram_count_vectors,
    columns=sorted(skip_1_gram_vocab, key=skip_1_gram_vocab.get),
)
display(df)

	I sci-fi	love put	sci-fi lot	sci-fi TV	movi usual	TV underfund	usual under-appreci	underfund misunderstood	I like	tri thi	like I	thi realli	I good	realli TV	good sci-fi	TV babylon	sci-fi 5	babylon star	5 trek	star origin	silli cheap	prosthet cardboard	cheap set	cardboard stilt	set dialogu	stilt CG	dialogu match	CG background	match pain	background one-dimension	pain charact	one-dimension cannot	charact overcom	cannot 'sci-fi'	overcom set	i'm think	sure babylon	think 5	babylon good	5 sci-fi	good TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

CS328-2022 Notes

[CODE] 🧑‍💻Data Representations

Contents

[CODE] 🧑‍💻Data Representations¶

Visualization of Images as Tensors¶

Document as “Bag of Words” model¶

	I	love	sci-fi	put	lot	movi	TV	usual	underfund	under-appreci	misunderstood	tri	like	thi	realli	good	babylon	5	star	trek	origin	silli	prosthet	cheap	cardboard	set	stilt	dialogu	CG	match	background	pain	one-dimension	charact	cannot	overcom	'sci-fi'	i'm	sure	think
0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	1	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	2	0	1	0	0	0	1	0	0	0	0	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	2	1	1	1	1	1	1	1	1	1	1	1	0	0	0
4	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1

	I love	love sci-fi	sci-fi put	put lot	sci-fi movi	movi TV	TV usual	usual underfund	underfund under-appreci	under-appreci misunderstood	I tri	tri like	like thi	thi I	I realli	realli good	good TV	TV sci-fi	sci-fi babylon	babylon 5	5 star	star trek	trek origin	silli prosthet	prosthet cheap	cheap cardboard	cardboard set	set stilt	stilt dialogu	dialogu CG	CG match	match background	background pain	pain one-dimension	one-dimension charact	charact cannot	cannot overcom	overcom 'sci-fi'	'sci-fi' set	i'm sure	sure think	think babylon	5 good	good sci-fi	sci-fi TV
0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I love sci-fi	love sci-fi put	sci-fi put lot	sci-fi movi TV	movi TV usual	TV usual underfund	usual underfund under-appreci	underfund under-appreci misunderstood	I tri like	tri like thi	like thi I	thi I realli	I realli good	realli good TV	good TV sci-fi	TV sci-fi babylon	sci-fi babylon 5	babylon 5 star	5 star trek	star trek origin	silli prosthet cheap	prosthet cheap cardboard	cheap cardboard set	cardboard set stilt	set stilt dialogu	stilt dialogu CG	dialogu CG match	CG match background	match background pain	background pain one-dimension	pain one-dimension charact	one-dimension charact cannot	charact cannot overcom	cannot overcom 'sci-fi'	overcom 'sci-fi' set	i'm sure think	sure think babylon	think babylon 5	babylon 5 good	5 good sci-fi	good sci-fi TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I sci-fi	love put	sci-fi lot	sci-fi TV	movi usual	TV underfund	usual under-appreci	underfund misunderstood	I like	tri thi	like I	thi realli	I good	realli TV	good sci-fi	TV babylon	sci-fi 5	babylon star	5 trek	star origin	silli cheap	prosthet cardboard	cheap set	cardboard stilt	set dialogu	stilt CG	dialogu match	CG background	match pain	background one-dimension	pain charact	one-dimension cannot	charact overcom	cannot 'sci-fi'	overcom set	i'm think	sure babylon	think 5	babylon good	5 sci-fi	good TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I	love	sci-fi	put	lot	movi	TV	usual	underfund	under-appreci	misunderstood	tri	like	thi	realli	good	babylon	5	star	trek	origin	silli	prosthet	cheap	cardboard	set	stilt	dialogu	CG	match	background	pain	one-dimension	charact	cannot	overcom	'sci-fi'	i'm	sure	think
0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	1	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	2	0	1	0	0	0	1	0	0	0	0	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	2	1	1	1	1	1	1	1	1	1	1	1	0	0	0
4	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1

	I love	love sci-fi	sci-fi put	put lot	sci-fi movi	movi TV	TV usual	usual underfund	underfund under-appreci	under-appreci misunderstood	I tri	tri like	like thi	thi I	I realli	realli good	good TV	TV sci-fi	sci-fi babylon	babylon 5	5 star	star trek	trek origin	silli prosthet	prosthet cheap	cheap cardboard	cardboard set	set stilt	stilt dialogu	dialogu CG	CG match	match background	background pain	pain one-dimension	one-dimension charact	charact cannot	cannot overcom	overcom 'sci-fi'	'sci-fi' set	i'm sure	sure think	think babylon	5 good	good sci-fi	sci-fi TV
0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I love sci-fi	love sci-fi put	sci-fi put lot	sci-fi movi TV	movi TV usual	TV usual underfund	usual underfund under-appreci	underfund under-appreci misunderstood	I tri like	tri like thi	like thi I	thi I realli	I realli good	realli good TV	good TV sci-fi	TV sci-fi babylon	sci-fi babylon 5	babylon 5 star	5 star trek	star trek origin	silli prosthet cheap	prosthet cheap cardboard	cheap cardboard set	cardboard set stilt	set stilt dialogu	stilt dialogu CG	dialogu CG match	CG match background	match background pain	background pain one-dimension	pain one-dimension charact	one-dimension charact cannot	charact cannot overcom	cannot overcom 'sci-fi'	overcom 'sci-fi' set	i'm sure think	sure think babylon	think babylon 5	babylon 5 good	5 good sci-fi	good sci-fi TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I sci-fi	love put	sci-fi lot	sci-fi TV	movi usual	TV underfund	usual under-appreci	underfund misunderstood	I like	tri thi	like I	thi realli	I good	realli TV	good sci-fi	TV babylon	sci-fi 5	babylon star	5 trek	star origin	silli cheap	prosthet cardboard	cheap set	cardboard stilt	set dialogu	stilt CG	dialogu match	CG background	match pain	background one-dimension	pain charact	one-dimension cannot	charact overcom	cannot 'sci-fi'	overcom set	i'm think	sure babylon	think 5	babylon good	5 sci-fi	good TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I	love	sci-fi	put	lot	movi	TV	usual	underfund	under-appreci	misunderstood	tri	like	thi	realli	good	babylon	5	star	trek	origin	silli	prosthet	cheap	cardboard	set	stilt	dialogu	CG	match	background	pain	one-dimension	charact	cannot	overcom	'sci-fi'	i'm	sure	think
0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	1	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	2	0	1	0	0	0	1	0	0	0	0	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	2	1	1	1	1	1	1	1	1	1	1	1	0	0	0
4	0	0	1	0	0	0	1	0	0	0	0	0	0	0	0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1

	I love	love sci-fi	sci-fi put	put lot	sci-fi movi	movi TV	TV usual	usual underfund	underfund under-appreci	under-appreci misunderstood	I tri	tri like	like thi	thi I	I realli	realli good	good TV	TV sci-fi	sci-fi babylon	babylon 5	5 star	star trek	trek origin	silli prosthet	prosthet cheap	cheap cardboard	cardboard set	set stilt	stilt dialogu	dialogu CG	CG match	match background	background pain	pain one-dimension	one-dimension charact	charact cannot	cannot overcom	overcom 'sci-fi'	'sci-fi' set	i'm sure	sure think	think babylon	5 good	good sci-fi	sci-fi TV
0	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	0	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I love sci-fi	love sci-fi put	sci-fi put lot	sci-fi movi TV	movi TV usual	TV usual underfund	usual underfund under-appreci	underfund under-appreci misunderstood	I tri like	tri like thi	like thi I	thi I realli	I realli good	realli good TV	good TV sci-fi	TV sci-fi babylon	sci-fi babylon 5	babylon 5 star	5 star trek	star trek origin	silli prosthet cheap	prosthet cheap cardboard	cheap cardboard set	cardboard set stilt	set stilt dialogu	stilt dialogu CG	dialogu CG match	CG match background	match background pain	background pain one-dimension	pain one-dimension charact	one-dimension charact cannot	charact cannot overcom	cannot overcom 'sci-fi'	overcom 'sci-fi' set	i'm sure think	sure think babylon	think babylon 5	babylon 5 good	5 good sci-fi	good sci-fi TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1

	I sci-fi	love put	sci-fi lot	sci-fi TV	movi usual	TV underfund	usual under-appreci	underfund misunderstood	I like	tri thi	like I	thi realli	I good	realli TV	good sci-fi	TV babylon	sci-fi 5	babylon star	5 trek	star origin	silli cheap	prosthet cardboard	cheap set	cardboard stilt	set dialogu	stilt CG	dialogu match	CG background	match pain	background one-dimension	pain charact	one-dimension cannot	charact overcom	cannot 'sci-fi'	overcom set	i'm think	sure babylon	think 5	babylon good	5 sci-fi	good TV
0	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
1	0	0	0	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
2	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0
3	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1	1	1	1	1	1	1	1	1	1	0	0	0	0	0	0
4	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	0	1	1	1	1	1	1