因此,输出数据是每个单词的 one-hot 编码,它表示一种理想化的概率分布,即除了实际词位置之外所有词位置的值都为 0,实际词位置的值为 1。
# create sequences of images, input sequences and output words for an image
def create_sequences(tokenizer, max_length, descriptions, photos):
X1, X2, y = list(), list(), list()
# walk through each image identifier
for key, desc_list in descriptions.items():
# walk through each description for the image
for desc in desc_list:
# encode the sequence
seq = tokenizer.texts_to_sequences([desc])[0]
# split one sequence into multiple X,y pairs
for i in range(1, len(seq)):
# split into input and output pair
in_seq, out_seq = seq[:i], seq[i]
# pad input sequence
in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
# encode output sequence
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
# store
X1.append(photos[key][0])
X2.append(in_seq)
y.append(out_seq)
return array(X1), array(X2), array(y)
我们需要计算最长描述中单词的最大数量。下面是一个有帮助的函数 max_length()。
# calculate the length of the description with the most words
def max_length(descriptions):
lines = to_lines(descriptions)
return max(len(d.split()) for d in lines)
现在我们可以为训练和开发数据集加载数据,并将加载数据转换成输入-输出对来拟合深度学习模型。
定义模型
我们将根据 Marc Tanti, et al. 在 2017 年论文中描述的「merge-model」定义深度学习模型。
论文作者提供了该模型的简图,如下所示:
我们将从三部分描述该模型:
图像特征提取器模型的输入图像特征是维度为 4096 的向量,这些向量经过全连接层处理并生成图像的 256 元素表征。
序列处理器模型期望馈送至嵌入层的预定义长度(34 个单词)输入序列使用掩码来忽略 padded 值。之后是具备 256 个循环单元的 LSTM 层。
两个输入模型均输出 256 元素的向量。此外,输入模型以 50% 的 dropout 率使用正则化,旨在减少训练数据集的过拟合情况,因为该模型配置学习非常快。
解码器模型使用额外的操作融合来自两个输入模型的向量。然后将其馈送至 256 个神经元的密集层,然后输送至最终输出密集层,从而在所有输出词汇上对序列中的下一个单词进行 softmax 预测。
下面的 define_model() 函数定义和返回要拟合的模型。
# define the captioning model
def define_model(vocab_size, max_length):
# feature extractor model
inputs1 = Input(shape=(4096,))
fe1 = Dropout(0.5)(inputs1)
fe2 = Dense(256, activation='relu')(fe1)
# sequence model
inputs2 = Input(shape=(max_length,))
se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
se2 = Dropout(0.5)(se1)
se3 = LSTM(256)(se2)
# decoder model
decoder1 = add([fe2, se3])
decoder2 = Dense(256, activation='relu')(decoder1)
outputs = Dense(vocab_size, activation='softmax')(decoder2)
# tie it together [image, seq] [word]
model = Model(inputs=[inputs1, inputs2], outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
# summarize model
print(model.summary())
plot_model(model, to_file='model.png', show_shapes=True)
return model
要了解模型结构,特别是层的形状,请参考下表中的总结。
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
input_2 (InputLayer) (None, 34) 0
____________________________________________________________________________________________________
input_1 (InputLayer) (None, 4096) 0
____________________________________________________________________________________________________
embedding_1 (Embedding) (None, 34, 256) 1940224 input_2[0][0]
____________________________________________________________________________________________________
dropout_1 (Dropout) (None, 4096) 0 input_1[0][0]
____________________________________________________________________________________________________
dropout_2 (Dropout) (None, 34, 256) 0 embedding_1[0][0]
____________________________________________________________________________________________________
dense_1 (Dense) (None, 256) 1048832 dropout_1[0][0]
____________________________________________________________________________________________________
lstm_1 (LSTM) (None, 256) 525312 dropout_2[0][0]
____________________________________________________________________________________________________
add_1 (Add) (None, 256) 0 dense_1[0][0]
lstm_1[0][0]
____________________________________________________________________________________________________
dense_2 (Dense) (None, 256) 65792 add_1[0][0]
____________________________________________________________________________________________________
dense_3 (Dense) (None, 7579) 1947803 dense_2[0][0]
====================================================================================================
Total params: 5,527,963
Trainable params: 5,527,963
Non-trainable params: 0
____________________________________________________________________________________________________
我们还创建了一幅图来可视化网络结构,帮助理解两个输入流。
图像字幕生成深度学习模型示意图。
拟合模型
现在我们已经了解如何定义模型了,那么接下来我们要在训练数据集上拟合模型。
该模型学习速度快,很快就会对训练数据集产生过拟合。因此,我们需要在留出的开发数据集上监控训练模型的泛化情况。如果模型在开发数据集上的技能在每个 epoch 结束时有所提升,则我们将整个模型保存至文件。
在运行结束时,我们能够使用训练数据集上具备最优技能的模型作为最终模型。
通过在 Keras 中定义 ModelCheckpoint,使之监控验证数据集上的最小损失,我们可以实现以上目的。然后将该模型保存至文件名中包含训练损失和验证损失的文件中。
# define checkpoint callback
filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
之后,通过 fit() 中的 callbacks 参数指定检查点。我们还需要 fit() 中的 validation_data 参数指定开发数据集。
我们仅拟合模型 20 epoch,给出一定量的训练数据,在一般硬件上每个 epoch 可能需要 30 分钟。
# fit model
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
完成示例
在训练数据上拟合模型的完整示例如下:
from numpy import array
from pickle import load
from keras.preprocessing.text import Tokenizer
from keras.preprocessing.sequence import pad_sequences
from keras.utils import to_categorical
from keras.utils import plot_model
from keras.models import Model
from keras.layers import Input
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Embedding
from keras.layers import Dropout
from keras.layers.merge import add
from keras.callbacks import ModelCheckpoint
# load doc into memory
def load_doc(filename):
# open the file as read only
file = open(filename, 'r')
# read all text
text = file.read()
# close the file
file.close()
return text
# load a pre-defined list of photo identifiers
def load_set(filename):
doc = load_doc(filename)
dataset = list()
# process line by line
for line in doc.split('\n'):
# skip empty lines
if len(line) < 1:
continue
# get the image identifier
identifier = line.split('.')[0]
dataset.append(identifier)
return set(dataset)
# load clean descriptions into memory
def load_clean_descriptions(filename, dataset):
# load document
doc = load_doc(filename)
descriptions = dict()
for line in doc.split('\n'):
# split line by white space
tokens = line.split()
# split id from description
image_id, image_desc = tokens[0], tokens[1:]
# skip images not in the set
if image_id in dataset:
# create list
if image_id not in descriptions:
descriptions[image_id] = list()
# wrap description in tokens
desc = 'startseq ' + ' '.join(image_desc) + ' endseq'
# store
descriptions[image_id].append(desc)
return descriptions
# load photo features
def load_photo_features(filename, dataset):
# load all features
all_features = load(open(filename, 'rb'))
# filter features
features = {k: all_features[k] for k in dataset}
return features
# covert a dictionary of clean descriptions to a list of descriptions
def to_lines(descriptions):
all_desc = list()
for key in descriptions.keys():
[all_desc.append(d) for d in descriptions[key]]
return all_desc
# fit a tokenizer given caption descriptions
def create_tokenizer(descriptions):
lines = to_lines(descriptions)
tokenizer = Tokenizer()
tokenizer.fit_on_texts(lines)
return tokenizer
# calculate the length of the description with the most words
def max_length(descriptions):
lines = to_lines(descriptions)
return max(len(d.split()) for d in lines)
# create sequences of images, input sequences and output words for an image
def create_sequences(tokenizer, max_length, descriptions, photos):
X1, X2, y = list(), list(), list()
# walk through each image identifier
for key, desc_list in descriptions.items():
# walk through each description for the image
for desc in desc_list:
# encode the sequence
seq = tokenizer.texts_to_sequences([desc])[0]
# split one sequence into multiple X,y pairs
for i in range(1, len(seq)):
# split into input and output pair
in_seq, out_seq = seq[:i], seq[i]
# pad input sequence
in_seq = pad_sequences([in_seq], maxlen=max_length)[0]
# encode output sequence
out_seq = to_categorical([out_seq], num_classes=vocab_size)[0]
# store
X1.append(photos[key][0])
X2.append(in_seq)
y.append(out_seq)
return array(X1), array(X2), array(y)
# define the captioning model
def define_model(vocab_size, max_length):
# feature extractor model
inputs1 = Input(shape=(4096,))
fe1 = Dropout(0.5)(inputs1)
fe2 = Dense(256, activation='relu')(fe1)
# sequence model
inputs2 = Input(shape=(max_length,))
se1 = Embedding(vocab_size, 256, mask_zero=True)(inputs2)
se2 = Dropout(0.5)(se1)
se3 = LSTM(256)(se2)
# decoder model
decoder1 = add([fe2, se3])
decoder2 = Dense(256, activation='relu')(decoder1)
outputs = Dense(vocab_size, activation='softmax')(decoder2)
# tie it together [image, seq] [word]
model = Model(inputs=[inputs1, inputs2], outputs=outputs)
model.compile(loss='categorical_crossentropy', optimizer='adam')
# summarize model
print(model.summary())
plot_model(model, to_file='model.png', show_shapes=True)
return model
# train dataset
# load training dataset (6K)
filename = 'Flickr8k_text/Flickr_8k.trainImages.txt'
train = load_set(filename)
print('Dataset: %d' % len(train))
# descriptions
train_descriptions = load_clean_descriptions('descriptions.txt', train)
print('Descriptions: train=%d' % len(train_descriptions))
# photo features
train_features = load_photo_features('features.pkl', train)
print('Photos: train=%d' % len(train_features))
# prepare tokenizer
tokenizer = create_tokenizer(train_descriptions)
vocab_size = len(tokenizer.word_index) + 1
print('Vocabulary Size: %d' % vocab_size)
# determine the maximum sequence length
max_length = max_length(train_descriptions)
print('Description Length: %d' % max_length)
# prepare sequences
X1train, X2train, ytrain = create_sequences(tokenizer, max_length, train_descriptions, train_features)
# dev dataset
# load test set
filename = 'Flickr8k_text/Flickr_8k.devImages.txt'
test = load_set(filename)
print('Dataset: %d' % len(test))
# descriptions
test_descriptions = load_clean_descriptions('descriptions.txt', test)
print('Descriptions: test=%d' % len(test_descriptions))
# photo features
test_features = load_photo_features('features.pkl', test)
print('Photos: test=%d' % len(test_features))
# prepare sequences
X1test, X2test, ytest = create_sequences(tokenizer, max_length, test_descriptions, test_features)
# fit model
# define the model
model = define_model(vocab_size, max_length)
# define checkpoint callback
filepath = 'model-ep{epoch:03d}-loss{loss:.3f}-val_loss{val_loss:.3f}.h5'
checkpoint = ModelCheckpoint(filepath, monitor='val_loss', verbose=1, save_best_only=True, mode='min')
# fit model
model.fit([X1train, X2train], ytrain, epochs=20, verbose=2, callbacks=[checkpoint], validation_data=([X1test, X2test], ytest))
运行该示例首先打印加载训练和开发数据集的摘要。
Dataset: 6,000
Descriptions: train=6,000
Photos: train=6,000
Vocabulary Size: 7,579
Description Length: 34
Dataset: 1,000
Descriptions: test=1,000
Photos: test=1,000
之后,我们可以了解训练和验证(开发)输入-输出对的整体数量。
Train on 306,404 samples, validate on 50,903 samples
然后运行模型,将最优模型保存至.h5 文件。
在运行过程中,我把最优验证结果的模型保存至文件中:
该模型在第 2 个 epoch 中结束时被保存,在训练数据集上的损失为 3.245,在开发数据集上的损失为 3.612,每个人的具体结果不同。如果你在 AWS 中运行上述示例,那么将模型文件复制回你当前的工作文件夹。
原文来自:机器之心
声明:所有来源为“聚合数据”的内容信息,未经本网许可,不得转载!如对内容有异议或投诉,请与我们联系。邮箱:marketing@think-land.com
通过企业关键词查询企业涉讼详情,如裁判文书、开庭公告、执行公告、失信公告、案件流程等等。
IP反查域名是通过IP查询相关联的域名信息的功能,它提供IP地址历史上绑定过的域名信息。
结合权威身份认证的精准人脸风险查询服务,提升人脸应用及身份认证生态的安全性。人脸风险情报库,覆盖范围广、准确性高,数据权威可靠。
全国城市和站点空气质量查询,污染物浓度及空气质量分指数、空气质量指数、首要污染物及空气质量级别、健康指引及建议采取的措施等。
输入手机号和拦截等级,查看是否是风险号码