按照 tensorflow 的教程并尝试使用多标签输入功能自己重新创建代码并遇到此错误。我重新创建了示例代码,如下所示。
DataFrame 创建:
sample_df = pd.DataFrame({"feature_1": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']], "feature_2": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']]})
Output:
feature_1 feature_2
0 [aa, bb, cc] [aa, bb, cc]
1 [cc, dd, ee] [cc, dd, ee]
2 [cc, aa, ee] [cc, aa, ee]
输入层:
inputs = {}
inputs['feature_1'] = tf.keras.Input(shape=(), name='feature_1', dtype=tf.string)
inputs['feature_2'] = tf.keras.Input(shape=(), name='feature_2', dtype=tf.string)
Output:
{'feature_1': <KerasTensor: shape=(None,) dtype=string (created by layer 'feature_1')>,
'feature_2': <KerasTensor: shape=(None,) dtype=string (created by layer 'feature_2')>}
预处理层:
preprocessed = []
for name, column in sample_df.items():
vocab = ['aa', 'bb', 'cc', 'dd', 'ee']
lookup = tf.keras.layers.StringLookup(vocabulary=vocab, output_mode='multi_hot')
print(f'name: {name}')
print(f'vocab: {vocab}\n')
x = inputs[name][:, tf.newaxis]
x = lookup(x)
preprocessed.append(x)
Output:
name: feature_1
vocab: ['aa', 'bb', 'cc', 'dd', 'ee']
name: feature_2
vocab: ['aa', 'bb', 'cc', 'dd', 'ee']
[<KerasTensor: shape=(None, 6) dtype=float32 (created by layer 'string_lookup_27')>,
<KerasTensor: shape=(None, 6) dtype=float32 (created by layer 'string_lookup_28')>]
模型创建:
preprocessed_result = tf.concat(preprocessed, axis=-1)
preprocessor = tf.keras.Model(inputs, preprocessed_result)
tf.keras.utils.plot_model(preprocessor, rankdir="LR", show_shapes=True)
Output:
<KerasTensor: shape=(None, 12) dtype=float32 (created by layer 'tf.concat_4')>
错误:
preprocessor(dict(sample_df.iloc[:1]))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
.../sample.ipynb Cell 63' in <cell line: 1>()
----> 1 preprocessor(dict(sample_df.iloc[:1]))
File ~/.local/lib/python3.10/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
65 except Exception as e: # pylint: disable=broad-except
66 filtered_tb = _process_traceback_frames(e.__traceback__)
---> 67 raise e.with_traceback(filtered_tb) from None
68 finally:
69 del filtered_tb
File ~/.local/lib/python3.10/site-packages/tensorflow/python/framework/constant_op.py:102, in convert_to_eager_tensor(value, ctx, dtype)
100 dtype = dtypes.as_dtype(dtype).as_datatype_enum
101 ctx.ensure_initialized()
--> 102 return ops.EagerTensor(value, ctx.device_name, dtype)
ValueError: Exception encountered when calling layer "tf.__operators__.getitem_20" (type SlicingOpLambda).
Failed to convert a NumPy array to a Tensor (Unsupported object type list).
Call arguments received:
• tensor=0 [aa, bb, cc]
Name: feature_2, dtype: object
• slice_spec=({'start': 'None', 'stop': 'None', 'step': 'None'}, 'None')
• var=None
https://www.tensorflow.org/tutorials/load_data/pandas_dataframe#create_and_train_a_model
任何有关错误或进一步了解错误的帮助将不胜感激。非常感谢您提前。
回答1
我为任何有兴趣/面临类似问题的人创建了一个解决方法。这只是一种解决方法,而不是解决方案。
解决方法:由于我的多热编码本质上是二进制的,所以我只是将它们各自分解为一个特性。
示例代码:
sample_df = pd.DataFrame({"feature_1": [['aa', 'bb','cc'], ['cc', 'dd', 'ee'], ['cc', 'aa', 'ee']]})
feature_1_labels = set()
for i in range(sample_df.shape[0]):
feature_1_labels.update(sample_df.iloc[i]['feature_1'])
for label in sorted(feature_1_labels):
sample_df[label] = 0
for i in range(sample_df.shape[0]):
for label in sample_df.iloc[i]['feature_1']:
sample_df.iloc[i, sample_df.columns.get_loc(label)] = 1
sample_df
Output:
feature_1 aa bb cc dd ee
0 [aa, bb, cc] 1 1 1 0 0
1 [cc, dd, ee] 0 0 1 1 1
2 [cc, aa, ee] 1 0 1 0 1
注意:这样做会显着增加输入特征的数量。要记住的事情。
随时让我知道更好的解决方法/如果我有任何错误:)