Experiments

We performed experiments on 4 datasets, the place three datasets correspond with downstream generative duties and one dataset with a classification job. Generative duties are usually more difficult than classification duties. It is because the generative duties are evaluated by the next-token prediction accuracy, which requires the artificial information to protect fine-grained textual data from the non-public information. In distinction, the classification duties solely require sustaining the co-occurrence patterns between labels and phrases within the non-public information.

The three generative duties are chosen to cowl a various set of sensible eventualities: PubMed (medical paper abstracts), Chatbot Area (human-to-machine interactions), and Multi-Session Chat (human-to-human every day dialogues). To guage the standard of the generated artificial information, we adopted the setup of Aug-PE to coach a small downstream language mannequin on the artificial information after which compute the next-token prediction accuracy on the actual check information.

The classification job is carried out on the OpenReview (tutorial paper evaluations) dataset. To guage the standard of the generated artificial information, we prepare a downstream classifier on the artificial information, and compute the classification accuracy on the actual check information.

To mitigate issues concerning information contamination, we fastidiously analyzed our chosen datasets. Our evaluation confirmed no overlap between our pre-training information and the downstream datasets.



Supply hyperlink


Leave a Reply

Your email address will not be published. Required fields are marked *