Stash pre-receive hook to check file encoding

tinyCabbage August 25, 2015

I have a basic plugin which gives me a list of all the files that was checked in, is it possible to scan the content of the files to see if they match the expected encoding? 

 

The part i'm struggling with is getting the files from refChanges and reading them. 

3 answers

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

3 votes
Answer accepted
Mibex_Software
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 25, 2015

Hi,

You can get the changesets for a refChange by

commitService.getChangesetsBetween(new ChangesetsBetweenRequest.Builder(repository)
						.exclude(refChange.getFromHash())
						.include(refChange.getToHash())
						.build(), pageRequest);

and with that, you can fetch the DetailedChangesets

commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository)
						.changesetIds(changeSetIds)
						.build(), pageRequest);

and with that you will receive the change objects which have a getPath method. With that, you can easily grab the file content and analyze it accordingly by using something like this:

contentService.streamFile(repository, refChange.getToHash(), change
                .getPath().toString(), new TypeAwareOutputSupplier() {
            @Override
            public OutputStream getStream(String fileContent) throws IOException {
 // analyze encoding of file content


Hope this helps.

Best regards,

Michael

tinyCabbage August 25, 2015

perfect, just want I was looking for.

tinyCabbage August 26, 2015

Hi, I've got the detailChangeSet, however I can't see a getPath method for it. e.g. values.getpath doesn't exisits? Page<DetailedChangeset> dcs = commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository).changesetIds(changeset.getId()).build(), new PageRequestImpl(0, PageRequest.MAX_PAGE_LIMIT)); for(DetailedChangeset detailChangeset: dcs.getValues()){ final Iterable<? extends Change> values = detailChangeset.getChanges().getValues(); }

Mibex_Software
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 26, 2015

It's the Change objects that have the getPath method.

tinyCabbage August 26, 2015

Here's the full code: public boolean onReceive(@Nonnull RepositoryHookContext context, @Nonnull Collection<RefChange> refChanges, @Nonnull HookResponse hookResponse) { CommitService commitService = null; Repository repository = context.getRepository(); final ChangesetsBetweenRequest request = new ChangesetsBetweenRequest.Builder(context.getRepository()).exclude(((RefChange)refChanges).getFromHash()).include(((RefChange)refChanges).getToHash()).build(); final Page<Changeset> cs = commitService.getChangesetsBetween(request, new PageRequestImpl(0, PageRequest.MAX_PAGE_LIMIT)); for(Changeset changeset: cs.getValues()) { System.out.println("Peter " + changeset.getAttributes() + changeset.getId() + changeset.getMessage()); writelogs("Peter " + changeset.getAttributes() + changeset.getId() + changeset.getMessage()); Page<DetailedChangeset> dcs = commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository).changesetIds(changeset.getId()).build(), new PageRequestImpl(0, PageRequest.MAX_PAGE_LIMIT)); for(DetailedChangeset detailChangeset: dcs.getValues()){ } } return true; }

Mibex_Software
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 26, 2015

You can use detailChangeset.getChanges() to retrieve the Change objects, then call getPath() on them and with the paths, you can get the file contents of the files you want to detect the content type for.

tinyCabbage September 1, 2015

I got to the stage where I've got the change object, yay! but the filecontent seems to be empty. this code below, gives me a empty file and the String fileContent says "text/plain" what am i doing wrong? for (Change changes : Pete){ writelogs("path is = " + changes.getPath().toString()); try { contentService.streamFile(repository, changes.getContentId(), changes.getPath().toString(), new TypeAwareOutputSupplier() { @Override public OutputStream getStream(String fileContent) throws IOException { return new FileOutputStream(new File("c:\\" + changes.getPath().toString())); } }); }catch(Exception e){ } }

Mibex_Software
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 1, 2015

What about if you use refChange.getUntilId instead of changes.getContentId() as your second argument to streamFile?

tinyCabbage September 2, 2015

no that didn't help. I ended up using the streamchanges method instead

0 votes
tinyCabbage September 2, 2015

For anyone else that wants todo a similar thing. 

 

import com.atlassian.stash.hook.*;
import com.atlassian.stash.hook.repository.*;
import com.atlassian.stash.io.TypeAwareOutputSupplier;
import com.atlassian.stash.repository.*;
import java.util.Collection;
import com.atlassian.stash.commit.CommitService;
import com.atlassian.stash.content.Change;
import com.atlassian.stash.content.ChangeCallback;
import com.atlassian.stash.content.ChangeContext;
import com.atlassian.stash.content.ChangeSummary;
import com.atlassian.stash.content.ChangesRequest;
import com.atlassian.stash.content.Changeset;
import com.atlassian.stash.content.DetailedChangeset;
import com.atlassian.stash.content.DetailedChangesetsRequest;
import com.atlassian.stash.repository.RefChange;
import com.atlassian.stash.scm.PluginCommandBuilderFactory;
import com.atlassian.stash.scm.git.GitCommandBuilderFactory;
import com.atlassian.stash.util.Page;
import com.atlassian.stash.util.PageProvider;
import com.atlassian.stash.util.PageRequest;
import com.atlassian.stash.util.PageRequestImpl;
import com.atlassian.stash.util.PagedIterable;
import com.atlassian.stash.content.ChangesetsBetweenRequest;
import com.atlassian.stash.content.ContentService;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import javax.annotation.Nonnull;
import com.glaforge.i18n.io.CharsetToolkit;
import static com.google.common.collect.Iterables.transform;
import com.google.common.base.Function;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Iterables;

public class fileEncodingChecker implements PreReceiveRepositoryHook{
	
	/*
	 * Checks the file's encoding isn't little endian
	 * 
	 */
	private static final PageRequestImpl PAGE_REQUEST = new PageRequestImpl(0, 100);
	private final CommitService commitService;
	private final ContentService contentService;
	private final PluginCommandBuilderFactory commandFactory;
	private static final int MAX_CHANGES = 100;
	static String filePaths = "";
	static boolean passedTest = true;
	
	private static final Function&lt;DetailedChangeset, Iterable&lt;Change&gt;&gt; PLUCK_CHANGES = new Function&lt;DetailedChangeset, Iterable&lt;Change&gt;&gt;() {
		@Override
		public Iterable&lt;Change&gt; apply(DetailedChangeset input) {
			return (Iterable&lt;Change&gt;) input.getChanges().getValues();
		}
	};
	private static final Function&lt;Changeset, String&gt; PLUCK_ID = new Function&lt;Changeset, String&gt;() {
		@Override
		public String apply(Changeset input) {
			return input.getId();
		}
	};
	private static final Function&lt;Change, String&gt; PLUCK_CONTENT_ID = new Function&lt;Change, String&gt;() {
		@Override
		public String apply(Change input) {
			return input.getContentId();
		}
	};
	
    public fileEncodingChecker(CommitService commitService, GitCommandBuilderFactory commandFactory, ContentService contentService) {
		this.commitService = commitService;
		this.commandFactory = commandFactory;
		this.contentService = contentService;
    }
    
    
    @SuppressWarnings("null")
	@Override
    public boolean onReceive(@Nonnull RepositoryHookContext context, @Nonnull Collection&lt;RefChange&gt; refChanges, @Nonnull HookResponse hookResponse) {
    	Repository repository = context.getRepository();
    	Iterable&lt;Change&gt; Pete = Iterables.concat(getChanges(refChanges, repository)); //Gets all the changes
    	
    	for (RefChange dave : refChanges){
    		Iterable&lt;Changeset&gt; chris = getChangesetsBetween(repository, dave);
    		
    		for (Changeset stu : chris){	
    			final ChangesRequest changeReq = new ChangesRequest.Builder(context.getRepository(),stu.getId()).build();
    			commitService.streamChanges(changeReq, new ChangeCallback(){
        			@Override
        			public boolean onChange(final Change change) throws IOException {
    	    			filePaths = change.getPath().toString();
    		    		contentService.streamFile(repository, changeReq.getUntilId(), filePaths, new TypeAwareOutputSupplier() {
    		    			@Override
	    		    		public OutputStream getStream(String arg0) throws IOException {
	    		    			return new FileOutputStream(new File("c:\\a" + filePaths));
	    		    		} 
    		    		});
					return false;
        			}
					@Override
					public void onEnd(ChangeSummary summary)  {				
						String encoding = guessCharset2(new File("c:\\a" + filePaths));
						if(encoding == "UTF-16LE"){
							hookResponse.out().println("==========ERROR=============== ");
							hookResponse.out().println(filePaths + " uses unsupported encoding " + encoding);
							hookResponse.out().println("==========ERROR=============== ");
							passedTest = false;
						}
						
						try { //clean the temp file we created to check the encoding
							Files.delete(new File("c:\\a" + filePaths).toPath());
						} catch (IOException e) {
							e.printStackTrace();
						}
					}
					@Override
					public void onStart(ChangeContext context)
							throws IOException {	
						// TODO Auto-generated method stub
					}
    			});
    		} 
    	}
    	return passedTest;
    }
	
	public String guessCharset2(File file)  {
		try {
			return CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8).name(); 
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
		return null; 
	}
	
	private Iterable&lt;Iterable&lt;Change&gt;&gt; getChanges(Iterable&lt;RefChange&gt; refChanges, final Repository repository) {
		return Iterables.transform(refChanges, new Function&lt;RefChange, Iterable&lt;Change&gt;&gt;() {
			@Override
			public Iterable&lt;Change&gt; apply(RefChange refChange) {
				// TODO Ideally this is one diff-tree git call
				Iterable&lt;String&gt; csetss = transform(getChangesetsBetween(repository, refChange), PLUCK_ID);
				return Iterables.concat(Iterables.transform(getDetailedChangesets(repository, csetss), PLUCK_CHANGES));
			}
		});
	}
	
	private Iterable&lt;DetailedChangeset&gt; getDetailedChangesets(final Repository repository, Iterable&lt;String&gt; changesets) {
		final Collection&lt;String&gt; csets = ImmutableSet.copyOf(changesets);
		return new PagedIterable&lt;DetailedChangeset&gt;(new PageProvider&lt;DetailedChangeset&gt;() {
			@Override
			public Page&lt;DetailedChangeset&gt; get(PageRequest pageRequest) {
				return commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository)
						.changesetIds(csets)
						.maxChangesPerCommit(MAX_CHANGES)
						.build(), pageRequest);
			}
		}, PAGE_REQUEST);
	}
	
	private Iterable&lt;Changeset&gt; getChangesetsBetween(final Repository repository, final RefChange refChange) {
		return new PagedIterable&lt;Changeset&gt;(new PageProvider&lt;Changeset&gt;() {
			@Override
			public Page&lt;Changeset&gt; get(PageRequest pageRequest) {
				return commitService.getChangesetsBetween(new ChangesetsBetweenRequest.Builder(repository)
						.exclude(refChange.getFromHash())
						.include(refChange.getToHash())
						.build(), pageRequest);
			}
		}, PAGE_REQUEST);
	}  
}
tinyCabbage September 2, 2015

oh, alot of this was borrowed from Christian Galastrerer's filehooks Plugin, great plugin and good example of how to use the stash api.

0 votes
JamieA
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 25, 2015

Also look at com.atlassian.stash.io.ContentDetectionUtils#detectEncoding - this will do most of the hard work for you.

Bear in mind it's a heuristic, not 100% guaranteed to get it right. 

tinyCabbage September 2, 2015

thanks i tried this but in the end i used guessencoding rather than this method as you need to save the stream to a file (because of the way contentservice.streamfile works, then turn it in to a stream again to use the detectencoding

JamieA
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
September 2, 2015

Do you want the content type or the encoding? If it's the encoding, which is what I think you originally said, it works with streams.

tinyCabbage September 2, 2015

it's mainly this part here: contentService.streamFile(repository, changeReq.getUntilId(), filePaths, new TypeAwareOutputSupplier() { @Override public OutputStream getStream(String arg0) throws IOException { return new FileOutputStream(new File("c:\\a" + filePaths)); } }); I didn't know how to pass that stream direct to com.atlassian.stash.io.ContentDetectionUtils#detectEncoding without first saving the file and then turning it into a stream again.

TAGS
AUG Leaders

Atlassian Community Events