Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,364,428
Community Members
 
Community Events
168
Community Groups

Stash pre-receive hook to check file encoding

I have a basic plugin which gives me a list of all the files that was checked in, is it possible to scan the content of the files to see if they match the expected encoding? 

 

The part i'm struggling with is getting the files from refChanges and reading them. 

3 answers

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

3 votes
Answer accepted

Hi,

You can get the changesets for a refChange by

commitService.getChangesetsBetween(new ChangesetsBetweenRequest.Builder(repository)
						.exclude(refChange.getFromHash())
						.include(refChange.getToHash())
						.build(), pageRequest);

and with that, you can fetch the DetailedChangesets

commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository)
						.changesetIds(changeSetIds)
						.build(), pageRequest);

and with that you will receive the change objects which have a getPath method. With that, you can easily grab the file content and analyze it accordingly by using something like this:

contentService.streamFile(repository, refChange.getToHash(), change
                .getPath().toString(), new TypeAwareOutputSupplier() {
            @Override
            public OutputStream getStream(String fileContent) throws IOException {
 // analyze encoding of file content


Hope this helps.

Best regards,

Michael

perfect, just want I was looking for.

Hi, I've got the detailChangeSet, however I can't see a getPath method for it. e.g. values.getpath doesn't exisits? Page<DetailedChangeset> dcs = commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository).changesetIds(changeset.getId()).build(), new PageRequestImpl(0, PageRequest.MAX_PAGE_LIMIT)); for(DetailedChangeset detailChangeset: dcs.getValues()){ final Iterable<? extends Change> values = detailChangeset.getChanges().getValues(); }

It's the Change objects that have the getPath method.

Here's the full code: public boolean onReceive(@Nonnull RepositoryHookContext context, @Nonnull Collection<RefChange> refChanges, @Nonnull HookResponse hookResponse) { CommitService commitService = null; Repository repository = context.getRepository(); final ChangesetsBetweenRequest request = new ChangesetsBetweenRequest.Builder(context.getRepository()).exclude(((RefChange)refChanges).getFromHash()).include(((RefChange)refChanges).getToHash()).build(); final Page<Changeset> cs = commitService.getChangesetsBetween(request, new PageRequestImpl(0, PageRequest.MAX_PAGE_LIMIT)); for(Changeset changeset: cs.getValues()) { System.out.println("Peter " + changeset.getAttributes() + changeset.getId() + changeset.getMessage()); writelogs("Peter " + changeset.getAttributes() + changeset.getId() + changeset.getMessage()); Page<DetailedChangeset> dcs = commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository).changesetIds(changeset.getId()).build(), new PageRequestImpl(0, PageRequest.MAX_PAGE_LIMIT)); for(DetailedChangeset detailChangeset: dcs.getValues()){ } } return true; }

You can use detailChangeset.getChanges() to retrieve the Change objects, then call getPath() on them and with the paths, you can get the file contents of the files you want to detect the content type for.

I got to the stage where I've got the change object, yay! but the filecontent seems to be empty. this code below, gives me a empty file and the String fileContent says "text/plain" what am i doing wrong? for (Change changes : Pete){ writelogs("path is = " + changes.getPath().toString()); try { contentService.streamFile(repository, changes.getContentId(), changes.getPath().toString(), new TypeAwareOutputSupplier() { @Override public OutputStream getStream(String fileContent) throws IOException { return new FileOutputStream(new File("c:\\" + changes.getPath().toString())); } }); }catch(Exception e){ } }

What about if you use refChange.getUntilId instead of changes.getContentId() as your second argument to streamFile?

no that didn't help. I ended up using the streamchanges method instead

For anyone else that wants todo a similar thing. 

 

import com.atlassian.stash.hook.*;
import com.atlassian.stash.hook.repository.*;
import com.atlassian.stash.io.TypeAwareOutputSupplier;
import com.atlassian.stash.repository.*;
import java.util.Collection;
import com.atlassian.stash.commit.CommitService;
import com.atlassian.stash.content.Change;
import com.atlassian.stash.content.ChangeCallback;
import com.atlassian.stash.content.ChangeContext;
import com.atlassian.stash.content.ChangeSummary;
import com.atlassian.stash.content.ChangesRequest;
import com.atlassian.stash.content.Changeset;
import com.atlassian.stash.content.DetailedChangeset;
import com.atlassian.stash.content.DetailedChangesetsRequest;
import com.atlassian.stash.repository.RefChange;
import com.atlassian.stash.scm.PluginCommandBuilderFactory;
import com.atlassian.stash.scm.git.GitCommandBuilderFactory;
import com.atlassian.stash.util.Page;
import com.atlassian.stash.util.PageProvider;
import com.atlassian.stash.util.PageRequest;
import com.atlassian.stash.util.PageRequestImpl;
import com.atlassian.stash.util.PagedIterable;
import com.atlassian.stash.content.ChangesetsBetweenRequest;
import com.atlassian.stash.content.ContentService;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import javax.annotation.Nonnull;
import com.glaforge.i18n.io.CharsetToolkit;
import static com.google.common.collect.Iterables.transform;
import com.google.common.base.Function;
import com.google.common.collect.ImmutableSet;
import com.google.common.collect.Iterables;

public class fileEncodingChecker implements PreReceiveRepositoryHook{
	
	/*
	 * Checks the file's encoding isn't little endian
	 * 
	 */
	private static final PageRequestImpl PAGE_REQUEST = new PageRequestImpl(0, 100);
	private final CommitService commitService;
	private final ContentService contentService;
	private final PluginCommandBuilderFactory commandFactory;
	private static final int MAX_CHANGES = 100;
	static String filePaths = "";
	static boolean passedTest = true;
	
	private static final Function&lt;DetailedChangeset, Iterable&lt;Change&gt;&gt; PLUCK_CHANGES = new Function&lt;DetailedChangeset, Iterable&lt;Change&gt;&gt;() {
		@Override
		public Iterable&lt;Change&gt; apply(DetailedChangeset input) {
			return (Iterable&lt;Change&gt;) input.getChanges().getValues();
		}
	};
	private static final Function&lt;Changeset, String&gt; PLUCK_ID = new Function&lt;Changeset, String&gt;() {
		@Override
		public String apply(Changeset input) {
			return input.getId();
		}
	};
	private static final Function&lt;Change, String&gt; PLUCK_CONTENT_ID = new Function&lt;Change, String&gt;() {
		@Override
		public String apply(Change input) {
			return input.getContentId();
		}
	};
	
    public fileEncodingChecker(CommitService commitService, GitCommandBuilderFactory commandFactory, ContentService contentService) {
		this.commitService = commitService;
		this.commandFactory = commandFactory;
		this.contentService = contentService;
    }
    
    
    @SuppressWarnings("null")
	@Override
    public boolean onReceive(@Nonnull RepositoryHookContext context, @Nonnull Collection&lt;RefChange&gt; refChanges, @Nonnull HookResponse hookResponse) {
    	Repository repository = context.getRepository();
    	Iterable&lt;Change&gt; Pete = Iterables.concat(getChanges(refChanges, repository)); //Gets all the changes
    	
    	for (RefChange dave : refChanges){
    		Iterable&lt;Changeset&gt; chris = getChangesetsBetween(repository, dave);
    		
    		for (Changeset stu : chris){	
    			final ChangesRequest changeReq = new ChangesRequest.Builder(context.getRepository(),stu.getId()).build();
    			commitService.streamChanges(changeReq, new ChangeCallback(){
        			@Override
        			public boolean onChange(final Change change) throws IOException {
    	    			filePaths = change.getPath().toString();
    		    		contentService.streamFile(repository, changeReq.getUntilId(), filePaths, new TypeAwareOutputSupplier() {
    		    			@Override
	    		    		public OutputStream getStream(String arg0) throws IOException {
	    		    			return new FileOutputStream(new File("c:\\a" + filePaths));
	    		    		} 
    		    		});
					return false;
        			}
					@Override
					public void onEnd(ChangeSummary summary)  {				
						String encoding = guessCharset2(new File("c:\\a" + filePaths));
						if(encoding == "UTF-16LE"){
							hookResponse.out().println("==========ERROR=============== ");
							hookResponse.out().println(filePaths + " uses unsupported encoding " + encoding);
							hookResponse.out().println("==========ERROR=============== ");
							passedTest = false;
						}
						
						try { //clean the temp file we created to check the encoding
							Files.delete(new File("c:\\a" + filePaths).toPath());
						} catch (IOException e) {
							e.printStackTrace();
						}
					}
					@Override
					public void onStart(ChangeContext context)
							throws IOException {	
						// TODO Auto-generated method stub
					}
    			});
    		} 
    	}
    	return passedTest;
    }
	
	public String guessCharset2(File file)  {
		try {
			return CharsetToolkit.guessEncoding(file, 4096, StandardCharsets.UTF_8).name(); 
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}
		return null; 
	}
	
	private Iterable&lt;Iterable&lt;Change&gt;&gt; getChanges(Iterable&lt;RefChange&gt; refChanges, final Repository repository) {
		return Iterables.transform(refChanges, new Function&lt;RefChange, Iterable&lt;Change&gt;&gt;() {
			@Override
			public Iterable&lt;Change&gt; apply(RefChange refChange) {
				// TODO Ideally this is one diff-tree git call
				Iterable&lt;String&gt; csetss = transform(getChangesetsBetween(repository, refChange), PLUCK_ID);
				return Iterables.concat(Iterables.transform(getDetailedChangesets(repository, csetss), PLUCK_CHANGES));
			}
		});
	}
	
	private Iterable&lt;DetailedChangeset&gt; getDetailedChangesets(final Repository repository, Iterable&lt;String&gt; changesets) {
		final Collection&lt;String&gt; csets = ImmutableSet.copyOf(changesets);
		return new PagedIterable&lt;DetailedChangeset&gt;(new PageProvider&lt;DetailedChangeset&gt;() {
			@Override
			public Page&lt;DetailedChangeset&gt; get(PageRequest pageRequest) {
				return commitService.getDetailedChangesets(new DetailedChangesetsRequest.Builder(repository)
						.changesetIds(csets)
						.maxChangesPerCommit(MAX_CHANGES)
						.build(), pageRequest);
			}
		}, PAGE_REQUEST);
	}
	
	private Iterable&lt;Changeset&gt; getChangesetsBetween(final Repository repository, final RefChange refChange) {
		return new PagedIterable&lt;Changeset&gt;(new PageProvider&lt;Changeset&gt;() {
			@Override
			public Page&lt;Changeset&gt; get(PageRequest pageRequest) {
				return commitService.getChangesetsBetween(new ChangesetsBetweenRequest.Builder(repository)
						.exclude(refChange.getFromHash())
						.include(refChange.getToHash())
						.build(), pageRequest);
			}
		}, PAGE_REQUEST);
	}  
}

oh, alot of this was borrowed from Christian Galastrerer's filehooks Plugin, great plugin and good example of how to use the stash api.

0 votes
JamieA Rising Star Aug 25, 2015

Also look at com.atlassian.stash.io.ContentDetectionUtils#detectEncoding - this will do most of the hard work for you.

Bear in mind it's a heuristic, not 100% guaranteed to get it right. 

thanks i tried this but in the end i used guessencoding rather than this method as you need to save the stream to a file (because of the way contentservice.streamfile works, then turn it in to a stream again to use the detectencoding

JamieA Rising Star Sep 02, 2015

Do you want the content type or the encoding? If it's the encoding, which is what I think you originally said, it works with streams.

it's mainly this part here: contentService.streamFile(repository, changeReq.getUntilId(), filePaths, new TypeAwareOutputSupplier() { @Override public OutputStream getStream(String arg0) throws IOException { return new FileOutputStream(new File("c:\\a" + filePaths)); } }); I didn't know how to pass that stream direct to com.atlassian.stash.io.ContentDetectionUtils#detectEncoding without first saving the file and then turning it into a stream again.

TAGS

Atlassian Community Events