Weekly Update - Saif Uddin Mahmud

Saif:

Action Items:

This week I finally completed the one-page summary of related papers.

I read the following paper and would like to discuss today:

S. Ouyang, J. M. Zhang, M. Harman, and M. Wang, “An Empirical Study of the Non-Determinism of ChatGPT in Code Generation,” TOSEM 2025

Goal:

Quantify and characterize the non-determinism of ChatGPT in code generation across multiple datasets and model versions (GPT-3.5 and GPT-4), with systematic measurement of semantic, syntactic, and structural similarity.

Detection baselines:

Semantic similarity: test pass rate variance and output equivalence rate (OER)

Syntactic similarity: Longest Common Subsequence (LCS), Levenshtein Edit Distance (LED)

Structural similarity: AST-based metrics using Unified_Diff and Tree_Diff via

Gap:

They did not analyze the actual content or correctness of the outputs when OER = 0. Specifically:

They didn’t inspect the generated code to see what kinds of errors caused the output differences (e.g., logic bug, wrong formula, off-by-one).

They didn’t verify whether any of the differing outputs are actually correct, even if others are not.

They didn’t explore whether the output differences are harmless or harmful (e.g., stylistic differences vs logic bugs).

Question:

what would be the role prompt ambiguity or under-specification on non-determinism?

Note: I’ve had a cold (sneezing and runny nose) over the last few days. The allergy meds are making me a bit drowsy, but I’m still planning to go ahead with the presentation. I’ll let you know if anything changes.

Saif:

Hi group,

Action Items:

This week I read the following papers:

[1] H. Suh et al, “An Empirical Study on Automatically Detecting AI-Generated Source Code: How Far Are We?” ArXiv, Nov. 2024, doi: [10.48550/arXiv.2311.00005].

Goal

Establish a realistic baseline for detecting AI-generated code across multiple languages and LLMs, and benchmark both off-the-shelf “AI-text” detectors and purpose-built code classifiers.

Detection baselines:

classic static metrics (LOC, comment ratio, identifier entropy, cyclomatic complexity),

AST-based graph embeddings,

fine-tuned LLM representations.

Gap:

The study’s AI code comes exclusively from single-prompt, deterministic (temperature = 0) generations

Do not consider vulnerability distribution

[2] B. Demirok and M. Kutlu, “AIGCodeSet: A New Annotated Dataset for AI Generated Code Detection,” ArXiv, Dec. 2024, doi: [10.48550/arXiv.2312.00020].

Goal:

Publish AIGCodeSet, a public, labeled corpus for training and evaluating detectors of AI-generated code.

Baseline detection experiments:

Feature sets: TF-IDF over tokens, classic code metrics, byte-level n-grams.

Gap:

AIGCodeSet is Python-only and sources all human code from a single domain (CodeNet competitive-programming problems).

Questions:

How would detection accuracy change if the same tasks were solved with diverse generations - e.g. varying temperature/top-p and including developer-edited or multi-step “chain-of-thought” code?

AIGCodeSet is Python-only, would the detection patterns persist if the corpus were extended to other languages?

Saif:

Hi group,

Action Items

Classify test cases / edge cases (related to RQ1)

How to investigate the role of prompt engineering (for RQ2)

This week, I mainly focused on the literature for the RQs.

RQ 1 – How do LLM-generated solutions compare to human-written code in terms of functional correctness and vulnerability profiles?
Functional correctness:

From the literature it appears that across competitive-programming and educational tasks, unfiltered LLM outputs typically pass 40–70 % of public tests but drop sharply (10–30% ) when extended or fuzz-generated test suites are applied Human reference solutions remain near 100 % on the same suites.

Vulnerability and CWE profile comparisons

LLM-generated snippets show higher density and broader variety of CWEs than comparable human code, especially for input-validation, randomness, and memory-safety categories. [I tested with cppcheck and found similar results]

Static analyzers (Semgrep, Infer, Clang SA) and fuzzing consistently flag ≈1.3–2 × more warnings in LLM code.

Self-repair or feedback-driven regeneration can mitigate 30–60 % of flagged issues but may introduce regressions.

RQ 2 – Role of prompt engineering & decoding controls in reducing functional and security errors

Several prompt engineering techniques have been discussed in the prior literature to mitigate errors:

Self-refinement / “Regenerate-until-tests-pass” :Re-prompting the model with failing-test feedback cuts incorrectness by 20–40 % , 31 % runtime-error fixes

System-level security directives (“follow OWASP /CWE best practices”): Lowers CWE-rated issues by 10–15 % but this is true mostly in low temperature

Results:

I lagged behind running the experiments on test pass / fail rates so there’s no results to share.

Questions:

NA

Saif:

Hello group,

Milestone: Vulnerability analysis in generated code - Complete by Aug 7, 25
Action Item* : Formulate Research Questions
I have formulated a few research questions that I would like to discuss in our next meeting.
Action Item 1 – in progress
Evaluate Code Generation Behavior Across Temperature.
I am investigating the effect of temperature on code generation. My focus is to find an optimal temperature.
According to the literature, single best temperature rarely exists.
- For single-shot generation aiming at correctness, low T (0 – 0.4) is usually optimal
- For multi-sample pass@k evaluation, moderate T (0.4 – 0.8) with larger k improves coverage
I will attach a detailed report on this by tonight.
Action Item 2 – partially complete
Analyze Vulnerability Patterns and pass/fail statistics
For this action Item, I have shared the CWE results for 90 generated solutions for different temperatures. Statistics for pass/fail comparison is due.
Action Item 3 – pending
Submit solutions to the original platforms
I am yet to start this action item.
Results/Findings:

N/A

Questions / Issues:

N/A

(Apologies for the delayed response, I set up the environment on the new laptop until late night and missed my alarm in the morning)

Saif:

Hi group,

Milestone: yet to decide depending on the deadlines of selected venue
Action Item 1 – Saturday, July 5 (Done)

Change temperature setting is to increase randomness in output

I ran a small experiment on 10 problems ( 3 solutions for each). The results are quite expected. Lower temperatures reduce both general programming errors and security-related CWEs.

Higher temperatures increase the risk of errors and vulnerabilities as it tires to generate more random tokens.

Action Item 2 – Tuesday, July 8 (Done)

Execute test cases on the 276 generated solutions to verify correctness

I executed the private test-cases for the the generated solutions. Surprisingly, only 2 solutions out of the 276 generated solutions passed all the test cases.

Plan for the next week:

Begin exploring and outlining a large-scale study: So far I have been running experiments with CodeLlama7b. My plan is to explore the other models to compare how the results differ.

Start formulating RQs: I will be focusing on formulating RQ for this empirical study.

And I will be shortlisting probable venues to submit

Results/Findings:

Increased temperature provides more vulnerable solutions

Most of the CodeLlama7b generated solutions do not pass the private tests

Questions/Issues:

Nothing at this moment

Saif:

Hi group,

Milestone: I am yet to decide a concrete study and milestone date
Action Items completed:
Action Item 1 – Saturday, June 28 (done)

Prepare a report analyzing the distribution of vulnerabilities (CWEs) comparing human-written and LLM-generated code from the first 276 cpp files.

Observation: LLM generated code has much less weakness reported

Action Item 2 – Thursday, July 3 (done for 10 problems with 3 generated solutionsfor each)

Check for how multiple solutions per problem (both human and LLM-generated) change the distribution.

For this action item, I have considered 3 correct solutions for 10 problems and then generated 3 solution form LLM for each of those problems.

Observation:Out of 30 human written cpp files, 14 of them have various warnings/cwe checks. For LLM generated solutions, only 7 have issues; 23 of them do have any CWE issues.

Results/Findings:

LLM-generated code has much less weaknesses

Questions/issues:

no specific questions / issues at this moment

Saif:

Hi group,

Action Item in progress:

We will construct our own dataset using problem instructions and corresponding code samples

This week I continued working on dataset creation using DeepMind’s code_contests dataset. I have successfully extracted 342 problems from the first chunk. From those, I identified 276 problems that include at least one correct C++ solution.
I configured and ran CodeLlama-7B.Q4_K_M.gguf locally via Ollama, and used it to generate C++ solutions for all 276 problems. The model was able to produce complete C++ code snippets based solely on the problem descriptions.
Currently, I am beginning to compare CWE (Common Weakness Enumeration) differences between the human-written C++ solutions and the ones generated by CodeLlama. This will help us assess whether LLM-generated code introduces or avoids common coding vulnerabilities when compared to human-authored examples.
Results/Findings:

Successfully generated 276 C++ solutions using LLM

Dataset of problem-description → LLM-generated first C++ chunk is now available for analysis

CWE comparison framework is in progress

Question/Issue:

No blockers at the moment. Continuing with CWE analysis.

Replies:
Dr. Lei:

Sounds good. Try to think deep and see if you can make any interesting observations about the results.

No update was provided for the week ending 2025-04-30.

Saif:

Hi group,

Milestone: Identify a research topic - May 04

Last week, I generated a report on existing datasets that include vulnerable codes generated by various LLMs reported by existing literature.

This week I am focusing on the following action items:

Look for Existing Work on Vulnerability Distributions - Find studies that discuss vulnerability types and frequencies and differences in LLM‑generated versus human‑written code; And look for datasets that include test cases. This would help us run experiments on divergent analysis at which point the LLM starts to fail generating correct code.

Review of LLM vs Human Code Treatment - Identify papers that treat / distinguish LLM code from human code

Replies:
Dr. Lei:

try to send a written report about your findings before thu meeting. the report does not have to be well-written or very detailed. but it should contain the major points.

Saif:

Hi Group,

Milestone: Diagnostic Evaluation (awaiting response from Dr. Khalili)

Milestone Complete a Literature review on LLM Explanation May 4

This week I focused on the following action items:

Compiling a list of papers that discuss Distinguishing features and detection methods for LLM-generated versus human-written code:

- Empirical and Theoretical Distinguishing Characteristics: Stylistic/Structural Patterns: LLM code tends to be more formulaic, concise, and uses narrower token/naming distributions; comments are often more consistent but sometimes mismatched with code
- Coding Style Markers: Differences in variable/class naming, control flow depth, code and comment length/distribution; human code is more diverse
2. Publicly available LLM-generated code datasets with vulnerability :

Over 100k–300k C programs generated by SOTA LLMs from prompts; every sample labeled with vulnerability metadata via formal verification (ESBMC), including location, CWE type, and code metadata. Downloadable, widely referenced, and extremely robust for security research.

LLM code, prompts, exploit-based vulnerability validation (CWE-mapped), metadata—stands out for real-world, end-to-end, multi-function scenarios.

- Scope of Security Bugs and Limitations: Most datasets focus on well-known bug/vulnerability types (CWE Top-25, input validation, memory safety, etc.), with variable coverage of edge cases, multi-step chain issues, or “in-the-wild” project code
Replies:
Dr. Lei:

good job. a few comments:

your 2nd milestone needs to be updated. maybe try to identify a research topic you will focus on by may 4.

for 1, the differences you have identified are mainly stylistic or syntactical. i suggest you focus more on the semantic side, i.e., how they are different in terms of bugs and vulnerabilities.

ESBMC seems to be a very good dataset. i would imagine there must be papers that analyze this dataset, e.g., in terms of how the different types of vulnerabilities are distributed in the dataset? also, can you try to look into some programs in the dataset, and try to figure out how the vulnerabilities seep into the code from the perspective of how LLM works? this will allow us to attack the problem at the source. try to discuss this dataset and this question on thursday

Saif:

hi group,

Milestone: Diagnostic Evaluation (awaiting response from Dr. Khalili)
Milestone Complete a Literature review on LLM Explanation May 4

This week I focused on the following paper:

Large Language Models and Code Security: A Systematic Literature Review. From this paper I found a few interesting papers related LLM generated code vulneribilties. I have gone through two of them-

How Secure is Code Generated by ChatGPT? – discusses vulnerabilities in 21 programs generated by GPT 3.5

LLMs in Web Development: Evaluating LLM-Generated PHP Code Unveiling Vulnerabilities and Limitations – discusses the vulnerabilities in PHP websites generated by GPT 4

Additional note:

I just read the fundamental paper on Transformer - Attention is All you Need.

Replies:
Dr. Lei:

sounds good. please prepare a presentation on your findings. try to focus on the most important points. also try to think about three ideas that you could think of on the topic of security testing of LLM-generated code, and discuss them when we meet on thursday

Saif:

Ok Professor.

Saif:

Hello group,

PhD Milestone: Diagnostic Evaluation (April 7)
Milestone Complete a Literature review on LLM Explanation May 4
New Study: Vulnerabilities in LLM generated code

Last week I have went through the following paper:

P Rauba et al. , “Quantifying perturbation impacts for large language models,” ArXiv, 2024, doi:10.48550/arXiv.2412.00868.

I am conducting a literature search for Vulnerabilities in LLM generated code. I have shared the incomplete list in my channel. I will provide the comprehensive list tomorrow (Wednesday, 4/2) and plan to discuss on Thursday.

Replies:
Dr. Lei:

sounds good. please prepare some slides for thu meeting to discuss

Saif:

Good morning,

Milestone: Diagnostic Evaluation (April 7)
Milestone Complete a Literature review on LLM Explanation May 4

This week I focused on the challenges in defining locality for generating accurate explanations. I am trying to understand different versions of LIME. I have gone through the following papers:

T. Laugel et al., “Defining Locality for Surrogates in Post-hoc Interpretability,” ArXiv, 2018.

This paper proposes an alternative sampling method to improve the local fidelity of surrogate models and evaluates it against LIMEi

T. Botari et al , “MeLIME: Meaningful Local Explanation for Machine Learning Models,” ArXiv, 2020.

The paper investigates the challenges of generating sample points in an instance’s neighborhood, balancing interpretability with explanation accuracy, and determining the appropriate sample size. The findings emphasize issues with LIME’s kernel-based weighting and boundary approximation.

Saif:

Good morning,

Milestone: Diagnostic Evaluation

Current work: LLM explanation

Last week I tried to run a part of the experiment from the paper I have presented last time to have a good grasp on LIME.

For diagnostic, I have sent out emails to the tentative committee. Two of the professors have already replied. I am awaiting Dr. Ji’s response.

Replies:
Dr. Lei:

please try to make your status report more informative

Saif:

Hello group,

Milestone: Diagnostic Evaluation (Tentative - 3rd Week of April)

Milestone Complete a Literature review on LLM Explanation May 4

Last week I have studied the background of LIME.

I also went through the review paper - Egor N. Volkov and Alexey N. Averkin, “Local Explanations for Large Language Models: a Brief Review of Methods,” XXVII International Conference on Soft Computing and Measurements (SCM), 2024. This paper details examines explainable AI approaches for LLMs, including classifying local explanation techniques

This week I am planning to run the experiments from the empirical study I have presented on Friday.

Replies:
Dr. Lei:

we need to discuss who you will invite to be on your committee.

Saif:

Yes Dr. Lei. I have sent you a direct message on slack describing my current academic progress and tentative Professor list. Thank you.

Saif:

Hello group,

Milestone: Diagnostic Evaluation (Tentative - First Week of April)

Current work: LLM Explanation (Milestones yet to be decided)

Action items completed:

Updated the paper list and added key notes for the papers [6], [7], [8], [9], [10], [21].

I was unable to find any survey papers that directly directly evaluate perturbation-based methods or LIME. Howeve I found the following two review artilces that are slightly relevant:

Egor N. Volkov and Alexey N. Averkin, “Local Explanations for Large Language Models: a Brief Review of Methods,” [4]

G. Kostopoulos, …, and S. Kotsiantis, “Explainable Artificial Intelligence-Based Decision Support Systems: A Recent Review,” [21]

I have been preparing a presentation on on the foundational paper on LIME [23] and the paper - Henning Heyen, …, and Philip C. Treleaven, “The Effect of Model Size on LLM Post-hoc Explainability via LIME,” [1]. I would like to discuss about them today or on Friday.

Replies:
Dr. Lei:

please note that the purpose fo the New Ideas session is to introduce interesting ideas/perspectives to the group. try to select a topic/paper that truly excites you; otherwise, it would not serve the purpose.

Saif:

Hello group,

Milestone: Diagnostic Evaluation (Tentative - First Week of April)

Current work: LLM Explanation

Action items completed:

After the discussion of Friday meeting, I have compiled a comprehensive list of papers that covers perturbation for LLMs. I have added the comprehensive list to my channel.

Action items for this week:

I have not yet organized the workflow and milestones dates. But this week, I will focus on setting up and running experiments.

Replies:
Dr. Lei:

good job on the collection of papers. at this stage, i suggest you give priority to the big picture before you go into the details of each paper. try to find a good survey paper on this topic first, if exist.

Saif:

Hello group,

Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Current work: Investigating Sparse AutoEncoders for LLM Explanation

I have been studying about the approaches of LLM explanation. I came along this paper that discusses the existing techniques and challenges of the approaches: “Explainability for Large Language Models: A Survey” - by H Zhao · 2024

From the existing approaches, I would like to explore the following two–

Feature Interpretability via Concept Vectors (CAVs):

What it is: Use post-hoc linear classifiers or other mechanisms to define high-level concept vectors in the latent space that align with human-understandable constructs (e.g., sentiment, gender).

Applications:

Explain model decisions in human terms by projecting latent representations onto concept vectors.

Diagnose the presence of biases or specific abstract features (e.g., fairness in language models).

Sparse Feature Extraction:

What it is: Neuroscience-inspired methods like sparse autoencoders or dictionary learning extract sparse or disentangled features from LLM activation spaces.

Replies:
Dr. Lei:

as i suggested, focus on the big picture first before you dive into the details of a particular approach

Saif:

Hello group,

Milestone: Diagnostic Evaluation (Tentative - First Week of April)
Current work: Investigating Sparse AutoEncoders for LLM Explanation

Action items done in the previous week:

– Regarding fundamentals of SAEs, I came along with the following:

Ng, Andrew. “Sparse autoencoder.” CS294A Lecture notes 72.2011 (2011): 1-19.

Makhzani, Alireza, and Brendan Frey. “K-sparse autoencoders.” arXiv preprint arXiv:1312.5663 (2013).

Action items for next week:

– My primary plan for this week is to look for small open source SAE models that can be trained in smaller scale

– And continue grouping the papers

Replies:
Dr. Lei:

i suggest you think more about the problem, i.e., explaining LLM models. in addition to SAEs, what are other approaches to LLM explanations? what are the general technical problems, and what the existing approaches to solving these problems?

Saif:

Ok Professor. I will look into this according to your suggestion.

Saif:

Hello group,

Milestone: Diagnostic Evaluation(Tentative - First Week of April)
Current work: Investigating Sparse AutoEncoders for LLM Explanation
Action items completed:

After the Friday meeting, I have compiled a list of 50 papers on Sparse Autoencoders (SAE) for LLM explanation. I have added the comprehensive list to my channel.

After skimming through the papers, I have come up with the key findings (references I am using here can be found in the shared document):

Utility of SAEs for LLM Interpretability:

SAEs successfully extract interpretable, sparse, and often monosemantic features from dense LLM activations, addressing the superposition problem where multiple features are combined in single neurons [1, 2, 48].

Applications include identifying causal dependencies, debugging model behaviors, improving model safety (e.g., hallucinations, refusals), and analyzing circuits like attention heads [3, 6].

2. Challenges in Implementation:

Scalability: SAEs must manage the high dimensionality of LLM activations (e.g., residual streams or attention) while maintaining computational efficiency. Techniques like layer-wise transfer learning [29] and grouping layers [21] have been proposed to address this.

Reconstruction Fidelity vs. Sparsity: Achieving a balance between sparsity and reconstruction fidelity remains difficult, especially when minimizing nonlinear reconstruction errors [7, 9].

Team meeting update:

I was working on compiling the papers. There was not much to discuss this week so we skipped the meeting.

Action items for this week:

I will continue summarizing the papers from the complied list. I would like to sign up for a technical discussion on Friday on my finding on this topic.

Replies:
Dr. Lei:

looks good. keep up with the good work

Saif:

Good morning group,

Milestone: Diagnostic Evaluation (Tentative - First Week of April)

This is my 3rd semester running and most of my required course work is complete. I would like to sit for diagnostic evaluation. I may need some suggestions from the group regarding the process.

Milestone: Literature Review Draft - January 23, 2025

During the break, I was unable to commit any time on research due to some unavoidable circumstances. I would like to push the deadline two more weeks.

Collaboration with Fadul:

We had a meeting on Saturday regarding potential research directions for collaboration. We have identified two topics that we may opt for — a) Sparse Auto Encoders for explaining LLMs, and b) Tabular Data Generation. The meeting minutes have been posted on our team’s channel. My plan for this week to go over the papers suggested by Fadul to gain background knowledge on these two topics.

Replies:
Dr. Lei:

sounds good. please work with Fadul and Sunny on possible new topics. the current focus for you is to find a topic to dive into in the next two weeks.

No update was provided for the week ending 2024-12-18.

Saif:

Hi Group,

Updated Milestone:Literature Review Draft - January 23, 2025

Based on the feedback on Friday’s meeting, I have updated the timeline for the literature review.

Action Items from last week:

After Friday, I refined the study design section, added research questions, and identified that performance metric summaries needed to be included..

I have edited the study design section that includes the details of article search process, keyword selection for query terms, and exclusion criteria.

I have added research questions based on machine learning approaches, dataset, feature selection, preprocessing mechanisms and performance metrics.

When I was working on defining the research questions, I noticed that I have missed summarizing the performance metrics discussed in the papers. This is a major part of the review. I am going over the papers to note down the performance metrics.

I have an exam to proctor today and after that the Professor would like to discuss the grading with me. I might be late to join the meeting.

Replies:
Dr. Lei:

this is probably your meaningful status update, which i am really happen to see. keep it up

Saif:

Thank you Dr. Lei.

Saif:

Action items Completed:

I have Completed summaries for all the papers and completed the grouping based on the summaries

I have created a skeleton for the review paper and started writing the draft.

We had a general discussion with Fadul regarding collaboration.

Saif:

Hi group,

I had Compilers final exam on Friday and two project submissions until yesterday. I couldn’t spend much time on research.

I will be presenting the new idea today.

Saif:
Milestone: Complete Literature Survey - Dec 5

Action items from last week–

I have outlined 5 more papers for the literature survey [I was working on final projects on compiler and distributed systems. was unable to spend more time on reading papers]

Action Items for next week:

New Idea presentation

End of November 23: Write a clear outline for the review on shared overleaf project. each section linked to relevant papers and points.

https://saif0524.github.io/code-smell-detection-survey/

Saif:

Hi group,

Milestone: Complete Literature Survey - Dec 5

Action items from last week–

Since Friday, I have outlined 10 more papers for the literature survey

Action Items for next week:

End of November 15: Complete summaries for all 50 papers.

End of November 19: Write a clear outline for the review on shared overleaf project. each section linked to relevant papers and points.

Saif:

Hi group,

Milestone: Complete Literature Survey - Dec 5

Action items–

Last week, I have summarized these four papers for the literature survey

Applying Machine Learning to Customized Smell Detection: AMulti-Project Study

Predicting Code Smells and Analysis of Predictions: Using Machine Learning Techniques and Software Metrics

A machine-learning based ensemble method for anti-patterns detection

A large empirical assessment of the role of data balancing in machine-learning-based code smell detection

This Friday, I will brief my current findings from the literature search.
Replies:
Dr. Lei:

if i remember correctly, i suggested you make a schedule towards the paper submission. can you share the schedule?

Saif:

Hi group,

Milestone: Complete Literature Survey - Dec 5

Action items

Last week, I have summarized these three papers for the literature survey

Python code smell detection using machine learning

An Empirical Study of Code Smells in Transformer-based Code Generation Techniques

Voting Heterogeneous Ensemble for Code Smell Detection

I will present my current findings to the group on the following Friday (11/8).

Saif:
Milestone: Complete a Survey paper - Dec 5 (venue yet to decide)

Action items -

Last week I have read these four papers from my literature search and jotted down the problems they are addressing, key insights and inspiration for solution, steps they are taking toward the solution, and the inputs and outputs generated from their approach

Detection Bad Code Smells By Using Deep Machine Learning Approaches

Improving performance with hybrid feature selection and ensemble machine learning techniques for code smell detection

MARS: Detecting brain class/method code smell based on metric–attention mechanism and residual network

Code Smell Detection using Hybrid Machine Learning Algorithms

Replies:
Dr. Lei:

please break down this milestone and have some intermediate deliverables, so that you can gauge the intermediate progress.

Saif:
Milestone: Complete a Survey paper - Dec 5 (venue yet to decide)

– for research I have updated the list for my literature search.

– I have documented papers that use LLM approach for source code smell / code summarization task

– I was mostly working on my course projects last week

– I have been grading Midterm papers and an assignment for TA work

Saif:

Hello everyone,

Broad Milestone: Complete Code smell detection project by Thanksgiving (Nov 26)

Complete grouping the papers Sep 29 - Partially completed

Present the findings of the existing literature Oct 11th. for this one I am preparing a presentation to show the findings of LLM approaches in codesmell detection.

Action Items from last week:

Completed grouping papers for LLM related works.

New Idea presentation: presented FedCSD paper. General Feedback: try to avoid general papers (mostly from IEEE Access) for new Idea presentation.

Replies:
Dr. Lei:

For a milestone to be meaningful, it must have clear deliverables and an objective way to check whether it is completed. Speaking of a research project being completed, i would expect a research paper to be submitted.

Dr. Lei:

also for your project to make real progress, i want you to make a proposal to the group. in the proposal, please clearly specify what problem you are trying to address, in terms of input/output, what are the technical challenges, and what are your ideas to address the challenges, and how your idea would compare to existing work. also you must break down a project into smaller tasks and put a target date on each small task.

Saif:

Thank you for the feedback Professor.

I’ll revise the milestone to clearly define goals and to break it into smaller tasks.

Our plan was to aim for survey paper. Should I target for a conference in December/January deadline?

Dr. Lei:

typically a survey paper is published in a journal paper. not many conferences accept survey papers. one target journal is ACM Computing Surveys. journal submission can be made anytime, i.e., there is no specific deadlines. i suggest you target the end of this semester.

Saif:

Sounds good. I will adjust the milestone deadline accordingly.

No update was provided for the week ending 2024-10-02.

Saif:

Hello everyone,

Broad Milestone: Complete Code smell detection project by Thanksgiving (Nov 26)

Complete grouping the papersSep 29 (original date was Sep 3)

Present the findings of the existing literature Oct 11th

Action Items from last week:

I am still working categorization of the papers for the literature review

No update was provided for the week ending 2024-09-17.

No update was provided for the week ending 2024-09-13.

Weekly Updates for Saif Uddin Mahmud

Enter Password

Weekly Updates for Saif Uddin Mahmud

Update: 2025-08-08

Update: 2025-07-31

Update: 2025-07-24

Update: 2025-07-17

Update: 2025-07-10

Update: 2025-07-03

Update: 2025-06-26

Update: 2025-04-30

Update: 2025-04-22

Update: 2025-04-15

Update: 2025-04-08

Update: 2025-04-01

Update: 2025-03-25

Update: 2025-03-18

Update: 2025-03-04

Update: 2025-02-25

Update: 2025-02-18

Update: 2025-02-11

Update: 2025-02-04

Update: 2025-01-28

Update: 2025-01-21

Update: 2024-12-18

Update: 2024-12-10

Update: 2024-12-03

Update: 2024-11-26

Update: 2024-11-20

Update: 2024-11-12

Update: 2024-11-05

Update: 2024-10-29

Update: 2024-10-22

Update: 2024-10-15

Update: 2024-10-08

Update: 2024-10-02

Update: 2024-09-24

Update: 2024-09-17

Update: 2024-09-13

Enter Password